OpenMS
Loading...
Searching...
No Matches
OpenSwathOSWParquetReader Class Reference

Reader for OpenSwath OSW Parquet output. More...

#include <OpenMS/ANALYSIS/OPENSWATH/OpenSwathOSWParquetReader.h>

Collaboration diagram for OpenSwathOSWParquetReader:
[legend]

Classes

struct  PeakGroupFeatureScoresResult
 
struct  Row
 Single extracted row combining feature + precursor + run metadata. More...
 
struct  TransitionFeaturesResult
 Result container for transition-level features. More...
 
struct  UnscoredResult
 Result container for an unscored table. More...
 

Public Member Functions

 OpenSwathOSWParquetReader ()=default
 Default constructor.
 
 OpenSwathOSWParquetReader (const String &oswpq_dir)
 Convenience constructor that loads from the given oswpq path.
 
void load (const String &oswpq_dir)
 Load and extract rows from an OSW Parquet directory or .oswpq archive.
 
const StringoswpqPath () const
 Return the originally provided oswpq path (may be empty)
 
const std::vector< Row > & rows () const
 Return extracted rows.
 
PeakGroupFeatureScoresResult fetchPeakGroupFeatures (const String &oswpq_dir, const String &level="ms2", const String &main_score="") const
 Extract MS2-level feature rows across all runs.
 
TransitionFeaturesResult fetchTransitionFeatures (const String &oswpq_dir) const
 Extract transition-level feature rows across all runs (SOA)
 
UnscoredResult fetchUnscoredData (const String &oswpq_dir) const
 Read an "unscored" table and return a column-oriented result.
 

Private Attributes

std::vector< Rowrows_
 
String oswpq_dir_
 

Detailed Description

Reader for OpenSwath OSW Parquet output.

This class reads the Parquet output layout produced by OpenSwathOSWParquetWriter (library/precursors.parquet, runs/runs.parquet and per-run features.parquet) and exposes a flat table of feature rows that combine feature-level scores with precursor/run metadata. This output is meant for downstream scoring workflows (PyProphet) or for simple single table exports.


Class Documentation

◆ OpenMS::OpenSwathOSWParquetReader::PeakGroupFeatureScoresResult

struct OpenMS::OpenSwathOSWParquetReader::PeakGroupFeatureScoresResult

Result container for fetchPeakGroupFeatures Contains discovered MS2 and optional MS1 score columns alongside core feature columns.

Collaboration diagram for OpenSwathOSWParquetReader::PeakGroupFeatureScoresResult:
[legend]
Class Members
vector< bool > decoy
vector< double > exp_rt
vector< int64_t > feature_id
vector< String > group_id
vector< String > ms1_columns
vector< vector< double > > ms1_values
vector< String > ms2_columns
vector< vector< double > > ms2_values
vector< int > precursor_charge
vector< int64_t > precursor_id
vector< int64_t > run_id
vector< int64_t > transition_count

◆ OpenMS::OpenSwathOSWParquetReader::Row

struct OpenMS::OpenSwathOSWParquetReader::Row

Single extracted row combining feature + precursor + run metadata.

Collaboration diagram for OpenSwathOSWParquetReader::Row:
[legend]
Class Members
bool decoy = false
double exp_rt = 0.0
int64_t feature_id = 0
String group_id
double ms2_apex_intensity = 0.0
double ms2_area_intensity = 0.0
double ms2_total_area_intensity = 0.0
int precursor_charge = 0
int64_t precursor_id = 0
int64_t run_id = 0
int64_t transition_count = 0

◆ OpenMS::OpenSwathOSWParquetReader::TransitionFeaturesResult

struct OpenMS::OpenSwathOSWParquetReader::TransitionFeaturesResult

Result container for transition-level features.

Collaboration diagram for OpenSwathOSWParquetReader::TransitionFeaturesResult:
[legend]
Class Members
vector< double > apex_intensity
vector< double > apex_rt
vector< double > area_intensity
vector< bool > decoy
vector< double > exp_rt
vector< int64_t > feature_id
vector< String > group_id
vector< double > masserror_ppm
vector< int > precursor_charge
vector< int64_t > precursor_id
vector< int > product_charge
vector< double > rt_fwhm
vector< int64_t > run_id
vector< double > total_area_intensity
vector< double > total_mi
vector< int64_t > transition_id
vector< String > transition_var_columns
vector< vector< double > > transition_var_values

◆ OpenMS::OpenSwathOSWParquetReader::UnscoredResult

struct OpenMS::OpenSwathOSWParquetReader::UnscoredResult

Result container for an unscored table.

An "unscored" table contains feature-level columns but does not include discriminant scores or FDR estimates. Provides many columns (feature, precursor, run, MS1/MS2 metrics and discovered feature score columns).

Collaboration diagram for OpenSwathOSWParquetReader::UnscoredResult:
[legend]
Class Members
vector< double > aggr_prec_Peak_Apex
vector< double > aggr_prec_Peak_Area
vector< double > assay_rt
vector< double > assay_RT
vector< int > Charge
vector< bool > decoy
vector< double > delta_rt
vector< double > delta_RT
vector< double > EXP_IM
vector< String > filename
vector< int64_t > id
vector< int64_t > id_peptide
vector< int64_t > id_run
vector< double > IM_leftWidth
vector< double > IM_rightWidth
vector< double > Intensity
vector< double > leftWidth
vector< String > ms1_columns
vector< vector< double > > ms1_values
vector< String > ms2_columns
vector< vector< double > > ms2_values
vector< double > mz
vector< double > rightWidth
vector< double > RT
vector< int64_t > run_id
vector< int64_t > transition_group_id

Constructor & Destructor Documentation

◆ OpenSwathOSWParquetReader() [1/2]

Default constructor.

◆ OpenSwathOSWParquetReader() [2/2]

OpenSwathOSWParquetReader ( const String oswpq_dir)

Convenience constructor that loads from the given oswpq path.

This constructor calls load(oswpq_dir) so the object is ready to use after construction. It is provided for Python ergonomics so callers can create an instance with a single argument similarly to other Parquet helper classes.

Parameters
[in]oswpq_dirPath to the unzipped OSW Parquet directory or a .oswpq archive (zip) that will be read.

Member Function Documentation

◆ fetchPeakGroupFeatures()

PeakGroupFeatureScoresResult fetchPeakGroupFeatures ( const String oswpq_dir,
const String level = "ms2",
const String main_score = "" 
) const

Extract MS2-level feature rows across all runs.

This method reads per-run features.parquet files and returns a PeakGroupFeatureScoresResult containing feature identifiers, retention times, precursor metadata and discovered MS2 (and optionally MS1) score columns. The returned result is sorted by run_id, precursor_id and exp_rt to match the ordering used by the sqlite-based extractor.

Parameters
[in]oswpq_dirPath to the unzipped OSW Parquet directory or a .oswpq archive (zip) that will be read.
[in]level"ms2" (default) or "ms1ms2" to also include MS1 scores
[in]main_scoreOptional main score name to be used downstream. If provided and present among discovered MS2 columns, the specified column will be placed first in ms2_columns/ms2_values to ease downstream usage where a primary score is expected.
Returns
PeakGroupFeatureScoresResult populated with discovered columns and core feature fields.

◆ fetchTransitionFeatures()

TransitionFeaturesResult fetchTransitionFeatures ( const String oswpq_dir) const

Extract transition-level feature rows across all runs (SOA)

Reads per-run feature_transition.parquet files and joins them with the per-run features.parquet and library tables to produce a column-oriented (structure-of-arrays) result. The returned TransitionFeaturesResult contains per-transition columns including:

  • feature identifiers and run/precursor metadata: feature_id, run_id, precursor_id, exp_rt, precursor_charge
  • transition identifiers and properties: transition_id, product_charge, transition-level decoy flag
  • basic transition peak metrics: area_intensity, total_area_intensity, apex_intensity, apex_rt, rt_fwhm, masserror_ppm, total_mi
  • discovered transition-level score columns (e.g. var_ms2_*) are collected into transition_var_columns with their per-row values in transition_var_values (one vector per discovered column)
  • group_id strings in the form run_feature_precursor_transition to allow easy grouping or joins on the Python side.

The ordering of rows is deterministic (sorted by run, precursor, feature, transition) to match other extractors.

Parameters
[in]oswpq_dirPath to the unzipped OSW Parquet directory or a .oswpq archive (zip) that will be read.
Returns
TransitionFeaturesResult populated with transition-level columns

◆ fetchUnscoredData()

UnscoredResult fetchUnscoredData ( const String oswpq_dir) const

Read an "unscored" table and return a column-oriented result.

Reads per-run features.parquet (and supporting tables) and returns a rich set of per-feature columns in a Structure-of-Arrays layout convenient for conversion to a pandas.DataFrame. The returned UnscoredResult contains the following columns (vectors of length N):

  • feature/run identifiers: id_run, id_peptide (optional, 0 if unknown), transition_group_id (precursor id), decoy, run_id, filename
  • retention time fields: RT (feature EXP_RT), assay_rt (EXP_RT - DELTA_RT), delta_rt (FEATURE.DELTA_RT), assay_RT (precursor/library RT), delta_RT (norm_rt - library RT)
  • feature identifiers and precursor metadata: id (FEATURE.ID), Charge, mz
  • MS2/MS1 intensity metrics: Intensity (FEATURE_MS2.AREA_INTENSITY), aggr_prec_Peak_Area, aggr_prec_Peak_Apex
  • peak boundary widths: left_width, right_width (may be NaN if absent; available in the returned struct as leftWidth/rightWidth)
  • optional ion-mobility fields: canonical Parquet names exp_im, exp_im_leftwidth, exp_im_rightwidth (returned as EXP_IM/IM_leftWidth/IM_rightWidth in the struct)
  • discovered score columns: ms2_columns / ms2_values and ms1_columns / ms1_values (discovered across runs, e.g. var_ms2_*)

Optional columns that are not present in a given Parquet file will be populated with default values (NaN for floating fields, empty strings for filenames, 0 for integer ids) so all returned vectors have equal length and are safe to convert into tabular formats.

Parameters
[in]oswpq_dirPath to the unzipped OSW Parquet directory or a .oswpq archive (zip) that will be read.
Returns
UnscoredResult containing the assembled column vectors.

◆ load()

void load ( const String oswpq_dir)

Load and extract rows from an OSW Parquet directory or .oswpq archive.

Parameters
[in]oswpq_dirPath to the unzipped directory or .oswpq archive

◆ oswpqPath()

const String & oswpqPath ( ) const
inline

Return the originally provided oswpq path (may be empty)

◆ rows()

const std::vector< Row > & rows ( ) const
inline

Return extracted rows.

Member Data Documentation

◆ oswpq_dir_

String oswpq_dir_
private

◆ rows_

std::vector<Row> rows_
private