![]() |
OpenMS
|
Reader for OpenSwath OSW Parquet output. More...
#include <OpenMS/ANALYSIS/OPENSWATH/OpenSwathOSWParquetReader.h>
Classes | |
| struct | PeakGroupFeatureScoresResult |
| struct | Row |
| Single extracted row combining feature + precursor + run metadata. More... | |
| struct | TransitionFeaturesResult |
| Result container for transition-level features. More... | |
| struct | UnscoredResult |
| Result container for an unscored table. More... | |
Public Member Functions | |
| OpenSwathOSWParquetReader ()=default | |
| Default constructor. | |
| OpenSwathOSWParquetReader (const String &oswpq_dir) | |
| Convenience constructor that loads from the given oswpq path. | |
| void | load (const String &oswpq_dir) |
| Load and extract rows from an OSW Parquet directory or .oswpq archive. | |
| const String & | oswpqPath () const |
| Return the originally provided oswpq path (may be empty) | |
| const std::vector< Row > & | rows () const |
| Return extracted rows. | |
| PeakGroupFeatureScoresResult | fetchPeakGroupFeatures (const String &oswpq_dir, const String &level="ms2", const String &main_score="") const |
| Extract MS2-level feature rows across all runs. | |
| TransitionFeaturesResult | fetchTransitionFeatures (const String &oswpq_dir) const |
| Extract transition-level feature rows across all runs (SOA) | |
| UnscoredResult | fetchUnscoredData (const String &oswpq_dir) const |
| Read an "unscored" table and return a column-oriented result. | |
Private Attributes | |
| std::vector< Row > | rows_ |
| String | oswpq_dir_ |
Reader for OpenSwath OSW Parquet output.
This class reads the Parquet output layout produced by OpenSwathOSWParquetWriter (library/precursors.parquet, runs/runs.parquet and per-run features.parquet) and exposes a flat table of feature rows that combine feature-level scores with precursor/run metadata. This output is meant for downstream scoring workflows (PyProphet) or for simple single table exports.
| struct OpenMS::OpenSwathOSWParquetReader::PeakGroupFeatureScoresResult |
Result container for fetchPeakGroupFeatures Contains discovered MS2 and optional MS1 score columns alongside core feature columns.
| Class Members | ||
|---|---|---|
| vector< bool > | decoy | |
| vector< double > | exp_rt | |
| vector< int64_t > | feature_id | |
| vector< String > | group_id | |
| vector< String > | ms1_columns | |
| vector< vector< double > > | ms1_values | |
| vector< String > | ms2_columns | |
| vector< vector< double > > | ms2_values | |
| vector< int > | precursor_charge | |
| vector< int64_t > | precursor_id | |
| vector< int64_t > | run_id | |
| vector< int64_t > | transition_count | |
| struct OpenMS::OpenSwathOSWParquetReader::Row |
Single extracted row combining feature + precursor + run metadata.
| Class Members | ||
|---|---|---|
| bool | decoy = false | |
| double | exp_rt = 0.0 | |
| int64_t | feature_id = 0 | |
| String | group_id | |
| double | ms2_apex_intensity = 0.0 | |
| double | ms2_area_intensity = 0.0 | |
| double | ms2_total_area_intensity = 0.0 | |
| int | precursor_charge = 0 | |
| int64_t | precursor_id = 0 | |
| int64_t | run_id = 0 | |
| int64_t | transition_count = 0 | |
| struct OpenMS::OpenSwathOSWParquetReader::TransitionFeaturesResult |
Result container for transition-level features.
| Class Members | ||
|---|---|---|
| vector< double > | apex_intensity | |
| vector< double > | apex_rt | |
| vector< double > | area_intensity | |
| vector< bool > | decoy | |
| vector< double > | exp_rt | |
| vector< int64_t > | feature_id | |
| vector< String > | group_id | |
| vector< double > | masserror_ppm | |
| vector< int > | precursor_charge | |
| vector< int64_t > | precursor_id | |
| vector< int > | product_charge | |
| vector< double > | rt_fwhm | |
| vector< int64_t > | run_id | |
| vector< double > | total_area_intensity | |
| vector< double > | total_mi | |
| vector< int64_t > | transition_id | |
| vector< String > | transition_var_columns | |
| vector< vector< double > > | transition_var_values | |
| struct OpenMS::OpenSwathOSWParquetReader::UnscoredResult |
Result container for an unscored table.
An "unscored" table contains feature-level columns but does not include discriminant scores or FDR estimates. Provides many columns (feature, precursor, run, MS1/MS2 metrics and discovered feature score columns).
| Class Members | ||
|---|---|---|
| vector< double > | aggr_prec_Peak_Apex | |
| vector< double > | aggr_prec_Peak_Area | |
| vector< double > | assay_rt | |
| vector< double > | assay_RT | |
| vector< int > | Charge | |
| vector< bool > | decoy | |
| vector< double > | delta_rt | |
| vector< double > | delta_RT | |
| vector< double > | EXP_IM | |
| vector< String > | filename | |
| vector< int64_t > | id | |
| vector< int64_t > | id_peptide | |
| vector< int64_t > | id_run | |
| vector< double > | IM_leftWidth | |
| vector< double > | IM_rightWidth | |
| vector< double > | Intensity | |
| vector< double > | leftWidth | |
| vector< String > | ms1_columns | |
| vector< vector< double > > | ms1_values | |
| vector< String > | ms2_columns | |
| vector< vector< double > > | ms2_values | |
| vector< double > | mz | |
| vector< double > | rightWidth | |
| vector< double > | RT | |
| vector< int64_t > | run_id | |
| vector< int64_t > | transition_group_id | |
|
default |
Default constructor.
| OpenSwathOSWParquetReader | ( | const String & | oswpq_dir | ) |
Convenience constructor that loads from the given oswpq path.
This constructor calls load(oswpq_dir) so the object is ready to use after construction. It is provided for Python ergonomics so callers can create an instance with a single argument similarly to other Parquet helper classes.
| [in] | oswpq_dir | Path to the unzipped OSW Parquet directory or a .oswpq archive (zip) that will be read. |
| PeakGroupFeatureScoresResult fetchPeakGroupFeatures | ( | const String & | oswpq_dir, |
| const String & | level = "ms2", |
||
| const String & | main_score = "" |
||
| ) | const |
Extract MS2-level feature rows across all runs.
This method reads per-run features.parquet files and returns a PeakGroupFeatureScoresResult containing feature identifiers, retention times, precursor metadata and discovered MS2 (and optionally MS1) score columns. The returned result is sorted by run_id, precursor_id and exp_rt to match the ordering used by the sqlite-based extractor.
| [in] | oswpq_dir | Path to the unzipped OSW Parquet directory or a .oswpq archive (zip) that will be read. |
| [in] | level | "ms2" (default) or "ms1ms2" to also include MS1 scores |
| [in] | main_score | Optional main score name to be used downstream. If provided and present among discovered MS2 columns, the specified column will be placed first in ms2_columns/ms2_values to ease downstream usage where a primary score is expected. |
| TransitionFeaturesResult fetchTransitionFeatures | ( | const String & | oswpq_dir | ) | const |
Extract transition-level feature rows across all runs (SOA)
Reads per-run feature_transition.parquet files and joins them with the per-run features.parquet and library tables to produce a column-oriented (structure-of-arrays) result. The returned TransitionFeaturesResult contains per-transition columns including:
feature_id, run_id, precursor_id, exp_rt, precursor_chargetransition_id, product_charge, transition-level decoy flagarea_intensity, total_area_intensity, apex_intensity, apex_rt, rt_fwhm, masserror_ppm, total_mivar_ms2_*) are collected into transition_var_columns with their per-row values in transition_var_values (one vector per discovered column)group_id strings in the form run_feature_precursor_transition to allow easy grouping or joins on the Python side.The ordering of rows is deterministic (sorted by run, precursor, feature, transition) to match other extractors.
| [in] | oswpq_dir | Path to the unzipped OSW Parquet directory or a .oswpq archive (zip) that will be read. |
| UnscoredResult fetchUnscoredData | ( | const String & | oswpq_dir | ) | const |
Read an "unscored" table and return a column-oriented result.
Reads per-run features.parquet (and supporting tables) and returns a rich set of per-feature columns in a Structure-of-Arrays layout convenient for conversion to a pandas.DataFrame. The returned UnscoredResult contains the following columns (vectors of length N):
id_run, id_peptide (optional, 0 if unknown), transition_group_id (precursor id), decoy, run_id, filenameRT (feature EXP_RT), assay_rt (EXP_RT - DELTA_RT), delta_rt (FEATURE.DELTA_RT), assay_RT (precursor/library RT), delta_RT (norm_rt - library RT)id (FEATURE.ID), Charge, mzIntensity (FEATURE_MS2.AREA_INTENSITY), aggr_prec_Peak_Area, aggr_prec_Peak_Apexleft_width, right_width (may be NaN if absent; available in the returned struct as leftWidth/rightWidth)exp_im, exp_im_leftwidth, exp_im_rightwidth (returned as EXP_IM/IM_leftWidth/IM_rightWidth in the struct)ms2_columns / ms2_values and ms1_columns / ms1_values (discovered across runs, e.g. var_ms2_*)Optional columns that are not present in a given Parquet file will be populated with default values (NaN for floating fields, empty strings for filenames, 0 for integer ids) so all returned vectors have equal length and are safe to convert into tabular formats.
| [in] | oswpq_dir | Path to the unzipped OSW Parquet directory or a .oswpq archive (zip) that will be read. |
| void load | ( | const String & | oswpq_dir | ) |
Load and extract rows from an OSW Parquet directory or .oswpq archive.
| [in] | oswpq_dir | Path to the unzipped directory or .oswpq archive |
|
inline |
Return the originally provided oswpq path (may be empty)
|
inline |
Return extracted rows.
|
private |
|
private |