![]() |
OpenMS
|
Reader for OpenSWATH chromatogram Parquet files (.xic). More...
#include <OpenMS/FORMAT/XICParquetFile.h>
Classes | |
| struct | XICAnalyte |
| Analyte metadata container. More... | |
| struct | XICChromatogram |
| Lightweight chromatogram container for XIC parquet rows. More... | |
| struct | XICRunInfo |
| Unique run information (run_id, source_file). More... | |
Public Member Functions | |
| XICParquetFile (const String &filename) | |
| Construct from a single .xic file. | |
| XICParquetFile (const std::vector< String > &filenames) | |
| Construct from multiple .xic files. | |
| XICParquetFile (const XICParquetFile &rhs)=default | |
| XICParquetFile & | operator= (const XICParquetFile &rhs)=default |
| const String & | getFilename () const |
| Return the primary filename. | |
| const std::vector< String > & | getFilenames () const |
| Return all filenames associated with this instance. | |
| void | load (std::vector< XICChromatogram > &output) const |
| Load all chromatograms from the file(s). | |
| void | getChromatograms (std::vector< XICChromatogram > &output, Int64 precursor_id=-1, Int64 transition_id=-1, const String &modified_sequence="", Int64 precursor_charge=-1, Int64 product_charge=-1, Int64 ms_level=-1, Int64 run_id=-1, const String &filter="") const |
| Load chromatograms with optional filtering. | |
| void | getChromatograms (std::vector< XICChromatogram > &output, const ParquetFilter &filter) const |
| Return chromatograms using a typed filter expression. | |
| void | getChromatograms (std::vector< XICChromatogram > &output, const ParquetFilterBuilder &filter) const |
| Return chromatograms using a typed filter builder. | |
| void | getRuns (std::vector< XICRunInfo > &output) const |
| Return unique run metadata (run_id, source_file). | |
| void | getAnalytes (std::vector< XICAnalyte > &output, const std::vector< String > &columns={}, bool nest_transitions=true) const |
| Return unique analyte metadata. | |
| void | getColumns (std::vector< String > &output) const |
| Return the parquet schema column names. | |
Private Member Functions | |
| void | getChromatograms_ (std::vector< XICChromatogram > &output, const FilterExpression &extra_filter, Int64 precursor_id, Int64 transition_id, const String &modified_sequence, Int64 precursor_charge, Int64 product_charge, Int64 ms_level, Int64 run_id, const String &filter) const |
Private Attributes | |
| String | filename_ |
| std::vector< String > | filenames_ |
Reader for OpenSWATH chromatogram Parquet files (.xic).
Supports loading single or multiple files and filtering on metadata columns (e.g., precursor id, transition id, annotations). Filters are applied before decoding RT/intensity binary arrays.
The filter argument in getChromatograms() accepts simple boolean expressions over column names. Supported operators are:
Values can be integers or strings; strings may be unquoted if they contain no spaces or commas (e.g., annotation=y3^1), otherwise use quotes.
Supported filter columns (case-insensitive): RUN_ID, SOURCE_FILE, MS_LEVEL, PRECURSOR_ID, TRANSITION_ID, MODIFIED_SEQUENCE, PRECURSOR_CHARGE, PRODUCT_CHARGE, DETECTING_TRANSITION, PRECURSOR_DECOY, PRODUCT_DECOY, TRANSITION_ORDINAL, TRANSITION_TYPE, ANNOTATION. RT and INTENSITY are not filterable because they are stored as compressed binary arrays.
The implementation uses an Arrow-based pipeline:
These steps are implemented in helper functions in the corresponding .cpp file (e.g., dataset scan vs. compute filter fallback and filter parsing). Keeping the helpers in the implementation file avoids exposing Arrow types in the public header.
| struct OpenMS::XICParquetFile::XICAnalyte |
Analyte metadata container.
If nest_transitions is false in getAnalytes(), transition-level fields are stored in the scalar members (transition_id, product_charge, etc.). If nest_transitions is true, transition-level fields are stored in the vector members (transition_ids, product_charges, etc.), with one entry per unique transition belonging to the precursor.
| Class Members | ||
|---|---|---|
| String | annotation | |
| vector< String > | annotations | |
| Int64 | detecting_transition {0} | |
| vector< Int64 > | detecting_transitions | |
| bool | has_detecting_transition {false} | |
| bool | has_precursor_charge {false} | |
| bool | has_precursor_decoy {false} | |
| bool | has_precursor_id {false} | |
| bool | has_product_charge {false} | |
| bool | has_product_decoy {false} | |
| bool | has_transition_id {false} | |
| bool | has_transition_ordinal {false} | |
| String | modified_sequence | |
| Int64 | precursor_charge {0} | |
| Int64 | precursor_decoy {0} | |
| Int64 | precursor_id {0} | |
| Int64 | product_charge {0} | |
| vector< Int64 > | product_charges | |
| Int64 | product_decoy {0} | |
| vector< Int64 > | product_decoys | |
| Int64 | transition_id {0} | |
| vector< Int64 > | transition_ids | |
| Int64 | transition_ordinal {0} | |
| vector< Int64 > | transition_ordinals | |
| String | transition_type | |
| vector< String > | transition_types | |
| struct OpenMS::XICParquetFile::XICChromatogram |
Lightweight chromatogram container for XIC parquet rows.
| Class Members | ||
|---|---|---|
| String | annotation | |
| Int64 | detecting_transition {0} | |
| bool | has_detecting_transition {false} | |
| bool | has_precursor_charge {false} | |
| bool | has_precursor_decoy {false} | |
| bool | has_precursor_id {false} | |
| bool | has_product_charge {false} | |
| bool | has_product_decoy {false} | |
| bool | has_transition_id {false} | |
| bool | has_transition_ordinal {false} | |
| vector< double > | intensity | |
| String | modified_sequence | |
| Int64 | ms_level {0} | |
| Int64 | precursor_charge {0} | |
| Int64 | precursor_decoy {0} | |
| Int64 | precursor_id {0} | |
| Int64 | product_charge {0} | |
| Int64 | product_decoy {0} | |
| vector< double > | rt | |
| Int64 | run_id {0} | |
| String | source_file | |
| Int64 | transition_id {0} | |
| Int64 | transition_ordinal {0} | |
| String | transition_type | |
| struct OpenMS::XICParquetFile::XICRunInfo |
|
explicit |
Construct from a single .xic file.
| [in] | filename | Path to an OpenSWATH chromatogram parquet file. |
|
explicit |
Construct from multiple .xic files.
| [in] | filenames | Paths to OpenSWATH chromatogram parquet files. |
|
default |
| void getAnalytes | ( | std::vector< XICAnalyte > & | output, |
| const std::vector< String > & | columns = {}, |
||
| bool | nest_transitions = true |
||
| ) | const |
Return unique analyte metadata.
If nest_transitions is false, each row represents a unique precursor-transition pair. If nest_transitions is true, each row represents a unique precursor with transition-level fields aggregated into vectors.
This method never decodes RT/intensity arrays and always returns distinct entries.
| [out] | output | Output analyte metadata |
| [in] | columns | Optional list of analyte columns to return (empty for defaults) |
| [in] | nest_transitions | Aggregate transition fields per precursor |
| void getChromatograms | ( | std::vector< XICChromatogram > & | output, |
| const ParquetFilter & | filter | ||
| ) | const |
Return chromatograms using a typed filter expression.
| [out] | output | Output chromatograms |
| [in] | filter | Typed filter builder expression |
| void getChromatograms | ( | std::vector< XICChromatogram > & | output, |
| const ParquetFilterBuilder & | filter | ||
| ) | const |
Return chromatograms using a typed filter builder.
| [out] | output | Output chromatograms |
| [in] | filter | Typed filter builder |
| void getChromatograms | ( | std::vector< XICChromatogram > & | output, |
| Int64 | precursor_id = -1, |
||
| Int64 | transition_id = -1, |
||
| const String & | modified_sequence = "", |
||
| Int64 | precursor_charge = -1, |
||
| Int64 | product_charge = -1, |
||
| Int64 | ms_level = -1, |
||
| Int64 | run_id = -1, |
||
| const String & | filter = "" |
||
| ) | const |
Load chromatograms with optional filtering.
| [out] | output | Output chromatograms |
| [in] | precursor_id | Optional precursor id (-1 to ignore) |
| [in] | transition_id | Optional transition id (-1 to ignore) |
| [in] | modified_sequence | Optional sequence filter (empty to ignore) |
| [in] | precursor_charge | Optional charge filter (-1 to ignore) |
| [in] | product_charge | Optional product charge filter (-1 to ignore) |
| [in] | ms_level | Optional MS level filter (-1 to ignore) |
| [in] | run_id | Optional run_id filter (-1 to ignore) |
| [in] | filter | Optional filter expression on columns (e.g., "PRECURSOR_ID=1 OR TRANSITION_ID in [2,3]") |
|
private |
| void getColumns | ( | std::vector< String > & | output | ) | const |
Return the parquet schema column names.
| [out] | output | Column names. |
| const String & getFilename | ( | ) | const |
Return the primary filename.
For multi-file instances this is the first file in the list.
| const std::vector< String > & getFilenames | ( | ) | const |
Return all filenames associated with this instance.
| void getRuns | ( | std::vector< XICRunInfo > & | output | ) | const |
Return unique run metadata (run_id, source_file).
This method never decodes RT/intensity arrays and always returns distinct rows.
| void load | ( | std::vector< XICChromatogram > & | output | ) | const |
Load all chromatograms from the file(s).
| [out] | output | Output chromatograms. |
|
default |
|
private |
|
private |