OpenMS
Loading...
Searching...
No Matches
XIMParquetFile Class Reference

Reader for OpenSWATH mobilogram Parquet files (.xim). More...

#include <OpenMS/FORMAT/XIMParquetFile.h>

Collaboration diagram for XIMParquetFile:
[legend]

Classes

struct  XIMAnalyte
 Analyte metadata container. More...
 
struct  XIMMobilogram
 Lightweight mobilogram container for XIM parquet rows. More...
 
struct  XIMRunInfo
 Unique run information (run_id, source_file). More...
 

Public Member Functions

 XIMParquetFile (const String &filename)
 Construct from a single .xim file.
 
 XIMParquetFile (const std::vector< String > &filenames)
 Construct from multiple .xim files.
 
 XIMParquetFile (const XIMParquetFile &rhs)=default
 
XIMParquetFileoperator= (const XIMParquetFile &rhs)=default
 
const StringgetFilename () const
 Return the primary filename.
 
const std::vector< String > & getFilenames () const
 Return all filenames associated with this instance.
 
void load (std::vector< XIMMobilogram > &output) const
 Load all mobilograms from the file(s).
 
void getMobilograms (std::vector< XIMMobilogram > &output, Int64 precursor_id=-1, Int64 transition_id=-1, const String &modified_sequence="", Int64 precursor_charge=-1, Int64 product_charge=-1, Int64 ms_level=-1, Int64 run_id=-1, const String &mobilogram_type="", Int64 feature_id=-1, double feature_rt=-1.0, const String &filter="") const
 Load mobilograms with optional filtering.
 
void getMobilograms (std::vector< XIMMobilogram > &output, const ParquetFilter &filter) const
 Return mobilograms using a typed filter expression.
 
void getMobilograms (std::vector< XIMMobilogram > &output, const ParquetFilterBuilder &filter) const
 Return mobilograms using a typed filter builder.
 
void getRuns (std::vector< XIMRunInfo > &output) const
 Return unique run metadata (run_id, source_file).
 
void getAnalytes (std::vector< XIMAnalyte > &output, const std::vector< String > &columns={}, bool nest_transitions=true) const
 Return unique analyte metadata.
 
void getColumns (std::vector< String > &output) const
 Return the parquet schema column names.
 

Private Member Functions

void getMobilograms_ (std::vector< XIMMobilogram > &output, const FilterExpression &extra_filter, Int64 precursor_id, Int64 transition_id, const String &modified_sequence, Int64 precursor_charge, Int64 product_charge, Int64 ms_level, Int64 run_id, const String &mobilogram_type, Int64 feature_id, double feature_rt, const String &filter) const
 

Private Attributes

String filename_
 
std::vector< Stringfilenames_
 

Detailed Description

Reader for OpenSWATH mobilogram Parquet files (.xim).

Supports loading single or multiple files and filtering on metadata columns (e.g., precursor id, transition id, annotations). Filters are applied before decoding mobility/intensity binary arrays.

Filter syntax

The filter argument in getMobilograms() accepts simple boolean expressions over column names. Supported operators are:

  • Comparison: =, ==, !=, <, <=, >, >=
  • Set membership: in [v1, v2, ...]
  • Boolean: AND/OR (also accepts &&, ||, &, |)

Values can be integers or strings; strings may be unquoted if they contain no spaces or commas (e.g., annotation=y3^1), otherwise use quotes.

Supported filter columns (case-insensitive): RUN_ID, SOURCE_FILE, MS_LEVEL, MOBILOGRAM_TYPE, PRECURSOR_ID, TRANSITION_ID, FEATURE_ID, FEATURE_RT, MODIFIED_SEQUENCE, PRECURSOR_CHARGE, PRODUCT_CHARGE, DETECTING_TRANSITION, PRECURSOR_DECOY, PRODUCT_DECOY, TRANSITION_ORDINAL, TRANSITION_TYPE, ANNOTATION. MOBILITY and INTENSITY are not filterable because they are stored as compressed binary arrays.

Internal processing notes

The implementation uses an Arrow-based pipeline:

  • If Arrow Dataset is available, filters are translated into Arrow expressions and pushed down via dataset scanning.
  • If dataset filtering is unavailable or fails, the same filter expression is evaluated in-memory using Arrow compute.
  • mobility/intensity binary arrays are decoded only after filtering.

These steps are implemented in helper functions in the corresponding .cpp file (e.g., dataset scan vs. compute filter fallback and filter parsing). Keeping the helpers in the implementation file avoids exposing Arrow types in the public header.

Note
The .xim schema is defined by MobilogramParquetConsumer.
See also
OpenMS::MobilogramParquetConsumer

Class Documentation

◆ OpenMS::XIMParquetFile::XIMAnalyte

struct OpenMS::XIMParquetFile::XIMAnalyte

Analyte metadata container.

If nest_transitions is false in getAnalytes(), transition-level fields are stored in the scalar members (transition_id, product_charge, etc.). If nest_transitions is true, transition-level fields are stored in the vector members (transition_ids, product_charges, etc.), with one entry per unique transition belonging to the precursor.

Collaboration diagram for XIMParquetFile::XIMAnalyte:
[legend]
Class Members
String annotation
vector< String > annotations
Int64 detecting_transition {0}
vector< Int64 > detecting_transitions
bool has_detecting_transition {false}
bool has_precursor_charge {false}
bool has_precursor_decoy {false}
bool has_precursor_id {false}
bool has_product_charge {false}
bool has_product_decoy {false}
bool has_transition_id {false}
bool has_transition_ordinal {false}
String modified_sequence
Int64 precursor_charge {0}
Int64 precursor_decoy {0}
Int64 precursor_id {0}
Int64 product_charge {0}
vector< Int64 > product_charges
Int64 product_decoy {0}
vector< Int64 > product_decoys
Int64 transition_id {0}
vector< Int64 > transition_ids
Int64 transition_ordinal {0}
vector< Int64 > transition_ordinals
String transition_type
vector< String > transition_types

◆ OpenMS::XIMParquetFile::XIMMobilogram

struct OpenMS::XIMParquetFile::XIMMobilogram

Lightweight mobilogram container for XIM parquet rows.

Collaboration diagram for XIMParquetFile::XIMMobilogram:
[legend]
Class Members
String annotation
Int64 detecting_transition {0}
Int64 feature_id {0}
double feature_rt {0.0}
bool has_detecting_transition {false}
bool has_feature_id {false}
bool has_feature_rt {false}
bool has_precursor_charge {false}
bool has_precursor_decoy {false}
bool has_precursor_id {false}
bool has_product_charge {false}
bool has_product_decoy {false}
bool has_transition_id {false}
bool has_transition_ordinal {false}
vector< double > intensity
vector< double > mobility
String mobilogram_type
String modified_sequence
Int64 ms_level {0}
Int64 precursor_charge {0}
Int64 precursor_decoy {0}
Int64 precursor_id {0}
Int64 product_charge {0}
Int64 product_decoy {0}
Int64 run_id {0}
String source_file
Int64 transition_id {0}
Int64 transition_ordinal {0}
String transition_type

◆ OpenMS::XIMParquetFile::XIMRunInfo

struct OpenMS::XIMParquetFile::XIMRunInfo

Unique run information (run_id, source_file).

Collaboration diagram for XIMParquetFile::XIMRunInfo:
[legend]
Class Members
Int64 run_id {0}
String source_file

Constructor & Destructor Documentation

◆ XIMParquetFile() [1/3]

XIMParquetFile ( const String filename)
explicit

Construct from a single .xim file.

Parameters
[in]filenamePath to an OpenSWATH mobilogram parquet file.

◆ XIMParquetFile() [2/3]

XIMParquetFile ( const std::vector< String > &  filenames)
explicit

Construct from multiple .xim files.

Parameters
[in]filenamesPaths to OpenSWATH mobilogram parquet files.

◆ XIMParquetFile() [3/3]

XIMParquetFile ( const XIMParquetFile rhs)
default

Member Function Documentation

◆ getAnalytes()

void getAnalytes ( std::vector< XIMAnalyte > &  output,
const std::vector< String > &  columns = {},
bool  nest_transitions = true 
) const

Return unique analyte metadata.

If nest_transitions is false, each row represents a unique precursor-transition pair. If nest_transitions is true, each row represents a unique precursor with transition-level fields aggregated into vectors.

This method never decodes mobility/intensity arrays and always returns distinct entries.

Parameters
[out]outputOutput analyte metadata
[in]columnsOptional list of analyte columns to return (empty for defaults)
[in]nest_transitionsAggregate transition fields per precursor

◆ getColumns()

void getColumns ( std::vector< String > &  output) const

Return the parquet schema column names.

Parameters
[out]outputColumn names.

◆ getFilename()

const String & getFilename ( ) const

Return the primary filename.

For multi-file instances this is the first file in the list.

Returns
Primary filename.

◆ getFilenames()

const std::vector< String > & getFilenames ( ) const

Return all filenames associated with this instance.

Returns
All filenames associated with this instance.

◆ getMobilograms() [1/3]

void getMobilograms ( std::vector< XIMMobilogram > &  output,
const ParquetFilter filter 
) const

Return mobilograms using a typed filter expression.

Parameters
[out]outputOutput mobilograms
[in]filterTyped filter builder expression

◆ getMobilograms() [2/3]

void getMobilograms ( std::vector< XIMMobilogram > &  output,
const ParquetFilterBuilder filter 
) const

Return mobilograms using a typed filter builder.

Parameters
[out]outputOutput mobilograms
[in]filterTyped filter builder

◆ getMobilograms() [3/3]

void getMobilograms ( std::vector< XIMMobilogram > &  output,
Int64  precursor_id = -1,
Int64  transition_id = -1,
const String modified_sequence = "",
Int64  precursor_charge = -1,
Int64  product_charge = -1,
Int64  ms_level = -1,
Int64  run_id = -1,
const String mobilogram_type = "",
Int64  feature_id = -1,
double  feature_rt = -1.0,
const String filter = "" 
) const

Load mobilograms with optional filtering.

Parameters
[out]outputOutput mobilograms
[in]precursor_idOptional precursor id (-1 to ignore)
[in]transition_idOptional transition id (-1 to ignore)
[in]modified_sequenceOptional sequence filter (empty to ignore)
[in]precursor_chargeOptional charge filter (-1 to ignore)
[in]product_chargeOptional product charge filter (-1 to ignore)
[in]ms_levelOptional MS level filter (-1 to ignore)
[in]run_idOptional run_id filter (-1 to ignore)
[in]mobilogram_typeOptional mobilogram type filter (empty to ignore)
[in]feature_idOptional feature id filter (-1 to ignore)
[in]feature_rtOptional feature RT filter (< 0 to ignore)
[in]filterOptional filter expression on columns (e.g., "PRECURSOR_ID=1 OR TRANSITION_ID in [2,3]")

◆ getMobilograms_()

void getMobilograms_ ( std::vector< XIMMobilogram > &  output,
const FilterExpression extra_filter,
Int64  precursor_id,
Int64  transition_id,
const String modified_sequence,
Int64  precursor_charge,
Int64  product_charge,
Int64  ms_level,
Int64  run_id,
const String mobilogram_type,
Int64  feature_id,
double  feature_rt,
const String filter 
) const
private

◆ getRuns()

void getRuns ( std::vector< XIMRunInfo > &  output) const

Return unique run metadata (run_id, source_file).

This method never decodes mobility/intensity arrays and always returns distinct rows.

◆ load()

void load ( std::vector< XIMMobilogram > &  output) const

Load all mobilograms from the file(s).

Parameters
[out]outputOutput mobilograms.

◆ operator=()

XIMParquetFile & operator= ( const XIMParquetFile rhs)
default

Member Data Documentation

◆ filename_

String filename_
private

◆ filenames_

std::vector<String> filenames_
private