OpenMS
Loading...
Searching...
No Matches
MSChromatogramParquetConsumer Class Reference

Writes chromatograms to a Parquet file with a PyProphet-compatible schema. More...

#include <OpenMS/FORMAT/DATAACCESS/MSChromatogramParquetConsumer.h>

Inheritance diagram for MSChromatogramParquetConsumer:
[legend]
Collaboration diagram for MSChromatogramParquetConsumer:
[legend]

Public Member Functions

 MSChromatogramParquetConsumer (const String &filename, UInt64 run_id, const String &source_file, const OpenSwath::LightTargetedExperiment &transition_exp)
 Construct a parquet consumer for chromatogram export.
 
 ~MSChromatogramParquetConsumer () override
 Destructor flushes pending data and closes the parquet writer.
 
void consumeSpectrum (SpectrumType &s) override
 Consume a spectrum (no-op; spectra are ignored for chromatogram export).
 
void consumeChromatogram (ChromatogramType &c) override
 Consume a chromatogram and append it to the parquet output.
 
void finalize ()
 Finalize and write the parquet file.
 
void setExpectedSize (Size expectedSpectra, Size expectedChromatograms) override
 Reserve storage for expected data sizes.
 
void setExperimentalSettings (const ExperimentalSettings &exp) override
 Set experimental settings (currently unused).
 
- Public Member Functions inherited from IMSDataConsumer
virtual ~IMSDataConsumer ()
 

Private Attributes

std::unique_ptr< MSChromatogramParquetConsumerImpl > impl_
 

Additional Inherited Members

- Public Types inherited from IMSDataConsumer
typedef MSSpectrum SpectrumType
 
typedef MSChromatogram ChromatogramType
 

Detailed Description

Writes chromatograms to a Parquet file with a PyProphet-compatible schema.

The schema includes precursor/transition metadata, RT/intensity arrays and compression flags. Additional columns are run_id, source_file, and ms_level.

The Parquet output has the following columns (one row per chromatogram):

Column Type Description
RUN_ID int64 Run identifier
SOURCE_FILE string Input source filename
MS_LEVEL int64 MS level (1 for precursor traces, 2 for fragment traces)
PRECURSOR_ID int64 (nullable) Precursor id
TRANSITION_ID int64 (nullable) Transition id
MODIFIED_SEQUENCE string (nullable) Modified peptide sequence
PRECURSOR_CHARGE int64 (nullable) Precursor charge
PRODUCT_CHARGE int64 (nullable) Product charge
DETECTING_TRANSITION int64 (nullable) Detecting transition flag
PRECURSOR_DECOY int64 (nullable) Precursor decoy flag
PRODUCT_DECOY int64 (nullable) Product decoy flag
TRANSITION_ORDINAL int64 (nullable) Transition ordinal
TRANSITION_TYPE string (nullable) Transition type (e.g., y, b)
ANNOTATION string (nullable) Transition annotation (e.g., y3^1)
RT_DATA binary Compressed RT array
INTENSITY_DATA binary Compressed intensity array
RT_COMPRESSION int64 RT compression scheme id
INTENSITY_COMPRESSION int64 Intensity compression scheme id

Compression identifiers:

Column Value Description
RT_COMPRESSION 0 No compression (raw doubles)
RT_COMPRESSION 1 Zlib-compressed raw doubles
RT_COMPRESSION 5 MSNumpress (linear) with lossy compression
INTENSITY_COMPRESSION 0 No compression (raw doubles)
INTENSITY_COMPRESSION 1 Zlib-compressed raw doubles
INTENSITY_COMPRESSION 6 MSNumpress (short logged float) with lossy compression

Constructor & Destructor Documentation

◆ MSChromatogramParquetConsumer()

MSChromatogramParquetConsumer ( const String filename,
UInt64  run_id,
const String source_file,
const OpenSwath::LightTargetedExperiment transition_exp 
)

Construct a parquet consumer for chromatogram export.

Parameters
[in]filenameOutput parquet filename.
[in]run_idRun identifier to store with each chromatogram.
[in]source_fileSource mzML filename to store with each chromatogram.
[in]transition_expTransition metadata used to annotate chromatograms.

◆ ~MSChromatogramParquetConsumer()

Destructor flushes pending data and closes the parquet writer.

Member Function Documentation

◆ consumeChromatogram()

void consumeChromatogram ( ChromatogramType c)
overridevirtual

Consume a chromatogram and append it to the parquet output.

Implements IMSDataConsumer.

◆ consumeSpectrum()

void consumeSpectrum ( SpectrumType s)
overridevirtual

Consume a spectrum (no-op; spectra are ignored for chromatogram export).

Implements IMSDataConsumer.

◆ finalize()

void finalize ( )

Finalize and write the parquet file.

Call this explicitly to surface write errors during normal control flow.

◆ setExpectedSize()

void setExpectedSize ( Size  expectedSpectra,
Size  expectedChromatograms 
)
overridevirtual

Reserve storage for expected data sizes.

Parameters
[in]expectedSpectraExpected number of spectra (ignored).
[in]expectedChromatogramsExpected number of chromatograms.

Implements IMSDataConsumer.

◆ setExperimentalSettings()

void setExperimentalSettings ( const ExperimentalSettings exp)
overridevirtual

Set experimental settings (currently unused).

Parameters
[in]expExperimental settings to store for context.

Implements IMSDataConsumer.

Member Data Documentation

◆ impl_

std::unique_ptr<MSChromatogramParquetConsumerImpl> impl_
private