OpenMS
Loading...
Searching...
No Matches
ConsensusMapArrowExport Class Reference

Export ConsensusMap feature data to Apache Arrow format following QPX feature schema. More...

#include <OpenMS/FORMAT/ConsensusMapArrowExport.h>

Static Public Member Functions

static std::shared_ptr< arrow::Table > exportToArrow (const ConsensusMap &cmap)
 Export ConsensusMap to Apache Arrow Table.
 
static bool exportToParquet (const ConsensusMap &cmap, const std::string &filename, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Export ConsensusMap to Parquet file.
 
static bool exportToParquetStreaming (const ConsensusMap &cmap, const std::string &filename, size_t batch_size=1000000, const ParquetWriteConfig &config=ParquetWriteConfig{}, int n_threads=1)
 Stream a ConsensusMap to a Parquet file in row batches (bounded peak memory)
 

Detailed Description

Export ConsensusMap feature data to Apache Arrow format following QPX feature schema.

This class provides static methods to export ConsensusMap data to Apache Arrow Tables and Parquet files. The schema follows the QPX (Quantitative Proteomics Exchange) feature format.

Experimental classes:
This API is experimental and may change in future versions.

Member Function Documentation

◆ exportToArrow()

static std::shared_ptr< arrow::Table > exportToArrow ( const ConsensusMap cmap)
static

Export ConsensusMap to Apache Arrow Table.

Exports consensus features following the QPX feature schema. Each ConsensusFeature becomes one row with identification, quantification, and protein group information.

Parameters
[in]cmapThe ConsensusMap to export
Returns
Shared pointer to Arrow Table, or nullptr on error

◆ exportToParquet()

static bool exportToParquet ( const ConsensusMap cmap,
const std::string &  filename,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)
static

Export ConsensusMap to Parquet file.

Parameters
[in]cmapThe ConsensusMap to export
[in]filenameOutput file path
[in]configParquet writing options
Returns
true on success, false on error

◆ exportToParquetStreaming()

static bool exportToParquetStreaming ( const ConsensusMap cmap,
const std::string &  filename,
size_t  batch_size = 1000000,
const ParquetWriteConfig config = ParquetWriteConfig{},
int  n_threads = 1 
)
static

Stream a ConsensusMap to a Parquet file in row batches (bounded peak memory)

Functionally equivalent to exportToParquet() but builds and flushes the feature table one batch_size -sized range at a time through a persistent parquet::arrow::FileWriter, instead of materializing the whole ~N-row Arrow table in memory before a single write. For isobaric data (one consensus feature per PSM) N can be in the millions, where the one-shot path's transient peak drives the process into swap / OOM; here peak memory stays bounded by one batch.

Each batch is optionally partitioned and built in parallel with OpenMP and written in index order (the Parquet writer stays serial), so the written rows and their order are identical to exportToParquet() and deterministic for any thread count; only the Parquet row-group layout may differ.

Parameters
[in]cmapThe ConsensusMap to export
[in]filenameOutput file path
[in]batch_sizeConsensus features materialized per batch (0 is treated as the default)
[in]configParquet writing options
[in]n_threadsOpenMP threads for the per-batch build: 1 = serial (default), 0 = all available cores (honors OMP_NUM_THREADS), N = fixed
Returns
true on success, false on error