![]() |
OpenMS
|
Export ConsensusMap feature data to Apache Arrow format following QPX feature schema. More...
#include <OpenMS/FORMAT/ConsensusMapArrowExport.h>
Static Public Member Functions | |
| static std::shared_ptr< arrow::Table > | exportToArrow (const ConsensusMap &cmap) |
| Export ConsensusMap to Apache Arrow Table. | |
| static bool | exportToParquet (const ConsensusMap &cmap, const std::string &filename, const ParquetWriteConfig &config=ParquetWriteConfig{}) |
| Export ConsensusMap to Parquet file. | |
| static bool | exportToParquetStreaming (const ConsensusMap &cmap, const std::string &filename, size_t batch_size=1000000, const ParquetWriteConfig &config=ParquetWriteConfig{}, int n_threads=1) |
| Stream a ConsensusMap to a Parquet file in row batches (bounded peak memory) | |
Export ConsensusMap feature data to Apache Arrow format following QPX feature schema.
This class provides static methods to export ConsensusMap data to Apache Arrow Tables and Parquet files. The schema follows the QPX (Quantitative Proteomics Exchange) feature format.
|
static |
Export ConsensusMap to Apache Arrow Table.
Exports consensus features following the QPX feature schema. Each ConsensusFeature becomes one row with identification, quantification, and protein group information.
| [in] | cmap | The ConsensusMap to export |
|
static |
Export ConsensusMap to Parquet file.
| [in] | cmap | The ConsensusMap to export |
| [in] | filename | Output file path |
| [in] | config | Parquet writing options |
|
static |
Stream a ConsensusMap to a Parquet file in row batches (bounded peak memory)
Functionally equivalent to exportToParquet() but builds and flushes the feature table one batch_size -sized range at a time through a persistent parquet::arrow::FileWriter, instead of materializing the whole ~N-row Arrow table in memory before a single write. For isobaric data (one consensus feature per PSM) N can be in the millions, where the one-shot path's transient peak drives the process into swap / OOM; here peak memory stays bounded by one batch.
Each batch is optionally partitioned and built in parallel with OpenMP and written in index order (the Parquet writer stays serial), so the written rows and their order are identical to exportToParquet() and deterministic for any thread count; only the Parquet row-group layout may differ.
| [in] | cmap | The ConsensusMap to export |
| [in] | filename | Output file path |
| [in] | batch_size | Consensus features materialized per batch (0 is treated as the default) |
| [in] | config | Parquet writing options |
| [in] | n_threads | OpenMP threads for the per-batch build: 1 = serial (default), 0 = all available cores (honors OMP_NUM_THREADS), N = fixed |