OpenMS
Loading...
Searching...
No Matches
ProteinGroupArrowExport Class Reference

Export protein group data to Apache Arrow format following QPX pg schema. More...

#include <OpenMS/FORMAT/ProteinGroupArrowExport.h>

Static Public Member Functions

static std::shared_ptr< arrow::Table > exportToArrow (const ConsensusMap &cmap)
 Export protein group data to Apache Arrow Table.
 
static bool exportToParquet (const ConsensusMap &cmap, const String &filename, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Export protein group data to Parquet file.
 
static std::shared_ptr< arrow::Table > exportToArrow (const std::vector< ProteinIdentification > &protein_identifications, const PeptideIdentificationList &peptide_identifications)
 Export protein group data to Arrow table from identification data (no quantification)
 
static bool exportToParquet (const std::vector< ProteinIdentification > &protein_identifications, const PeptideIdentificationList &peptide_identifications, const String &filename, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Export protein group data to Parquet file from identification data (no quantification)
 
static bool exportToParquet (const std::shared_ptr< arrow::Table > &table, const String &filename, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Write a pre-built QPX protein group Arrow table to a Parquet file.
 

Detailed Description

Export protein group data to Apache Arrow format following QPX pg schema.

This class provides static methods to export protein group quantification data from a ConsensusMap to Apache Arrow Tables and Parquet files. The schema follows the QPX (Quantitative Proteomics Exchange) protein group format.

Protein groups must have quantification annotated via PeptideAndProteinQuant::annotateQuantificationsToProteins() before export.

Experimental classes:
This API is experimental and may change in future versions.

Member Function Documentation

◆ exportToArrow() [1/2]

static std::shared_ptr< arrow::Table > exportToArrow ( const ConsensusMap cmap)
static

Export protein group data to Apache Arrow Table.

Exports indistinguishable protein groups following the QPX pg schema. One row is emitted per protein group per run file.

Parameters
[in]cmapThe ConsensusMap with annotated protein group quantification
Returns
Shared pointer to Arrow Table, or nullptr on error

◆ exportToArrow() [2/2]

static std::shared_ptr< arrow::Table > exportToArrow ( const std::vector< ProteinIdentification > &  protein_identifications,
const PeptideIdentificationList peptide_identifications 
)
static

Export protein group data to Arrow table from identification data (no quantification)

For search-engine output where no ConsensusMap is available. Populates required QPX pg fields (pg_accessions, anchor_protein, run_file_name, is_decoy, peptides) and sets quantification columns (intensities, additional_intensities) to null.

Parameters
[in]protein_identificationsProtein identifications with protein groups
[in]peptide_identificationsPeptide identifications (for peptide-per-protein counts)
Returns
Shared pointer to Arrow Table (empty table if no groups, never nullptr)

◆ exportToParquet() [1/3]

static bool exportToParquet ( const ConsensusMap cmap,
const String filename,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)
static

Export protein group data to Parquet file.

Parameters
[in]cmapThe ConsensusMap with annotated protein group quantification
[in]filenameOutput file path
[in]configParquet writing options
Returns
true on success, false on error

◆ exportToParquet() [2/3]

static bool exportToParquet ( const std::shared_ptr< arrow::Table > &  table,
const String filename,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)
static

Write a pre-built QPX protein group Arrow table to a Parquet file.

The table is expected to follow QPXPgSchema (e.g., from exportToArrow). Attaches QPX file metadata (qpx_version, file_type="pg", UUID, creation_date) before writing. Use this overload when the caller already has the table built (e.g., for merged output) to avoid rebuilding it.

Parameters
[in]tableQPX pg Arrow table (must not be null)
[in]filenameOutput file path
[in]configParquet writing options
Returns
true on success, false on error

◆ exportToParquet() [3/3]

static bool exportToParquet ( const std::vector< ProteinIdentification > &  protein_identifications,
const PeptideIdentificationList peptide_identifications,
const String filename,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)
static

Export protein group data to Parquet file from identification data (no quantification)

Parameters
[in]protein_identificationsProtein identifications with protein groups
[in]peptide_identificationsPeptide identifications (for peptide-per-protein counts)
[in]filenameOutput file path
[in]configParquet writing options
Returns
true on success, false on error