Export protein group data to Apache Arrow format following QPX pg schema.
More...
#include <OpenMS/FORMAT/ProteinGroupArrowExport.h>
Export protein group data to Apache Arrow format following QPX pg schema.
This class provides static methods to export protein group quantification data from a ConsensusMap to Apache Arrow Tables and Parquet files. The schema follows the QPX (Quantitative Proteomics Exchange) protein group format.
Protein groups must have quantification annotated via PeptideAndProteinQuant::annotateQuantificationsToProteins() before export.
- Experimental classes:
- This API is experimental and may change in future versions.
◆ exportToArrow() [1/2]
| static std::shared_ptr< arrow::Table > exportToArrow |
( |
const ConsensusMap & |
cmap | ) |
|
|
static |
Export protein group data to Apache Arrow Table.
Exports indistinguishable protein groups following the QPX pg schema. One row is emitted per protein group per run file.
- Parameters
-
| [in] | cmap | The ConsensusMap with annotated protein group quantification |
- Returns
- Shared pointer to Arrow Table, or nullptr on error
◆ exportToArrow() [2/2]
Export protein group data to Arrow table from identification data (no quantification)
For search-engine output where no ConsensusMap is available. Populates required QPX pg fields (pg_accessions, anchor_protein, run_file_name, is_decoy, peptides) and sets quantification columns (intensities, additional_intensities) to null.
- Parameters
-
| [in] | protein_identifications | Protein identifications with protein groups |
| [in] | peptide_identifications | Peptide identifications (for peptide-per-protein counts) |
- Returns
- Shared pointer to Arrow Table (empty table if no groups, never nullptr)
◆ exportToParquet() [1/3]
Export protein group data to Parquet file.
- Parameters
-
| [in] | cmap | The ConsensusMap with annotated protein group quantification |
| [in] | filename | Output file path |
| [in] | config | Parquet writing options |
- Returns
- true on success, false on error
◆ exportToParquet() [2/3]
Write a pre-built QPX protein group Arrow table to a Parquet file.
The table is expected to follow QPXPgSchema (e.g., from exportToArrow). Attaches QPX file metadata (qpx_version, file_type="pg", UUID, creation_date) before writing. Use this overload when the caller already has the table built (e.g., for merged output) to avoid rebuilding it.
- Parameters
-
| [in] | table | QPX pg Arrow table (must not be null) |
| [in] | filename | Output file path |
| [in] | config | Parquet writing options |
- Returns
- true on success, false on error
◆ exportToParquet() [3/3]
Export protein group data to Parquet file from identification data (no quantification)
- Parameters
-
| [in] | protein_identifications | Protein identifications with protein groups |
| [in] | peptide_identifications | Peptide identifications (for peptide-per-protein counts) |
| [in] | filename | Output file path |
| [in] | config | Parquet writing options |
- Returns
- true on success, false on error