OpenMS
Loading...
Searching...
No Matches
QPXFile Class Reference

Export PSM (Peptide Spectrum Match) data to Apache Arrow format following QPX PSM schema. More...

#include <OpenMS/FORMAT/QPXFile.h>

Static Public Member Functions

static std::shared_ptr< arrow::Table > exportToArrow (const std::vector< ProteinIdentification > &protein_identifications, const PeptideIdentificationList &peptide_identifications, bool export_all_psms=false)
 Export PSMs to Arrow table using PSMSchema for lossless round-trips.
 
static std::shared_ptr< arrow::Table > exportPSMsToQPXArrow (const std::vector< ProteinIdentification > &protein_identifications, const PeptideIdentificationList &peptide_identifications, bool export_all_psms=false)
 Export PSMs to QPX Parquet eXchange format Arrow table (QPXPSMSchema).
 
static bool exportToParquet (const std::vector< ProteinIdentification > &protein_identifications, const PeptideIdentificationList &peptide_identifications, const String &filename, bool export_all_psms=false, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Export PSM data to Parquet file.
 
static bool exportToParquet (const std::shared_ptr< arrow::Table > &table, const String &filename, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Write a pre-built QPX PSM Arrow table to a Parquet file.
 

Detailed Description

Export PSM (Peptide Spectrum Match) data to Apache Arrow format following QPX PSM schema.

This class provides static methods to export PeptideIdentification/ProteinIdentification data to Apache Arrow Tables and Parquet files. The schema follows the QPX (Quantitative Proteomics Exchange) PSM format.

Experimental classes:
This API is experimental and may change in future versions.

Member Function Documentation

◆ exportPSMsToQPXArrow()

static std::shared_ptr< arrow::Table > exportPSMsToQPXArrow ( const std::vector< ProteinIdentification > &  protein_identifications,
const PeptideIdentificationList peptide_identifications,
bool  export_all_psms = false 
)
static

Export PSMs to QPX Parquet eXchange format Arrow table (QPXPSMSchema).

Unlike exportToArrow() which produces a PSMSchema table for lossless round-trips, this method produces a QPXPSMSchema table optimized for cross-tool exchange (quantms format).

Parameters
protein_identificationsProtein identifications (for file name lookup)
peptide_identificationsPeptide identifications to export
export_all_psmsIf true, export all PSM hits; if false, only best hit per spectrum
Returns
Arrow table with QPXPSMSchema columns, or nullptr on failure

◆ exportToArrow()

static std::shared_ptr< arrow::Table > exportToArrow ( const std::vector< ProteinIdentification > &  protein_identifications,
const PeptideIdentificationList peptide_identifications,
bool  export_all_psms = false 
)
static

Export PSMs to Arrow table using PSMSchema for lossless round-trips.

Produces a table with PSMSchema columns (score, score_type, rank, etc.) suitable for FeatureMapArrowIO and ConsensusMapArrowIO round-trips. For QPX exchange format output, use exportPSMsToQPXArrow() instead.

◆ exportToParquet() [1/2]

static bool exportToParquet ( const std::shared_ptr< arrow::Table > &  table,
const String filename,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)
static

Write a pre-built QPX PSM Arrow table to a Parquet file.

The table is expected to follow QPXPSMSchema (e.g., from exportPSMsToQPXArrow). Attaches QPX file metadata (qpx_version, file_type="psm", UUID, creation_date) before writing. Use this overload when the caller already has the table built (e.g., for merged output) to avoid rebuilding it.

Parameters
[in]tableQPX PSM Arrow table (must not be null)
[in]filenameOutput file path
[in]configParquet writing options
Returns
true on success, false on error

◆ exportToParquet() [2/2]

static bool exportToParquet ( const std::vector< ProteinIdentification > &  protein_identifications,
const PeptideIdentificationList peptide_identifications,
const String filename,
bool  export_all_psms = false,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)
static

Export PSM data to Parquet file.

Parameters
[in]protein_identificationsVector of protein identifications
[in]peptide_identificationsList of peptide identifications
[in]filenameOutput file path
[in]export_all_psmsIf true, export all hits per spectrum (default: false, only best hit)
[in]configParquet writing options
Returns
true on success, false on error