OpenMS
Loading...
Searching...
No Matches
OpenMS::ArrowIOHelpers Namespace Reference

Public helpers for writing and concatenating Arrow tables to Parquet files. More...

Functions

std::string generateUuidV4 ()
 Generate a lowercase hyphenated RFC 4122 version-4 UUID string.
 
bool writeTableToParquet (const std::shared_ptr< arrow::Table > &table, const std::string &filename, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Write an Arrow table to a Parquet file.
 
bool concatenateAndWriteToParquet (const std::vector< std::shared_ptr< arrow::Table > > &tables, const std::string &filename, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Concatenate a vector of Arrow tables and write the result to a Parquet file.
 
std::shared_ptr< arrow::Array > getColumn (const std::shared_ptr< arrow::Table > &table, const std::string &name, bool required=true)
 Fetch a named column from a table, combining chunks if needed.
 
std::string getStringValue (const std::shared_ptr< arrow::Array > &array, int64_t row)
 Read a string at row, or "" if null/out-of-bounds.
 
double getDoubleValue (const std::shared_ptr< arrow::Array > &array, int64_t row, double default_val=0.0)
 Read a double at row, or default_val if null.
 
float getFloatValue (const std::shared_ptr< arrow::Array > &array, int64_t row, float default_val=0.0f)
 Read a float at row, or default_val if null.
 
int32_t getInt32Value (const std::shared_ptr< arrow::Array > &array, int64_t row, int32_t default_val=0)
 Read an int32 at row, or default_val if null.
 
int64_t getInt64Value (const std::shared_ptr< arrow::Array > &array, int64_t row, int64_t default_val=0)
 Read an int64 at row, or default_val if null.
 
bool getBoolValue (const std::shared_ptr< arrow::Array > &array, int64_t row, bool default_val=false)
 Read a bool at row, or default_val if null.
 
bool isNull (const std::shared_ptr< arrow::Array > &array, int64_t row)
 Whether array is null at row (or unset)
 
void readMetaValues (const std::shared_ptr< arrow::Array > &array, int64_t row, MetaInfoInterface &target, const std::unordered_set< std::string > &excluded_keys={})
 Read metavalues from a list<struct{name,value,value_type}> column.
 

Detailed Description

Public helpers for writing and concatenating Arrow tables to Parquet files.

TOPP tools link against libOpenMS (which exports these helpers) but not directly against Arrow/Parquet. These wrappers keep all Arrow/Parquet API calls inside libOpenMS so downstream binaries don't need to import Arrow symbols.

Function Documentation

◆ concatenateAndWriteToParquet()

bool concatenateAndWriteToParquet ( const std::vector< std::shared_ptr< arrow::Table > > &  tables,
const std::string &  filename,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)

Concatenate a vector of Arrow tables and write the result to a Parquet file.

All tables must share the same schema. An empty input vector is a no-op (returns true without writing).

Parameters
[in]tablesVector of Arrow tables to concatenate (must share schema)
[in]filenameOutput file path
[in]configParquet writer configuration
Returns
true on success (or if tables is empty), false on error

◆ generateUuidV4()

std::string generateUuidV4 ( )

Generate a lowercase hyphenated RFC 4122 version-4 UUID string.

Used by QPX Parquet exporters when attaching file metadata.

Returns
UUID string, e.g. "550e8400-e29b-41d4-a716-446655440000"

◆ getBoolValue()

bool getBoolValue ( const std::shared_ptr< arrow::Array > &  array,
int64_t  row,
bool  default_val = false 
)

Read a bool at row, or default_val if null.

◆ getColumn()

std::shared_ptr< arrow::Array > getColumn ( const std::shared_ptr< arrow::Table > &  table,
const std::string &  name,
bool  required = true 
)

Fetch a named column from a table, combining chunks if needed.

Returns nullptr if the column is missing or contains no chunks. When required is true, missing columns are logged as errors.

◆ getDoubleValue()

double getDoubleValue ( const std::shared_ptr< arrow::Array > &  array,
int64_t  row,
double  default_val = 0.0 
)

Read a double at row, or default_val if null.

◆ getFloatValue()

float getFloatValue ( const std::shared_ptr< arrow::Array > &  array,
int64_t  row,
float  default_val = 0.0f 
)

Read a float at row, or default_val if null.

◆ getInt32Value()

int32_t getInt32Value ( const std::shared_ptr< arrow::Array > &  array,
int64_t  row,
int32_t  default_val = 0 
)

Read an int32 at row, or default_val if null.

◆ getInt64Value()

int64_t getInt64Value ( const std::shared_ptr< arrow::Array > &  array,
int64_t  row,
int64_t  default_val = 0 
)

Read an int64 at row, or default_val if null.

◆ getStringValue()

std::string getStringValue ( const std::shared_ptr< arrow::Array > &  array,
int64_t  row 
)

Read a string at row, or "" if null/out-of-bounds.

◆ isNull()

bool isNull ( const std::shared_ptr< arrow::Array > &  array,
int64_t  row 
)

Whether array is null at row (or unset)

◆ readMetaValues()

void readMetaValues ( const std::shared_ptr< arrow::Array > &  array,
int64_t  row,
MetaInfoInterface target,
const std::unordered_set< std::string > &  excluded_keys = {} 
)

Read metavalues from a list<struct{name,value,value_type}> column.

Decodes typed entries (int, double/float, *_list, string) and assigns them to target. Keys in excluded_keys are skipped.

◆ writeTableToParquet()

bool writeTableToParquet ( const std::shared_ptr< arrow::Table > &  table,
const std::string &  filename,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)

Write an Arrow table to a Parquet file.

Parameters
[in]tableThe Arrow table to write (must not be null)
[in]filenameOutput file path
[in]configParquet writer configuration (compression, row group size, ...)
Returns
true on success, false on error (errors are logged)