![]() |
OpenMS
|
Public helpers for writing and concatenating Arrow tables to Parquet files. More...
Functions | |
| std::string | generateUuidV4 () |
| Generate a lowercase hyphenated RFC 4122 version-4 UUID string. | |
| bool | writeTableToParquet (const std::shared_ptr< arrow::Table > &table, const std::string &filename, const ParquetWriteConfig &config=ParquetWriteConfig{}) |
| Write an Arrow table to a Parquet file. | |
| bool | concatenateAndWriteToParquet (const std::vector< std::shared_ptr< arrow::Table > > &tables, const std::string &filename, const ParquetWriteConfig &config=ParquetWriteConfig{}) |
| Concatenate a vector of Arrow tables and write the result to a Parquet file. | |
| std::shared_ptr< arrow::Array > | getColumn (const std::shared_ptr< arrow::Table > &table, const std::string &name, bool required=true) |
| Fetch a named column from a table, combining chunks if needed. | |
| std::string | getStringValue (const std::shared_ptr< arrow::Array > &array, int64_t row) |
Read a string at row, or "" if null/out-of-bounds. | |
| double | getDoubleValue (const std::shared_ptr< arrow::Array > &array, int64_t row, double default_val=0.0) |
Read a double at row, or default_val if null. | |
| float | getFloatValue (const std::shared_ptr< arrow::Array > &array, int64_t row, float default_val=0.0f) |
Read a float at row, or default_val if null. | |
| int32_t | getInt32Value (const std::shared_ptr< arrow::Array > &array, int64_t row, int32_t default_val=0) |
Read an int32 at row, or default_val if null. | |
| int64_t | getInt64Value (const std::shared_ptr< arrow::Array > &array, int64_t row, int64_t default_val=0) |
Read an int64 at row, or default_val if null. | |
| bool | getBoolValue (const std::shared_ptr< arrow::Array > &array, int64_t row, bool default_val=false) |
Read a bool at row, or default_val if null. | |
| bool | isNull (const std::shared_ptr< arrow::Array > &array, int64_t row) |
Whether array is null at row (or unset) | |
| void | readMetaValues (const std::shared_ptr< arrow::Array > &array, int64_t row, MetaInfoInterface &target, const std::unordered_set< std::string > &excluded_keys={}) |
| Read metavalues from a list<struct{name,value,value_type}> column. | |
Public helpers for writing and concatenating Arrow tables to Parquet files.
TOPP tools link against libOpenMS (which exports these helpers) but not directly against Arrow/Parquet. These wrappers keep all Arrow/Parquet API calls inside libOpenMS so downstream binaries don't need to import Arrow symbols.
| bool concatenateAndWriteToParquet | ( | const std::vector< std::shared_ptr< arrow::Table > > & | tables, |
| const std::string & | filename, | ||
| const ParquetWriteConfig & | config = ParquetWriteConfig{} |
||
| ) |
Concatenate a vector of Arrow tables and write the result to a Parquet file.
All tables must share the same schema. An empty input vector is a no-op (returns true without writing).
| [in] | tables | Vector of Arrow tables to concatenate (must share schema) |
| [in] | filename | Output file path |
| [in] | config | Parquet writer configuration |
tables is empty), false on error | std::string generateUuidV4 | ( | ) |
Generate a lowercase hyphenated RFC 4122 version-4 UUID string.
Used by QPX Parquet exporters when attaching file metadata.
| bool getBoolValue | ( | const std::shared_ptr< arrow::Array > & | array, |
| int64_t | row, | ||
| bool | default_val = false |
||
| ) |
Read a bool at row, or default_val if null.
| std::shared_ptr< arrow::Array > getColumn | ( | const std::shared_ptr< arrow::Table > & | table, |
| const std::string & | name, | ||
| bool | required = true |
||
| ) |
Fetch a named column from a table, combining chunks if needed.
Returns nullptr if the column is missing or contains no chunks. When required is true, missing columns are logged as errors.
| double getDoubleValue | ( | const std::shared_ptr< arrow::Array > & | array, |
| int64_t | row, | ||
| double | default_val = 0.0 |
||
| ) |
Read a double at row, or default_val if null.
| float getFloatValue | ( | const std::shared_ptr< arrow::Array > & | array, |
| int64_t | row, | ||
| float | default_val = 0.0f |
||
| ) |
Read a float at row, or default_val if null.
| int32_t getInt32Value | ( | const std::shared_ptr< arrow::Array > & | array, |
| int64_t | row, | ||
| int32_t | default_val = 0 |
||
| ) |
Read an int32 at row, or default_val if null.
| int64_t getInt64Value | ( | const std::shared_ptr< arrow::Array > & | array, |
| int64_t | row, | ||
| int64_t | default_val = 0 |
||
| ) |
Read an int64 at row, or default_val if null.
| std::string getStringValue | ( | const std::shared_ptr< arrow::Array > & | array, |
| int64_t | row | ||
| ) |
Read a string at row, or "" if null/out-of-bounds.
| bool isNull | ( | const std::shared_ptr< arrow::Array > & | array, |
| int64_t | row | ||
| ) |
Whether array is null at row (or unset)
| void readMetaValues | ( | const std::shared_ptr< arrow::Array > & | array, |
| int64_t | row, | ||
| MetaInfoInterface & | target, | ||
| const std::unordered_set< std::string > & | excluded_keys = {} |
||
| ) |
Read metavalues from a list<struct{name,value,value_type}> column.
Decodes typed entries (int, double/float, *_list, string) and assigns them to target. Keys in excluded_keys are skipped.
| bool writeTableToParquet | ( | const std::shared_ptr< arrow::Table > & | table, |
| const std::string & | filename, | ||
| const ParquetWriteConfig & | config = ParquetWriteConfig{} |
||
| ) |
Write an Arrow table to a Parquet file.
| [in] | table | The Arrow table to write (must not be null) |
| [in] | filename | Output file path |
| [in] | config | Parquet writer configuration (compression, row group size, ...) |