![]() |
OpenMS
|
Configuration for Parquet file writing. More...
#include <OpenMS/FORMAT/MSExperimentArrowExport.h>
Public Types | |
| enum class | Compression { NONE , SNAPPY , GZIP , LZ4 , ZSTD } |
| Compression algorithm. More... | |
Public Attributes | |
| Compression | compression = Compression::ZSTD |
| Compression algorithm (default: ZSTD for best ratio/speed) | |
| int | compression_level = 3 |
| int64_t | row_group_size = 128 * 1024 * 1024 |
| bool | write_statistics = true |
| int64_t | data_page_size = 1024 * 1024 |
Configuration for Parquet file writing.
Controls compression, row group size, and other Parquet-specific settings. These settings affect file size, read performance, and memory usage.
Parquet automatically applies run-length encoding (RLE) and dictionary encoding where beneficial during write. Repetitive values (like ms_level repeated per peak in long format) are compressed efficiently without explicit configuration.
Performance guidelines for MS data:
|
strong |
| Compression compression = Compression::ZSTD |
Compression algorithm (default: ZSTD for best ratio/speed)
| int compression_level = 3 |
Compression level (interpretation depends on algorithm) ZSTD: 1-22 (default 3, higher = better ratio, slower) GZIP: 1-9 (default 6) LZ4/SNAPPY: ignored
| int64_t data_page_size = 1024 * 1024 |
Data page size in bytes (default: 1MB) Affects granularity of reads within a row group
| int64_t row_group_size = 128 * 1024 * 1024 |
Target row group size in bytes (default: 128MB) Smaller = more parallelism for readers, larger = better compression For MS data with millions of peaks, 128MB typically gives 1-2M rows per group
| bool write_statistics = true |
Write column statistics (min/max/null_count) for each row group Enables predicate pushdown for efficient m/z and RT range queries Small overhead (~1% file size increase)