All classes for file IO can be found in the FORMAT folder. For most file formats, you can use the generic interfaces described in the section "File adapter classes" but for some applications (e.g. where CPU or memory usage are of concern), also consult the specific interfaces that OpenMS provides for the mzML data format (see section "File I/O for MzML").
The interface of most file adapter classes is very similar. They implement a load and a store method, that take a file name and the appropriate data structure. Usually these methods expect a complete in-memory representation of an MS run (see below for other ways to access MS data).
The following example (Tutorial_FileIO.cpp) demonstrates the use of OpenMS::MzMLFile and OpenMS::MzXMLFile to convert one format into another using OpenMS::MSExperiment to hold the temporary data:
For MzML, several additional interfaces exist which make some data processing tasks easier. Specifically, it is not always feasible to load the complete data of an LC-MS/MS run into memory and for these cases, special interfaces exist. One such interface is the OpenMS::OnDiscMSExperiment which abstracts an mzML file that contains an index. The spectra and chromatogram can be obtained by calling getSpectrum or getChromatogram .
The following example (Tutorial_FileIO_mzML.cpp) demonstrates the use of OpenMS::IndexedMzMLFileLoader to obtain a representation of a mass spectrometric experiment in the form of an OnDiscMSExperiment:
In addition, the OpenMS::MzMLFile also offers the OpenMS::MzMLFile::transform function which allows to load an MzML file and while loading the spectra and chromatograms are fed into the provided MSDataConsumer (which inherits from the interface IMSDataConsumer). This can allow for memory-efficient implementation of algorithms which only need to access spectra and chromatograms sequentially. Of course several MSDataConsumer implementations can be chained by writing an implementation which hands the results of its computation directly to the next consumer. One common usage scenario uses an OpenMS::PlainMSDataWritingConsumer or an OpenMS::CachedMzMLConsumer as a final consumer which writes the data to disk after usage. In this way, only the memory to keep a single spectrum in memory is used and algorithms that are constant in memory usage with respect to the amount of data to be processed can be implemented.
The following example (Tutorial_FileIO_Consumer.cpp) demonstrates the use of a consumer to obtain all spectra contained in a file sequentially and apply some data processing and then writing them to disk to them without ever loading the whole file into memory:
However, if random access to the data is needed, then the consumer-based approach will not suffice. Here one can either use the OpenMS::OnDiscMSExperiment approach as discussed above or for very fast access to individual spectra (no parsing of base64-encoded needed) one can first cache the data to disk using the OpenMS::CachedmzML class.
In summary, here is a short table when to use which File access mode:
Use Case | Access Type | |
---|---|---|
Random Access in Memory | Read | MSExperiment (through MzMLFile.load) |
Write | MSExperiment (through MzMLFile.load) | |
Random Access on Disc (indexed) | Read | OnDiscMSExperiment (through IndexedMzMLFileLoader.load) |
Write | random write access not possible | |
Sequential Processing | Read | IMSDataConsumer (through MzMLFile.transform) |
Write | MSDataWritingConsumer (through consumeSpectrum or consumeChromatogram) |
In order to have more control over loading data from files, most adapters can be configured using PeakFileOptions. The following options are available:
OpenMS / TOPP release 2.3.0 | Documentation generated on Tue Jan 9 2018 18:22:05 using doxygen 1.8.13 |