Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
File access

All classes for file IO can be found in the FORMAT folder. For most file formats, you can use the generic interfaces described in the section "File adapter classes" but for some applications (e.g. where CPU or memory usage are of concern), also consult the specific interfaces that OpenMS provides for the mzML data format (see section "File I/O for MzML").

File adapter classes

The interface of most file adapter classes is very similar. They implement a load and a store method, that take a file name and the appropriate data structure. Usually these methods expect a complete in-memory representation of an MS run (see below for other ways to access MS data).

The following example (Tutorial_FileIO.cpp) demonstrates the use of OpenMS::MzMLFile and OpenMS::MzXMLFile to convert one format into another using OpenMS::MSExperiment to hold the temporary data:

int main(int argc, const char** argv)
{
if (argc < 2) return 1;
// the path to the data should be given on the command line
String tutorial_data_path(argv[1]);
MzXMLFile mzxml;
MzMLFile mzml;
// temporary data storage
PeakMap map;
// convert MzXML to MzML
mzxml.load(tutorial_data_path + "/data/Tutorial_FileIO.mzXML", map);
mzml.store("Tutorial_FileIO.mzML", map);
return 0;
} //end of main

FileHandler
In order to make the handling of different file types easier, the class FileHandler can be used. It loads a file into the appropriate data structure independently of the file type. The file type is determined from the file extension or the file contents:
MSExperiment in;
FileHandler handler();
handler.loadExperiment("input.mzML",in);

File I/O for MzML

For MzML, several additional interfaces exist which make some data processing tasks easier. Specifically, it is not always feasible to load the complete data of an LC-MS/MS run into memory and for these cases, special interfaces exist. One such interface is the OpenMS::OnDiscMSExperiment which abstracts an mzML file that contains an index. The spectra and chromatogram can be obtained by calling getSpectrum or getChromatogram .

Indexed mzML

The following example (Tutorial_FileIO_mzML.cpp) demonstrates the use of OpenMS::IndexedMzMLFileLoader to obtain a representation of a mass spectrometric experiment in the form of an OnDiscMSExperiment:

int main(int argc, const char** argv)
{
if (argc < 2) return 1;
// the path to the data should be given on the command line
String tutorial_data_path(argv[1]);
IndexedMzMLFileLoader imzml;
// load data from an indexed MzML file
imzml.load(tutorial_data_path + "/data/Tutorial_FileIO_indexed.mzML", map);
// get the first spectrum in memory, do some constant (non-changing) data processing
MSSpectrum s = map.getSpectrum(0);
std::cout << "There are " << map.getNrSpectra() << " spectra in the input file." << std::endl;
std::cout << "The first spectrum has " << s.size() << " peaks." << std::endl;
// store the (unmodified) data in a different file
imzml.store("Tutorial_FileIO_output.mzML", map);
return 0;
} //end of main

Sequential Reading/Writing of mzML

In addition, the OpenMS::MzMLFile also offers the OpenMS::MzMLFile::transform function which allows to load an MzML file and while loading the spectra and chromatograms are fed into the provided MSDataConsumer (which inherits from the interface IMSDataConsumer). This can allow for memory-efficient implementation of algorithms which only need to access spectra and chromatograms sequentially. Of course several MSDataConsumer implementations can be chained by writing an implementation which hands the results of its computation directly to the next consumer. One common usage scenario uses an OpenMS::PlainMSDataWritingConsumer or an OpenMS::CachedMzMLConsumer as a final consumer which writes the data to disk after usage. In this way, only the memory to keep a single spectrum in memory is used and algorithms that are constant in memory usage with respect to the amount of data to be processed can be implemented.

The following example (Tutorial_FileIO_Consumer.cpp) demonstrates the use of a consumer to obtain all spectra contained in a file sequentially and apply some data processing and then writing them to disk to them without ever loading the whole file into memory:

class TICWritingConsumer : public MSDataWritingConsumer
{
// Inheriting from MSDataWritingConsumer allows to change the data before
// they are written to disk (to "filename") using the processSpectrum_ and
// processChromatogram_ functions.
public:
double TIC;
int nr_spectra;
// Create new consumer, set TIC to zero
TICWritingConsumer(String filename) : MSDataWritingConsumer(filename)
{ TIC = 0.0; nr_spectra = 0;}
// Add a data processing step for spectra before they are written to disk
void processSpectrum_(MSDataWritingConsumer::SpectrumType & s)
{
for (Size i = 0; i < s.size(); i++) { TIC += s[i].getIntensity(); }
nr_spectra++;
}
// Empty chromatogram data processing
void processChromatogram_(MSDataWritingConsumer::ChromatogramType& /* c */) {}
};
int main(int argc, const char** argv)
{
if (argc < 2) return 1;
// the path to the data should be given on the command line
String tutorial_data_path(argv[1]);
// Create the consumer, set output file name, transform
TICWritingConsumer * consumer = new TICWritingConsumer("Tutorial_FileIO_output.mzML");
MzMLFile().transform(tutorial_data_path + "/data/Tutorial_FileIO_indexed.mzML", consumer);
std::cout << "There are " << consumer->nr_spectra << " spectra in the input file." << std::endl;
std::cout << "The total ion current is " << consumer->TIC << std::endl;
delete consumer;
return 0;
} //end of main

Summary

However, if random access to the data is needed, then the consumer-based approach will not suffice. Here one can either use the OpenMS::OnDiscMSExperiment approach as discussed above or for very fast access to individual spectra (no parsing of base64-encoded needed) one can first cache the data to disk using the OpenMS::CachedmzML class.

In summary, here is a short table when to use which File access mode:
Use Case Access Type
Random Access in Memory Read MSExperiment (through MzMLFile.load)
Write MSExperiment (through MzMLFile.load)
Random Access on Disc (indexed) Read OnDiscMSExperiment (through IndexedMzMLFileLoader.load)
Write random write access not possible
Sequential Processing Read IMSDataConsumer (through MzMLFile.transform)
Write MSDataWritingConsumer (through consumeSpectrum or consumeChromatogram)

PeakFileOptions

In order to have more control over loading data from files, most adapters can be configured using PeakFileOptions. The following options are available:


OpenMS / TOPP release 2.3.0 Documentation generated on Tue Jan 9 2018 18:22:05 using doxygen 1.8.13