OpenMS
IndexedMzMLDecoder Class Reference

A class to analyze indexedmzML files and extract the offsets of individual tags. More...

#include <OpenMS/FORMAT/HANDLERS/IndexedMzMLDecoder.h>

Public Types

typedef std::vector< std::pair< std::string, std::streampos > > OffsetVector
 The vector containing binary offsets. More...
 

Public Member Functions

int parseOffsets (const String &filename, std::streampos indexoffset, OffsetVector &spectra_offsets, OffsetVector &chromatograms_offsets)
 Tries to extract the offsets of all spectra and chromatograms from an indexedmzML. More...
 
std::streampos findIndexListOffset (const String &filename, int buffersize=1023)
 Tries to extract the indexList offset from an indexedmzML. More...
 

Protected Member Functions

int domParseIndexedEnd_ (const std::string &in, OffsetVector &spectra_offsets, OffsetVector &chromatograms_offsets)
 Extract data from a string containing an <indexList> tag. More...
 

Detailed Description

A class to analyze indexedmzML files and extract the offsets of individual tags.

Specifically, this class allows one to extract the offsets of the <indexList> tag and of all <spectrum> and <chromatogram> tag using the indices found at the end of the indexedmzML XML structure.

While findIndexListOffset tries extracts the offset of the indexList tag from the last 1024 bytes of the file, this offset allows the function parseOffsets to extract all elements contained in the <indexList> tag and thus get access to all spectra and chromatogram offsets.

Member Typedef Documentation

◆ OffsetVector

typedef std::vector< std::pair<std::string, std::streampos> > OffsetVector

The vector containing binary offsets.

Member Function Documentation

◆ domParseIndexedEnd_()

int domParseIndexedEnd_ ( const std::string &  in,
OffsetVector spectra_offsets,
OffsetVector chromatograms_offsets 
)
protected

Extract data from a string containing an <indexList> tag.

This function parses the contained <offset> tags inside the indexList tag and stores the contents in the spectra and chromatogram offset vectors.

This function expects an input string that contains a root XML tag and as one of its child an <indexList> tag as defined by the mzML 1.1.0 index wrapper schema. Usually the root would be an indexedmzML tag and _must_ contain an indexList tag, while the dx:mzML, indexListOffset and fileChecksum are optional(their presence is not checked).

Still this means, don't stick non-valid XML in here (e.g. non matching open/close tags). Usually this means that you will at least have to add an opening </indexedmzML>. Valid input for this function would for example be:

<indexedmzML>
<indexList count="1">
<index name="chromatogram">
<offset idRef="1">9752</offset>
</index>
</indexList>
<indexListOffset>26795</indexListOffset>
<fileChecksum>0</fileChecksum>
</indexedmzML>
Parameters
inString containing the XML with a indexedmzML parent and an indexList child tag
spectra_offsetsOutput vector containing the positions of all spectra in the file
chromatograms_offsetsOutput vector containing the positions of all chromatograms in the file

◆ findIndexListOffset()

std::streampos findIndexListOffset ( const String filename,
int  buffersize = 1023 
)

Tries to extract the indexList offset from an indexedmzML.

This function reads by default the last few (1024) bytes of the given input file and tries to read the content of the <indexListOffset> tag. The idea is that somewhere in the last parts of the file specified by the input string, the string <indexListOffset>xxx</indexListOffset> occurs. This function returns the xxx part converted to an integer.

Note
Since this function cannot determine where it will start reading the XML, no regular XML parser can be used for this. Therefore it uses regex to do its job. It matches the <indexListOffset> part and any numerical characters that follow.
Parameters
inFilename of the input indexedmzML file
buffersizeHow many bytes of the input file should be searched for the tag
Returns
A positive integer containing the content of the indexListOffset tag, returns -1 in case of failure no tag was found (you can re-try with a larger buffersize but most likely its not an indexed mzML). Using -1 is what the reference docu recommends: http://en.cppreference.com/w/cpp/io/streamoff
Exceptions
FileNotFoundis thrown if file cannot be found
ParseErrorif offset cannot be parsed

◆ parseOffsets()

int parseOffsets ( const String filename,
std::streampos  indexoffset,
OffsetVector spectra_offsets,
OffsetVector chromatograms_offsets 
)

Tries to extract the offsets of all spectra and chromatograms from an indexedmzML.

Given the start of the <indexList> element, this function tries to read this tag from the given the indexedmzML file. It stores the result in the spectra and chromatogram offset vectors.

Parameters
inFilename of the input indexedmzML file
indexoffsetOffset at which position in the file the XML tag "<indexList" is expected to occur
spectra_offsetsOutput vector containing the positions of all spectra in the file
chromatograms_offsetsOutput vector containing the positions of all chromatograms in the file
Returns
0 in case of success and -1 otherwise (failure, no offset was found)