OpenMS  2.5.0
Classes | Public Member Functions | Protected Member Functions | Private Member Functions | Private Attributes | Static Private Attributes | List of all members
PepXMLFile Class Reference

Used to load and store PepXML files. More...

#include <OpenMS/FORMAT/PepXMLFile.h>

Inheritance diagram for PepXMLFile:
XMLHandler XMLFile

Classes

struct  AminoAcidModification
 

Public Member Functions

 PepXMLFile ()
 Constructor. More...
 
 ~PepXMLFile () override
 Destructor. More...
 
void load (const String &filename, std::vector< ProteinIdentification > &proteins, std::vector< PeptideIdentification > &peptides, const String &experiment_name, const SpectrumMetaDataLookup &lookup)
 Loads peptide sequences with modifications out of a PepXML file. More...
 
void load (const String &filename, std::vector< ProteinIdentification > &proteins, std::vector< PeptideIdentification > &peptides, const String &experiment_name="")
 load function with empty defaults for some parameters (see above) More...
 
void store (const String &filename, std::vector< ProteinIdentification > &protein_ids, std::vector< PeptideIdentification > &peptide_ids, const String &mz_file="", const String &mz_name="", bool peptideprophet_analyzed=false, double rt_tolerance=0.01)
 Stores idXML as PepXML file. More...
 
void keepNativeSpectrumName (bool keep)
 Whether we should keep the native spectrum name of the pepXML. More...
 
- Public Member Functions inherited from XMLFile
 XMLFile ()
 Default constructor. More...
 
 XMLFile (const String &schema_location, const String &version)
 Constructor that sets the schema location. More...
 
virtual ~XMLFile ()
 Destructor. More...
 
bool isValid (const String &filename, std::ostream &os)
 Checks if a file validates against the XML schema. More...
 
const StringgetVersion () const
 return the version of the schema More...
 

Protected Member Functions

void endElement (const XMLCh *const, const XMLCh *const, const XMLCh *const qname) override
 Docu in base class. More...
 
void startElement (const XMLCh *const, const XMLCh *const, const XMLCh *const qname, const xercesc::Attributes &attributes) override
 Docu in base class. More...
 
- Protected Member Functions inherited from XMLHandler
void writeUserParam_ (const String &tag_name, std::ostream &os, const MetaInfoInterface &meta, UInt indent) const
 Writes the content of MetaInfoInterface to the file. More...
 
Int asInt_ (const String &in)
 Conversion of a String to an integer value. More...
 
Int asInt_ (const XMLCh *in)
 Conversion of a Xerces string to an integer value. More...
 
UInt asUInt_ (const String &in)
 Conversion of a String to an unsigned integer value. More...
 
double asDouble_ (const String &in)
 Conversion of a String to a double value. More...
 
float asFloat_ (const String &in)
 Conversion of a String to a float value. More...
 
bool asBool_ (const String &in)
 Conversion of a string to a boolean value. More...
 
DateTime asDateTime_ (String date_string)
 Conversion of a xs:datetime string to a DateTime value. More...
 
bool equal_ (const XMLCh *a, const XMLCh *b) const
 Returns if two Xerces strings are equal. More...
 
SignedSize cvStringToEnum_ (const Size section, const String &term, const char *message, const SignedSize result_on_error=0)
 
String attributeAsString_ (const xercesc::Attributes &a, const char *name) const
 Converts an attribute to a String. More...
 
Int attributeAsInt_ (const xercesc::Attributes &a, const char *name) const
 Converts an attribute to a Int. More...
 
double attributeAsDouble_ (const xercesc::Attributes &a, const char *name) const
 Converts an attribute to a double. More...
 
DoubleList attributeAsDoubleList_ (const xercesc::Attributes &a, const char *name) const
 Converts an attribute to a DoubleList. More...
 
IntList attributeAsIntList_ (const xercesc::Attributes &a, const char *name) const
 Converts an attribute to an IntList. More...
 
StringList attributeAsStringList_ (const xercesc::Attributes &a, const char *name) const
 Converts an attribute to an StringList. More...
 
bool optionalAttributeAsString_ (String &value, const xercesc::Attributes &a, const char *name) const
 Assigns the attribute content to the String value if the attribute is present. More...
 
bool optionalAttributeAsInt_ (Int &value, const xercesc::Attributes &a, const char *name) const
 Assigns the attribute content to the Int value if the attribute is present. More...
 
bool optionalAttributeAsUInt_ (UInt &value, const xercesc::Attributes &a, const char *name) const
 Assigns the attribute content to the UInt value if the attribute is present. More...
 
bool optionalAttributeAsDouble_ (double &value, const xercesc::Attributes &a, const char *name) const
 Assigns the attribute content to the double value if the attribute is present. More...
 
bool optionalAttributeAsDoubleList_ (DoubleList &value, const xercesc::Attributes &a, const char *name) const
 Assigns the attribute content to the DoubleList value if the attribute is present. More...
 
bool optionalAttributeAsStringList_ (StringList &value, const xercesc::Attributes &a, const char *name) const
 Assigns the attribute content to the StringList value if the attribute is present. More...
 
bool optionalAttributeAsIntList_ (IntList &value, const xercesc::Attributes &a, const char *name) const
 Assigns the attribute content to the IntList value if the attribute is present. More...
 
String attributeAsString_ (const xercesc::Attributes &a, const XMLCh *name) const
 Converts an attribute to a String. More...
 
Int attributeAsInt_ (const xercesc::Attributes &a, const XMLCh *name) const
 Converts an attribute to a Int. More...
 
double attributeAsDouble_ (const xercesc::Attributes &a, const XMLCh *name) const
 Converts an attribute to a double. More...
 
DoubleList attributeAsDoubleList_ (const xercesc::Attributes &a, const XMLCh *name) const
 Converts an attribute to a DoubleList. More...
 
IntList attributeAsIntList_ (const xercesc::Attributes &a, const XMLCh *name) const
 Converts an attribute to a IntList. More...
 
StringList attributeAsStringList_ (const xercesc::Attributes &a, const XMLCh *name) const
 Converts an attribute to a StringList. More...
 
bool optionalAttributeAsString_ (String &value, const xercesc::Attributes &a, const XMLCh *name) const
 Assigns the attribute content to the String value if the attribute is present. More...
 
bool optionalAttributeAsInt_ (Int &value, const xercesc::Attributes &a, const XMLCh *name) const
 Assigns the attribute content to the Int value if the attribute is present. More...
 
bool optionalAttributeAsUInt_ (UInt &value, const xercesc::Attributes &a, const XMLCh *name) const
 Assigns the attribute content to the UInt value if the attribute is present. More...
 
bool optionalAttributeAsDouble_ (double &value, const xercesc::Attributes &a, const XMLCh *name) const
 Assigns the attribute content to the double value if the attribute is present. More...
 
bool optionalAttributeAsDoubleList_ (DoubleList &value, const xercesc::Attributes &a, const XMLCh *name) const
 Assigns the attribute content to the DoubleList value if the attribute is present. More...
 
bool optionalAttributeAsIntList_ (IntList &value, const xercesc::Attributes &a, const XMLCh *name) const
 Assigns the attribute content to the IntList value if the attribute is present. More...
 
bool optionalAttributeAsStringList_ (StringList &value, const xercesc::Attributes &a, const XMLCh *name) const
 Assigns the attribute content to the StringList value if the attribute is present. More...
 
 XMLHandler (const String &filename, const String &version)
 Default constructor. More...
 
 ~XMLHandler () override
 Destructor. More...
 
void reset ()
 Release internal memory used for parsing (call. More...
 
void fatalError (const xercesc::SAXParseException &exception) override
 
void error (const xercesc::SAXParseException &exception) override
 
void warning (const xercesc::SAXParseException &exception) override
 
void fatalError (ActionMode mode, const String &msg, UInt line=0, UInt column=0) const
 Fatal error handler. Throws a ParseError exception. More...
 
void error (ActionMode mode, const String &msg, UInt line=0, UInt column=0) const
 Error handler for recoverable errors. More...
 
void warning (ActionMode mode, const String &msg, UInt line=0, UInt column=0) const
 Warning handler. More...
 
void characters (const XMLCh *const chars, const XMLSize_t length) override
 Parsing method for character data. More...
 
void startElement (const XMLCh *const uri, const XMLCh *const localname, const XMLCh *const qname, const xercesc::Attributes &attrs) override
 Parsing method for opening tags. More...
 
void endElement (const XMLCh *const uri, const XMLCh *const localname, const XMLCh *const qname) override
 Parsing method for closing tags. More...
 
virtual void writeTo (std::ostream &)
 Writes the contents to a stream. More...
 
String errorString ()
 Returns the last error description. More...
 
virtual LOADDETAIL getLoadDetail () const
 handler which support partial loading, implement this method More...
 
virtual void setLoadDetail (const LOADDETAIL d)
 handler which support partial loading, implement this method More...
 
void checkUniqueIdentifiers_ (const std::vector< ProteinIdentification > &prot_ids)
 
- Protected Member Functions inherited from XMLFile
void parse_ (const String &filename, XMLHandler *handler)
 Parses the XML file given by filename using the handler given by handler. More...
 
void parseBuffer_ (const std::string &buffer, XMLHandler *handler)
 Parses the in-memory buffer given by buffer using the handler given by handler. More...
 
void save_ (const String &filename, XMLHandler *handler) const
 Stores the contents of the XML handler given by handler in the file given by filename. More...
 
void enforceEncoding_ (const String &encoding)
 
 XMLFile ()
 Default constructor. More...
 
 XMLFile (const String &schema_location, const String &version)
 Constructor that sets the schema location. More...
 
virtual ~XMLFile ()
 Destructor. More...
 
bool isValid (const String &filename, std::ostream &os)
 Checks if a file validates against the XML schema. More...
 
const StringgetVersion () const
 return the version of the schema More...
 

Private Member Functions

void makeScanMap_ ()
 Fill scan_map_. More...
 
void readRTMZCharge_ (const xercesc::Attributes &attributes)
 Read RT, m/z, charge information from attributes of "spectrum_query". More...
 
void matchModification_ (const double mass, const String &origin, String &modification_description)
 find modification name given a modified AA mass More...
 

Private Attributes

std::vector< ProteinIdentification > * proteins_
 Pointer to the list of identified proteins. More...
 
std::vector< PeptideIdentification > * peptides_
 Pointer to the list of identified peptides. More...
 
const SpectrumMetaDataLookuplookup_
 Pointer to wrapper for looking up spectrum meta data. More...
 
String exp_name_
 Name of the associated experiment (filename of the data file, extension will be removed) More...
 
String search_engine_
 Set name of search engine. More...
 
String native_spectrum_name_
 Several optional attributes of spectrum_query. More...
 
String experiment_label_
 
String swath_assay_
 
String status_
 
bool use_precursor_data_
 Get RT and m/z for peptide ID from precursor scan (should only matter for RT)? More...
 
std::map< Size, Sizescan_map_
 Mapping between scan number in the pepXML file and index in the corresponding MSExperiment. More...
 
Element hydrogen_
 Hydrogen data (for mass types) More...
 
bool analysis_summary_
 Are we currently in an "analysis_summary" element (should be skipped)? More...
 
bool keep_native_name_
 Whether we should keep the native spectrum name of the pepXML. More...
 
bool search_score_summary_
 Are we currently in an "search_score_summary" element (should be skipped)? More...
 
bool search_summary_
 Are we currently in an "search_summary" element (should be skipped)? More...
 
bool wrong_experiment_
 Do current entries belong to the experiment of interest (for pepXML files that bundle results from different experiments)? More...
 
bool seen_experiment_
 Have we seen the experiment of interest at all? More...
 
bool checked_base_name_
 Have we checked the "base_name" attribute in the "msms_run_summary" element? More...
 
String current_base_name_
 current base name More...
 
std::vector< std::vector< ProteinIdentification >::iterator > current_proteins_
 References to currently active ProteinIdentifications. More...
 
ProteinIdentification::SearchParameters params_
 Search parameters of the current identification run. More...
 
String enzyme_
 Enzyme name associated with the current identification run. More...
 
PeptideIdentification current_peptide_
 PeptideIdentification instance currently being processed. More...
 
PeptideHit::PepXMLAnalysisResult current_analysis_result_
 Analysis result instance currently being processed. More...
 
PeptideHit peptide_hit_
 PeptideHit instance currently being processed. More...
 
String current_sequence_
 Sequence of the current peptide hit. More...
 
double rt_
 RT and m/z of current PeptideIdentification. More...
 
double mz_
 
Int charge_
 Precursor ion charge. More...
 
UInt search_id_
 ID of current search result. More...
 
String prot_id_
 Identifier linking PeptideIdentifications and ProteinIdentifications. More...
 
DateTime date_
 Date the pepXML file was generated. More...
 
double hydrogen_mass_
 Mass of a hydrogen atom (monoisotopic/average depending on case) More...
 
std::vector< std::pair< String, Size > > current_modifications_
 The modifications of the current peptide hit (position is 1-based) More...
 
std::vector< AminoAcidModificationfixed_modifications_
 Fixed aminoacid modifications. More...
 
std::vector< AminoAcidModificationvariable_modifications_
 Variable aminoacid modifications. More...
 

Static Private Attributes

static const double mod_tol_
 
static const double xtandem_artificial_mod_tol_
 

Additional Inherited Members

- Protected Types inherited from XMLHandler
enum  ActionMode { LOAD, STORE }
 Action to set the current mode (for error messages) More...
 
enum  LOADDETAIL { LD_ALLDATA, LD_RAWCOUNTS, LD_COUNTS_WITHOPTIONS }
 
- Static Protected Member Functions inherited from XMLHandler
static String writeXMLEscape (const String &to_escape)
 Escapes a string and returns the escaped string. More...
 
- Protected Attributes inherited from XMLHandler
String error_message_
 Error message of the last error. More...
 
String file_
 File name. More...
 
String version_
 Schema version. More...
 
StringManager sm_
 Helper class for string conversion. More...
 
std::vector< Stringopen_tags_
 Stack of open XML tags. More...
 
LOADDETAIL load_detail_
 parse only until total number of scans and chroms have been determined from attributes More...
 
std::vector< std::vector< String > > cv_terms_
 Array of CV term lists (one sublist denotes one term and it's children) More...
 
- Protected Attributes inherited from XMLFile
String schema_location_
 XML schema file location. More...
 
String schema_version_
 Version string. More...
 
String enforced_encoding_
 Encoding string that replaces the encoding (system dependent or specified in the XML). Disabled if empty. Used as a workaround for XTandem output xml. More...
 

Detailed Description

Used to load and store PepXML files.

This class is used to load and store documents that implement the schema of PepXML files.

A documented schema for this format comes with the TPP and can also be found at https://github.com/OpenMS/OpenMS/tree/develop/share/OpenMS/SCHEMAS

Constructor & Destructor Documentation

◆ PepXMLFile()

Constructor.

◆ ~PepXMLFile()

~PepXMLFile ( )
override

Destructor.

Member Function Documentation

◆ endElement()

void endElement ( const XMLCh * const  ,
const XMLCh * const  ,
const XMLCh *const  qname 
)
overrideprotected

Docu in base class.

◆ keepNativeSpectrumName()

void keepNativeSpectrumName ( bool  keep)
inline

Whether we should keep the native spectrum name of the pepXML.

Note
This will lead to a "pepxml_spectrum_name" meta value being added to each PeptideIdentification containing the original name of the spectrum in TPP format.

◆ load() [1/2]

void load ( const String filename,
std::vector< ProteinIdentification > &  proteins,
std::vector< PeptideIdentification > &  peptides,
const String experiment_name,
const SpectrumMetaDataLookup lookup 
)

Loads peptide sequences with modifications out of a PepXML file.

Parameters
filenamePepXML file to load
proteinsProtein identification output
peptidesPeptide identification output
experiment_nameExperiment file name, which is used to extract the corresponding search results from the PepXML file.
lookupHelper for looking up retention times (PepXML may contain only scan numbers).
Exceptions
Exception::FileNotFoundis thrown if the file could not be opened
Exception::ParseErroris thrown if an error occurs during parsing

◆ load() [2/2]

void load ( const String filename,
std::vector< ProteinIdentification > &  proteins,
std::vector< PeptideIdentification > &  peptides,
const String experiment_name = "" 
)

load function with empty defaults for some parameters (see above)

Exceptions
Exception::FileNotFoundis thrown if the file could not be opened
Exception::ParseErroris thrown if an error occurs during parsing

◆ makeScanMap_()

void makeScanMap_ ( )
private

Fill scan_map_.

◆ matchModification_()

void matchModification_ ( const double  mass,
const String origin,
String modification_description 
)
private

find modification name given a modified AA mass

Matches a mass of a modified AA to a mod in our modification db For ambiguous mods, the first (arbitrary) is returned If no mod is found an error is issued and the return string is empty

Note
A duplicate of this function is also used in ProtXMLFile
Parameters
massModified AA's mass
originAA one letter code
modification_description[out] Name of the modification, e.g. 'Carboxymethyl (C)'

◆ readRTMZCharge_()

void readRTMZCharge_ ( const xercesc::Attributes &  attributes)
private

Read RT, m/z, charge information from attributes of "spectrum_query".

◆ startElement()

void startElement ( const XMLCh * const  ,
const XMLCh * const  ,
const XMLCh *const  qname,
const xercesc::Attributes &  attributes 
)
overrideprotected

Docu in base class.

◆ store()

void store ( const String filename,
std::vector< ProteinIdentification > &  protein_ids,
std::vector< PeptideIdentification > &  peptide_ids,
const String mz_file = "",
const String mz_name = "",
bool  peptideprophet_analyzed = false,
double  rt_tolerance = 0.01 
)

Stores idXML as PepXML file.

Exceptions
Exception::UnableToCreateFileis thrown if the file could not be opened for writing

Member Data Documentation

◆ analysis_summary_

bool analysis_summary_
private

Are we currently in an "analysis_summary" element (should be skipped)?

◆ charge_

Int charge_
private

Precursor ion charge.

◆ checked_base_name_

bool checked_base_name_
private

Have we checked the "base_name" attribute in the "msms_run_summary" element?

◆ current_analysis_result_

PeptideHit::PepXMLAnalysisResult current_analysis_result_
private

Analysis result instance currently being processed.

◆ current_base_name_

String current_base_name_
private

current base name

◆ current_modifications_

std::vector<std::pair<String, Size> > current_modifications_
private

The modifications of the current peptide hit (position is 1-based)

◆ current_peptide_

PeptideIdentification current_peptide_
private

PeptideIdentification instance currently being processed.

◆ current_proteins_

std::vector<std::vector<ProteinIdentification>::iterator> current_proteins_
private

References to currently active ProteinIdentifications.

◆ current_sequence_

String current_sequence_
private

Sequence of the current peptide hit.

◆ date_

DateTime date_
private

Date the pepXML file was generated.

◆ enzyme_

String enzyme_
private

Enzyme name associated with the current identification run.

◆ exp_name_

String exp_name_
private

Name of the associated experiment (filename of the data file, extension will be removed)

◆ experiment_label_

String experiment_label_
private

◆ fixed_modifications_

std::vector<AminoAcidModification> fixed_modifications_
private

Fixed aminoacid modifications.

◆ hydrogen_

Element hydrogen_
private

Hydrogen data (for mass types)

◆ hydrogen_mass_

double hydrogen_mass_
private

Mass of a hydrogen atom (monoisotopic/average depending on case)

◆ keep_native_name_

bool keep_native_name_
private

Whether we should keep the native spectrum name of the pepXML.

◆ lookup_

const SpectrumMetaDataLookup* lookup_
private

Pointer to wrapper for looking up spectrum meta data.

◆ mod_tol_

const double mod_tol_
staticprivate

◆ mz_

double mz_
private

◆ native_spectrum_name_

String native_spectrum_name_
private

Several optional attributes of spectrum_query.

◆ params_

Search parameters of the current identification run.

◆ peptide_hit_

PeptideHit peptide_hit_
private

PeptideHit instance currently being processed.

◆ peptides_

std::vector<PeptideIdentification>* peptides_
private

Pointer to the list of identified peptides.

◆ prot_id_

String prot_id_
private

Identifier linking PeptideIdentifications and ProteinIdentifications.

◆ proteins_

std::vector<ProteinIdentification>* proteins_
private

Pointer to the list of identified proteins.

◆ rt_

double rt_
private

RT and m/z of current PeptideIdentification.

◆ scan_map_

std::map<Size, Size> scan_map_
private

Mapping between scan number in the pepXML file and index in the corresponding MSExperiment.

◆ search_engine_

String search_engine_
private

Set name of search engine.

◆ search_id_

UInt search_id_
private

ID of current search result.

◆ search_score_summary_

bool search_score_summary_
private

Are we currently in an "search_score_summary" element (should be skipped)?

◆ search_summary_

bool search_summary_
private

Are we currently in an "search_summary" element (should be skipped)?

◆ seen_experiment_

bool seen_experiment_
private

Have we seen the experiment of interest at all?

◆ status_

String status_
private

◆ swath_assay_

String swath_assay_
private

◆ use_precursor_data_

bool use_precursor_data_
private

Get RT and m/z for peptide ID from precursor scan (should only matter for RT)?

◆ variable_modifications_

std::vector<AminoAcidModification> variable_modifications_
private

Variable aminoacid modifications.

◆ wrong_experiment_

bool wrong_experiment_
private

Do current entries belong to the experiment of interest (for pepXML files that bundle results from different experiments)?

◆ xtandem_artificial_mod_tol_

const double xtandem_artificial_mod_tol_
staticprivate