![]() |
OpenMS
|
Representation of a peptide/protein sequence. More...
#include <OpenMS/CHEMISTRY/AASequence.h>
Classes | |
| class | ConstIterator |
| ConstIterator for AASequence. More... | |
| class | Iterator |
| Iterator class for AASequence. More... | |
Public Member Functions | |
Constructors and Destructors | |
| AASequence ()=default | |
| Default constructor. | |
| AASequence (const AASequence &)=default | |
| Copy constructor. | |
| AASequence (AASequence &&)=default | |
| Move constructor. | |
| ~AASequence ()=default | |
| Destructor. | |
| AASequence & | operator= (const AASequence &)=default |
| Assignment operator. | |
| AASequence & | operator= (AASequence &&)=default |
| Move assignment operator. | |
| bool | empty () const |
| check if sequence is empty | |
Accessors | |
| std::string | toString () const |
| returns the peptide as string with modifications embedded in brackets | |
| std::string | toUnmodifiedString () const |
| returns the peptide as string without any modifications or (e.g., "PEPTIDER") | |
| std::string | toUniModString () const |
| returns the peptide as string with UniMod-style modifications embedded in brackets | |
| std::string | toBracketString (bool integer_mass=true, bool mass_delta=false, const std::vector< std::string > &fixed_modifications=std::vector< std::string >()) const |
| create a TPP compatible string of the modified sequence using bracket notation. | |
| void | setModification (Size index, const std::string &modification) |
| void | setModification (Size index, const Residue *modification) |
sets the modification of AA at index by providing an already, potentially modified residue | |
| void | setModification (Size index, const ResidueModification *modification) |
sets the modification of AA at index by providing a pointer to a ResidueModification object found in the ModificationsDB | |
| void | setModification (Size index, const ResidueModification &modification) |
| void | setModificationByDiffMonoMass (Size index, double diffMonoMass) |
modifies the residue at index in the sequence and potentially in the ResidueDB | |
| void | setNTerminalModification (const std::string &modification) |
| void | setNTerminalModification (const ResidueModification *modification) |
| sets the N-terminal modification | |
| void | setNTerminalModification (const ResidueModification &mod) |
| sets the N-terminal modification (copies and adds to database if not present) | |
| void | setNTerminalModificationByDiffMonoMass (double diffMonoMass, bool protein_term) |
| sets the N-terminal modification by the monoisotopic mass difference it introduces (creates a "user-defined" mod if not present) | |
| const std::string & | getNTerminalModificationName () const |
| returns the name (ID) of the N-terminal modification, or an empty string if none is set | |
| const ResidueModification * | getNTerminalModification () const |
| returns a pointer to the N-terminal modification, or zero if none is set | |
| void | setCTerminalModification (const std::string &modification) |
| void | setCTerminalModification (const ResidueModification *modification) |
| sets the C-terminal modification (must be present in the database) | |
| void | setCTerminalModification (const ResidueModification &mod) |
| sets the C-terminal modification (copies and adds to database if not present) | |
| void | setCTerminalModificationByDiffMonoMass (double diffMonoMass, bool protein_term) |
| sets the C-terminal modification by the monoisotopic mass difference it introduces (creates a "user-defined" mod if not present) | |
| const std::string & | getCTerminalModificationName () const |
| returns the name (ID) of the C-terminal modification, or an empty string if none is set | |
| const ResidueModification * | getCTerminalModification () const |
| returns a pointer to the C-terminal modification, or zero if none is set | |
| const Residue & | getResidue (Size index) const |
returns a pointer to the residue at position index | |
| EmpiricalFormula | getFormula (Residue::ResidueType type=Residue::Full, Int charge=0) const |
| returns the formula of the peptide | |
| double | getAverageWeight (Residue::ResidueType type=Residue::Full, Int charge=0) const |
| returns the average weight of the peptide | |
| double | getMonoWeight (Residue::ResidueType type=Residue::Full, Int charge=0) const |
| double | getMZ (Int charge, Residue::ResidueType type=Residue::Full) const |
| const Residue & | operator[] (Size index) const |
| returns a pointer to the residue at given position | |
| AASequence | operator+ (const AASequence &peptide) const |
| adds the residues of the peptide | |
| AASequence & | operator+= (const AASequence &) |
| adds the residues of a peptide | |
| AASequence | operator+ (const Residue *residue) const |
| adds the residues of the peptide | |
| AASequence & | operator+= (const Residue *) |
| adds the residues of a peptide | |
| Size | size () const |
| returns the number of residues | |
| AASequence | getPrefix (Size index) const |
| returns a peptide sequence of the first index residues | |
| AASequence | getSuffix (Size index) const |
| returns a peptide sequence of the last index residues | |
| AASequence | getSubsequence (Size index, UInt number) const |
| returns a peptide sequence of number residues, beginning at position index | |
| void | getAAFrequencies (std::map< std::string, Size > &frequency_table) const |
| compute frequency table of amino acids | |
Predicates | |
| bool | has (const Residue &residue) const |
| returns true if the peptide contains the given residue | |
| bool | hasSubsequence (const AASequence &peptide) const |
| bool | hasPrefix (const AASequence &peptide) const |
| bool | hasSuffix (const AASequence &peptide) const |
| bool | hasNTerminalModification () const |
| predicate which is true if the peptide is N-term modified | |
| bool | hasCTerminalModification () const |
| predicate which is true if the peptide is C-term modified | |
| bool | isModified () const |
| returns true if any of the residues or termini are modified | |
| bool | operator== (const AASequence &rhs) const |
| equality operator. Two sequences are equal iff all amino acids including PTMs are equal | |
| bool | operator< (const AASequence &rhs) const |
| lesser than operator which compares the C-term mods, sequence including PTMS and N-term mods; can be used for maps | |
| bool | operator!= (const AASequence &rhs) const |
| inequality operator. Complement of equality operator. | |
Iterators | |
| Iterator | begin () |
| ConstIterator | begin () const |
| Iterator | end () |
| ConstIterator | end () const |
Stream operators | |
| std::vector< const Residue * > | peptide_ |
| const ResidueModification * | n_term_mod_ = nullptr |
| const ResidueModification * | c_term_mod_ = nullptr |
| AASequence (const std::string &s) | |
| constructor from String | |
| AASequence (const char *s) | |
| constructor from C string | |
| AASequence (const std::string &s, bool permissive) | |
| constructor from String | |
| AASequence (const char *s, bool permissive) | |
| constructor from C string | |
| std::ostream & | operator<< (std::ostream &os, const AASequence &peptide) |
| writes a peptide to an output stream | |
| std::istream & | operator>> (std::istream &is, const AASequence &peptide) |
| reads a peptide from an input stream | |
| static AASequence | fromString (const std::string &s, bool permissive=true) |
| create AASequence object by parsing an OpenMS string | |
| static AASequence | fromString (const char *s, bool permissive=true) |
| create AASequence object by parsing a C string (character array) | |
| static std::string::const_iterator | parseModRoundBrackets_ (const std::string::const_iterator str_it, const std::string &str, AASequence &aas, const ResidueModification::TermSpecificity &specificity) |
| Parses modifications in round brackets (an identifier) | |
| static std::string::const_iterator | parseModSquareBrackets_ (const std::string::const_iterator str_it, const std::string &str, AASequence &aas, const ResidueModification::TermSpecificity &specificity) |
| Parses modifications in square brackets (a mass) | |
| static void | parseString_ (const std::string &peptide, AASequence &aas, bool permissive=true) |
Representation of a peptide/protein sequence.
This class represents amino acid sequences in OpenMS. An AASequence instance primarily contains a sequence of residues. The sequence is represented as a vector of pointers to instances of Residue. Each amino acid has only one instance, which is accessible using the ResidueDB instance (singleton).
To create an AASequence instance for a specific amino acid sequence, use the AASequence::fromString function. For example, AASequence::fromString(".DFPIANGER.") produces an instance of AASequence for the peptide "DFPIANGER". Please note that both the N- and the C-terminal are explicitly represented by dots.
A critical property of amino acid sequences is that they can be modified. Which means that one or more amino acids are chemically modified, e.g. oxidized. This is represented via Residue instances which carry a ResidueModification object. This is also handled in the ResidueDB.
Modifications are specified using a unique string identifier present in the ModificationsDB in round brackets after the modified amino acid or by providing the mass of the residue in square brackets. For example AASequence::fromString(".DFPIAM(Oxidation)GER.") creates an instance of the peptide "DFPIAMGER" with an oxidized methionine (AASequence::fromString(".DFPIAM(UniMod:35)GER."), AASequence::fromString(".DFPIAM[+16]GER.") and AASequence::fromString(".DFPIAM[147]GER.") are all equivalent). N- and C-terminal modifications are represented by brackets to the right of the dots terminating the sequence. For example, ".(Dimethyl)DFPIAMGER." and ".DFPIAMGER.(Label:18O(2))" represent the labelling of the N- and C-terminus respectively, but ".DFPIAMGER(Phospho)." will be interpreted as a phosphorylation of the last arginine at its side chain.
Note there is a subtle difference between AASequence::fromString(".DFPIAM[+16]GER.") and AASequence::fromString(".DFPIAM[+15.9949]GER.") – while the former will try to find the first modification matching to a mass difference of 16 +/- 0.5, the latter will try to find the closest matching modification to the exact mass. This usually gives the intended results while the first approach may not.
Arbitrary/unknown amino acids (usually due to an unknown modification) can be specified using tags preceded by X: "X[weight]". This indicates a new amino acid ("X") with the specified weight, e.g. "RX[148.5]T"". Note that this tag does not alter the amino acids to the left (R) or right (T). Rather, X represents an amino acid on its own. Be careful when converting such AASequence objects to an EmpiricalFormula using getFormula(), as tags will not be considered in this case (there exists no formula for them). However, they have an influence on getMonoWeight() and getAverageWeight()!
|
default |
Default constructor.
|
default |
Copy constructor.
|
default |
Move constructor.
|
default |
Destructor.
|
explicit |
constructor from String
| [in] | s | A std::string representing the amino acid sequence |
|
explicit |
constructor from C string
| [in] | s | A C-style string representing the amino acid sequence |
|
explicit |
constructor from String
| [in] | s | A std::string representing the amino acid sequence |
| [in] | permissive | If set, skip spaces and replace stop codon symbols ("*", "#", "+") by "X" (unknown amino acid) during parsing |
|
explicit |
constructor from C string
| [in] | s | A C-style string representing the amino acid sequence |
| [in] | permissive | If set, skip spaces and replace stop codon symbols ("*", "#", "+") by "X" (unknown amino acid) during parsing |
|
inline |
|
inline |
| bool empty | ( | ) | const |
check if sequence is empty
Referenced by OPXLDataStructs::ProteinProteinCrossLink::getType().
|
inline |
|
inline |
|
static |
create AASequence object by parsing a C string (character array)
| [in] | s | Input string |
| [in] | permissive | If set, skip spaces and replace stop codon symbols ("*", "#", "+") by "X" (unknown amino acid) during parsing |
| Exception::ParseError | if an invalid string representation of an AA sequence is passed |
|
static |
create AASequence object by parsing an OpenMS string
| [in] | s | Input string |
| [in] | permissive | If set, skip spaces and replace stop codon symbols ("*", "#", "+") by "X" (unknown amino acid) during parsing |
| Exception::ParseError | if an invalid string representation of an AA sequence is passed |
| void getAAFrequencies | ( | std::map< std::string, Size > & | frequency_table | ) | const |
compute frequency table of amino acids
| double getAverageWeight | ( | Residue::ResidueType | type = Residue::Full, |
| Int | charge = 0 |
||
| ) | const |
returns the average weight of the peptide
| const ResidueModification * getCTerminalModification | ( | ) | const |
returns a pointer to the C-terminal modification, or zero if none is set
| const std::string & getCTerminalModificationName | ( | ) | const |
returns the name (ID) of the C-terminal modification, or an empty string if none is set
| EmpiricalFormula getFormula | ( | Residue::ResidueType | type = Residue::Full, |
| Int | charge = 0 |
||
| ) | const |
returns the formula of the peptide
| double getMonoWeight | ( | Residue::ResidueType | type = Residue::Full, |
| Int | charge = 0 |
||
| ) | const |
returns the mono isotopic weight of the peptide in the given ionic form
| double getMZ | ( | Int | charge, |
| Residue::ResidueType | type = Residue::Full |
||
| ) | const |
returns mass-to-charge ratio of the peptide in the given ionic form
| Exception::InvalidValue | if charge==0 |
| const ResidueModification * getNTerminalModification | ( | ) | const |
returns a pointer to the N-terminal modification, or zero if none is set
| const std::string & getNTerminalModificationName | ( | ) | const |
returns the name (ID) of the N-terminal modification, or an empty string if none is set
| AASequence getPrefix | ( | Size | index | ) | const |
returns a peptide sequence of the first index residues
| AASequence getSubsequence | ( | Size | index, |
| UInt | number | ||
| ) | const |
returns a peptide sequence of number residues, beginning at position index
| AASequence getSuffix | ( | Size | index | ) | const |
returns a peptide sequence of the last index residues
| bool has | ( | const Residue & | residue | ) | const |
returns true if the peptide contains the given residue
| bool hasCTerminalModification | ( | ) | const |
predicate which is true if the peptide is C-term modified
| bool hasNTerminalModification | ( | ) | const |
predicate which is true if the peptide is N-term modified
| bool hasPrefix | ( | const AASequence & | peptide | ) | const |
returns true if the peptide has the given prefix n-term mod is also checked (c-term as well, if prefix is of same length)
| bool hasSubsequence | ( | const AASequence & | peptide | ) | const |
returns true if the peptide contains the given peptide
| bool hasSuffix | ( | const AASequence & | peptide | ) | const |
returns true if the peptide has the given suffix c-term mod is also checked (n-term as well, if suffix is of same length)
| bool isModified | ( | ) | const |
returns true if any of the residues or termini are modified
| bool operator!= | ( | const AASequence & | rhs | ) | const |
inequality operator. Complement of equality operator.
| AASequence operator+ | ( | const AASequence & | peptide | ) | const |
adds the residues of the peptide
| AASequence operator+ | ( | const Residue * | residue | ) | const |
adds the residues of the peptide
| AASequence & operator+= | ( | const AASequence & | ) |
adds the residues of a peptide
| AASequence & operator+= | ( | const Residue * | ) |
adds the residues of a peptide
| bool operator< | ( | const AASequence & | rhs | ) | const |
lesser than operator which compares the C-term mods, sequence including PTMS and N-term mods; can be used for maps
|
default |
Move assignment operator.
|
default |
Assignment operator.
| bool operator== | ( | const AASequence & | rhs | ) | const |
equality operator. Two sequences are equal iff all amino acids including PTMs are equal
|
staticprotected |
Parses modifications in round brackets (an identifier)
If dot notation is used it resolves cterm ambiguity based on the presence of the dot.
| [in] | str_it | Current position in the string to be parsed |
| [in] | str | Full input string |
| [in,out] | aas | Current AASequence object (will be modified with the correct residue added) |
| [in] | specificity | Whether the current modification should be interpreted as N- or C-terminal |
|
staticprotected |
Parses modifications in square brackets (a mass)
If dot notation is used it resolves cterm ambiguity based on the presence of the dot.
| [in] | str_it | Current position in the string to be parsed |
| [in] | str | Full input string |
| [in,out] | aas | Current AASequence object (will be modified with the correct residue added) |
| [in] | specificity | Whether the current modification should be interpreted as N- or C-terminal |
|
staticprotected |
| void setCTerminalModification | ( | const ResidueModification & | mod | ) |
sets the C-terminal modification (copies and adds to database if not present)
| void setCTerminalModification | ( | const ResidueModification * | modification | ) |
sets the C-terminal modification (must be present in the database)
| void setCTerminalModification | ( | const std::string & | modification | ) |
sets the C-terminal modification (by lookup in the mod names of the ModificationsDB) throws if nothing is found (since the name is not enough information to create a new mod)
| void setCTerminalModificationByDiffMonoMass | ( | double | diffMonoMass, |
| bool | protein_term | ||
| ) |
sets the C-terminal modification by the monoisotopic mass difference it introduces (creates a "user-defined" mod if not present)
sets the modification of AA at index by providing an already, potentially modified residue
| void setModification | ( | Size | index, |
| const ResidueModification & | modification | ||
| ) |
sets the modification of AA at index by providing a ResidueModification object stricter than just looking for the name and adds the Modification to the DB if not present
| void setModification | ( | Size | index, |
| const ResidueModification * | modification | ||
| ) |
sets the modification of AA at index by providing a pointer to a ResidueModification object found in the ModificationsDB
| void setModification | ( | Size | index, |
| const std::string & | modification | ||
| ) |
set the modification of the residue at position index. if an empty string is passed replaces the residue with its unmodified version
| void setModificationByDiffMonoMass | ( | Size | index, |
| double | diffMonoMass | ||
| ) |
modifies the residue at index in the sequence and potentially in the ResidueDB
| void setNTerminalModification | ( | const ResidueModification & | mod | ) |
sets the N-terminal modification (copies and adds to database if not present)
| void setNTerminalModification | ( | const ResidueModification * | modification | ) |
sets the N-terminal modification
| void setNTerminalModification | ( | const std::string & | modification | ) |
sets the N-terminal modification (by lookup in the mod names of the ModificationsDB) throws if nothing is found (since the name is not enough information to create a new mod)
| void setNTerminalModificationByDiffMonoMass | ( | double | diffMonoMass, |
| bool | protein_term | ||
| ) |
sets the N-terminal modification by the monoisotopic mass difference it introduces (creates a "user-defined" mod if not present)
| Size size | ( | ) | const |
returns the number of residues
Referenced by NuXLLinearRescore::apply(), and AAIndex::calculateGB().
| std::string toBracketString | ( | bool | integer_mass = true, |
| bool | mass_delta = false, |
||
| const std::vector< std::string > & | fixed_modifications = std::vector< std::string >() |
||
| ) | const |
create a TPP compatible string of the modified sequence using bracket notation.
Instead of using the modification names, it writes the modification masses in brackets
i.e.:
will be produced, depending on whether relative or absolute masses are used.
| [in] | integer_mass | Whether to use integer masses in brackets (default is true, if false, accurate masses will be written) |
| [in] | mass_delta | Whether to write absolute masses M[147] or relative mass deltas M[+16] (default is false) |
| [in] | fixed_modifications | Optional list of fixed modifications that should not be added to the output (they are considered to be present in all cases) |
| std::string toString | ( | ) | const |
returns the peptide as string with modifications embedded in brackets
Uses round brackets when possible (id is known) or square brackets for unknown modifications where only the mass is known.
i.e.: .[43]PEPC(Carbamidomethyl)PEPM[147]PEPR.[-1]
Referenced by MapAlignmentAlgorithmIdentification::getRetentionTimes_(), and IDBoostGraph::LabelVisitor::operator()().
| std::string toUniModString | ( | ) | const |
returns the peptide as string with UniMod-style modifications embedded in brackets
Annotates modification with UniMod identifier (when identifier is known) and uses square brackets for unknown modifications (only mass is known).
i.e.: .[43]PEPC(UniMod:4)PEPM[147]PEPR.[16]
| std::string toUnmodifiedString | ( | ) | const |
returns the peptide as string without any modifications or (e.g., "PEPTIDER")
Referenced by NuXLRTPrediction::buildPredictorsAndResponse_(), NuXLRTPrediction::buildPredictorsAndResponseFromIdentifiedFeatures_(), IDFilter::PeptideDigestionFilter::operator()(), and IDBoostGraph::PrintAddressVisitor< CharT >::operator()().
|
friend |
writes a peptide to an output stream
|
friend |
reads a peptide from an input stream
|
protected |
|
protected |
|
protected |