OpenMS  2.4.0
Classes | Public Member Functions | Static Public Member Functions | Static Protected Member Functions | Protected Attributes | List of all members
AASequence Class Reference

Representation of a peptide/protein sequence. More...

#include <OpenMS/CHEMISTRY/AASequence.h>

Classes

class  ConstIterator
 ConstIterator for AASequence. More...
 
class  Iterator
 Iterator class for AASequence. More...
 

Public Member Functions

AASequenceoperator= (const AASequence &rhs)
 assignment operator More...
 
bool empty () const
 check if sequence is empty More...
 
Constructors and Destructors
 AASequence ()
 default constructor More...
 
 AASequence (const AASequence &rhs)
 copy constructor More...
 
virtual ~AASequence ()
 destructor More...
 
Accessors
String toString () const
 returns the peptide as string with modifications embedded in brackets More...
 
String toUnmodifiedString () const
 returns the peptide as string without any modifications More...
 
String toUniModString () const
 returns the peptide as string with UniMod-style modifications embedded in brackets More...
 
String toBracketString (bool integer_mass=true, const std::vector< String > &fixed_modifications=std::vector< String >()) const
 create a TPP compatible string of the modified sequence using bracket notation. More...
 
void setModification (Size index, const String &modification)
 
void setNTerminalModification (const String &modification)
 sets the N-terminal modification More...
 
const StringgetNTerminalModificationName () const
 returns the name (ID) of the N-terminal modification, or an empty string if none is set More...
 
const ResidueModificationgetNTerminalModification () const
 returns a pointer to the N-terminal modification, or zero if none is set More...
 
void setCTerminalModification (const String &modification)
 sets the C-terminal modification More...
 
const StringgetCTerminalModificationName () const
 returns the name (ID) of the C-terminal modification, or an empty string if none is set More...
 
const ResidueModificationgetCTerminalModification () const
 returns a pointer to the C-terminal modification, or zero if none is set More...
 
const ResiduegetResidue (Size index) const
 returns a pointer to the residue at position index More...
 
EmpiricalFormula getFormula (Residue::ResidueType type=Residue::Full, Int charge=0) const
 returns the formula of the peptide More...
 
double getAverageWeight (Residue::ResidueType type=Residue::Full, Int charge=0) const
 returns the average weight of the peptide More...
 
double getMonoWeight (Residue::ResidueType type=Residue::Full, Int charge=0) const
 
const Residueoperator[] (Size index) const
 returns a pointer to the residue at given position More...
 
AASequence operator+ (const AASequence &peptide) const
 adds the residues of the peptide More...
 
AASequenceoperator+= (const AASequence &)
 adds the residues of a peptide More...
 
AASequence operator+ (const Residue *residue) const
 adds the residues of the peptide More...
 
AASequenceoperator+= (const Residue *)
 adds the residues of a peptide More...
 
Size size () const
 returns the number of residues More...
 
AASequence getPrefix (Size index) const
 returns a peptide sequence of the first index residues More...
 
AASequence getSuffix (Size index) const
 returns a peptide sequence of the last index residues More...
 
AASequence getSubsequence (Size index, UInt number) const
 returns a peptide sequence of number residues, beginning at position index More...
 
void getAAFrequencies (Map< String, Size > &frequency_table) const
 compute frequency table of amino acids More...
 
Predicates
bool has (const Residue &residue) const
 returns true if the peptide contains the given residue More...
 
bool hasSubsequence (const AASequence &peptide) const
 
bool hasPrefix (const AASequence &peptide) const
 
bool hasSuffix (const AASequence &peptide) const
 
bool hasNTerminalModification () const
 predicate which is true if the peptide is N-term modified More...
 
bool hasCTerminalModification () const
 predicate which is true if the peptide is C-term modified More...
 
bool isModified () const
 returns true if any of the residues or termini are modified More...
 
bool operator== (const AASequence &rhs) const
 equality operator. Two sequences are equal iff all amino acids including PTMs are equal More...
 
bool operator< (const AASequence &rhs) const
 lesser than operator which compares the C-term mods, sequence including PTMS and N-term mods; can be used for maps More...
 
bool operator!= (const AASequence &rhs) const
 inequality operator. Complement of equality operator. More...
 
Iterators
Iterator begin ()
 
ConstIterator begin () const
 
Iterator end ()
 
ConstIterator end () const
 

Static Public Member Functions

static AASequence fromString (const String &s, bool permissive=true)
 create AASequence object by parsing an OpenMS string More...
 
static AASequence fromString (const char *s, bool permissive=true)
 create AASequence object by parsing a C string (character array) More...
 

Static Protected Member Functions

static String::ConstIterator parseModRoundBrackets_ (const String::ConstIterator str_it, const String &str, AASequence &aas, const ResidueModification::TermSpecificity &specificity)
 Parses modifications in round brackets (an identifier) More...
 
static String::ConstIterator parseModSquareBrackets_ (const String::ConstIterator str_it, const String &str, AASequence &aas, const ResidueModification::TermSpecificity &specificity)
 Parses modifications in square brackets (a mass) More...
 
static void parseString_ (const String &peptide, AASequence &aas, bool permissive=true)
 

Protected Attributes

std::vector< const Residue * > peptide_
 
const ResidueModificationn_term_mod_
 
const ResidueModificationc_term_mod_
 

Friends

Stream operators
std::ostream & operator<< (std::ostream &os, const AASequence &peptide)
 writes a peptide to an output stream More...
 
std::istream & operator>> (std::istream &is, const AASequence &peptide)
 reads a peptide from an input stream More...
 

Detailed Description

Representation of a peptide/protein sequence.

This class represents amino acid sequences in OpenMS. An AASequence instance primarily contains a sequence of residues. The sequence is represented as a vector of pointers to instances of Residue. Each amino acid has only one instance, which is accessible using the ResidueDB instance (singleton).

To create an AASequence instance for a specific amino acid sequence, use the AASequence::fromString function. For example, AASequence::fromString(".DFPIANGER.") produces an instance of AASequence for the peptide "DFPIANGER". Please note that both the N- and the C-terminal are explicitly represented by dots.

A critical property of amino acid sequences is that they can be modified. Which means that one or more amino acids are chemically modified, e.g. oxidized. This is represented via Residue instances which carry a ResidueModification object. This is also handled in the ResidueDB.

Modifications are specified using a unique string identifier present in the ModificationsDB in round brackets after the modified amino acid or by providing the mass of the residue in square brackets. For example AASequence::fromString(".DFPIAM(Oxidation)GER.") creates an instance of the peptide "DFPIAMGER" with an oxidized methionine (AASequence::fromString(".DFPIAM(UniMod:35)GER."), AASequence::fromString(".DFPIAM[+16]GER.") and AASequence::fromString(".DFPIAM[147]GER.") are all equivalent). N- and C-terminal modifications are represented by brackets to the right of the dots terminating the sequence. For example, ".(Dimethyl)DFPIAMGER." and ".DFPIAMGER.(Label:18O(2))" represent the labelling of the N- and C-terminus respectively, but ".DFPIAMGER(Phospho)." will be interpreted as a phosphorylation of the last arginine at its side chain.

Note there is a subtle difference between AASequence::fromString(".DFPIAM[+16]GER.") and AASequence::fromString(".DFPIAM[+15.9949]GER.") – while the former will try to find the first modification matching to a mass difference of 16 +/- 0.5, the latter will try to find the closest matching modification to the exact mass. This usually gives the intended results while the first approach may not.

Arbitrary/unknown amino acids (usually due to an unknown modification) can be specified using tags preceded by X: "X[weight]". This indicates a new amino acid ("X") with the specified weight, e.g. "RX[148.5]T"". Note that this tag does not alter the amino acids to the left (R) or right (T). Rather, X represents an amino acid on its own. Be careful when converting such AASequence objects to an EmpiricalFormula using getFormula(), as tags will not be considered in this case (there exists no formula for them). However, they have an influence on getMonoWeight() and getAverageWeight()!

Constructor & Destructor Documentation

◆ AASequence() [1/2]

default constructor

◆ AASequence() [2/2]

AASequence ( const AASequence rhs)

copy constructor

◆ ~AASequence()

virtual ~AASequence ( )
virtual

destructor

Member Function Documentation

◆ begin() [1/2]

Iterator begin ( )
inline

◆ begin() [2/2]

ConstIterator begin ( ) const
inline

◆ empty()

bool empty ( ) const

check if sequence is empty

Referenced by OPXLDataStructs::ProteinProteinCrossLink::getType().

◆ end() [1/2]

Iterator end ( )
inline

◆ end() [2/2]

ConstIterator end ( ) const
inline

◆ fromString() [1/2]

static AASequence fromString ( const String s,
bool  permissive = true 
)
static

◆ fromString() [2/2]

static AASequence fromString ( const char *  s,
bool  permissive = true 
)
static

create AASequence object by parsing a C string (character array)

Parameters
sInput string
permissiveIf set, skip spaces and replace stop codon symbols ("*", "#", "+") by "X" (unknown amino acid) during parsing
Exceptions
Exception::ParseErrorif an invalid string representation of an AA sequence is passed

◆ getAAFrequencies()

void getAAFrequencies ( Map< String, Size > &  frequency_table) const

compute frequency table of amino acids

◆ getAverageWeight()

double getAverageWeight ( Residue::ResidueType  type = Residue::Full,
Int  charge = 0 
) const

returns the average weight of the peptide

◆ getCTerminalModification()

const ResidueModification* getCTerminalModification ( ) const

returns a pointer to the C-terminal modification, or zero if none is set

◆ getCTerminalModificationName()

const String& getCTerminalModificationName ( ) const

returns the name (ID) of the C-terminal modification, or an empty string if none is set

◆ getFormula()

EmpiricalFormula getFormula ( Residue::ResidueType  type = Residue::Full,
Int  charge = 0 
) const

◆ getMonoWeight()

double getMonoWeight ( Residue::ResidueType  type = Residue::Full,
Int  charge = 0 
) const

returns the mono isotopic weight of the peptide in the given ionic form

Note
will not (and cannot) control whether the required ion can exist (e.g. x/c ions for monomers) as it does not do fragmentation but rather supplementing/deduction of the sequence to its ionic form.

Referenced by TOPPOpenPepXLLF::main_(), SimpleSearchEngine::main_(), TOPPOpenPepXL::main_(), RNPxlSearch::main_(), TOPPMetaProSIP::main_(), and TOPPViewBase::showSpectrumGenerationDialog().

◆ getNTerminalModification()

const ResidueModification* getNTerminalModification ( ) const

returns a pointer to the N-terminal modification, or zero if none is set

◆ getNTerminalModificationName()

const String& getNTerminalModificationName ( ) const

returns the name (ID) of the N-terminal modification, or an empty string if none is set

◆ getPrefix()

AASequence getPrefix ( Size  index) const

returns a peptide sequence of the first index residues

Referenced by RNPxlSearch::postScoreHits_().

◆ getResidue()

const Residue& getResidue ( Size  index) const

returns a pointer to the residue at position index

◆ getSubsequence()

AASequence getSubsequence ( Size  index,
UInt  number 
) const

returns a peptide sequence of number residues, beginning at position index

◆ getSuffix()

AASequence getSuffix ( Size  index) const

returns a peptide sequence of the last index residues

Referenced by RNPxlSearch::postScoreHits_().

◆ has()

bool has ( const Residue residue) const

returns true if the peptide contains the given residue

◆ hasCTerminalModification()

bool hasCTerminalModification ( ) const

predicate which is true if the peptide is C-term modified

◆ hasNTerminalModification()

bool hasNTerminalModification ( ) const

predicate which is true if the peptide is N-term modified

◆ hasPrefix()

bool hasPrefix ( const AASequence peptide) const

returns true if the peptide has the given prefix n-term mod is also checked (c-term as well, if prefix is of same length)

◆ hasSubsequence()

bool hasSubsequence ( const AASequence peptide) const

returns true if the peptide contains the given peptide

Note
c-term and n-term mods are ignored

◆ hasSuffix()

bool hasSuffix ( const AASequence peptide) const

returns true if the peptide has the given suffix c-term mod is also checked (n-term as well, if suffix is of same length)

◆ isModified()

bool isModified ( ) const

returns true if any of the residues or termini are modified

◆ operator!=()

bool operator!= ( const AASequence rhs) const

inequality operator. Complement of equality operator.

◆ operator+() [1/2]

AASequence operator+ ( const AASequence peptide) const

adds the residues of the peptide

◆ operator+() [2/2]

AASequence operator+ ( const Residue residue) const

adds the residues of the peptide

◆ operator+=() [1/2]

AASequence& operator+= ( const AASequence )

adds the residues of a peptide

◆ operator+=() [2/2]

AASequence& operator+= ( const Residue )

adds the residues of a peptide

◆ operator<()

bool operator< ( const AASequence rhs) const

lesser than operator which compares the C-term mods, sequence including PTMS and N-term mods; can be used for maps

◆ operator=()

AASequence& operator= ( const AASequence rhs)

assignment operator

◆ operator==()

bool operator== ( const AASequence rhs) const

equality operator. Two sequences are equal iff all amino acids including PTMs are equal

◆ operator[]()

const Residue& operator[] ( Size  index) const

returns a pointer to the residue at given position

◆ parseModRoundBrackets_()

static String::ConstIterator parseModRoundBrackets_ ( const String::ConstIterator  str_it,
const String str,
AASequence aas,
const ResidueModification::TermSpecificity specificity 
)
staticprotected

Parses modifications in round brackets (an identifier)

If dot notation is used it resolves cterm ambiguity based on the presence of the dot.

Parameters
str_itCurrent position in the string to be parsed
strFull input string
aasCurrent AASequence object (will be modified with the correct residue added)
dot_notationWhether "dot notation" is used (e.g. ".PEPTIDE.")
dot_terminalWhether the previous character was a dot
Returns
Position at which to continue parsing

◆ parseModSquareBrackets_()

static String::ConstIterator parseModSquareBrackets_ ( const String::ConstIterator  str_it,
const String str,
AASequence aas,
const ResidueModification::TermSpecificity specificity 
)
staticprotected

Parses modifications in square brackets (a mass)

If dot notation is used it resolves cterm ambiguity based on the presence of the dot.

Parameters
str_itCurrent position in the string to be parsed
strFull input string
aasCurrent AASequence object (will be modified with the correct residue added)
specificityWhether the current modification should be interpreted as N- or C-terminal
Returns
Position at which to continue parsing

◆ parseString_()

static void parseString_ ( const String peptide,
AASequence aas,
bool  permissive = true 
)
staticprotected

◆ setCTerminalModification()

void setCTerminalModification ( const String modification)

sets the C-terminal modification

◆ setModification()

void setModification ( Size  index,
const String modification 
)

set the modification of the residue at position index. if an empty string is passed replaces the residue with its unmodified version

◆ setNTerminalModification()

void setNTerminalModification ( const String modification)

sets the N-terminal modification

◆ size()

Size size ( ) const

◆ toBracketString()

String toBracketString ( bool  integer_mass = true,
const std::vector< String > &  fixed_modifications = std::vector< String >() 
) const

create a TPP compatible string of the modified sequence using bracket notation.

Instead of using the modification names, it writes the modification masses in brackets

i.e.: n[35]RQLNK[162]LQHK[162]GEA

◆ toString()

String toString ( ) const

returns the peptide as string with modifications embedded in brackets

Uses round brackets when possible (id is known) or square brackets for unknown modifications where only the mass is known.

i.e.: .n[43]PEPC(Carbamidomethyl)PEPM[147]PEPR.[16]

Referenced by MetaProSIPReporting::createPeptideCentricCSVReport(), TOPPOpenPepXLLF::main_(), TOPPOpenPepXL::main_(), TOPPMetaProSIP::main_(), and RNPxlSearch::postScoreHits_().

◆ toUniModString()

String toUniModString ( ) const

returns the peptide as string with UniMod-style modifications embedded in brackets

Uses round brackets when possible (id is known) or square brackets for unknown modifications where only the mass is known.

i.e.: .n[43]PEPC(UniMod:4)PEPM[147]PEPR.[16]

◆ toUnmodifiedString()

String toUnmodifiedString ( ) const

Friends And Related Function Documentation

◆ operator<<

std::ostream& operator<< ( std::ostream &  os,
const AASequence peptide 
)
friend

writes a peptide to an output stream

◆ operator>>

std::istream& operator>> ( std::istream &  is,
const AASequence peptide 
)
friend

reads a peptide from an input stream

Member Data Documentation

◆ c_term_mod_

const ResidueModification* c_term_mod_
protected

◆ n_term_mod_

const ResidueModification* n_term_mod_
protected

◆ peptide_

std::vector<const Residue*> peptide_
protected