OpenMS
2.4.0
|
Class for the enzymatic digestion of proteins. More...
#include <OpenMS/CHEMISTRY/ProteaseDigestion.h>
Public Member Functions | |
void | setEnzyme (const String &name) |
Sets the enzyme for the digestion (by name) More... | |
Size | digest (const AASequence &protein, std::vector< AASequence > &output, Size min_length=1, Size max_length=0) const |
: Performs the enzymatic digestion of a protein. More... | |
Size | peptideCount (const AASequence &protein) |
Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings. More... | |
bool | isValidProduct (const String &protein, int pep_pos, int pep_length, bool ignore_missed_cleavages=true, bool allow_nterm_protein_cleavage=false, bool allow_random_asp_pro_cleavage=false) const |
Variant of EnzymaticDigestion::isValidProduct() with support for n-term protein cleavage and random D|P cleavage. More... | |
bool | isValidProduct (const AASequence &protein, int pep_pos, int pep_length, bool ignore_missed_cleavages=true, bool allow_nterm_protein_cleavage=false, bool allow_random_asp_pro_cleavage=false) const |
forwards to isValidProduct using protein.toUnmodifiedString() More... | |
void | setEnzyme (const DigestionEnzyme *enzyme) |
Sets the enzyme for the digestion. More... | |
Public Member Functions inherited from EnzymaticDigestion | |
EnzymaticDigestion () | |
Default constructor. More... | |
virtual | ~EnzymaticDigestion () |
Destructor. More... | |
Size | getMissedCleavages () const |
Returns the number of missed cleavages for the digestion. More... | |
void | setMissedCleavages (Size missed_cleavages) |
Sets the number of missed cleavages for the digestion (default is 0). This setting is ignored when log model is used. More... | |
String | getEnzymeName () const |
Returns the enzyme for the digestion. More... | |
void | setEnzyme (const DigestionEnzyme *enzyme) |
Sets the enzyme for the digestion. More... | |
Specificity | getSpecificity () const |
Returns the specificity for the digestion. More... | |
void | setSpecificity (Specificity spec) |
Sets the specificity for the digestion (default is SPEC_FULL). More... | |
Size | digestUnmodified (const StringView &sequence, std::vector< StringView > &output, Size min_length=1, Size max_length=0) const |
Performs the enzymatic digestion of an unmodified sequence. More... | |
bool | isValidProduct (const String &sequence, int pos, int length, bool ignore_missed_cleavages=true) const |
Is the peptide fragment starting at position pos with length length within the sequence sequence generated by the current enzyme? More... | |
bool | filterByMissedCleavages (const String &sequence, std::function< bool(const Int)> filter) const |
Filter based on the number of missed cleavages. More... | |
Additional Inherited Members | |
Public Types inherited from EnzymaticDigestion | |
enum | Specificity { SPEC_FULL, SPEC_SEMI, SPEC_NONE, SIZE_OF_SPECIFICITY } |
when querying for valid digestion products, this determines if the specificity of the two peptide ends is considered important More... | |
Static Public Member Functions inherited from EnzymaticDigestion | |
static Specificity | getSpecificityByName (const String &name) |
Static Public Attributes inherited from EnzymaticDigestion | |
static const std::string | NamesOfSpecificity [SIZE_OF_SPECIFICITY] |
Names of the Specificity. More... | |
static const std::string | UnspecificCleavage |
Name for unspecific cleavage. More... | |
Protected Member Functions inherited from EnzymaticDigestion | |
bool | isValidProduct_ (const String &sequence, int pos, int length, bool ignore_missed_cleavages, bool allow_nterm_protein_cleavage, bool allow_random_asp_pro_cleavage) const |
supports functionality for ProteaseDigestion as well (which is deeply weaved into the function) To avoid code duplication, this is stored here and called by wrappers. Do not duplicate the code, just for the sake of semantics (unless we can come up with a clean separation) Note: the overhead of allow_nterm_protein_cleavage and allow_random_asp_pro_cleavage is marginal; the main runtime is spend during tokenize_() More... | |
std::vector< int > | tokenize_ (const String &sequence, int start=0, int end=-1) const |
Digests the sequence using the enzyme's regular expression. More... | |
Size | digestAfterTokenize_ (const std::vector< int > &fragment_positions, const StringView &sequence, std::vector< StringView > &output, Size min_length=0, Size max_length=-1) const |
Helper function for digestUnmodified() More... | |
Size | countMissedCleavages_ (const std::vector< int > &cleavage_positions, Size seq_start, Size seq_end) const |
Counts the number of missed cleavages in a sequence fragment. More... | |
Protected Attributes inherited from EnzymaticDigestion | |
Size | missed_cleavages_ |
Number of missed cleavages. More... | |
const DigestionEnzyme * | enzyme_ |
Used enzyme. More... | |
boost::regex | re_ |
Regex for tokenizing (huge speedup by making this a member instead of stack object in tokenize_()) More... | |
Specificity | specificity_ |
specificity of enzyme More... | |
Class for the enzymatic digestion of proteins.
Digestion can be performed using simple regular expressions, e.g. [KR] | [^P] for trypsin. Also missed cleavages can be modeled, i.e. adjacent peptides are not cleaved due to enzyme malfunction/access restrictions. If n missed cleavages are allowed, all possible resulting peptides (cleaved and uncleaved) with up to n missed cleavages are returned. Thus no random selection of just n specific missed cleavage sites is performed.
An alternative model is also available in EnzymaticDigestionLogModel.
Size digest | ( | const AASequence & | protein, |
std::vector< AASequence > & | output, | ||
Size | min_length = 1 , |
||
Size | max_length = 0 |
||
) | const |
: Performs the enzymatic digestion of a protein.
protein | Sequence to digest |
output | Digestion products (peptides) |
min_length | Minimal length of reported products |
max_length | Maximal length of reported products (0 = no restriction) |
bool isValidProduct | ( | const String & | protein, |
int | pep_pos, | ||
int | pep_length, | ||
bool | ignore_missed_cleavages = true , |
||
bool | allow_nterm_protein_cleavage = false , |
||
bool | allow_random_asp_pro_cleavage = false |
||
) | const |
Variant of EnzymaticDigestion::isValidProduct() with support for n-term protein cleavage and random D|P cleavage.
Checks if peptide is a valid digestion product of the enzyme, taking into account specificity and the flags provided here.
protein | Protein sequence |
pep_pos | Starting index of potential peptide |
pep_length | Length of potential peptide |
ignore_missed_cleavages | Do not compare MC's of potential peptide to the maximum allowed MC's |
allow_nterm_protein_cleavage | Regard peptide as n-terminal of protein if it starts only at pos=1 or 2 and protein starts with 'M' |
allow_random_asp_pro_cleavage | Allow cleavage at D|P sites to count as n/c-terminal. |
Referenced by PeptideIndexing::FoundProteinFunctor::addHit(), and IDFilter::DigestionFilter::operator()().
bool isValidProduct | ( | const AASequence & | protein, |
int | pep_pos, | ||
int | pep_length, | ||
bool | ignore_missed_cleavages = true , |
||
bool | allow_nterm_protein_cleavage = false , |
||
bool | allow_random_asp_pro_cleavage = false |
||
) | const |
forwards to isValidProduct using protein.toUnmodifiedString()
Size peptideCount | ( | const AASequence & | protein | ) |
Returns the number of peptides a digestion of protein
would yield under the current enzyme and missed cleavage settings.
void setEnzyme |
Sets the enzyme for the digestion.
void setEnzyme | ( | const String & | name | ) |
Sets the enzyme for the digestion (by name)
Referenced by TOPPOpenPepXLLF::main_(), SimpleSearchEngine::main_(), TOPPOpenPepXL::main_(), RNPxlSearch::main_(), and PeptideIndexing::run().