OpenMS
2.4.0
|
Extended Aho-Corasick algorithm capable of matching ambiguous amino acids in the pattern (i.e. proteins). More...
#include <OpenMS/ANALYSIS/ID/AhoCorasickAmbiguous.h>
Public Types | |
typedef ::seqan::StringSet<::seqan::AAString > | PeptideDB |
typedef ::seqan::Pattern< PeptideDB, ::seqan::FuzzyAC > | FuzzyACPattern |
Public Member Functions | |
AhoCorasickAmbiguous () | |
Default Ctor; call setProtein() before using findNext(). More... | |
AhoCorasickAmbiguous (const String &protein_sequence) | |
Prepare to start searching for hits in a new protein sequence. More... | |
void | setProtein (const String &protein_sequence) |
Reset to new protein sequence. All previous data is forgotten. More... | |
bool | findNext (const FuzzyACPattern &pattern) |
Enumerate hits. More... | |
Size | getHitDBIndex () |
Get index of hit into peptide database of the pattern. More... | |
Int | getHitProteinPosition () |
Offset into protein sequence where hit was found. More... | |
Static Public Member Functions | |
static void | initPattern (const PeptideDB &pep_db, const int aaa_max, const int mm_max, FuzzyACPattern &pattern) |
Construct a trie from a set of peptide sequences (which are to be found in a protein). More... | |
Private Types | |
typedef FuzzyACPattern::KeyWordLengthType | KeyWordLengthType |
Private Attributes | |
::seqan::Finder< seqan::AAString > | finder_ |
locate the next peptide hit in protein More... | |
::seqan::AAString | protein_ |
the protein sequence - we need to store it since the finder only keeps a pointer to protein when constructed More... | |
::seqan::PatternAuxData< PeptideDB > | dh_ |
auxiliary data to hold a state after searching More... | |
Extended Aho-Corasick algorithm capable of matching ambiguous amino acids in the pattern (i.e. proteins).
... Features: + blazingly fast + low memory usage + number of allowed ambAA's can be capped by user (default 3).
This implementation is based on the original AC in SeqAn.
typedef ::seqan::Pattern<PeptideDB, ::seqan::FuzzyAC> FuzzyACPattern |
|
private |
typedef ::seqan::StringSet<::seqan::AAString> PeptideDB |
|
inline |
Default Ctor; call setProtein() before using findNext().
|
inline |
Prepare to start searching for hits in a new protein sequence.
This only sets the sequence. No computation is performed. Use findNext() to enumerate the hits.
protein_sequence | Sequence (ambiguous characters allowed) |
References AhoCorasickAmbiguous::setProtein().
|
inline |
Enumerate hits.
pattern | The pattern (i.e. trie) created with initPattern(). |
References AhoCorasickAmbiguous::dh_, seqan::find(), and AhoCorasickAmbiguous::finder_.
Referenced by PeptideIndexing::addHits_().
|
inline |
Get index of hit into peptide database of the pattern.
Only valid if findNext() returned true before.
References AhoCorasickAmbiguous::dh_, and seqan::position().
Referenced by PeptideIndexing::addHits_().
|
inline |
Offset into protein sequence where hit was found.
Only valid if findNext() returned true before.
References AhoCorasickAmbiguous::finder_, and seqan::position().
Referenced by PeptideIndexing::addHits_().
|
inlinestatic |
Construct a trie from a set of peptide sequences (which are to be found in a protein).
Peptides must not contain ambiguous characters (exception thrown otherwise) or unknown characters (such as J or U). Ambiguous characters are only allowed in protein sequences.
Usage: Build the pattern only once and use it multiple times when running findNext().
pep_db | Set of peptides |
aaa_max | Maximum allowed ambiguous characters in the matching protein sequence |
mm_max | Maximum allowed mismatches in the matching protein sequence |
pattern | The pattern to be created |
Exception::InvalidValue | if a peptide contains an unknown (U,J,...) or ambiguous character |
Referenced by PeptideIndexing::run().
|
inline |
Reset to new protein sequence. All previous data is forgotten.
References AhoCorasickAmbiguous::dh_, AhoCorasickAmbiguous::finder_, AhoCorasickAmbiguous::protein_, and PatternAuxData< TNeedle >::reset().
Referenced by PeptideIndexing::addHits_(), and AhoCorasickAmbiguous::AhoCorasickAmbiguous().
|
private |
auxiliary data to hold a state after searching
Referenced by AhoCorasickAmbiguous::findNext(), AhoCorasickAmbiguous::getHitDBIndex(), and AhoCorasickAmbiguous::setProtein().
|
private |
locate the next peptide hit in protein
Referenced by AhoCorasickAmbiguous::findNext(), AhoCorasickAmbiguous::getHitProteinPosition(), and AhoCorasickAmbiguous::setProtein().
|
private |
the protein sequence - we need to store it since the finder only keeps a pointer to protein when constructed
Referenced by AhoCorasickAmbiguous::setProtein().