OpenMS
2.4.0
|
Collection of functions for filtering peptide and protein identifications. More...
#include <OpenMS/FILTERING/ID/IDFilter.h>
Classes | |
struct | DigestionFilter |
Is peptide evidence digestion product of some protein. More... | |
struct | GetMatchingItems |
Builds a map index of data that have a String index to find matches and return the objects. More... | |
struct | HasDecoyAnnotation |
Is this a decoy hit? More... | |
struct | HasGoodScore |
Is the score of this hit at least as good as the given value? More... | |
struct | HasMatchingAccession |
Given a list of protein accessions, do any occur in the annotation(s) of this hit? More... | |
struct | HasMaxMetaValue |
Does a meta value of this hit have at most the given value? More... | |
struct | HasMaxRank |
Is the rank of this hit below or at the given cut-off? More... | |
struct | HasMetaValue |
Is a meta value with given key and value set on this hit? More... | |
struct | HasNoHits |
Is the list of hits of this peptide/protein ID empty? More... | |
class | PeptideDigestionFilter |
Filter Peptide Hit by its digestion product. More... | |
Public Member Functions | |
IDFilter () | |
Constructor. More... | |
virtual | ~IDFilter () |
Destructor. More... | |
Static Public Member Functions | |
Higher-order filter functions | |
Functions for filtering a container based on a predicate | |
template<class Container , class Predicate > | |
static void | removeMatchingItems (Container &items, const Predicate &pred) |
Remove items that satisfy a condition from a container (e.g. vector) More... | |
template<class Container , class Predicate > | |
static void | keepMatchingItems (Container &items, const Predicate &pred) |
Keep items that satisfy a condition in a container (e.g. vector), removing all others. More... | |
Helper functions | |
template<class IdentificationType > | |
static Size | countHits (const std::vector< IdentificationType > &ids) |
Returns the total number of peptide/protein hits in a vector of peptide/protein identifications. More... | |
template<class IdentificationType > | |
static bool | getBestHit (const std::vector< IdentificationType > &identifications, bool assume_sorted, typename IdentificationType::HitType &best_hit) |
Finds the best-scoring hit in a vector of peptide or protein identifications. More... | |
static void | extractPeptideSequences (const std::vector< PeptideIdentification > &peptides, std::set< String > &sequences, bool ignore_mods=false) |
Extracts all unique peptide sequences from a list of peptide IDs. More... | |
template<class EvidenceFilter > | |
static void | FilterPeptideEvidences (EvidenceFilter &filter, std::vector< PeptideIdentification > &peptides) |
remove peptide evidences based on a filter More... | |
Clean-up functions | |
template<class IdentificationType > | |
static void | updateHitRanks (std::vector< IdentificationType > &ids) |
Updates the hit ranks on all peptide or protein IDs. More... | |
static void | removeUnreferencedProteins (std::vector< ProteinIdentification > &proteins, const std::vector< PeptideIdentification > &peptides) |
Removes protein hits from proteins that are not referenced by a peptide in peptides . More... | |
static void | updateProteinReferences (std::vector< PeptideIdentification > &peptides, const std::vector< ProteinIdentification > &proteins, bool remove_peptides_without_reference=false) |
Removes references to missing proteins. More... | |
static bool | updateProteinGroups (std::vector< ProteinIdentification::ProteinGroup > &groups, const std::vector< ProteinHit > &hits) |
Update protein groups after protein hits were filtered. More... | |
Filter functions for peptide or protein IDs | |
template<class IdentificationType > | |
static void | removeEmptyIdentifications (std::vector< IdentificationType > &ids) |
Removes peptide or protein identifications that have no hits in them. More... | |
template<class IdentificationType > | |
static void | filterHitsByScore (std::vector< IdentificationType > &ids, double threshold_score) |
Filters peptide or protein identifications according to the score of the hits. More... | |
template<class IdentificationType > | |
static void | filterHitsBySignificance (std::vector< IdentificationType > &ids, double threshold_fraction=1.0) |
Filters peptide or protein identifications according to the significance threshold of the hits. More... | |
template<class IdentificationType > | |
static void | keepNBestHits (std::vector< IdentificationType > &ids, Size n) |
Filters peptide or protein identifications according to the score of the hits, keeping the n best hits per ID. More... | |
template<class IdentificationType > | |
static void | filterHitsByRank (std::vector< IdentificationType > &ids, Size min_rank, Size max_rank) |
Filters peptide or protein identifications according to the ranking of the hits. More... | |
template<class IdentificationType > | |
static void | removeDecoyHits (std::vector< IdentificationType > &ids) |
Removes hits annotated as decoys from peptide or protein identifications. More... | |
template<class IdentificationType > | |
static void | removeHitsMatchingProteins (std::vector< IdentificationType > &ids, const std::set< String > accessions) |
Filters peptide or protein identifications according to the given proteins (negative). More... | |
template<class IdentificationType > | |
static void | keepHitsMatchingProteins (std::vector< IdentificationType > &ids, const std::set< String > accessions) |
Filters peptide or protein identifications according to the given proteins (positive). More... | |
Filter functions for peptide IDs only | |
static void | keepBestPeptideHits (std::vector< PeptideIdentification > &peptides, bool strict=false) |
Filters peptide identifications keeping only the single best-scoring hit per ID. More... | |
static void | filterPeptidesByLength (std::vector< PeptideIdentification > &peptides, Size min_length, Size max_length=UINT_MAX) |
Filters peptide identifications according to peptide sequence length. More... | |
static void | filterPeptidesByCharge (std::vector< PeptideIdentification > &peptides, Int min_charge, Int max_charge) |
Filters peptide identifications according to charge state. More... | |
static void | filterPeptidesByRT (std::vector< PeptideIdentification > &peptides, double min_rt, double max_rt) |
Filters peptide identifications by precursor RT, keeping only IDs in the given range. More... | |
static void | filterPeptidesByMZ (std::vector< PeptideIdentification > &peptides, double min_mz, double max_mz) |
Filters peptide identifications by precursor m/z, keeping only IDs in the given range. More... | |
static void | filterPeptidesByMZError (std::vector< PeptideIdentification > &peptides, double mass_error, bool unit_ppm) |
Filter peptide identifications according to mass deviation. More... | |
template<class Filter > | |
static void | filterPeptideEvidences (Filter &filter, std::vector< PeptideIdentification > &peptides) |
Digest a collection of proteins and filter PeptideEvidences based on specificity PeptideEvidences of peptides are removed if the digest of a protein did not produce the peptide sequence. More... | |
static void | filterPeptidesByRTPredictPValue (std::vector< PeptideIdentification > &peptides, const String &metavalue_key, double threshold=0.05) |
Filters peptide identifications according to p-values from RTPredict. More... | |
static void | removePeptidesWithMatchingModifications (std::vector< PeptideIdentification > &peptides, const std::set< String > &modifications) |
Removes all peptide hits that have at least one of the given modifications. More... | |
static void | keepPeptidesWithMatchingModifications (std::vector< PeptideIdentification > &peptides, const std::set< String > &modifications) |
Keeps only peptide hits that have at least one of the given modifications. More... | |
static void | removePeptidesWithMatchingSequences (std::vector< PeptideIdentification > &peptides, const std::vector< PeptideIdentification > &bad_peptides, bool ignore_mods=false) |
Removes all peptide hits with a sequence that matches one in bad_peptides . More... | |
static void | keepPeptidesWithMatchingSequences (std::vector< PeptideIdentification > &peptides, const std::vector< PeptideIdentification > &good_peptides, bool ignore_mods=false) |
Removes all peptide hits with a sequence that does not match one in good_peptides . More... | |
static void | keepUniquePeptidesPerProtein (std::vector< PeptideIdentification > &peptides) |
Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer) More... | |
static void | removeDuplicatePeptideHits (std::vector< PeptideIdentification > &peptides, bool seq_only=false) |
Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID). More... | |
Filter functions for MS/MS experiments | |
static void | filterHitsByScore (PeakMap &experiment, double peptide_threshold_score, double protein_threshold_score) |
Filters an MS/MS experiment according to score thresholds. More... | |
static void | filterHitsBySignificance (PeakMap &experiment, double peptide_threshold_fraction, double protein_threshold_fraction) |
Filters an MS/MS experiment according to fractions of the significance thresholds. More... | |
static void | keepNBestHits (PeakMap &experiment, Size n) |
Filters an MS/MS experiment by keeping the N best peptide hits for every spectrum. More... | |
static void | keepHitsMatchingProteins (PeakMap &experiment, const std::vector< FASTAFile::FASTAEntry > &proteins) |
Filters an MS/MS experiment according to the given proteins. More... | |
Collection of functions for filtering peptide and protein identifications.
This class provides functions for filtering collections of peptide or protein identifications according to various criteria. It also contains helper functions and classes (functors that implement predicates) that are used in this context.
The filter functions modify their inputs, rather than creating filtered copies.
Most filters work on the hit level, i.e. they remove peptide or protein hits from peptide or protein identifications (IDs). A few filters work on the ID level instead, i.e. they remove peptide or protein IDs from vectors thereof. Independent of this, the inputs for all filter functions are vectors of IDs, because the data most often comes in this form. This design also allows many helper objects to be set up only once per vector, rather than once per ID.
The filter functions for vectors of peptide/protein IDs do not include clean-up steps (e.g. removal of IDs without hits, reassignment of hit ranks, ...). They only carry out their specific filtering operations. This is so filters can be chained without having to repeat clean-up operations. The group of clean-up functions provides helpers that are useful to ensure data integrity after filters have been applied, but it is up to the individual developer to use them when necessary.
The filter functions for MS/MS experiments do include clean-up steps, because they filter peptide and protein IDs in conjunction and potential contradictions between the two must be eliminated.
IDFilter | ( | ) |
Constructor.
|
virtual |
Destructor.
|
inlinestatic |
Returns the total number of peptide/protein hits in a vector of peptide/protein identifications.
|
static |
Extracts all unique peptide sequences from a list of peptide IDs.
peptides | Input |
sequences | Output |
ignore_mods | Extract sequences without modifications? |
|
inlinestatic |
Filters peptide or protein identifications according to the ranking of the hits.
The hits between min_rank
and max_rank
(both inclusive) in each ID are kept. Counting starts at 1, i.e. the best (highest/lowest scoring) hit has rank 1. The ranks are (re-)computed before filtering. max_rank
is ignored if it is smaller than min_rank
.
Note that there may be several hits with the same rank in a peptide or protein ID (if the scores are the same).
This method is useful if a range of higher hits is needed for decoy fairness analysis.
|
inlinestatic |
Filters peptide or protein identifications according to the score of the hits.
Only peptide/protein hits with a score at least as good as threshold_score
are kept. Score orientation (are higher scores better?) is taken into account.
|
inlinestatic |
Filters an MS/MS experiment according to score thresholds.
References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().
|
inlinestatic |
Filters peptide or protein identifications according to the significance threshold of the hits.
Only peptide/protein hits which reach a score above (or below, depending on score orientation) threshold_fraction
* significance_threshold
(as stored in the ID) are kept.
|
inlinestatic |
Filters an MS/MS experiment according to fractions of the significance thresholds.
References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().
|
inlinestatic |
remove peptide evidences based on a filter
filter | filter function that overloads ()(PeptideEvidence&) operator |
peptides | a collection of peptide evidences |
|
static |
Digest a collection of proteins and filter PeptideEvidences based on specificity PeptideEvidences of peptides are removed if the digest of a protein did not produce the peptide sequence.
filter | filter function on PeptideEvidence level |
peptides | PeptideIdentification that will be scanned and filtered |
|
static |
Filters peptide identifications according to charge state.
Only peptide hits with a charge state between min_charge
and max_charge
(both inclusive) are kept. max_charge
is ignored if it is smaller than min_charge
.
|
static |
Filters peptide identifications according to peptide sequence length.
Only peptide hits with a sequence length between min_length
and max_length
(both inclusive) are kept. max_length
is ignored if it is smaller than min_length
.
|
static |
Filters peptide identifications by precursor m/z, keeping only IDs in the given range.
|
static |
Filter peptide identifications according to mass deviation.
Only peptide hits with a low mass deviation (between theoretical peptide mass and precursor mass) are kept.
identification | Input/output |
mass_error | Threshold for the mass deviation |
unit_ppm | Is mass_error given in PPM? |
|
static |
Filters peptide identifications by precursor RT, keeping only IDs in the given range.
|
static |
Filters peptide identifications according to p-values from RTPredict.
Filters the peptide hits by the probability (p-value) of a correct peptide identification having a deviation between observed and predicted RT equal to or greater than allowed.
peptides | Input/output |
metavalue_key | Name of the meta value that holds the p-value: "predicted_RT_p_value" or "predicted_RT_p_value_first_dim" |
threshold | P-value threshold |
|
inlinestatic |
Finds the best-scoring hit in a vector of peptide or protein identifications.
If there are several hits with the best score, the first one is taken.
identifications | Vector of peptide or protein IDs, each containing one or more (peptide/protein) hits |
assume_sorted | Are hits sorted by score (best score first) already? This allows for faster query, since only the first hit needs to be looked at |
Exception::InvalidValue if the IDs have different score types (i.e. scores cannot be compared)
|
static |
Filters peptide identifications keeping only the single best-scoring hit per ID.
peptides | Input/output |
strict | If set, keep the best hit only if its score is unique - i.e. ties are not allowed. (Otherwise all hits with the best score is kept.) |
|
inlinestatic |
Filters peptide or protein identifications according to the given proteins (positive).
Hits with no matching protein accession in accessions
are removed.
|
inlinestatic |
Filters an MS/MS experiment according to the given proteins.
References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().
|
inlinestatic |
Keep items that satisfy a condition in a container (e.g. vector), removing all others.
|
inlinestatic |
Filters peptide or protein identifications according to the score of the hits, keeping the n
best hits per ID.
The score orientation (are higher scores better?) is taken into account.
Filters an MS/MS experiment by keeping the N best peptide hits for every spectrum.
References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().
|
static |
Keeps only peptide hits that have at least one of the given modifications.
|
static |
Removes all peptide hits with a sequence that does not match one in good_peptides
.
If ignore_mods
is set, unmodified sequences are generated and compared to the given ones.
|
static |
Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer)
|
inlinestatic |
Removes hits annotated as decoys from peptide or protein identifications.
Checks for meta values named "target_decoy" and "isDecoy", and removes protein/peptide hits if the values are "decoy" and "true", respectively.
|
static |
Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID).
By default, hits are considered duplicated if they compare as equal using PeptideHit::operator==. However, if seq_only
is set, only the sequences (incl. modifications) are compared. In both cases, the first occurrence of each hit in a peptide ID is kept, later ones are removed.
|
inlinestatic |
Removes peptide or protein identifications that have no hits in them.
|
inlinestatic |
Filters peptide or protein identifications according to the given proteins (negative).
Hits with a matching protein accession in accessions
are removed.
|
inlinestatic |
Remove items that satisfy a condition from a container (e.g. vector)
|
static |
Removes all peptide hits that have at least one of the given modifications.
|
static |
Removes all peptide hits with a sequence that matches one in bad_peptides
.
If ignore_mods
is set, unmodified sequences are generated and compared to the given ones.
|
static |
Removes protein hits from proteins
that are not referenced by a peptide in peptides
.
|
inlinestatic |
Updates the hit ranks on all peptide or protein IDs.
|
static |
Update protein groups after protein hits were filtered.
groups | Input/output protein groups |
hits | Available protein hits (all others are removed from the groups) |
|
static |
Removes references to missing proteins.
Only PeptideEvidence entries that reference protein hits in proteins
are kept in the peptide hits.
If remove_peptides_without_reference
is set, peptide hits without any remaining protein reference are removed.