OpenMS  2.4.0
Classes | Public Member Functions | List of all members
IDFilter Class Reference

Collection of functions for filtering peptide and protein identifications. More...

#include <OpenMS/FILTERING/ID/IDFilter.h>

Classes

struct  DigestionFilter
 Is peptide evidence digestion product of some protein. More...
 
struct  GetMatchingItems
 Builds a map index of data that have a String index to find matches and return the objects. More...
 
struct  HasDecoyAnnotation
 Is this a decoy hit? More...
 
struct  HasGoodScore
 Is the score of this hit at least as good as the given value? More...
 
struct  HasMatchingAccession
 Given a list of protein accessions, do any occur in the annotation(s) of this hit? More...
 
struct  HasMaxMetaValue
 Does a meta value of this hit have at most the given value? More...
 
struct  HasMaxRank
 Is the rank of this hit below or at the given cut-off? More...
 
struct  HasMetaValue
 Is a meta value with given key and value set on this hit? More...
 
struct  HasNoHits
 Is the list of hits of this peptide/protein ID empty? More...
 
class  PeptideDigestionFilter
 Filter Peptide Hit by its digestion product. More...
 

Public Member Functions

 IDFilter ()
 Constructor. More...
 
virtual ~IDFilter ()
 Destructor. More...
 

Static Public Member Functions

Higher-order filter functions

Functions for filtering a container based on a predicate

template<class Container , class Predicate >
static void removeMatchingItems (Container &items, const Predicate &pred)
 Remove items that satisfy a condition from a container (e.g. vector) More...
 
template<class Container , class Predicate >
static void keepMatchingItems (Container &items, const Predicate &pred)
 Keep items that satisfy a condition in a container (e.g. vector), removing all others. More...
 
Helper functions
template<class IdentificationType >
static Size countHits (const std::vector< IdentificationType > &ids)
 Returns the total number of peptide/protein hits in a vector of peptide/protein identifications. More...
 
template<class IdentificationType >
static bool getBestHit (const std::vector< IdentificationType > &identifications, bool assume_sorted, typename IdentificationType::HitType &best_hit)
 Finds the best-scoring hit in a vector of peptide or protein identifications. More...
 
static void extractPeptideSequences (const std::vector< PeptideIdentification > &peptides, std::set< String > &sequences, bool ignore_mods=false)
 Extracts all unique peptide sequences from a list of peptide IDs. More...
 
template<class EvidenceFilter >
static void FilterPeptideEvidences (EvidenceFilter &filter, std::vector< PeptideIdentification > &peptides)
 remove peptide evidences based on a filter More...
 
Clean-up functions
template<class IdentificationType >
static void updateHitRanks (std::vector< IdentificationType > &ids)
 Updates the hit ranks on all peptide or protein IDs. More...
 
static void removeUnreferencedProteins (std::vector< ProteinIdentification > &proteins, const std::vector< PeptideIdentification > &peptides)
 Removes protein hits from proteins that are not referenced by a peptide in peptides. More...
 
static void updateProteinReferences (std::vector< PeptideIdentification > &peptides, const std::vector< ProteinIdentification > &proteins, bool remove_peptides_without_reference=false)
 Removes references to missing proteins. More...
 
static bool updateProteinGroups (std::vector< ProteinIdentification::ProteinGroup > &groups, const std::vector< ProteinHit > &hits)
 Update protein groups after protein hits were filtered. More...
 
Filter functions for peptide or protein IDs
template<class IdentificationType >
static void removeEmptyIdentifications (std::vector< IdentificationType > &ids)
 Removes peptide or protein identifications that have no hits in them. More...
 
template<class IdentificationType >
static void filterHitsByScore (std::vector< IdentificationType > &ids, double threshold_score)
 Filters peptide or protein identifications according to the score of the hits. More...
 
template<class IdentificationType >
static void filterHitsBySignificance (std::vector< IdentificationType > &ids, double threshold_fraction=1.0)
 Filters peptide or protein identifications according to the significance threshold of the hits. More...
 
template<class IdentificationType >
static void keepNBestHits (std::vector< IdentificationType > &ids, Size n)
 Filters peptide or protein identifications according to the score of the hits, keeping the n best hits per ID. More...
 
template<class IdentificationType >
static void filterHitsByRank (std::vector< IdentificationType > &ids, Size min_rank, Size max_rank)
 Filters peptide or protein identifications according to the ranking of the hits. More...
 
template<class IdentificationType >
static void removeDecoyHits (std::vector< IdentificationType > &ids)
 Removes hits annotated as decoys from peptide or protein identifications. More...
 
template<class IdentificationType >
static void removeHitsMatchingProteins (std::vector< IdentificationType > &ids, const std::set< String > accessions)
 Filters peptide or protein identifications according to the given proteins (negative). More...
 
template<class IdentificationType >
static void keepHitsMatchingProteins (std::vector< IdentificationType > &ids, const std::set< String > accessions)
 Filters peptide or protein identifications according to the given proteins (positive). More...
 
Filter functions for peptide IDs only
static void keepBestPeptideHits (std::vector< PeptideIdentification > &peptides, bool strict=false)
 Filters peptide identifications keeping only the single best-scoring hit per ID. More...
 
static void filterPeptidesByLength (std::vector< PeptideIdentification > &peptides, Size min_length, Size max_length=UINT_MAX)
 Filters peptide identifications according to peptide sequence length. More...
 
static void filterPeptidesByCharge (std::vector< PeptideIdentification > &peptides, Int min_charge, Int max_charge)
 Filters peptide identifications according to charge state. More...
 
static void filterPeptidesByRT (std::vector< PeptideIdentification > &peptides, double min_rt, double max_rt)
 Filters peptide identifications by precursor RT, keeping only IDs in the given range. More...
 
static void filterPeptidesByMZ (std::vector< PeptideIdentification > &peptides, double min_mz, double max_mz)
 Filters peptide identifications by precursor m/z, keeping only IDs in the given range. More...
 
static void filterPeptidesByMZError (std::vector< PeptideIdentification > &peptides, double mass_error, bool unit_ppm)
 Filter peptide identifications according to mass deviation. More...
 
template<class Filter >
static void filterPeptideEvidences (Filter &filter, std::vector< PeptideIdentification > &peptides)
 Digest a collection of proteins and filter PeptideEvidences based on specificity PeptideEvidences of peptides are removed if the digest of a protein did not produce the peptide sequence. More...
 
static void filterPeptidesByRTPredictPValue (std::vector< PeptideIdentification > &peptides, const String &metavalue_key, double threshold=0.05)
 Filters peptide identifications according to p-values from RTPredict. More...
 
static void removePeptidesWithMatchingModifications (std::vector< PeptideIdentification > &peptides, const std::set< String > &modifications)
 Removes all peptide hits that have at least one of the given modifications. More...
 
static void keepPeptidesWithMatchingModifications (std::vector< PeptideIdentification > &peptides, const std::set< String > &modifications)
 Keeps only peptide hits that have at least one of the given modifications. More...
 
static void removePeptidesWithMatchingSequences (std::vector< PeptideIdentification > &peptides, const std::vector< PeptideIdentification > &bad_peptides, bool ignore_mods=false)
 Removes all peptide hits with a sequence that matches one in bad_peptides. More...
 
static void keepPeptidesWithMatchingSequences (std::vector< PeptideIdentification > &peptides, const std::vector< PeptideIdentification > &good_peptides, bool ignore_mods=false)
 Removes all peptide hits with a sequence that does not match one in good_peptides. More...
 
static void keepUniquePeptidesPerProtein (std::vector< PeptideIdentification > &peptides)
 Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer) More...
 
static void removeDuplicatePeptideHits (std::vector< PeptideIdentification > &peptides, bool seq_only=false)
 Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID). More...
 
Filter functions for MS/MS experiments
static void filterHitsByScore (PeakMap &experiment, double peptide_threshold_score, double protein_threshold_score)
 Filters an MS/MS experiment according to score thresholds. More...
 
static void filterHitsBySignificance (PeakMap &experiment, double peptide_threshold_fraction, double protein_threshold_fraction)
 Filters an MS/MS experiment according to fractions of the significance thresholds. More...
 
static void keepNBestHits (PeakMap &experiment, Size n)
 Filters an MS/MS experiment by keeping the N best peptide hits for every spectrum. More...
 
static void keepHitsMatchingProteins (PeakMap &experiment, const std::vector< FASTAFile::FASTAEntry > &proteins)
 Filters an MS/MS experiment according to the given proteins. More...
 

Detailed Description

Collection of functions for filtering peptide and protein identifications.

This class provides functions for filtering collections of peptide or protein identifications according to various criteria. It also contains helper functions and classes (functors that implement predicates) that are used in this context.

The filter functions modify their inputs, rather than creating filtered copies.

Most filters work on the hit level, i.e. they remove peptide or protein hits from peptide or protein identifications (IDs). A few filters work on the ID level instead, i.e. they remove peptide or protein IDs from vectors thereof. Independent of this, the inputs for all filter functions are vectors of IDs, because the data most often comes in this form. This design also allows many helper objects to be set up only once per vector, rather than once per ID.

The filter functions for vectors of peptide/protein IDs do not include clean-up steps (e.g. removal of IDs without hits, reassignment of hit ranks, ...). They only carry out their specific filtering operations. This is so filters can be chained without having to repeat clean-up operations. The group of clean-up functions provides helpers that are useful to ensure data integrity after filters have been applied, but it is up to the individual developer to use them when necessary.

The filter functions for MS/MS experiments do include clean-up steps, because they filter peptide and protein IDs in conjunction and potential contradictions between the two must be eliminated.

Constructor & Destructor Documentation

◆ IDFilter()

IDFilter ( )

Constructor.

◆ ~IDFilter()

virtual ~IDFilter ( )
virtual

Destructor.

Member Function Documentation

◆ countHits()

static Size countHits ( const std::vector< IdentificationType > &  ids)
inlinestatic

Returns the total number of peptide/protein hits in a vector of peptide/protein identifications.

◆ extractPeptideSequences()

static void extractPeptideSequences ( const std::vector< PeptideIdentification > &  peptides,
std::set< String > &  sequences,
bool  ignore_mods = false 
)
static

Extracts all unique peptide sequences from a list of peptide IDs.

Parameters
peptidesInput
sequencesOutput
ignore_modsExtract sequences without modifications?

◆ filterHitsByRank()

static void filterHitsByRank ( std::vector< IdentificationType > &  ids,
Size  min_rank,
Size  max_rank 
)
inlinestatic

Filters peptide or protein identifications according to the ranking of the hits.

The hits between min_rank and max_rank (both inclusive) in each ID are kept. Counting starts at 1, i.e. the best (highest/lowest scoring) hit has rank 1. The ranks are (re-)computed before filtering. max_rank is ignored if it is smaller than min_rank.

Note that there may be several hits with the same rank in a peptide or protein ID (if the scores are the same).

This method is useful if a range of higher hits is needed for decoy fairness analysis.

Note
The ranks of the hits may be invalidated.

◆ filterHitsByScore() [1/2]

static void filterHitsByScore ( std::vector< IdentificationType > &  ids,
double  threshold_score 
)
inlinestatic

Filters peptide or protein identifications according to the score of the hits.

Only peptide/protein hits with a score at least as good as threshold_score are kept. Score orientation (are higher scores better?) is taken into account.

◆ filterHitsByScore() [2/2]

static void filterHitsByScore ( PeakMap experiment,
double  peptide_threshold_score,
double  protein_threshold_score 
)
inlinestatic

Filters an MS/MS experiment according to score thresholds.

References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().

◆ filterHitsBySignificance() [1/2]

static void filterHitsBySignificance ( std::vector< IdentificationType > &  ids,
double  threshold_fraction = 1.0 
)
inlinestatic

Filters peptide or protein identifications according to the significance threshold of the hits.

Only peptide/protein hits which reach a score above (or below, depending on score orientation) threshold_fraction * significance_threshold (as stored in the ID) are kept.

◆ filterHitsBySignificance() [2/2]

static void filterHitsBySignificance ( PeakMap experiment,
double  peptide_threshold_fraction,
double  protein_threshold_fraction 
)
inlinestatic

Filters an MS/MS experiment according to fractions of the significance thresholds.

References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().

◆ FilterPeptideEvidences()

static void FilterPeptideEvidences ( EvidenceFilter &  filter,
std::vector< PeptideIdentification > &  peptides 
)
inlinestatic

remove peptide evidences based on a filter

Parameters
filterfilter function that overloads ()(PeptideEvidence&) operator
peptidesa collection of peptide evidences

◆ filterPeptideEvidences()

static void filterPeptideEvidences ( Filter &  filter,
std::vector< PeptideIdentification > &  peptides 
)
static

Digest a collection of proteins and filter PeptideEvidences based on specificity PeptideEvidences of peptides are removed if the digest of a protein did not produce the peptide sequence.

Parameters
filterfilter function on PeptideEvidence level
peptidesPeptideIdentification that will be scanned and filtered

◆ filterPeptidesByCharge()

static void filterPeptidesByCharge ( std::vector< PeptideIdentification > &  peptides,
Int  min_charge,
Int  max_charge 
)
static

Filters peptide identifications according to charge state.

Only peptide hits with a charge state between min_charge and max_charge (both inclusive) are kept. max_charge is ignored if it is smaller than min_charge.

Note
The ranks of the hits may be invalidated.

◆ filterPeptidesByLength()

static void filterPeptidesByLength ( std::vector< PeptideIdentification > &  peptides,
Size  min_length,
Size  max_length = UINT_MAX 
)
static

Filters peptide identifications according to peptide sequence length.

Only peptide hits with a sequence length between min_length and max_length (both inclusive) are kept. max_length is ignored if it is smaller than min_length.

Note
The ranks of the hits may be invalidated.

◆ filterPeptidesByMZ()

static void filterPeptidesByMZ ( std::vector< PeptideIdentification > &  peptides,
double  min_mz,
double  max_mz 
)
static

Filters peptide identifications by precursor m/z, keeping only IDs in the given range.

◆ filterPeptidesByMZError()

static void filterPeptidesByMZError ( std::vector< PeptideIdentification > &  peptides,
double  mass_error,
bool  unit_ppm 
)
static

Filter peptide identifications according to mass deviation.

Only peptide hits with a low mass deviation (between theoretical peptide mass and precursor mass) are kept.

Parameters
identificationInput/output
mass_errorThreshold for the mass deviation
unit_ppmIs mass_error given in PPM?
Note
The ranks of the hits may be invalidated.

◆ filterPeptidesByRT()

static void filterPeptidesByRT ( std::vector< PeptideIdentification > &  peptides,
double  min_rt,
double  max_rt 
)
static

Filters peptide identifications by precursor RT, keeping only IDs in the given range.

◆ filterPeptidesByRTPredictPValue()

static void filterPeptidesByRTPredictPValue ( std::vector< PeptideIdentification > &  peptides,
const String metavalue_key,
double  threshold = 0.05 
)
static

Filters peptide identifications according to p-values from RTPredict.

Filters the peptide hits by the probability (p-value) of a correct peptide identification having a deviation between observed and predicted RT equal to or greater than allowed.

Parameters
peptidesInput/output
metavalue_keyName of the meta value that holds the p-value: "predicted_RT_p_value" or "predicted_RT_p_value_first_dim"
thresholdP-value threshold
Note
The ranks of the hits may be invalidated.

◆ getBestHit()

static bool getBestHit ( const std::vector< IdentificationType > &  identifications,
bool  assume_sorted,
typename IdentificationType::HitType &  best_hit 
)
inlinestatic

Finds the best-scoring hit in a vector of peptide or protein identifications.

If there are several hits with the best score, the first one is taken.

Parameters
identificationsVector of peptide or protein IDs, each containing one or more (peptide/protein) hits
assume_sortedAre hits sorted by score (best score first) already? This allows for faster query, since only the first hit needs to be looked at

Exception::InvalidValue if the IDs have different score types (i.e. scores cannot be compared)

Returns
true if a hit was present, false otherwise

◆ keepBestPeptideHits()

static void keepBestPeptideHits ( std::vector< PeptideIdentification > &  peptides,
bool  strict = false 
)
static

Filters peptide identifications keeping only the single best-scoring hit per ID.

Parameters
peptidesInput/output
strictIf set, keep the best hit only if its score is unique - i.e. ties are not allowed. (Otherwise all hits with the best score is kept.)

◆ keepHitsMatchingProteins() [1/2]

static void keepHitsMatchingProteins ( std::vector< IdentificationType > &  ids,
const std::set< String accessions 
)
inlinestatic

Filters peptide or protein identifications according to the given proteins (positive).

Hits with no matching protein accession in accessions are removed.

Note
The ranks of the hits may be invalidated.

◆ keepHitsMatchingProteins() [2/2]

static void keepHitsMatchingProteins ( PeakMap experiment,
const std::vector< FASTAFile::FASTAEntry > &  proteins 
)
inlinestatic

Filters an MS/MS experiment according to the given proteins.

References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().

◆ keepMatchingItems()

static void keepMatchingItems ( Container &  items,
const Predicate &  pred 
)
inlinestatic

Keep items that satisfy a condition in a container (e.g. vector), removing all others.

◆ keepNBestHits() [1/2]

static void keepNBestHits ( std::vector< IdentificationType > &  ids,
Size  n 
)
inlinestatic

Filters peptide or protein identifications according to the score of the hits, keeping the n best hits per ID.

The score orientation (are higher scores better?) is taken into account.

◆ keepNBestHits() [2/2]

static void keepNBestHits ( PeakMap experiment,
Size  n 
)
inlinestatic

Filters an MS/MS experiment by keeping the N best peptide hits for every spectrum.

References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().

◆ keepPeptidesWithMatchingModifications()

static void keepPeptidesWithMatchingModifications ( std::vector< PeptideIdentification > &  peptides,
const std::set< String > &  modifications 
)
static

Keeps only peptide hits that have at least one of the given modifications.

◆ keepPeptidesWithMatchingSequences()

static void keepPeptidesWithMatchingSequences ( std::vector< PeptideIdentification > &  peptides,
const std::vector< PeptideIdentification > &  good_peptides,
bool  ignore_mods = false 
)
static

Removes all peptide hits with a sequence that does not match one in good_peptides.

If ignore_mods is set, unmodified sequences are generated and compared to the given ones.

Note
The ranks of the hits may be invalidated.

◆ keepUniquePeptidesPerProtein()

static void keepUniquePeptidesPerProtein ( std::vector< PeptideIdentification > &  peptides)
static

Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer)

◆ removeDecoyHits()

static void removeDecoyHits ( std::vector< IdentificationType > &  ids)
inlinestatic

Removes hits annotated as decoys from peptide or protein identifications.

Checks for meta values named "target_decoy" and "isDecoy", and removes protein/peptide hits if the values are "decoy" and "true", respectively.

Note
The ranks of the hits may be invalidated.

◆ removeDuplicatePeptideHits()

static void removeDuplicatePeptideHits ( std::vector< PeptideIdentification > &  peptides,
bool  seq_only = false 
)
static

Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID).

By default, hits are considered duplicated if they compare as equal using PeptideHit::operator==. However, if seq_only is set, only the sequences (incl. modifications) are compared. In both cases, the first occurrence of each hit in a peptide ID is kept, later ones are removed.

◆ removeEmptyIdentifications()

static void removeEmptyIdentifications ( std::vector< IdentificationType > &  ids)
inlinestatic

Removes peptide or protein identifications that have no hits in them.

◆ removeHitsMatchingProteins()

static void removeHitsMatchingProteins ( std::vector< IdentificationType > &  ids,
const std::set< String accessions 
)
inlinestatic

Filters peptide or protein identifications according to the given proteins (negative).

Hits with a matching protein accession in accessions are removed.

Note
The ranks of the hits may be invalidated.

◆ removeMatchingItems()

static void removeMatchingItems ( Container &  items,
const Predicate &  pred 
)
inlinestatic

Remove items that satisfy a condition from a container (e.g. vector)

◆ removePeptidesWithMatchingModifications()

static void removePeptidesWithMatchingModifications ( std::vector< PeptideIdentification > &  peptides,
const std::set< String > &  modifications 
)
static

Removes all peptide hits that have at least one of the given modifications.

◆ removePeptidesWithMatchingSequences()

static void removePeptidesWithMatchingSequences ( std::vector< PeptideIdentification > &  peptides,
const std::vector< PeptideIdentification > &  bad_peptides,
bool  ignore_mods = false 
)
static

Removes all peptide hits with a sequence that matches one in bad_peptides.

If ignore_mods is set, unmodified sequences are generated and compared to the given ones.

Note
The ranks of the hits may be invalidated.

◆ removeUnreferencedProteins()

static void removeUnreferencedProteins ( std::vector< ProteinIdentification > &  proteins,
const std::vector< PeptideIdentification > &  peptides 
)
static

Removes protein hits from proteins that are not referenced by a peptide in peptides.

◆ updateHitRanks()

static void updateHitRanks ( std::vector< IdentificationType > &  ids)
inlinestatic

Updates the hit ranks on all peptide or protein IDs.

◆ updateProteinGroups()

static bool updateProteinGroups ( std::vector< ProteinIdentification::ProteinGroup > &  groups,
const std::vector< ProteinHit > &  hits 
)
static

Update protein groups after protein hits were filtered.

Parameters
groupsInput/output protein groups
hitsAvailable protein hits (all others are removed from the groups)
Returns
Returns whether the groups are still valid (which is the case if only whole groups, if any, were removed).

◆ updateProteinReferences()

static void updateProteinReferences ( std::vector< PeptideIdentification > &  peptides,
const std::vector< ProteinIdentification > &  proteins,
bool  remove_peptides_without_reference = false 
)
static

Removes references to missing proteins.

Only PeptideEvidence entries that reference protein hits in proteins are kept in the peptide hits.

If remove_peptides_without_reference is set, peptide hits without any remaining protein reference are removed.