OpenMS
IDFilter Class Reference

Collection of functions for filtering peptide and protein identifications. More...

#include <OpenMS/FILTERING/ID/IDFilter.h>

Classes

struct  DigestionFilter
 Is peptide evidence digestion product of some protein. More...
 
struct  GetMatchingItems
 Builds a map index of data that have a String index to find matches and return the objects. More...
 
struct  HasDecoyAnnotation
 Is this a decoy hit? More...
 
struct  HasGoodScore
 Is the score of this hit at least as good as the given value? More...
 
struct  HasMatchingAccession
 Given a list of protein accessions, do any occur in the annotation(s) of this hit? More...
 
struct  HasMatchingAccessionUnordered
 Given a list of protein accessions, do any occur in the annotation(s) of this hit? More...
 
struct  HasMaxMetaValue
 Does a meta value of this hit have at most the given value? More...
 
struct  HasMaxRank
 Is the rank of this hit below or at the given cut-off? More...
 
struct  HasMetaValue
 Is a meta value with given key and value set on this hit? More...
 
struct  HasNoHits
 Is the list of hits of this peptide/protein ID empty? More...
 
class  PeptideDigestionFilter
 Filter Peptide Hit by its digestion product. More...
 

Public Types

typedef std::map< Int, PeptideHit * > ChargeToPepHitP
 Typedefs. More...
 
typedef std::unordered_map< std::string, ChargeToPepHitPSequenceToChargeToPepHitP
 
typedef std::map< std::string, SequenceToChargeToPepHitPRunToSequenceToChargeToPepHitP
 

Public Member Functions

 IDFilter ()=default
 Constructor. More...
 
virtual ~IDFilter ()=default
 Destructor. More...
 

Static Public Member Functions

Higher-order filter functions

Functions for filtering a container based on a predicate

template<class Container , class Predicate >
static void removeMatchingItems (Container &items, const Predicate &pred)
 Remove items that satisfy a condition from a container (e.g. vector) More...
 
template<class Container , class Predicate >
static void keepMatchingItems (Container &items, const Predicate &pred)
 Keep items that satisfy a condition in a container (e.g. vector), removing all others. More...
 
template<class Container , class Predicate >
static void moveMatchingItems (Container &items, const Predicate &pred, Container &target)
 Move items that satisfy a condition to a container (e.g. vector) More...
 
template<class IDContainer , class Predicate >
static void removeMatchingItemsUnroll (IDContainer &items, const Predicate &pred)
 Remove Hit items that satisfy a condition in one of our ID containers (e.g. vector of Peptide or ProteinIDs) More...
 
template<class IDContainer , class Predicate >
static void keepMatchingItemsUnroll (IDContainer &items, const Predicate &pred)
 Keep Hit items that satisfy a condition in one of our ID containers (e.g. vector of Peptide or ProteinIDs) More...
 
template<class MapType , class Predicate >
static void keepMatchingPeptideHits (MapType &prot_and_pep_ids, Predicate &pred)
 
template<class MapType , class Predicate >
static void removeMatchingPeptideHits (MapType &prot_and_pep_ids, Predicate &pred)
 
template<class MapType , class Predicate >
static void removeMatchingPeptideIdentifications (MapType &prot_and_pep_ids, Predicate &pred)
 
Helper functions
template<class IdentificationType >
static Size countHits (const std::vector< IdentificationType > &ids)
 Returns the total number of peptide/protein hits in a vector of peptide/protein identifications. More...
 
template<class IdentificationType >
static bool getBestHit (const std::vector< IdentificationType > &identifications, bool assume_sorted, typename IdentificationType::HitType &best_hit)
 Finds the best-scoring hit in a vector of peptide or protein identifications. More...
 
static void extractPeptideSequences (const std::vector< PeptideIdentification > &peptides, std::set< String > &sequences, bool ignore_mods=false)
 Extracts all unique peptide sequences from a list of peptide IDs. More...
 
static std::map< String, std::vector< ProteinHit > > extractUnassignedProteins (ConsensusMap &cmap)
 Extracts all proteins not matched by PSMs in features. More...
 
template<class EvidenceFilter >
static void FilterPeptideEvidences (EvidenceFilter &filter, std::vector< PeptideIdentification > &peptides)
 remove peptide evidences based on a filter More...
 
Clean-up functions
template<class IdentificationType >
static void updateHitRanks (std::vector< IdentificationType > &ids)
 Updates the hit ranks on all peptide or protein IDs. More...
 
static void removeUnreferencedProteins (ConsensusMap &cmap, bool include_unassigned)
 
static void removeUnreferencedProteins (std::vector< ProteinIdentification > &proteins, const std::vector< PeptideIdentification > &peptides)
 Removes protein hits from proteins that are not referenced by a peptide in peptides. More...
 
static void removeUnreferencedProteins (ProteinIdentification &proteins, const std::vector< PeptideIdentification > &peptides)
 Removes protein hits from proteins that are not referenced by a peptide in peptides. More...
 
static void updateProteinReferences (std::vector< PeptideIdentification > &peptides, const std::vector< ProteinIdentification > &proteins, bool remove_peptides_without_reference=false)
 Removes references to missing proteins. More...
 
static void updateProteinReferences (ConsensusMap &cmap, bool remove_peptides_without_reference=false)
 Removes references to missing proteins. More...
 
static void updateProteinReferences (ConsensusMap &cmap, const ProteinIdentification &ref_run, bool remove_peptides_without_reference=false)
 Removes references to missing proteins. More...
 
static bool updateProteinGroups (std::vector< ProteinIdentification::ProteinGroup > &groups, const std::vector< ProteinHit > &hits)
 Update protein groups after protein hits were filtered. More...
 
static void removeUngroupedProteins (const std::vector< ProteinIdentification::ProteinGroup > &groups, std::vector< ProteinHit > &hits)
 Update protein hits after protein groups were filtered. More...
 
Filter functions for peptide or protein IDs
template<class IdentificationType >
static void removeEmptyIdentifications (std::vector< IdentificationType > &ids)
 Removes peptide or protein identifications that have no hits in them. More...
 
template<class IdentificationType >
static void filterHitsByScore (std::vector< IdentificationType > &ids, double threshold_score)
 Filters peptide or protein identifications according to the score of the hits. More...
 
static void filterGroupsByScore (std::vector< ProteinIdentification::ProteinGroup > &grps, double threshold_score, bool higher_better)
 Filters protein groups according to the score of the groups. More...
 
template<class IdentificationType >
static void filterHitsByScore (IdentificationType &id, double threshold_score)
 Filters peptide or protein identifications according to the score of the hits. More...
 
template<class IdentificationType >
static void keepNBestHits (std::vector< IdentificationType > &ids, Size n)
 Filters peptide or protein identifications according to the score of the hits, keeping the n best hits per ID. More...
 
template<class IdentificationType >
static void filterHitsByRank (std::vector< IdentificationType > &ids, Size min_rank, Size max_rank)
 Filters peptide or protein identifications according to the ranking of the hits. More...
 
template<class IdentificationType >
static void removeDecoyHits (std::vector< IdentificationType > &ids)
 Removes hits annotated as decoys from peptide or protein identifications. More...
 
template<class IdentificationType >
static void removeHitsMatchingProteins (std::vector< IdentificationType > &ids, const std::set< String > accessions)
 Filters peptide or protein identifications according to the given proteins (negative). More...
 
template<class IdentificationType >
static void keepHitsMatchingProteins (std::vector< IdentificationType > &ids, const std::set< String > &accessions)
 Filters peptide or protein identifications according to the given proteins (positive). More...
 
Filter functions for peptide IDs only
static void keepBestPeptideHits (std::vector< PeptideIdentification > &peptides, bool strict=false)
 Filters peptide identifications keeping only the single best-scoring hit per ID. More...
 
static void filterPeptidesByLength (std::vector< PeptideIdentification > &peptides, Size min_length, Size max_length=UINT_MAX)
 Filters peptide identifications according to peptide sequence length. More...
 
static void filterPeptidesByCharge (std::vector< PeptideIdentification > &peptides, Int min_charge, Int max_charge)
 Filters peptide identifications according to charge state. More...
 
static void filterPeptidesByRT (std::vector< PeptideIdentification > &peptides, double min_rt, double max_rt)
 Filters peptide identifications by precursor RT, keeping only IDs in the given range. More...
 
static void filterPeptidesByMZ (std::vector< PeptideIdentification > &peptides, double min_mz, double max_mz)
 Filters peptide identifications by precursor m/z, keeping only IDs in the given range. More...
 
static void filterPeptidesByMZError (std::vector< PeptideIdentification > &peptides, double mass_error, bool unit_ppm)
 Filter peptide identifications according to mass deviation. More...
 
template<class Filter >
static void filterPeptideEvidences (Filter &filter, std::vector< PeptideIdentification > &peptides)
 Digest a collection of proteins and filter PeptideEvidences based on specificity PeptideEvidences of peptides are removed if the digest of a protein did not produce the peptide sequence. More...
 
static void filterPeptidesByRTPredictPValue (std::vector< PeptideIdentification > &peptides, const String &metavalue_key, double threshold=0.05)
 Filters peptide identifications according to p-values from RTPredict. More...
 
static void removePeptidesWithMatchingModifications (std::vector< PeptideIdentification > &peptides, const std::set< String > &modifications)
 Removes all peptide hits that have at least one of the given modifications. More...
 
static void removePeptidesWithMatchingRegEx (std::vector< PeptideIdentification > &peptides, const String &regex)
 
static void keepPeptidesWithMatchingModifications (std::vector< PeptideIdentification > &peptides, const std::set< String > &modifications)
 Keeps only peptide hits that have at least one of the given modifications. More...
 
static void removePeptidesWithMatchingSequences (std::vector< PeptideIdentification > &peptides, const std::vector< PeptideIdentification > &bad_peptides, bool ignore_mods=false)
 Removes all peptide hits with a sequence that matches one in bad_peptides. More...
 
static void keepPeptidesWithMatchingSequences (std::vector< PeptideIdentification > &peptides, const std::vector< PeptideIdentification > &good_peptides, bool ignore_mods=false)
 Removes all peptide hits with a sequence that does not match one in good_peptides. More...
 
static void keepUniquePeptidesPerProtein (std::vector< PeptideIdentification > &peptides)
 Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer) More...
 
static void removeDuplicatePeptideHits (std::vector< PeptideIdentification > &peptides, bool seq_only=false)
 Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID). More...
 
Filter functions for MS/MS experiments
static void filterHitsByScore (PeakMap &experiment, double peptide_threshold_score, double protein_threshold_score)
 Filters an MS/MS experiment according to score thresholds. More...
 
static void keepNBestHits (PeakMap &experiment, Size n)
 Filters an MS/MS experiment by keeping the N best peptide hits for every spectrum. More...
 
static void keepNBestSpectra (std::vector< PeptideIdentification > &peptides, Size n)
 
template<class MapType >
static void keepNBestPeptideHits (MapType &map, Size n)
 Filters a Consensus/FeatureMap by keeping the N best peptide hits for every spectrum. More...
 
template<class MapType >
static void removeEmptyIdentifications (MapType &prot_and_pep_ids)
 
static void keepBestPerPeptide (std::vector< PeptideIdentification > &pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum)
 Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence. More...
 
static void keepBestPerPeptidePerRun (std::vector< ProteinIdentification > &prot_ids, std::vector< PeptideIdentification > &pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum)
 
template<class MapType >
static void annotateBestPerPeptidePerRun (MapType &prot_and_pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum)
 
template<class MapType >
static void keepBestPerPeptidePerRun (MapType &prot_and_pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum)
 
static void annotateBestPerPeptidePerRun (const std::vector< ProteinIdentification > &prot_ids, std::vector< PeptideIdentification > &pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum)
 
static void annotateBestPerPeptidePerRunWithData (RunToSequenceToChargeToPepHitP &best_peps_per_run, std::vector< PeptideIdentification > &pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum)
 
static void annotateBestPerPeptide (std::vector< PeptideIdentification > &pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum)
 
static void annotateBestPerPeptideWithData (SequenceToChargeToPepHitP &best_pep, PeptideIdentification &pep, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum)
 
static void keepHitsMatchingProteins (PeakMap &experiment, const std::vector< FASTAFile::FASTAEntry > &proteins)
 Filters an MS/MS experiment according to the given proteins. More...
 
Filter functions for class IdentificationData
static void keepBestMatchPerObservation (IdentificationData &id_data, IdentificationData::ScoreTypeRef score_ref)
 Filter IdentificationData to keep only the best match (e.g. PSM) for each observation (e.g. spectrum) More...
 
static void filterObservationMatchesByScore (IdentificationData &id_data, IdentificationData::ScoreTypeRef score_ref, double cutoff)
 Filter observation matches (e.g. PSMs) in IdentificationData by score. More...
 
static void removeDecoys (IdentificationData &id_data)
 Filter IdentificationData to remove parent sequences annotated as decoys. More...
 

Detailed Description

Collection of functions for filtering peptide and protein identifications.

This class provides functions for filtering collections of peptide or protein identifications according to various criteria. It also contains helper functions and classes (functors that implement predicates) that are used in this context.

The filter functions modify their inputs, rather than creating filtered copies.

Most filters work on the hit level, i.e. they remove peptide or protein hits from peptide or protein identifications (IDs). A few filters work on the ID level instead, i.e. they remove peptide or protein IDs from vectors thereof. Independent of this, the inputs for all filter functions are vectors of IDs, because the data most often comes in this form. This design also allows many helper objects to be set up only once per vector, rather than once per ID.

The filter functions for vectors of peptide/protein IDs do not include clean-up steps (e.g. removal of IDs without hits, reassignment of hit ranks, ...). They only carry out their specific filtering operations. This is so filters can be chained without having to repeat clean-up operations. The group of clean-up functions provides helpers that are useful to ensure data integrity after filters have been applied, but it is up to the individual developer to use them when necessary.

The filter functions for MS/MS experiments do include clean-up steps, because they filter peptide and protein IDs in conjunction and potential contradictions between the two must be eliminated.

Member Typedef Documentation

◆ ChargeToPepHitP

typedef std::map<Int, PeptideHit*> ChargeToPepHitP

Typedefs.

◆ RunToSequenceToChargeToPepHitP

◆ SequenceToChargeToPepHitP

typedef std::unordered_map<std::string, ChargeToPepHitP> SequenceToChargeToPepHitP

Constructor & Destructor Documentation

◆ IDFilter()

IDFilter ( )
default

Constructor.

◆ ~IDFilter()

virtual ~IDFilter ( )
virtualdefault

Destructor.

Member Function Documentation

◆ annotateBestPerPeptide()

static void annotateBestPerPeptide ( std::vector< PeptideIdentification > &  pep_ids,
bool  ignore_mods,
bool  ignore_charges,
Size  nr_best_spectrum 
)
inlinestatic

Annotates PeptideHits from PeptideIdentification if it is the best peptide hit for its peptide sequence Adds metavalue "bestForItsPeps" which can be used for additional filtering. Does not check Run information and just goes over all Peptide IDs

◆ annotateBestPerPeptidePerRun() [1/2]

static void annotateBestPerPeptidePerRun ( const std::vector< ProteinIdentification > &  prot_ids,
std::vector< PeptideIdentification > &  pep_ids,
bool  ignore_mods,
bool  ignore_charges,
Size  nr_best_spectrum 
)
inlinestatic

Annotates PeptideHits from PeptideIdentification if it is the best peptide hit for its peptide sequence Adds metavalue "bestForItsPeps" which can be used for additional filtering.

◆ annotateBestPerPeptidePerRun() [2/2]

static void annotateBestPerPeptidePerRun ( MapType prot_and_pep_ids,
bool  ignore_mods,
bool  ignore_charges,
Size  nr_best_spectrum 
)
inlinestatic

◆ annotateBestPerPeptidePerRunWithData()

static void annotateBestPerPeptidePerRunWithData ( RunToSequenceToChargeToPepHitP best_peps_per_run,
std::vector< PeptideIdentification > &  pep_ids,
bool  ignore_mods,
bool  ignore_charges,
Size  nr_best_spectrum 
)
inlinestatic

Annotates PeptideHits from PeptideIdentification if it is the best peptide hit for its peptide sequence Adds metavalue "bestForItsPeps" which can be used for additional filtering. To be used when a RunToSequenceToChargeToPepHitP map is already available

◆ annotateBestPerPeptideWithData()

static void annotateBestPerPeptideWithData ( SequenceToChargeToPepHitP best_pep,
PeptideIdentification pep,
bool  ignore_mods,
bool  ignore_charges,
Size  nr_best_spectrum 
)
inlinestatic

Annotates PeptideHits from PeptideIdentification if it is the best peptide hit for its peptide sequence Adds metavalue "bestForItsPeps" which can be used for additional filtering. Does not check Run information and just goes over all Peptide IDs To be used when a SequenceToChargeToPepHitP map is already available

References PeptideHit::getCharge(), PeptideIdentification::getHits(), PeptideHit::getScore(), PeptideHit::getSequence(), PeptideIdentification::isHigherScoreBetter(), MetaInfoInterface::setMetaValue(), PeptideIdentification::sort(), AASequence::toString(), and AASequence::toUnmodifiedString().

◆ countHits()

static Size countHits ( const std::vector< IdentificationType > &  ids)
inlinestatic

Returns the total number of peptide/protein hits in a vector of peptide/protein identifications.

◆ extractPeptideSequences()

static void extractPeptideSequences ( const std::vector< PeptideIdentification > &  peptides,
std::set< String > &  sequences,
bool  ignore_mods = false 
)
static

Extracts all unique peptide sequences from a list of peptide IDs.

Parameters
peptidesInput
sequencesOutput
ignore_modsExtract sequences without modifications?

◆ extractUnassignedProteins()

static std::map<String, std::vector<ProteinHit> > extractUnassignedProteins ( ConsensusMap cmap)
static

Extracts all proteins not matched by PSMs in features.

Parameters
cmapthe Input ConsensusMap
Returns
extracted ProteinHits for every IDRun

◆ filterGroupsByScore()

static void filterGroupsByScore ( std::vector< ProteinIdentification::ProteinGroup > &  grps,
double  threshold_score,
bool  higher_better 
)
static

Filters protein groups according to the score of the groups.

Only protein groups with a score at least as good as threshold_score are kept. Score orientation (higher_better) should be taken from the protein hits and assumed equal.

◆ filterHitsByRank()

static void filterHitsByRank ( std::vector< IdentificationType > &  ids,
Size  min_rank,
Size  max_rank 
)
inlinestatic

Filters peptide or protein identifications according to the ranking of the hits.

The hits between min_rank and max_rank (both inclusive) in each ID are kept. Counting starts at 1, i.e. the best (highest/lowest scoring) hit has rank 1. The ranks are (re-)computed before filtering. max_rank is ignored if it is smaller than min_rank.

Note that there may be several hits with the same rank in a peptide or protein ID (if the scores are the same).

This method is useful if a range of higher hits is needed for decoy fairness analysis.

Note
The ranks of the hits may be invalidated.

◆ filterHitsByScore() [1/3]

static void filterHitsByScore ( IdentificationType &  id,
double  threshold_score 
)
inlinestatic

Filters peptide or protein identifications according to the score of the hits.

Only peptide/protein hits with a score at least as good as threshold_score are kept. Score orientation (are higher scores better?) is taken into account.

◆ filterHitsByScore() [2/3]

static void filterHitsByScore ( PeakMap experiment,
double  peptide_threshold_score,
double  protein_threshold_score 
)
inlinestatic

Filters an MS/MS experiment according to score thresholds.

References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().

◆ filterHitsByScore() [3/3]

static void filterHitsByScore ( std::vector< IdentificationType > &  ids,
double  threshold_score 
)
inlinestatic

Filters peptide or protein identifications according to the score of the hits.

Only peptide/protein hits with a score at least as good as threshold_score are kept. Score orientation (are higher scores better?) is taken into account.

◆ filterObservationMatchesByScore()

static void filterObservationMatchesByScore ( IdentificationData id_data,
IdentificationData::ScoreTypeRef  score_ref,
double  cutoff 
)
static

Filter observation matches (e.g. PSMs) in IdentificationData by score.

Matches with scores of the required type that are worse than the cut-off are removed. Matches without a score of the required type are also removed. The data structure will be cleaned up (IdentificationData::cleanup) to remove any invalidated references at the end of this operation.

Parameters
id_dataData to be filtered
score_refReference to the score type used for filtering
cutoffScore cut-off for filtering

Referenced by NucleicAcidSearchEngine::calculateAndFilterFDR_().

◆ FilterPeptideEvidences()

static void FilterPeptideEvidences ( EvidenceFilter &  filter,
std::vector< PeptideIdentification > &  peptides 
)
inlinestatic

remove peptide evidences based on a filter

Parameters
filterfilter function that overloads ()(PeptideEvidence&) operator
peptidesa collection of peptide evidences

◆ filterPeptideEvidences()

static void filterPeptideEvidences ( Filter &  filter,
std::vector< PeptideIdentification > &  peptides 
)
static

Digest a collection of proteins and filter PeptideEvidences based on specificity PeptideEvidences of peptides are removed if the digest of a protein did not produce the peptide sequence.

Parameters
filterfilter function on PeptideEvidence level
peptidesPeptideIdentification that will be scanned and filtered

◆ filterPeptidesByCharge()

static void filterPeptidesByCharge ( std::vector< PeptideIdentification > &  peptides,
Int  min_charge,
Int  max_charge 
)
static

Filters peptide identifications according to charge state.

Only peptide hits with a charge state between min_charge and max_charge (both inclusive) are kept. max_charge is ignored if it is smaller than min_charge.

Note
The ranks of the hits may be invalidated.

◆ filterPeptidesByLength()

static void filterPeptidesByLength ( std::vector< PeptideIdentification > &  peptides,
Size  min_length,
Size  max_length = UINT_MAX 
)
static

Filters peptide identifications according to peptide sequence length.

Only peptide hits with a sequence length between min_length and max_length (both inclusive) are kept. max_length is ignored if it is smaller than min_length.

Note
The ranks of the hits may be invalidated.

◆ filterPeptidesByMZ()

static void filterPeptidesByMZ ( std::vector< PeptideIdentification > &  peptides,
double  min_mz,
double  max_mz 
)
static

Filters peptide identifications by precursor m/z, keeping only IDs in the given range.

◆ filterPeptidesByMZError()

static void filterPeptidesByMZError ( std::vector< PeptideIdentification > &  peptides,
double  mass_error,
bool  unit_ppm 
)
static

Filter peptide identifications according to mass deviation.

Only peptide hits with a low mass deviation (between theoretical peptide mass and precursor mass) are kept.

Parameters
peptidesInput/output
mass_errorThreshold for the mass deviation
unit_ppmIs mass_error given in PPM?
Note
The ranks of the hits may be invalidated.

◆ filterPeptidesByRT()

static void filterPeptidesByRT ( std::vector< PeptideIdentification > &  peptides,
double  min_rt,
double  max_rt 
)
static

Filters peptide identifications by precursor RT, keeping only IDs in the given range.

◆ filterPeptidesByRTPredictPValue()

static void filterPeptidesByRTPredictPValue ( std::vector< PeptideIdentification > &  peptides,
const String metavalue_key,
double  threshold = 0.05 
)
static

Filters peptide identifications according to p-values from RTPredict.

Filters the peptide hits by the probability (p-value) of a correct peptide identification having a deviation between observed and predicted RT equal to or greater than allowed.

Parameters
peptidesInput/output
metavalue_keyName of the meta value that holds the p-value: "predicted_RT_p_value" or "predicted_RT_p_value_first_dim"
thresholdP-value threshold
Note
The ranks of the hits may be invalidated.

◆ getBestHit()

static bool getBestHit ( const std::vector< IdentificationType > &  identifications,
bool  assume_sorted,
typename IdentificationType::HitType &  best_hit 
)
inlinestatic

Finds the best-scoring hit in a vector of peptide or protein identifications.

If there are several hits with the best score, the first one is taken.

Parameters
identificationsVector of peptide or protein IDs, each containing one or more (peptide/protein) hits
assume_sortedAre hits sorted by score (best score first) already? This allows for faster query, since only the first hit needs to be looked at
best_hitContains the best hit if successful
Exceptions
Exception::InvalidValueif the IDs have different score types (i.e. scores cannot be compared)
Returns
true if a hit was present, false otherwise

◆ keepBestMatchPerObservation()

static void keepBestMatchPerObservation ( IdentificationData id_data,
IdentificationData::ScoreTypeRef  score_ref 
)
static

Filter IdentificationData to keep only the best match (e.g. PSM) for each observation (e.g. spectrum)

The data structure will be cleaned up (IdentificationData::cleanup) to remove any invalidated references at the end of this operation.

See also
IdentificationData::getBestMatchPerObservation
Parameters
id_dataData to be filtered
score_refReference to the score type defining "best" matches

◆ keepBestPeptideHits()

static void keepBestPeptideHits ( std::vector< PeptideIdentification > &  peptides,
bool  strict = false 
)
static

Filters peptide identifications keeping only the single best-scoring hit per ID.

Parameters
peptidesInput/output
strictIf set, keep the best hit only if its score is unique - i.e. ties are not allowed. (Otherwise all hits with the best score is kept.)

◆ keepBestPerPeptide()

static void keepBestPerPeptide ( std::vector< PeptideIdentification > &  pep_ids,
bool  ignore_mods,
bool  ignore_charges,
Size  nr_best_spectrum 
)
inlinestatic

Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence.

◆ keepBestPerPeptidePerRun() [1/2]

static void keepBestPerPeptidePerRun ( MapType prot_and_pep_ids,
bool  ignore_mods,
bool  ignore_charges,
Size  nr_best_spectrum 
)
inlinestatic

◆ keepBestPerPeptidePerRun() [2/2]

static void keepBestPerPeptidePerRun ( std::vector< ProteinIdentification > &  prot_ids,
std::vector< PeptideIdentification > &  pep_ids,
bool  ignore_mods,
bool  ignore_charges,
Size  nr_best_spectrum 
)
inlinestatic

◆ keepHitsMatchingProteins() [1/2]

static void keepHitsMatchingProteins ( PeakMap experiment,
const std::vector< FASTAFile::FASTAEntry > &  proteins 
)
inlinestatic

Filters an MS/MS experiment according to the given proteins.

References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().

◆ keepHitsMatchingProteins() [2/2]

static void keepHitsMatchingProteins ( std::vector< IdentificationType > &  ids,
const std::set< String > &  accessions 
)
inlinestatic

Filters peptide or protein identifications according to the given proteins (positive).

Hits with no matching protein accession in accessions are removed.

Note
The ranks of the hits may be invalidated.

◆ keepMatchingItems()

static void keepMatchingItems ( Container &  items,
const Predicate &  pred 
)
inlinestatic

Keep items that satisfy a condition in a container (e.g. vector), removing all others.

◆ keepMatchingItemsUnroll()

static void keepMatchingItemsUnroll ( IDContainer &  items,
const Predicate &  pred 
)
inlinestatic

Keep Hit items that satisfy a condition in one of our ID containers (e.g. vector of Peptide or ProteinIDs)

◆ keepMatchingPeptideHits()

static void keepMatchingPeptideHits ( MapType prot_and_pep_ids,
Predicate &  pred 
)
inlinestatic

◆ keepNBestHits() [1/2]

static void keepNBestHits ( PeakMap experiment,
Size  n 
)
inlinestatic

Filters an MS/MS experiment by keeping the N best peptide hits for every spectrum.

References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().

◆ keepNBestHits() [2/2]

static void keepNBestHits ( std::vector< IdentificationType > &  ids,
Size  n 
)
inlinestatic

Filters peptide or protein identifications according to the score of the hits, keeping the n best hits per ID.

The score orientation (are higher scores better?) is taken into account.

◆ keepNBestPeptideHits()

static void keepNBestPeptideHits ( MapType map,
Size  n 
)
inlinestatic

Filters a Consensus/FeatureMap by keeping the N best peptide hits for every spectrum.

◆ keepNBestSpectra()

static void keepNBestSpectra ( std::vector< PeptideIdentification > &  peptides,
Size  n 
)
static

Filter identifications by "N best" PeptideIdentification objects (better PeptideIdentification means better [best] PeptideHit than other). The vector is sorted and reduced to n elements. If the vector's size 's' is less than n, only 's' best spectra are kept.

◆ keepPeptidesWithMatchingModifications()

static void keepPeptidesWithMatchingModifications ( std::vector< PeptideIdentification > &  peptides,
const std::set< String > &  modifications 
)
static

Keeps only peptide hits that have at least one of the given modifications.

◆ keepPeptidesWithMatchingSequences()

static void keepPeptidesWithMatchingSequences ( std::vector< PeptideIdentification > &  peptides,
const std::vector< PeptideIdentification > &  good_peptides,
bool  ignore_mods = false 
)
static

Removes all peptide hits with a sequence that does not match one in good_peptides.

If ignore_mods is set, unmodified sequences are generated and compared to the given ones.

Note
The ranks of the hits may be invalidated.

◆ keepUniquePeptidesPerProtein()

static void keepUniquePeptidesPerProtein ( std::vector< PeptideIdentification > &  peptides)
static

Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer)

◆ moveMatchingItems()

static void moveMatchingItems ( Container &  items,
const Predicate &  pred,
Container &  target 
)
inlinestatic

Move items that satisfy a condition to a container (e.g. vector)

◆ removeDecoyHits()

static void removeDecoyHits ( std::vector< IdentificationType > &  ids)
inlinestatic

Removes hits annotated as decoys from peptide or protein identifications.

Checks for meta values named "target_decoy" and "isDecoy", and removes protein/peptide hits if the values are "decoy" and "true", respectively.

Note
The ranks of the hits may be invalidated.

◆ removeDecoys()

static void removeDecoys ( IdentificationData id_data)
static

Filter IdentificationData to remove parent sequences annotated as decoys.

If any were removed, the data structure will be cleaned up (IdentificationData::cleanup) to remove any invalidated references at the end of this operation.

Referenced by NucleicAcidSearchEngine::calculateAndFilterFDR_().

◆ removeDuplicatePeptideHits()

static void removeDuplicatePeptideHits ( std::vector< PeptideIdentification > &  peptides,
bool  seq_only = false 
)
static

Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID).

By default, hits are considered duplicated if they compare as equal using PeptideHit::operator==. However, if seq_only is set, only the sequences (incl. modifications) are compared. In both cases, the first occurrence of each hit in a peptide ID is kept, later ones are removed.

◆ removeEmptyIdentifications() [1/2]

static void removeEmptyIdentifications ( MapType prot_and_pep_ids)
inlinestatic

◆ removeEmptyIdentifications() [2/2]

static void removeEmptyIdentifications ( std::vector< IdentificationType > &  ids)
inlinestatic

Removes peptide or protein identifications that have no hits in them.

◆ removeHitsMatchingProteins()

static void removeHitsMatchingProteins ( std::vector< IdentificationType > &  ids,
const std::set< String accessions 
)
inlinestatic

Filters peptide or protein identifications according to the given proteins (negative).

Hits with a matching protein accession in accessions are removed.

Note
The ranks of the hits may be invalidated.

◆ removeMatchingItems()

static void removeMatchingItems ( Container &  items,
const Predicate &  pred 
)
inlinestatic

Remove items that satisfy a condition from a container (e.g. vector)

◆ removeMatchingItemsUnroll()

static void removeMatchingItemsUnroll ( IDContainer &  items,
const Predicate &  pred 
)
inlinestatic

Remove Hit items that satisfy a condition in one of our ID containers (e.g. vector of Peptide or ProteinIDs)

◆ removeMatchingPeptideHits()

static void removeMatchingPeptideHits ( MapType prot_and_pep_ids,
Predicate &  pred 
)
inlinestatic

◆ removeMatchingPeptideIdentifications()

static void removeMatchingPeptideIdentifications ( MapType prot_and_pep_ids,
Predicate &  pred 
)
inlinestatic

◆ removePeptidesWithMatchingModifications()

static void removePeptidesWithMatchingModifications ( std::vector< PeptideIdentification > &  peptides,
const std::set< String > &  modifications 
)
static

Removes all peptide hits that have at least one of the given modifications.

◆ removePeptidesWithMatchingRegEx()

static void removePeptidesWithMatchingRegEx ( std::vector< PeptideIdentification > &  peptides,
const String regex 
)
static

◆ removePeptidesWithMatchingSequences()

static void removePeptidesWithMatchingSequences ( std::vector< PeptideIdentification > &  peptides,
const std::vector< PeptideIdentification > &  bad_peptides,
bool  ignore_mods = false 
)
static

Removes all peptide hits with a sequence that matches one in bad_peptides.

If ignore_mods is set, unmodified sequences are generated and compared to the given ones.

Note
The ranks of the hits may be invalidated.

◆ removeUngroupedProteins()

static void removeUngroupedProteins ( const std::vector< ProteinIdentification::ProteinGroup > &  groups,
std::vector< ProteinHit > &  hits 
)
static

Update protein hits after protein groups were filtered.

Parameters
groupsAvailable protein groups with protein accessions to keep
hitsInput/output hits (all others are removed from the groups)

◆ removeUnreferencedProteins() [1/3]

static void removeUnreferencedProteins ( ConsensusMap cmap,
bool  include_unassigned 
)
static

Removes protein hits from the protein IDs in a cmap that are not referenced by a peptide in the features or if requested in the unassigned peptide list

◆ removeUnreferencedProteins() [2/3]

static void removeUnreferencedProteins ( ProteinIdentification proteins,
const std::vector< PeptideIdentification > &  peptides 
)
static

Removes protein hits from proteins that are not referenced by a peptide in peptides.

◆ removeUnreferencedProteins() [3/3]

static void removeUnreferencedProteins ( std::vector< ProteinIdentification > &  proteins,
const std::vector< PeptideIdentification > &  peptides 
)
static

Removes protein hits from proteins that are not referenced by a peptide in peptides.

◆ updateHitRanks()

static void updateHitRanks ( std::vector< IdentificationType > &  ids)
inlinestatic

Updates the hit ranks on all peptide or protein IDs.

◆ updateProteinGroups()

static bool updateProteinGroups ( std::vector< ProteinIdentification::ProteinGroup > &  groups,
const std::vector< ProteinHit > &  hits 
)
static

Update protein groups after protein hits were filtered.

Parameters
groupsInput/output protein groups
hitsAvailable protein hits (all others are removed from the groups)
Returns
Returns whether the groups are still valid (which is the case if only whole groups, if any, were removed).

◆ updateProteinReferences() [1/3]

static void updateProteinReferences ( ConsensusMap cmap,
bool  remove_peptides_without_reference = false 
)
static

Removes references to missing proteins.

Only PeptideEvidence entries that reference protein hits in their corresponding protein run of cmap are kept in the peptide hits.

If remove_peptides_without_reference is set, peptide hits without any remaining protein reference are removed.

◆ updateProteinReferences() [2/3]

static void updateProteinReferences ( ConsensusMap cmap,
const ProteinIdentification ref_run,
bool  remove_peptides_without_reference = false 
)
static

Removes references to missing proteins.

Only PeptideEvidence entries that reference protein hits in ref_run are kept in the peptide hits.

If remove_peptides_without_reference is set, peptide hits without any remaining protein reference are removed.

◆ updateProteinReferences() [3/3]

static void updateProteinReferences ( std::vector< PeptideIdentification > &  peptides,
const std::vector< ProteinIdentification > &  proteins,
bool  remove_peptides_without_reference = false 
)
static

Removes references to missing proteins.

Only PeptideEvidence entries that reference protein hits in proteins are kept in the peptide hits.

If remove_peptides_without_reference is set, peptide hits without any remaining protein reference are removed.