OpenMS
Loading...
Searching...
No Matches
Contaminants Class Reference

QualityControl metric: flag PSMs whose peptide sequences match a user-supplied contaminants FASTA (e.g. cRAP) after digestion. More...

#include <OpenMS/QC/Contaminants.h>

Inheritance diagram for Contaminants:
[legend]
Collaboration diagram for Contaminants:
[legend]

Classes

struct  ContaminantsSummary
 Result bundle returned per compute call. More...
 

Public Member Functions

 Contaminants ()=default
 Default constructor.
 
virtual ~Contaminants ()=default
 Destructor.
 
void compute (FeatureMap &features, const std::vector< FASTAFile::FASTAEntry > &contaminants)
 Annotate the PSMs of features with "is_contaminant" and append a summary to getResults.
 
const std::string & getName () const override
 Name of this QC metric ("Contaminants").
 
const std::vector< Contaminants::ContaminantsSummary > & getResults ()
 Per-call summaries appended by compute, in call order.
 
Status requirements () const override
 Input-data requirements of compute.
 
- Public Member Functions inherited from QCBase
bool isRunnable (const Status &s) const
 

Private Member Functions

void compare_ (const std::string &key, PeptideHit &pep_hit, Int64 &total, Int64 &cont, double &sum_total, double &sum_cont, double intensity)
 Increment the contaminant counters and annotate one hit with "is_contaminant".
 

Private Attributes

const std::string name_ = "Contaminants"
 Metric name returned by getName.
 
std::vector< Contaminants::ContaminantsSummaryresults_
 Per-call summaries; compute appends one entry per invocation.
 
std::unordered_set< std::string > digested_db_
 Cached digested contaminants database, filled on the first compute call and reused thereafter.
 

Additional Inherited Members

- Public Types inherited from QCBase
enum class  Requires : UInt64 {
  NOTHING , RAWMZML , POSTFDRFEAT , PREFDRFEAT ,
  CONTAMINANTS , TRAFOALIGN , ID , SIZE_OF_REQUIRES
}
 Enum to encode a file type as a bit. More...
 
enum class  ToleranceUnit { AUTO , PPM , DA , SIZE_OF_TOLERANCEUNIT }
 
using Status = FlagSet< Requires >
 
- Static Public Member Functions inherited from QCBase
static bool isLabeledExperiment (const ConsensusMap &cm)
 check if the IsobaricAnalyzer TOPP tool was used to create this ConsensusMap
 
template<typename MAP >
static bool hasPepID (const MAP &fmap)
 does the container have a PeptideIdentification in its members or as unassignedPepID ?
 
- Static Public Attributes inherited from QCBase
static const std::string names_of_requires []
 strings corresponding to enum Requires
 
static const std::string names_of_toleranceUnit []
 strings corresponding to enum ToleranceUnit
 

Detailed Description

QualityControl metric: flag PSMs whose peptide sequences match a user-supplied contaminants FASTA (e.g. cRAP) after digestion.

Each call to compute marks the first PeptideHit of every PeptideIdentification in the supplied FeatureMap (both feature-attached and unassigned) with an "is_contaminant" meta value (1 if its unmodified sequence matches an entry in the digested contaminants database, 0 otherwise) and appends a ContaminantsSummary to the internal results list returned by getResults.

Note
The digested contaminants database is cached on the first compute call; subsequent calls reuse the same cache and ignore the contaminants argument's actual contents. The enzyme and missed-cleavage settings used for the initial digestion are also frozen at that point (taken from FeatureMap::getProteinIdentifications()[0].getSearchParameters() of the FeatureMap supplied on the first call).

Class Documentation

◆ OpenMS::Contaminants::ContaminantsSummary

struct OpenMS::Contaminants::ContaminantsSummary

Result bundle returned per compute call.

All four ratios are unitless and lie in [0, 1] for non-empty denominators; when the corresponding denominator is 0 the ratio evaluates to NaN.

Collaboration diagram for Contaminants::ContaminantsSummary:
[legend]
Class Members
double all_contaminants_ratio #contaminants in all PSMs / #all PSMs.
double assigned_contaminants_intensity_ratio Sum of feature intensities of contaminant feature-attached PSMs / sum of feature intensities of all feature-attached PSMs.
double assigned_contaminants_ratio #contaminants in feature-attached PSMs / #feature-attached PSMs.
pair< Int64, Int64 > empty_features (Number of features without a peptide hit, total number of features).
double unassigned_contaminants_ratio #contaminants in unassigned PSMs / #unassigned PSMs.

Constructor & Destructor Documentation

◆ Contaminants()

Contaminants ( )
default

Default constructor.

◆ ~Contaminants()

virtual ~Contaminants ( )
virtualdefault

Destructor.

Member Function Documentation

◆ compare_()

void compare_ ( const std::string &  key,
PeptideHit pep_hit,
Int64 total,
Int64 cont,
double &  sum_total,
double &  sum_cont,
double  intensity 
)
private

Increment the contaminant counters and annotate one hit with "is_contaminant".

Looks key up in digested_db_ and updates the counters and pep_hit accordingly.

Parameters
[in]keyUnmodified peptide sequence to look up.
[in,out]pep_hitHit to annotate with "is_contaminant" set to 0 (not found) or 1 (found).
[in,out]totalTotal-PSMs counter; incremented by 1.
[in,out]contContaminant-PSMs counter; incremented only on a hit.
[in,out]sum_totalTotal-intensity accumulator; incremented by intensity.
[in,out]sum_contContaminant-intensity accumulator; incremented by intensity only on a hit.
[in]intensityIntensity associated with this PSM.

◆ compute()

void compute ( FeatureMap features,
const std::vector< FASTAFile::FASTAEntry > &  contaminants 
)

Annotate the PSMs of features with "is_contaminant" and append a summary to getResults.

On the first call (when the internal digested-DB cache is empty), the FASTA entries in contaminants are digested using the enzyme and missed-cleavage count from features.getProteinIdentifications()[0].getSearchParameters() and stored in a hash set. Every subsequent call to compute reuses that cache without consulting contaminants again.

The first PeptideHit of every PeptideIdentification in features (both attached to features and in getUnassignedPeptideIdentifications()) is annotated with "is_contaminant" set to 0 or 1. The aggregated ratios and the empty-feature counters are then pushed onto the internal results list (see getResults).

Parameters
[in,out]featuresSource of the PSMs (annotated in place) and – on the first call – of the digestion enzyme used to build the cache.
[in]contaminantsContaminants FASTA. Only consulted on the first call; subsequent calls ignore it.
Exceptions
Exception::MissingInformationwhen contaminants is empty.
Exception::MissingInformationwhen the FeatureMap has no protein identification (only on the first call).
Exception::MissingInformationwhen the configured digestion enzyme is "unknown_enzyme" (only on the first call).
Note
When features is empty, a warning is logged and the method still runs (and may produce NaN ratios).

◆ getName()

const std::string & getName ( ) const
overridevirtual

Name of this QC metric ("Contaminants").

Implements QCBase.

◆ getResults()

const std::vector< Contaminants::ContaminantsSummary > & getResults ( )

Per-call summaries appended by compute, in call order.

◆ requirements()

Status requirements ( ) const
overridevirtual

Input-data requirements of compute.

Returns
Status flags with POSTFDRFEAT and CONTAMINANTS set.

Implements QCBase.

Member Data Documentation

◆ digested_db_

std::unordered_set<std::string> digested_db_
private

Cached digested contaminants database, filled on the first compute call and reused thereafter.

◆ name_

const std::string name_ = "Contaminants"
private

Metric name returned by getName.

◆ results_

std::vector<Contaminants::ContaminantsSummary> results_
private

Per-call summaries; compute appends one entry per invocation.