![]() |
OpenMS
|
QualityControl metric: flag PSMs whose peptide sequences match a user-supplied contaminants FASTA (e.g. cRAP) after digestion. More...
#include <OpenMS/QC/Contaminants.h>
Classes | |
| struct | ContaminantsSummary |
| Result bundle returned per compute call. More... | |
Public Member Functions | |
| Contaminants ()=default | |
| Default constructor. | |
| virtual | ~Contaminants ()=default |
| Destructor. | |
| void | compute (FeatureMap &features, const std::vector< FASTAFile::FASTAEntry > &contaminants) |
Annotate the PSMs of features with "is_contaminant" and append a summary to getResults. | |
| const std::string & | getName () const override |
Name of this QC metric ("Contaminants"). | |
| const std::vector< Contaminants::ContaminantsSummary > & | getResults () |
| Per-call summaries appended by compute, in call order. | |
| Status | requirements () const override |
| Input-data requirements of compute. | |
Public Member Functions inherited from QCBase | |
| bool | isRunnable (const Status &s) const |
Private Member Functions | |
| void | compare_ (const std::string &key, PeptideHit &pep_hit, Int64 &total, Int64 &cont, double &sum_total, double &sum_cont, double intensity) |
Increment the contaminant counters and annotate one hit with "is_contaminant". | |
Private Attributes | |
| const std::string | name_ = "Contaminants" |
| Metric name returned by getName. | |
| std::vector< Contaminants::ContaminantsSummary > | results_ |
| Per-call summaries; compute appends one entry per invocation. | |
| std::unordered_set< std::string > | digested_db_ |
| Cached digested contaminants database, filled on the first compute call and reused thereafter. | |
Additional Inherited Members | |
Public Types inherited from QCBase | |
| enum class | Requires : UInt64 { NOTHING , RAWMZML , POSTFDRFEAT , PREFDRFEAT , CONTAMINANTS , TRAFOALIGN , ID , SIZE_OF_REQUIRES } |
| Enum to encode a file type as a bit. More... | |
| enum class | ToleranceUnit { AUTO , PPM , DA , SIZE_OF_TOLERANCEUNIT } |
| using | Status = FlagSet< Requires > |
Static Public Member Functions inherited from QCBase | |
| static bool | isLabeledExperiment (const ConsensusMap &cm) |
| check if the IsobaricAnalyzer TOPP tool was used to create this ConsensusMap | |
| template<typename MAP > | |
| static bool | hasPepID (const MAP &fmap) |
| does the container have a PeptideIdentification in its members or as unassignedPepID ? | |
Static Public Attributes inherited from QCBase | |
| static const std::string | names_of_requires [] |
| strings corresponding to enum Requires | |
| static const std::string | names_of_toleranceUnit [] |
| strings corresponding to enum ToleranceUnit | |
QualityControl metric: flag PSMs whose peptide sequences match a user-supplied contaminants FASTA (e.g. cRAP) after digestion.
Each call to compute marks the first PeptideHit of every PeptideIdentification in the supplied FeatureMap (both feature-attached and unassigned) with an "is_contaminant" meta value (1 if its unmodified sequence matches an entry in the digested contaminants database, 0 otherwise) and appends a ContaminantsSummary to the internal results list returned by getResults.
contaminants argument's actual contents. The enzyme and missed-cleavage settings used for the initial digestion are also frozen at that point (taken from FeatureMap::getProteinIdentifications()[0].getSearchParameters() of the FeatureMap supplied on the first call). | struct OpenMS::Contaminants::ContaminantsSummary |
Result bundle returned per compute call.
All four ratios are unitless and lie in [0, 1] for non-empty denominators; when the corresponding denominator is 0 the ratio evaluates to NaN.
| Class Members | ||
|---|---|---|
| double | all_contaminants_ratio | #contaminants in all PSMs / #all PSMs. |
| double | assigned_contaminants_intensity_ratio | Sum of feature intensities of contaminant feature-attached PSMs / sum of feature intensities of all feature-attached PSMs. |
| double | assigned_contaminants_ratio | #contaminants in feature-attached PSMs / #feature-attached PSMs. |
| pair< Int64, Int64 > | empty_features | (Number of features without a peptide hit, total number of features). |
| double | unassigned_contaminants_ratio | #contaminants in unassigned PSMs / #unassigned PSMs. |
|
default |
Default constructor.
|
virtualdefault |
Destructor.
|
private |
Increment the contaminant counters and annotate one hit with "is_contaminant".
Looks key up in digested_db_ and updates the counters and pep_hit accordingly.
| [in] | key | Unmodified peptide sequence to look up. |
| [in,out] | pep_hit | Hit to annotate with "is_contaminant" set to 0 (not found) or 1 (found). |
| [in,out] | total | Total-PSMs counter; incremented by 1. |
| [in,out] | cont | Contaminant-PSMs counter; incremented only on a hit. |
| [in,out] | sum_total | Total-intensity accumulator; incremented by intensity. |
| [in,out] | sum_cont | Contaminant-intensity accumulator; incremented by intensity only on a hit. |
| [in] | intensity | Intensity associated with this PSM. |
| void compute | ( | FeatureMap & | features, |
| const std::vector< FASTAFile::FASTAEntry > & | contaminants | ||
| ) |
Annotate the PSMs of features with "is_contaminant" and append a summary to getResults.
On the first call (when the internal digested-DB cache is empty), the FASTA entries in contaminants are digested using the enzyme and missed-cleavage count from features.getProteinIdentifications()[0].getSearchParameters() and stored in a hash set. Every subsequent call to compute reuses that cache without consulting contaminants again.
The first PeptideHit of every PeptideIdentification in features (both attached to features and in getUnassignedPeptideIdentifications()) is annotated with "is_contaminant" set to 0 or 1. The aggregated ratios and the empty-feature counters are then pushed onto the internal results list (see getResults).
| [in,out] | features | Source of the PSMs (annotated in place) and – on the first call – of the digestion enzyme used to build the cache. |
| [in] | contaminants | Contaminants FASTA. Only consulted on the first call; subsequent calls ignore it. |
| Exception::MissingInformation | when contaminants is empty. |
| Exception::MissingInformation | when the FeatureMap has no protein identification (only on the first call). |
| Exception::MissingInformation | when the configured digestion enzyme is "unknown_enzyme" (only on the first call). |
features is empty, a warning is logged and the method still runs (and may produce NaN ratios).
|
overridevirtual |
Name of this QC metric ("Contaminants").
Implements QCBase.
| const std::vector< Contaminants::ContaminantsSummary > & getResults | ( | ) |
Per-call summaries appended by compute, in call order.
|
overridevirtual |
|
private |
Cached digested contaminants database, filled on the first compute call and reused thereafter.
|
private |
Metric name returned by getName.
|
private |
Per-call summaries; compute appends one entry per invocation.