OpenMS
FalseDiscoveryRate Class Reference

Calculates false discovery rates (FDR) from identifications. More...

#include <OpenMS/ANALYSIS/ID/FalseDiscoveryRate.h>

Inheritance diagram for FalseDiscoveryRate:
[legend]
Collaboration diagram for FalseDiscoveryRate:
[legend]

Classes

class  DecoyStringHelper
 Finds decoy strings in ProteinIdentification runs. More...
 

Public Member Functions

 FalseDiscoveryRate ()
 Default constructor. More...
 
void apply (std::vector< PeptideIdentification > &fwd_ids, std::vector< PeptideIdentification > &rev_ids) const
 Calculates the FDR of two runs, a forward run and a decoy run on peptide level. More...
 
void apply (std::vector< PeptideIdentification > &id, bool annotate_peptide_fdr=false) const
 Calculates the FDR of one run from a concatenated sequence DB search. More...
 
void apply (std::vector< ProteinIdentification > &fwd_ids, std::vector< ProteinIdentification > &rev_ids) const
 Calculates the FDR of two runs, a forward run and decoy run on protein level. More...
 
void apply (std::vector< ProteinIdentification > &ids) const
 Calculate the FDR of one run from a concatenated sequence db search. More...
 
void applyEstimated (std::vector< ProteinIdentification > &ids) const
 Calculate the FDR based on PEPs or PPs (if present) and modifies the IDs inplace. More...
 
double applyEvaluateProteinIDs (const std::vector< ProteinIdentification > &ids, double pepCutoff=1.0, UInt fpCutoff=50, double diffWeight=0.2) const
 Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives). More...
 
double applyEvaluateProteinIDs (const ProteinIdentification &ids, double pepCutoff=1.0, UInt fpCutoff=50, double diffWeight=0.2) const
 Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives). More...
 
double applyEvaluateProteinIDs (ScoreToTgtDecLabelPairs &score_to_tgt_dec_fraction_pairs, double pepCutoff=1.0, UInt fpCutoff=50, double diffWeight=0.2) const
 Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives). More...
 
void applyBasic (const std::vector< ProteinIdentification > &run_info, std::vector< PeptideIdentification > &ids)
 simpler reimplementation of the apply function above for PSMs. With charge and identifier info from run_info More...
 
void applyBasic (std::vector< PeptideIdentification > &ids, bool higher_score_better, int charge=0, String identifier="", bool only_best_per_pep=false)
 simpler reimplementation of the apply function above for PSMs or peptides. More...
 
void applyBasicPeptideLevel (std::vector< PeptideIdentification > &ids)
 
void applyBasicPeptideLevel (ConsensusMap &ids, bool use_unassigned_peptides=true)
 
void applyBasic (ConsensusMap &cmap, bool use_unassigned_peptides=true)
 simpler reimplementation of the apply function above for peptides in ConsensusMaps. More...
 
void applyBasic (ProteinIdentification &id, bool groups_too=true)
 simpler reimplementation of the apply function above for proteins. More...
 
void applyPickedProteinFDR (ProteinIdentification &id, String decoy_string="", bool prefix=true, bool groups_too=true)
 Applies a picked protein FDR. Behaves like a normal target-decoy FDR where only the score of the best protein per target-decoy pair is used. A pair is calculated by checking accession equality after removing the decoy string. If decoy_string is empty, we try to guess it. If you set decoy_string you should also set prefix and say if the string is a prefix (true) or suffix (false). groups_too decides if also a (indistinguishable) group-level FDR will be calculated. Here a group score will be taken if not ALL proteins in the group were picked already. Targets preferred. More...
 
double rocN (const std::vector< PeptideIdentification > &ids, Size fp_cutoff) const
 
double rocN (const std::vector< PeptideIdentification > &ids, Size fp_cutoff, const String &identifier) const
 
double rocN (const ConsensusMap &ids, Size fp_cutoff, bool include_unassigned_peptides=false) const
 
double rocN (const ConsensusMap &ids, Size fp_cutoff, const String &identifier, bool include_unassigned_peptides=false) const
 
double diffEstimatedEmpirical (const ScoreToTgtDecLabelPairs &scores_labels, double pepCutoff=1.0) const
 calculates the area of the difference between estimated and empirical FDR on the fly. Does not store results. More...
 
double rocN (const ScoreToTgtDecLabelPairs &scores_labels, Size fpCutoff=50) const
 
IdentificationData::ScoreTypeRef applyToObservationMatches (IdentificationData &id_data, IdentificationData::ScoreTypeRef score_ref) const
 Calculate FDR on the level of observation matches (e.g. peptide-spectrum matches) for "general" identification data. More...
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 

Private Member Functions

 FalseDiscoveryRate (const FalseDiscoveryRate &)
 Not implemented. More...
 
FalseDiscoveryRateoperator= (const FalseDiscoveryRate &)
 Not implemented. More...
 
void calculateFDRs_ (std::map< double, double > &score_to_fdr, std::vector< double > &target_scores, std::vector< double > &decoy_scores, bool q_value, bool higher_score_better) const
 calculates the FDR, given two vectors of scores More...
 
void handleObservationMatch_ (IdentificationData::ObservationMatchRef match_ref, IdentificationData::ScoreTypeRef score_ref, std::vector< double > &target_scores, std::vector< double > &decoy_scores, std::map< IdentificationData::IdentifiedMolecule, bool > &molecule_to_decoy, std::map< IdentificationData::ObservationMatchRef, double > &match_to_score) const
 Helper function for applyToObservationMatches() More...
 
void calculateEstimatedQVal_ (std::map< double, double > &scores_to_FDR, ScoreToTgtDecLabelPairs &scores_labels, bool higher_score_better) const
 
void calculateFDRBasic_ (std::map< double, double > &scores_to_FDR, ScoreToTgtDecLabelPairs &scores_labels, bool qvalue, bool higher_score_better) const
 
double trapezoidal_area_xEqy (double exp1, double exp2, double act1, double act2) const
 
double trapezoidal_area (double x1, double x2, double y1, double y2) const
 calculates the trapezoidal area for a trapezoid with a flat horizontal base e.g. for an AUC More...
 

Additional Inherited Members

- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="")
 Writes all parameters to meta values. More...
 
- Protected Member Functions inherited from DefaultParamHandler
virtual void updateMembers_ ()
 This method is used to update extra member variables at the end of the setParameters() method. More...
 
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 

Detailed Description

Calculates false discovery rates (FDR) from identifications.

Either two runs of forward and decoy database identification or one run containing both (with annotations) can be used to annotate each of the peptide hits with an FDR or q-value.

q-values are basically only adjusted p-values, also ranging from 0 to 1, with lower values being preferable. When looking at the list of hits ordered by q-values, then a specific q-value of x means that x*100 percent of hits with a q-value <= x are expected to be false positives.

Only simple target-decoy FDRs are supported with a formula depending on the "conservative" parameter:

  • false: (D+1)/T.
  • true: (D+1)/(T+D) [for comparison with protein level FDR used by other tools like e.g., Fido] For protein groups, a group is considered as a target when it contains at least one target protein. Group level FDRs assume the same score type as on protein level.

For peptide hits, a hit is considered target also if it maps to both a target and a decoy protein (i.e. "target+decoy") as value in the "target_decoy" metavalue e.g. annotated by PeptideIndexer

Note
The parameter add_decoy_proteins currently does not affect groups
Parameters of this class are:

NameTypeDefaultRestrictionsDescription
no_qvalues stringfalse true, falseIf 'true' strict FDRs will be calculated instead of q-values (the default)
use_all_hits stringfalse true, falseIf 'true' not only the first hit, but all are used (peptides only)
split_charge_variants stringfalse true, falseIf 'true' charge variants are treated separately (for peptides of combined target/decoy searches only).
treat_runs_separately stringfalse true, falseIf 'true' different search runs are treated separately (for peptides of combined target/decoy searches only).
add_decoy_peptides stringfalse true, falseIf 'true' decoy peptides will be written to output file, too. The q-value is set to the closest target score.
add_decoy_proteins stringfalse true, falseIf 'true' decoy proteins will be written to output file, too. The q-value is set to the closest target score.
conservative stringtrue true, falseIf 'true' (D+1)/T instead of (D+1)/(T+D) is used as a formula.

Note:
  • If a section name is documented, the documentation is displayed as tooltip.
  • Advanced parameter names are italic.

Constructor & Destructor Documentation

◆ FalseDiscoveryRate() [1/2]

Default constructor.

◆ FalseDiscoveryRate() [2/2]

FalseDiscoveryRate ( const FalseDiscoveryRate )
private

Not implemented.

Member Function Documentation

◆ apply() [1/4]

void apply ( std::vector< PeptideIdentification > &  fwd_ids,
std::vector< PeptideIdentification > &  rev_ids 
) const

Calculates the FDR of two runs, a forward run and a decoy run on peptide level.

Parameters
fwd_idsforward peptide identifications
rev_idsreverse peptide identifications

◆ apply() [2/4]

void apply ( std::vector< PeptideIdentification > &  id,
bool  annotate_peptide_fdr = false 
) const

Calculates the FDR of one run from a concatenated sequence DB search.

Parameters
idpeptide identifications, containing target and decoy hits
annotate_peptide_fdradds the peptide q-value or peptide fdr meta value to each PSM. Calculation uses best PSM per peptide.

◆ apply() [3/4]

void apply ( std::vector< ProteinIdentification > &  fwd_ids,
std::vector< ProteinIdentification > &  rev_ids 
) const

Calculates the FDR of two runs, a forward run and decoy run on protein level.

Parameters
fwd_idsforward protein identifications
rev_idsreverse protein identifications

◆ apply() [4/4]

void apply ( std::vector< ProteinIdentification > &  ids) const

Calculate the FDR of one run from a concatenated sequence db search.

Parameters
idsprotein identifications, containing target and decoy hits

◆ applyBasic() [1/4]

void applyBasic ( ConsensusMap cmap,
bool  use_unassigned_peptides = true 
)

simpler reimplementation of the apply function above for peptides in ConsensusMaps.

◆ applyBasic() [2/4]

void applyBasic ( const std::vector< ProteinIdentification > &  run_info,
std::vector< PeptideIdentification > &  ids 
)

simpler reimplementation of the apply function above for PSMs. With charge and identifier info from run_info

◆ applyBasic() [3/4]

void applyBasic ( ProteinIdentification id,
bool  groups_too = true 
)

simpler reimplementation of the apply function above for proteins.

◆ applyBasic() [4/4]

void applyBasic ( std::vector< PeptideIdentification > &  ids,
bool  higher_score_better,
int  charge = 0,
String  identifier = "",
bool  only_best_per_pep = false 
)

simpler reimplementation of the apply function above for PSMs or peptides.

◆ applyBasicPeptideLevel() [1/2]

void applyBasicPeptideLevel ( ConsensusMap ids,
bool  use_unassigned_peptides = true 
)

like applyBasic with "only_best_per_peptide" but it assigns a score to EVERY PSM sharing the peptide sequence with the best representative. Useful if all hits need to have a peptide score (e.g., for mzTab report). No support for specific charges, runs etc. yet

◆ applyBasicPeptideLevel() [2/2]

void applyBasicPeptideLevel ( std::vector< PeptideIdentification > &  ids)

like applyBasic with "only_best_per_peptide" but it assigns a score to EVERY PSM sharing the peptide sequence with the best representative. Useful if all hits need to have a peptide score (e.g., for mzTab report). No support for specific charges, runs etc. yet

◆ applyEstimated()

void applyEstimated ( std::vector< ProteinIdentification > &  ids) const

Calculate the FDR based on PEPs or PPs (if present) and modifies the IDs inplace.

Parameters
idsprotein identifications, containing PEP scores (not necessarily) annotated with target decoy.

◆ applyEvaluateProteinIDs() [1/3]

double applyEvaluateProteinIDs ( const ProteinIdentification ids,
double  pepCutoff = 1.0,
UInt  fpCutoff = 50,
double  diffWeight = 0.2 
) const

Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives).

Parameters
idsprotein identifications, containing PEP scores annotated with target decoy.
pepCutoffup to which PEP should the differences between the two FDRs be calculated
fpCutoffup to which nr. of false positives should the target-decoy AUC be evaluated
diffWeightwhich weight should the difference get. The ROC-N value gets 1 - this weight.

◆ applyEvaluateProteinIDs() [2/3]

double applyEvaluateProteinIDs ( const std::vector< ProteinIdentification > &  ids,
double  pepCutoff = 1.0,
UInt  fpCutoff = 50,
double  diffWeight = 0.2 
) const

Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives).

Parameters
idsprotein identifications, containing PEP scores annotated with target decoy. Only first run will be evaluated.
pepCutoffup to which PEP should the differences between the two FDRs be calculated
fpCutoffup to which nr. of false positives should the target-decoy AUC be evaluated
diffWeightwhich weight should the difference get. The ROC-N value gets 1 - this weight.

◆ applyEvaluateProteinIDs() [3/3]

double applyEvaluateProteinIDs ( ScoreToTgtDecLabelPairs score_to_tgt_dec_fraction_pairs,
double  pepCutoff = 1.0,
UInt  fpCutoff = 50,
double  diffWeight = 0.2 
) const

Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives).

Parameters
score_to_tgt_dec_fraction_pairsextracted scores of protein(group) identifications, containing PEP scores annotated with target decoy fractions. Simple case target=1, decoy=0.
pepCutoffup to which PEP should the differences between the two FDRs be calculated
fpCutoffup to which nr. of false positives should the target-decoy AUC be evaluated
diffWeightwhich weight should the difference get. The ROC-N value gets 1 - this weight.

◆ applyPickedProteinFDR()

void applyPickedProteinFDR ( ProteinIdentification id,
String  decoy_string = "",
bool  prefix = true,
bool  groups_too = true 
)

Applies a picked protein FDR. Behaves like a normal target-decoy FDR where only the score of the best protein per target-decoy pair is used. A pair is calculated by checking accession equality after removing the decoy string. If decoy_string is empty, we try to guess it. If you set decoy_string you should also set prefix and say if the string is a prefix (true) or suffix (false). groups_too decides if also a (indistinguishable) group-level FDR will be calculated. Here a group score will be taken if not ALL proteins in the group were picked already. Targets preferred.

◆ applyToObservationMatches()

IdentificationData::ScoreTypeRef applyToObservationMatches ( IdentificationData id_data,
IdentificationData::ScoreTypeRef  score_ref 
) const

Calculate FDR on the level of observation matches (e.g. peptide-spectrum matches) for "general" identification data.

Parameters
id_dataIdentification data
score_refKey of the score to use for FDR calculation
Returns
Key of the FDR score

Referenced by NucleicAcidSearchEngine::calculateAndFilterFDR_().

◆ calculateEstimatedQVal_()

void calculateEstimatedQVal_ ( std::map< double, double > &  scores_to_FDR,
ScoreToTgtDecLabelPairs scores_labels,
bool  higher_score_better 
) const
private

calculates an estimated FDR (based on P(E)Ps) given a vector of score value pairs and fills a map for lookup in scores_to_FDR

◆ calculateFDRBasic_()

void calculateFDRBasic_ ( std::map< double, double > &  scores_to_FDR,
ScoreToTgtDecLabelPairs scores_labels,
bool  qvalue,
bool  higher_score_better 
) const
private

calculates the FDR with a basic and faster algorithm Just goes through the sorted scores and counts the number of decoys and targets and annotates the FDR for this score as it goes. Q-values are optionally annotated by calculating the cumulative minimum in reversed order afterwards. Since I never understood our other algorithm, I can not explain the difference.

Note
Formula used depends on Param "conservative": false -> (D+1)/T, true (e.g. used in Fido) -> (D+1)/(T+D)

◆ calculateFDRs_()

void calculateFDRs_ ( std::map< double, double > &  score_to_fdr,
std::vector< double > &  target_scores,
std::vector< double > &  decoy_scores,
bool  q_value,
bool  higher_score_better 
) const
private

calculates the FDR, given two vectors of scores

◆ diffEstimatedEmpirical()

double diffEstimatedEmpirical ( const ScoreToTgtDecLabelPairs scores_labels,
double  pepCutoff = 1.0 
) const

calculates the area of the difference between estimated and empirical FDR on the fly. Does not store results.

◆ handleObservationMatch_()

void handleObservationMatch_ ( IdentificationData::ObservationMatchRef  match_ref,
IdentificationData::ScoreTypeRef  score_ref,
std::vector< double > &  target_scores,
std::vector< double > &  decoy_scores,
std::map< IdentificationData::IdentifiedMolecule, bool > &  molecule_to_decoy,
std::map< IdentificationData::ObservationMatchRef, double > &  match_to_score 
) const
private

Helper function for applyToObservationMatches()

◆ operator=()

FalseDiscoveryRate& operator= ( const FalseDiscoveryRate )
private

Not implemented.

◆ rocN() [1/5]

double rocN ( const ConsensusMap ids,
Size  fp_cutoff,
bool  include_unassigned_peptides = false 
) const

calculates the AUC until the first fp_cutoff False positive pep IDs (takes all runs together) if fp_cutoff = 0, it will calculate the full AUC

◆ rocN() [2/5]

double rocN ( const ConsensusMap ids,
Size  fp_cutoff,
const String identifier,
bool  include_unassigned_peptides = false 
) const

calculates the AUC until the first fp_cutoff False positive pep IDs. if fp_cutoff = 0, it will calculate the full AUC. Restricted to IDs from a specific ID run with identifier.

◆ rocN() [3/5]

double rocN ( const ScoreToTgtDecLabelPairs scores_labels,
Size  fpCutoff = 50 
) const

calculates AUC of empirical FDR up to the first fpCutoff false positives on the fly. Does not store results. use e.g. fpCutoff = scores_labels.size() for complete AUC

◆ rocN() [4/5]

double rocN ( const std::vector< PeptideIdentification > &  ids,
Size  fp_cutoff 
) const

calculates the AUC until the first fp_cutoff False positive pep IDs (currently only takes all runs together) if fp_cutoff = 0, it will calculate the full AUC

◆ rocN() [5/5]

double rocN ( const std::vector< PeptideIdentification > &  ids,
Size  fp_cutoff,
const String identifier 
) const

calculates the AUC until the first fp_cutoff False positive pep IDs (currently only takes all runs together) if fp_cutoff = 0, it will calculate the full AUC. Restricted to IDs from a specific ID run.

◆ trapezoidal_area()

double trapezoidal_area ( double  x1,
double  x2,
double  y1,
double  y2 
) const
private

calculates the trapezoidal area for a trapezoid with a flat horizontal base e.g. for an AUC

◆ trapezoidal_area_xEqy()

double trapezoidal_area_xEqy ( double  exp1,
double  exp2,
double  act1,
double  act2 
) const
private

calculates the error area around the x=x line between two consecutive values of expected and actual i.e. it assumes exp2 > exp1