OpenMS  2.4.0
Public Member Functions | Static Public Member Functions | Private Member Functions | Static Private Member Functions | Private Attributes | List of all members
PosteriorErrorProbabilityModel Class Reference

Implements a mixture model of the inverse gumbel and the gauss distribution or a gaussian mixture. More...

#include <OpenMS/MATH/STATISTICS/PosteriorErrorProbabilityModel.h>

Inheritance diagram for PosteriorErrorProbabilityModel:
DefaultParamHandler

Public Member Functions

 PosteriorErrorProbabilityModel ()
 default constructor More...
 
 ~PosteriorErrorProbabilityModel () override
 Destructor. More...
 
bool fit (std::vector< double > &search_engine_scores)
 fits the distributions to the data points(search_engine_scores). Estimated parameters for the distributions are saved in member variables. computeProbability can be used afterwards. More...
 
bool fit (std::vector< double > &search_engine_scores, std::vector< double > &probabilities)
 fits the distributions to the data points(search_engine_scores) and writes the computed probabilities into the given vector (the second one). More...
 
void fillDensities (std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density)
 Writes the distributions densities into the two vectors for a set of scores. Incorrect_densities represent the incorrectly assigned sequences. More...
 
double computeMaxLikelihood (std::vector< double > &incorrect_density, std::vector< double > &correct_density)
 computes the Maximum Likelihood with a log-likelihood function. More...
 
double one_minus_sum_post (std::vector< double > &incorrect_density, std::vector< double > &correct_density)
 sums (1 - posterior probabilities) More...
 
double sum_post (std::vector< double > &incorrect_density, std::vector< double > &correct_density)
 sums posterior probabilities More...
 
double sum_pos_x0 (std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density)
 helper function for the EM algorithm (for fitting) More...
 
double sum_neg_x0 (std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density)
 helper function for the EM algorithm (for fitting) More...
 
double sum_pos_sigma (std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density, double positive_mean)
 helper function for the EM algorithm (for fitting) More...
 
double sum_neg_sigma (std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density, double positive_mean)
 helper function for the EM algorithm (for fitting) More...
 
GaussFitter::GaussFitResult getCorrectlyAssignedFitResult () const
 returns estimated parameters for correctly assigned sequences. Fit should be used before. More...
 
GaussFitter::GaussFitResult getIncorrectlyAssignedFitResult () const
 returns estimated parameters for correctly assigned sequences. Fit should be used before. More...
 
double getNegativePrior () const
 returns the estimated negative prior probability. More...
 
double computeProbability (double score) const
 
TextFile initPlots (std::vector< double > &x_scores)
 initializes the plots More...
 
const String getGumbelGnuplotFormula (const GaussFitter::GaussFitResult &params) const
 returns the gnuplot formula of the fitted gumbel distribution. Only x0 and sigma are used as local parameter alpha and scale parameter beta, respectively. More...
 
const String getGaussGnuplotFormula (const GaussFitter::GaussFitResult &params) const
 returns the gnuplot formula of the fitted gauss distribution. More...
 
const String getBothGnuplotFormula (const GaussFitter::GaussFitResult &incorrect, const GaussFitter::GaussFitResult &correct) const
 returns the gnuplot formula of the fitted mixture distribution. More...
 
void plotTargetDecoyEstimation (std::vector< double > &target, std::vector< double > &decoy)
 plots the estimated distribution against target and decoy hits More...
 
double getSmallestScore ()
 returns the smallest score used in the last fit More...
 
void tryGnuplot (const String &gp_file)
 try to invoke 'gnuplot' on the file to create PDF automatically More...
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
virtual DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 

Static Public Member Functions

static std::map< String, std::vector< std::vector< double > > > extractAndTransformScores (const std::vector< ProteinIdentification > &protein_ids, const std::vector< PeptideIdentification > &peptide_ids, const bool split_charge, const bool top_hits_only, const bool target_decoy_available, const double fdr_for_targets_smaller)
 extract and transform score types to a range and score orientation that the PEP model can handle More...
 
static void updateScores (const PosteriorErrorProbabilityModel &PEP_model, const String &search_engine, const Int charge, const bool prob_correct, const bool split_charge, std::vector< ProteinIdentification > &protein_ids, std::vector< PeptideIdentification > &peptide_ids, bool &unable_to_fit_data, bool &data_might_not_be_well_fit)
 update score entries with PEP (or 1-PEP) estimates More...
 
static double getGumbel_ (double x, const GaussFitter::GaussFitResult &params)
 computes the gumbel density at position x with parameters params. More...
 

Private Member Functions

PosteriorErrorProbabilityModeloperator= (const PosteriorErrorProbabilityModel &rhs)
 assignment operator (not implemented) More...
 
 PosteriorErrorProbabilityModel (const PosteriorErrorProbabilityModel &rhs)
 Copy constructor (not implemented) More...
 

Static Private Member Functions

static double transformScore_ (const String &engine, const PeptideHit &hit)
 transform different score types to a range and score orientation that the model can handle (engine string is assumed in upper-case) More...
 

Private Attributes

GaussFitter::GaussFitResult incorrectly_assigned_fit_param_
 stores parameters for incorrectly assigned sequences. If gumbel fit was used, A can be ignored. Furthermore, in this case, x0 and sigma are the local parameter alpha and scale parameter beta, respectively. More...
 
GaussFitter::GaussFitResult correctly_assigned_fit_param_
 stores gauss parameters More...
 
double negative_prior_
 stores final prior probability for negative peptides More...
 
double max_incorrectly_
 peak of the incorrectly assigned sequences distribution More...
 
double max_correctly_
 peak of the gauss distribution (correctly assigned sequences) More...
 
double smallest_score_
 smallest score which was used for fitting the model More...
 
const String(PosteriorErrorProbabilityModel::* getNegativeGnuplotFormula_ )(const GaussFitter::GaussFitResult &params) const
 points either to getGumbelGnuplotFormula or getGaussGnuplotFormula depending on whether one uses the gumbel or the gaussian distribution for incorrectly assigned sequences. More...
 
const String(PosteriorErrorProbabilityModel::* getPositiveGnuplotFormula_ )(const GaussFitter::GaussFitResult &params) const
 points to getGumbelGnuplotFormula More...
 

Additional Inherited Members

- Protected Member Functions inherited from DefaultParamHandler
virtual void updateMembers_ ()
 This method is used to update extra member variables at the end of the setParameters() method. More...
 
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 

Detailed Description

Implements a mixture model of the inverse gumbel and the gauss distribution or a gaussian mixture.

This class fits either a Gumbel distribution and a Gauss distribution to a set of data points or two Gaussian distributions using the EM algorithm. One can output the fit as a gnuplot formula using getGumbelGnuplotFormula() and getGaussGnuplotFormula() after fitting.

Note
All parameters are stored in GaussFitResult. In the case of the Gumbel distribution x0 and sigma represent the local parameter alpha and the scale parameter beta, respectively.
Parameters of this class are:

NameTypeDefaultRestrictionsDescription
out_plot string  If given, the some output files will be saved in the following manner: _scores.txt for the scores and which contains the fitted values for each step of the EM-algorithm, e.g., out_plot = /usr/home/OMSSA123 leads to /usr/home/OMSSA123_scores.txt, /usr/home/OMSSA123 will be written. If no directory is specified, e.g. instead of '/usr/home/OMSSA123' just OMSSA123, the files will be written into the working directory.
number_of_bins int100  Number of bins used for visualization. Only needed if each iteration step of the EM-Algorithm will be visualized
incorrectly_assigned stringGumbel Gumbel, Gaussfor 'Gumbel', the Gumbel distribution is used to plot incorrectly assigned sequences. For 'Gauss', the Gauss distribution is used.
max_nr_iterations int1000  Bounds the number of iterations for the EM algorithm when convergence is slow.

Note:

Constructor & Destructor Documentation

◆ PosteriorErrorProbabilityModel() [1/2]

default constructor

◆ ~PosteriorErrorProbabilityModel()

Destructor.

◆ PosteriorErrorProbabilityModel() [2/2]

Copy constructor (not implemented)

Member Function Documentation

◆ computeMaxLikelihood()

double computeMaxLikelihood ( std::vector< double > &  incorrect_density,
std::vector< double > &  correct_density 
)

computes the Maximum Likelihood with a log-likelihood function.

◆ computeProbability()

double computeProbability ( double  score) const

Returns the computed posterior error probability for a given score.

Note
: fit has to be used before using this function. Otherwise this function will compute nonsense.

◆ extractAndTransformScores()

static std::map<String, std::vector<std::vector<double> > > extractAndTransformScores ( const std::vector< ProteinIdentification > &  protein_ids,
const std::vector< PeptideIdentification > &  peptide_ids,
const bool  split_charge,
const bool  top_hits_only,
const bool  target_decoy_available,
const double  fdr_for_targets_smaller 
)
static

extract and transform score types to a range and score orientation that the PEP model can handle

Parameters
protein_idsthe protein identifications
peptide_idsthe peptide identifications
split_chargewhether different charge states should be treated separately
top_hits_onlyonly consider rank 1
target_decoy_availablewhether target decoy information is stored as meta value
fdr_for_targets_smallerfdr threshold for targets
Returns
engine (and optional charge state) id -> vector of triplets (score, target, decoy)
Note
supported engines are: XTandem,OMSSA,MASCOT,SpectraST,MyriMatch,SimTandem,MSGFPlus,MS-GF+,Comet

◆ fillDensities()

void fillDensities ( std::vector< double > &  x_scores,
std::vector< double > &  incorrect_density,
std::vector< double > &  correct_density 
)

Writes the distributions densities into the two vectors for a set of scores. Incorrect_densities represent the incorrectly assigned sequences.

◆ fit() [1/2]

bool fit ( std::vector< double > &  search_engine_scores)

fits the distributions to the data points(search_engine_scores). Estimated parameters for the distributions are saved in member variables. computeProbability can be used afterwards.

Parameters
search_engine_scoresa vector which holds the data points
Returns
true if algorithm has run through. Else false will be returned. In that case no plot and no probabilities are calculated.
Note
the vector is sorted from smallest to biggest value!

◆ fit() [2/2]

bool fit ( std::vector< double > &  search_engine_scores,
std::vector< double > &  probabilities 
)

fits the distributions to the data points(search_engine_scores) and writes the computed probabilities into the given vector (the second one).

Parameters
search_engine_scoresa vector which holds the data points
probabilitiesa vector which holds the probability for each data point after running this function. If it has some content it will be overwritten.
Returns
true if algorithm has run through. Else false will be returned. In that case no plot and no probabilities are calculated.
Note
the vectors are sorted from smallest to biggest value!

◆ getBothGnuplotFormula()

const String getBothGnuplotFormula ( const GaussFitter::GaussFitResult incorrect,
const GaussFitter::GaussFitResult correct 
) const

returns the gnuplot formula of the fitted mixture distribution.

◆ getCorrectlyAssignedFitResult()

GaussFitter::GaussFitResult getCorrectlyAssignedFitResult ( ) const
inline

returns estimated parameters for correctly assigned sequences. Fit should be used before.

◆ getGaussGnuplotFormula()

const String getGaussGnuplotFormula ( const GaussFitter::GaussFitResult params) const

returns the gnuplot formula of the fitted gauss distribution.

◆ getGumbel_()

static double getGumbel_ ( double  x,
const GaussFitter::GaussFitResult params 
)
inlinestatic

computes the gumbel density at position x with parameters params.

References GaussFitter::GaussFitResult::sigma, and GaussFitter::GaussFitResult::x0.

◆ getGumbelGnuplotFormula()

const String getGumbelGnuplotFormula ( const GaussFitter::GaussFitResult params) const

returns the gnuplot formula of the fitted gumbel distribution. Only x0 and sigma are used as local parameter alpha and scale parameter beta, respectively.

◆ getIncorrectlyAssignedFitResult()

GaussFitter::GaussFitResult getIncorrectlyAssignedFitResult ( ) const
inline

returns estimated parameters for correctly assigned sequences. Fit should be used before.

◆ getNegativePrior()

double getNegativePrior ( ) const
inline

returns the estimated negative prior probability.

◆ getSmallestScore()

double getSmallestScore ( )
inline

returns the smallest score used in the last fit

◆ initPlots()

TextFile initPlots ( std::vector< double > &  x_scores)

initializes the plots

◆ one_minus_sum_post()

double one_minus_sum_post ( std::vector< double > &  incorrect_density,
std::vector< double > &  correct_density 
)

sums (1 - posterior probabilities)

◆ operator=()

assignment operator (not implemented)

◆ plotTargetDecoyEstimation()

void plotTargetDecoyEstimation ( std::vector< double > &  target,
std::vector< double > &  decoy 
)

plots the estimated distribution against target and decoy hits

◆ sum_neg_sigma()

double sum_neg_sigma ( std::vector< double > &  x_scores,
std::vector< double > &  incorrect_density,
std::vector< double > &  correct_density,
double  positive_mean 
)

helper function for the EM algorithm (for fitting)

◆ sum_neg_x0()

double sum_neg_x0 ( std::vector< double > &  x_scores,
std::vector< double > &  incorrect_density,
std::vector< double > &  correct_density 
)

helper function for the EM algorithm (for fitting)

◆ sum_pos_sigma()

double sum_pos_sigma ( std::vector< double > &  x_scores,
std::vector< double > &  incorrect_density,
std::vector< double > &  correct_density,
double  positive_mean 
)

helper function for the EM algorithm (for fitting)

◆ sum_pos_x0()

double sum_pos_x0 ( std::vector< double > &  x_scores,
std::vector< double > &  incorrect_density,
std::vector< double > &  correct_density 
)

helper function for the EM algorithm (for fitting)

◆ sum_post()

double sum_post ( std::vector< double > &  incorrect_density,
std::vector< double > &  correct_density 
)

sums posterior probabilities

◆ transformScore_()

static double transformScore_ ( const String engine,
const PeptideHit hit 
)
staticprivate

transform different score types to a range and score orientation that the model can handle (engine string is assumed in upper-case)

◆ tryGnuplot()

void tryGnuplot ( const String gp_file)

try to invoke 'gnuplot' on the file to create PDF automatically

◆ updateScores()

static void updateScores ( const PosteriorErrorProbabilityModel PEP_model,
const String search_engine,
const Int  charge,
const bool  prob_correct,
const bool  split_charge,
std::vector< ProteinIdentification > &  protein_ids,
std::vector< PeptideIdentification > &  peptide_ids,
bool &  unable_to_fit_data,
bool &  data_might_not_be_well_fit 
)
static

update score entries with PEP (or 1-PEP) estimates

Parameters
PEP_modelthe PEP model used to update the scores
search_enginethe score of search_engine will be updated
chargeidentifications with the given charge will be updated
prob_correctreport 1-PEP
split_chargeif charge states have been treated separately
protein_idsthe protein identifications
peptide_idsthe peptide identifications
unable_to_fit_datathere was a problem fitting the data (probabilities are all smaller 0 or larger 1)
data_might_not_be_well_fitfit was successful but of bad quality (probabilities are all smaller 0.8 and larger 0.2)
Note
supported engines are: XTandem,OMSSA,MASCOT,SpectraST,MyriMatch,SimTandem,MSGFPlus,MS-GF+,Comet

Member Data Documentation

◆ correctly_assigned_fit_param_

GaussFitter::GaussFitResult correctly_assigned_fit_param_
private

stores gauss parameters

◆ getNegativeGnuplotFormula_

const String(PosteriorErrorProbabilityModel::* getNegativeGnuplotFormula_) (const GaussFitter::GaussFitResult &params) const
private

points either to getGumbelGnuplotFormula or getGaussGnuplotFormula depending on whether one uses the gumbel or the gaussian distribution for incorrectly assigned sequences.

◆ getPositiveGnuplotFormula_

const String(PosteriorErrorProbabilityModel::* getPositiveGnuplotFormula_) (const GaussFitter::GaussFitResult &params) const
private

points to getGumbelGnuplotFormula

◆ incorrectly_assigned_fit_param_

GaussFitter::GaussFitResult incorrectly_assigned_fit_param_
private

stores parameters for incorrectly assigned sequences. If gumbel fit was used, A can be ignored. Furthermore, in this case, x0 and sigma are the local parameter alpha and scale parameter beta, respectively.

◆ max_correctly_

double max_correctly_
private

peak of the gauss distribution (correctly assigned sequences)

◆ max_incorrectly_

double max_incorrectly_
private

peak of the incorrectly assigned sequences distribution

◆ negative_prior_

double negative_prior_
private

stores final prior probability for negative peptides

◆ smallest_score_

double smallest_score_
private

smallest score which was used for fitting the model