Implements a mixture model of the inverse gumbel and the gauss distribution or a gaussian mixture. More...

#include <OpenMS/MATH/STATISTICS/PosteriorErrorProbabilityModel.h>

Inheritance diagram for PosteriorErrorProbabilityModel:

Public Member Functions
	PosteriorErrorProbabilityModel ()
	default constructor More...

	~PosteriorErrorProbabilityModel () override
	Destructor. More...

bool	fit (std::vector< double > &search_engine_scores)
	fits the distributions to the data points(search_engine_scores). Estimated parameters for the distributions are saved in member variables. computeProbability can be used afterwards. More...

bool	fit (std::vector< double > &search_engine_scores, std::vector< double > &probabilities)
	fits the distributions to the data points(search_engine_scores) and writes the computed probabilities into the given vector (the second one). More...

void	fillDensities (std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density)
	Writes the distributions densities into the two vectors for a set of scores. Incorrect_densities represent the incorrectly assigned sequences. More...

double	computeMaxLikelihood (std::vector< double > &incorrect_density, std::vector< double > &correct_density)
	computes the Maximum Likelihood with a log-likelihood function. More...

double	one_minus_sum_post (std::vector< double > &incorrect_density, std::vector< double > &correct_density)
	sums (1 - posterior probabilities) More...

double	sum_post (std::vector< double > &incorrect_density, std::vector< double > &correct_density)
	sums posterior probabilities More...

double	sum_pos_x0 (std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density)
	helper function for the EM algorithm (for fitting) More...

double	sum_neg_x0 (std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density)
	helper function for the EM algorithm (for fitting) More...

double	sum_pos_sigma (std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density, double positive_mean)
	helper function for the EM algorithm (for fitting) More...

double	sum_neg_sigma (std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density, double positive_mean)
	helper function for the EM algorithm (for fitting) More...

GaussFitter::GaussFitResult	getCorrectlyAssignedFitResult () const
	returns estimated parameters for correctly assigned sequences. Fit should be used before. More...

GaussFitter::GaussFitResult	getIncorrectlyAssignedFitResult () const
	returns estimated parameters for correctly assigned sequences. Fit should be used before. More...

double	getNegativePrior () const
	returns the estimated negative prior probability. More...

double	computeProbability (double score) const

TextFile	initPlots (std::vector< double > &x_scores)
	initializes the plots More...

const String	getGumbelGnuplotFormula (const GaussFitter::GaussFitResult &params) const
	returns the gnuplot formula of the fitted gumbel distribution. Only x0 and sigma are used as local parameter alpha and scale parameter beta, respectively. More...

const String	getGaussGnuplotFormula (const GaussFitter::GaussFitResult &params) const
	returns the gnuplot formula of the fitted gauss distribution. More...

const String	getBothGnuplotFormula (const GaussFitter::GaussFitResult &incorrect, const GaussFitter::GaussFitResult &correct) const
	returns the gnuplot formula of the fitted mixture distribution. More...

void	plotTargetDecoyEstimation (std::vector< double > &target, std::vector< double > &decoy)
	plots the estimated distribution against target and decoy hits More...

double	getSmallestScore ()
	returns the smallest score used in the last fit More...

void	tryGnuplot (const String &gp_file)
	try to invoke 'gnuplot' on the file to create PDF automatically More...

Public Member Functions inherited from DefaultParamHandler
	DefaultParamHandler (const String &name)
	Constructor with name that is displayed in error messages. More...

	DefaultParamHandler (const DefaultParamHandler &rhs)
	Copy constructor. More...

virtual	~DefaultParamHandler ()
	Destructor. More...

virtual DefaultParamHandler &	operator= (const DefaultParamHandler &rhs)
	Assignment operator. More...

virtual bool	operator== (const DefaultParamHandler &rhs) const
	Equality operator. More...

void	setParameters (const Param &param)
	Sets the parameters. More...

const Param &	getParameters () const
	Non-mutable access to the parameters. More...

const Param &	getDefaults () const
	Non-mutable access to the default parameters. More...

const String &	getName () const
	Non-mutable access to the name. More...

void	setName (const String &name)
	Mutable access to the name. More...

const std::vector< String > &	getSubsections () const
	Non-mutable access to the registered subsections. More...

Static Public Member Functions
static std::map< String, std::vector< std::vector< double > > >	extractAndTransformScores (const std::vector< ProteinIdentification > &protein_ids, const std::vector< PeptideIdentification > &peptide_ids, const bool split_charge, const bool top_hits_only, const bool target_decoy_available, const double fdr_for_targets_smaller)
	extract and transform score types to a range and score orientation that the PEP model can handle More...

static void	updateScores (const PosteriorErrorProbabilityModel &PEP_model, const String &search_engine, const Int charge, const bool prob_correct, const bool split_charge, std::vector< ProteinIdentification > &protein_ids, std::vector< PeptideIdentification > &peptide_ids, bool &unable_to_fit_data, bool &data_might_not_be_well_fit)
	update score entries with PEP (or 1-PEP) estimates More...

static double	getGumbel_ (double x, const GaussFitter::GaussFitResult &params)
	computes the gumbel density at position x with parameters params. More...

Private Member Functions
PosteriorErrorProbabilityModel &	operator= (const PosteriorErrorProbabilityModel &rhs)
	assignment operator (not implemented) More...

	PosteriorErrorProbabilityModel (const PosteriorErrorProbabilityModel &rhs)
	Copy constructor (not implemented) More...

Static Private Member Functions
static double	transformScore_ (const String &engine, const PeptideHit &hit)
	transform different score types to a range and score orientation that the model can handle (engine string is assumed in upper-case) More...

Private Attributes
GaussFitter::GaussFitResult	incorrectly_assigned_fit_param_
	stores parameters for incorrectly assigned sequences. If gumbel fit was used, A can be ignored. Furthermore, in this case, x0 and sigma are the local parameter alpha and scale parameter beta, respectively. More...

GaussFitter::GaussFitResult	correctly_assigned_fit_param_
	stores gauss parameters More...

double	negative_prior_
	stores final prior probability for negative peptides More...

double	max_incorrectly_
	peak of the incorrectly assigned sequences distribution More...

double	max_correctly_
	peak of the gauss distribution (correctly assigned sequences) More...

double	smallest_score_
	smallest score which was used for fitting the model More...

const String(PosteriorErrorProbabilityModel::*	getNegativeGnuplotFormula_ )(const GaussFitter::GaussFitResult &params) const
	points either to getGumbelGnuplotFormula or getGaussGnuplotFormula depending on whether one uses the gumbel or the gaussian distribution for incorrectly assigned sequences. More...

const String(PosteriorErrorProbabilityModel::*	getPositiveGnuplotFormula_ )(const GaussFitter::GaussFitResult &params) const
	points to getGumbelGnuplotFormula More...

Additional Inherited Members
Protected Member Functions inherited from DefaultParamHandler
virtual void	updateMembers_ ()
	This method is used to update extra member variables at the end of the setParameters() method. More...

void	defaultsToParam_ ()
	Updates the parameters after the defaults have been set in the constructor. More...

Protected Attributes inherited from DefaultParamHandler
Param	param_
	Container for current parameters. More...

Param	defaults_
	Container for default parameters. This member should be filled in the constructor of derived classes! More...

std::vector< String >	subsections_
	Container for registered subsections. This member should be filled in the constructor of derived classes! More...

String	error_name_
	Name that is displayed in error messages during the parameter checking. More...

bool	check_defaults_
	If this member is set to false no checking if parameters in done;. More...

bool	warn_empty_defaults_
	If this member is set to false no warning is emitted when defaults are empty;. More...

Detailed Description

Implements a mixture model of the inverse gumbel and the gauss distribution or a gaussian mixture.

This class fits either a Gumbel distribution and a Gauss distribution to a set of data points or two Gaussian distributions using the EM algorithm. One can output the fit as a gnuplot formula using getGumbelGnuplotFormula() and getGaussGnuplotFormula() after fitting.

Note: All parameters are stored in GaussFitResult. In the case of the Gumbel distribution x0 and sigma represent the local parameter alpha and the scale parameter beta, respectively.

Parameters of this class are:

Name	Type	Default	Restrictions	Description
out_plot	string			If given, the some output files will be saved in the following manner: _scores.txt for the scores and which contains the fitted values for each step of the EM-algorithm, e.g., out_plot = /usr/home/OMSSA123 leads to /usr/home/OMSSA123_scores.txt, /usr/home/OMSSA123 will be written. If no directory is specified, e.g. instead of '/usr/home/OMSSA123' just OMSSA123, the files will be written into the working directory.
number_of_bins	int	100		Number of bins used for visualization. Only needed if each iteration step of the EM-Algorithm will be visualized
incorrectly_assigned	string	Gumbel	Gumbel, Gauss	for 'Gumbel', the Gumbel distribution is used to plot incorrectly assigned sequences. For 'Gauss', the Gauss distribution is used.
max_nr_iterations	int	1000		Bounds the number of iterations for the EM algorithm when convergence is slow.

Note:

If a section name is documented, the documentation is displayed as tooltip.
Advanced parameter names are italic.

Constructor & Destructor Documentation

◆ PosteriorErrorProbabilityModel() [1/2]

PosteriorErrorProbabilityModel ( )

default constructor

◆ ~PosteriorErrorProbabilityModel()

~PosteriorErrorProbabilityModel ( )

override

Destructor.

◆ PosteriorErrorProbabilityModel() [2/2]

PosteriorErrorProbabilityModel ( const PosteriorErrorProbabilityModel & rhs )

private

Copy constructor (not implemented)

Member Function Documentation

◆ computeMaxLikelihood()

double computeMaxLikelihood	(	std::vector< double > &	incorrect_density,
		std::vector< double > &	correct_density
	)

computes the Maximum Likelihood with a log-likelihood function.

◆ computeProbability()

double computeProbability ( double score ) const

Returns the computed posterior error probability for a given score.

Note: : fit has to be used before using this function. Otherwise this function will compute nonsense.

◆ extractAndTransformScores()

static std::map<String, std::vector<std::vector<double> > > extractAndTransformScores	(	const std::vector< ProteinIdentification > &	protein_ids,
		const std::vector< PeptideIdentification > &	peptide_ids,
		const bool	split_charge,
		const bool	top_hits_only,
		const bool	target_decoy_available,
		const double	fdr_for_targets_smaller
	)

static

extract and transform score types to a range and score orientation that the PEP model can handle

Parameters

protein_ids	the protein identifications
peptide_ids	the peptide identifications
split_charge	whether different charge states should be treated separately
top_hits_only	only consider rank 1
target_decoy_available	whether target decoy information is stored as meta value
fdr_for_targets_smaller	fdr threshold for targets

Returns: engine (and optional charge state) id -> vector of triplets (score, target, decoy)

Note: supported engines are: XTandem,OMSSA,MASCOT,SpectraST,MyriMatch,SimTandem,MSGFPlus,MS-GF+,Comet

◆ fillDensities()

void fillDensities	(	std::vector< double > &	x_scores,
		std::vector< double > &	incorrect_density,
		std::vector< double > &	correct_density
	)

Writes the distributions densities into the two vectors for a set of scores. Incorrect_densities represent the incorrectly assigned sequences.

◆ fit() [1/2]

bool fit ( std::vector< double > & search_engine_scores )

fits the distributions to the data points(search_engine_scores). Estimated parameters for the distributions are saved in member variables. computeProbability can be used afterwards.

Parameters

search_engine_scores a vector which holds the data points

Returns: true if algorithm has run through. Else false will be returned. In that case no plot and no probabilities are calculated.

Note: the vector is sorted from smallest to biggest value!

◆ fit() [2/2]

bool fit	(	std::vector< double > &	search_engine_scores,
		std::vector< double > &	probabilities
	)

fits the distributions to the data points(search_engine_scores) and writes the computed probabilities into the given vector (the second one).

Parameters

search_engine_scores	a vector which holds the data points
probabilities	a vector which holds the probability for each data point after running this function. If it has some content it will be overwritten.

Returns: true if algorithm has run through. Else false will be returned. In that case no plot and no probabilities are calculated.

Note: the vectors are sorted from smallest to biggest value!

◆ getBothGnuplotFormula()

const String getBothGnuplotFormula	(	const GaussFitter::GaussFitResult &	incorrect,
		const GaussFitter::GaussFitResult &	correct
	)		const

returns the gnuplot formula of the fitted mixture distribution.

◆ getCorrectlyAssignedFitResult()

GaussFitter::GaussFitResult getCorrectlyAssignedFitResult ( ) const

inline

returns estimated parameters for correctly assigned sequences. Fit should be used before.

◆ getGaussGnuplotFormula()

const String getGaussGnuplotFormula ( const GaussFitter::GaussFitResult & params ) const

returns the gnuplot formula of the fitted gauss distribution.

◆ getGumbel_()

static double getGumbel_	(	double	x,
		const GaussFitter::GaussFitResult &	params
	)

inlinestatic

computes the gumbel density at position x with parameters params.

References GaussFitter::GaussFitResult::sigma, and GaussFitter::GaussFitResult::x0.

◆ getGumbelGnuplotFormula()

const String getGumbelGnuplotFormula ( const GaussFitter::GaussFitResult & params ) const

returns the gnuplot formula of the fitted gumbel distribution. Only x0 and sigma are used as local parameter alpha and scale parameter beta, respectively.

◆ getIncorrectlyAssignedFitResult()

GaussFitter::GaussFitResult getIncorrectlyAssignedFitResult ( ) const

inline

returns estimated parameters for correctly assigned sequences. Fit should be used before.

◆ getNegativePrior()

double getNegativePrior ( ) const

inline

returns the estimated negative prior probability.

◆ getSmallestScore()

double getSmallestScore ( )

inline

returns the smallest score used in the last fit

◆ initPlots()

TextFile initPlots ( std::vector< double > & x_scores )

initializes the plots

◆ one_minus_sum_post()

double one_minus_sum_post	(	std::vector< double > &	incorrect_density,
		std::vector< double > &	correct_density
	)

sums (1 - posterior probabilities)

◆ operator=()

PosteriorErrorProbabilityModel& operator= ( const PosteriorErrorProbabilityModel & rhs )

private

assignment operator (not implemented)

◆ plotTargetDecoyEstimation()

void plotTargetDecoyEstimation	(	std::vector< double > &	target,
		std::vector< double > &	decoy
	)

plots the estimated distribution against target and decoy hits

◆ sum_neg_sigma()

double sum_neg_sigma	(	std::vector< double > &	x_scores,
		std::vector< double > &	incorrect_density,
		std::vector< double > &	correct_density,
		double	positive_mean
	)

helper function for the EM algorithm (for fitting)

◆ sum_neg_x0()

double sum_neg_x0	(	std::vector< double > &	x_scores,
		std::vector< double > &	incorrect_density,
		std::vector< double > &	correct_density
	)

helper function for the EM algorithm (for fitting)

◆ sum_pos_sigma()

double sum_pos_sigma	(	std::vector< double > &	x_scores,
		std::vector< double > &	incorrect_density,
		std::vector< double > &	correct_density,
		double	positive_mean
	)

helper function for the EM algorithm (for fitting)

◆ sum_pos_x0()

double sum_pos_x0	(	std::vector< double > &	x_scores,
		std::vector< double > &	incorrect_density,
		std::vector< double > &	correct_density
	)

helper function for the EM algorithm (for fitting)

◆ sum_post()

double sum_post	(	std::vector< double > &	incorrect_density,
		std::vector< double > &	correct_density
	)

sums posterior probabilities

◆ transformScore_()

static double transformScore_	(	const String &	engine,
		const PeptideHit &	hit
	)

staticprivate

transform different score types to a range and score orientation that the model can handle (engine string is assumed in upper-case)

◆ tryGnuplot()

void tryGnuplot ( const String & gp_file )

try to invoke 'gnuplot' on the file to create PDF automatically

◆ updateScores()

static void updateScores	(	const PosteriorErrorProbabilityModel &	PEP_model,
		const String &	search_engine,
		const Int	charge,
		const bool	prob_correct,
		const bool	split_charge,
		std::vector< ProteinIdentification > &	protein_ids,
		std::vector< PeptideIdentification > &	peptide_ids,
		bool &	unable_to_fit_data,
		bool &	data_might_not_be_well_fit
	)

static

update score entries with PEP (or 1-PEP) estimates

Parameters

PEP_model	the PEP model used to update the scores
search_engine	the score of search_engine will be updated
charge	identifications with the given charge will be updated
prob_correct	report 1-PEP
split_charge	if charge states have been treated separately
protein_ids	the protein identifications
peptide_ids	the peptide identifications
unable_to_fit_data	there was a problem fitting the data (probabilities are all smaller 0 or larger 1)
data_might_not_be_well_fit	fit was successful but of bad quality (probabilities are all smaller 0.8 and larger 0.2)

Note: supported engines are: XTandem,OMSSA,MASCOT,SpectraST,MyriMatch,SimTandem,MSGFPlus,MS-GF+,Comet

Member Data Documentation

◆ correctly_assigned_fit_param_

GaussFitter::GaussFitResult correctly_assigned_fit_param_

private

stores gauss parameters

◆ getNegativeGnuplotFormula_

const String(PosteriorErrorProbabilityModel::* getNegativeGnuplotFormula_) (const GaussFitter::GaussFitResult &params) const

private

points either to getGumbelGnuplotFormula or getGaussGnuplotFormula depending on whether one uses the gumbel or the gaussian distribution for incorrectly assigned sequences.

◆ getPositiveGnuplotFormula_

const String(PosteriorErrorProbabilityModel::* getPositiveGnuplotFormula_) (const GaussFitter::GaussFitResult &params) const

private

points to getGumbelGnuplotFormula

◆ incorrectly_assigned_fit_param_

GaussFitter::GaussFitResult incorrectly_assigned_fit_param_

private

stores parameters for incorrectly assigned sequences. If gumbel fit was used, A can be ignored. Furthermore, in this case, x0 and sigma are the local parameter alpha and scale parameter beta, respectively.

◆ max_correctly_

double max_correctly_

private

peak of the gauss distribution (correctly assigned sequences)

◆ max_incorrectly_

double max_incorrectly_

private

peak of the incorrectly assigned sequences distribution

◆ negative_prior_

double negative_prior_

private

stores final prior probability for negative peptides

◆ smallest_score_

double smallest_score_

private

smallest score which was used for fitting the model

Public Member Functions

Static Public Member Functions

Private Member Functions

Static Private Member Functions

Private Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ PosteriorErrorProbabilityModel() [1/2]

◆ ~PosteriorErrorProbabilityModel()

◆ PosteriorErrorProbabilityModel() [2/2]

Member Function Documentation

◆ computeMaxLikelihood()

◆ computeProbability()

◆ extractAndTransformScores()

◆ fillDensities()

◆ fit() [1/2]

◆ fit() [2/2]

◆ getBothGnuplotFormula()

◆ getCorrectlyAssignedFitResult()

◆ getGaussGnuplotFormula()

◆ getGumbel_()

◆ getGumbelGnuplotFormula()

◆ getIncorrectlyAssignedFitResult()

◆ getNegativePrior()

◆ getSmallestScore()

◆ initPlots()

◆ one_minus_sum_post()

◆ operator=()

◆ plotTargetDecoyEstimation()

◆ sum_neg_sigma()

◆ sum_neg_x0()

◆ sum_pos_sigma()

◆ sum_pos_x0()

◆ sum_post()

◆ transformScore_()

◆ tryGnuplot()

◆ updateScores()

Member Data Documentation

◆ correctly_assigned_fit_param_

◆ getNegativeGnuplotFormula_

◆ getPositiveGnuplotFormula_

◆ incorrectly_assigned_fit_param_

◆ max_correctly_

◆ max_incorrectly_

◆ negative_prior_

◆ smallest_score_