ConsensusID

Computes a consensus from results of multiple peptide identification engines.

potential predecessor tools	$\longrightarrow$ ConsensusID $\longrightarrow$	potential successor tools
IDPosteriorErrorProbability		PeptideIndexer
IDFilter
IDMapper

Reference:

Nahnsen et al.: Probabilistic consensus scoring improves tandem mass spectrometry peptide identification (J. Proteome Res., 2011, PMID: 21644507).

Algorithms:

ConsensusID offers several algorithms that can aggregate results from multiple peptide identification engines ("search engines") into consensus identifications - typically one per MS2 spectrum. This works especially well for search engines that provide more than one peptide hit per spectrum, i.e. that report not just the best hit, but also a list of runner-up candidates with corresponding scores.

The available algorithms are (see also OpenMS::ConsensusIDAlgorithm and its subclasses):

PEPMatrix: Scoring based on posterior error probabilities (PEPs) and peptide sequence similarities. This algorithm uses a substitution matrix to score the similarity of sequences not listed by all search engines. It requires PEPs as the scores for all peptide hits.
PEPIons: Scoring based on posterior error probabilities (PEPs) and fragment ion similarities ("shared peak count"). This algorithm, too, requires PEPs as scores.
best: For each peptide ID, this uses the best score of any search engine as the consensus score. All peptide IDs must have the same score type.
worst: For each peptide ID, this uses the worst score of any search engine as the consensus score. All peptide IDs must have the same score type.
average: For each peptide ID, this uses the average score of all search engines as the consensus score. Again, all peptide IDs must have the same score type.
ranks: Calculates a consensus score based on the ranks of peptide IDs in the results of different search engines. The final score is in the range (0, 1], with 1 being the best score. The input peptide IDs do not need to have the same score type.

PEPs for search results can be calculated using the IDPosteriorErrorProbability tool, which supports a variety of search engines.

Note: Important: All protein-level identification results will be lost by applying ConsensusID. (It is unclear how potentially conflicting protein-level results from different search engines should be combined.) If necessary, run the PeptideIndexer tool to add protein references for peptides again.; Peptides with different post-translational modifications (PTMs), or with different site localizations of the same PTMs, are treated as different peptides by all algorithms. However, a qualification applies for the PEPMatrix algorithm: The similarity scoring method used there can only take unmodified peptide sequences into account, so PTMs are ignored during that step. However, the PTMs are not removed from the peptides, and there will be separate results for differently-modified peptides.

File types:

Different input files types are supported:

idXML: A file containing multiple identification runs, typically from different search engines. Use IDMerger to merge individual idXML files from different search runs into one. During the ConsensusID analysis, the identification results will be grouped according to their originating MS2 spectra, based on retention time and precursor m/z information (see parameters rt_delta and mz_delta). One consensus identification will be generated for each group.
featureXML or consensusXML: Given (consensus) features annotated with peptide identifications from multiple search runs, one consensus identification is created for every annotated feature. Peptide identifications not assigned to features are not considered and will be removed. See IDMapper for the task of mapping peptide identifications to feature maps or consensus maps.

Note: Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

Filtering:

Generally, search results can be filtered according to various criteria using IDFilter before (or after) applying this tool. ConsensusID itself offers only a limited number of filtering options that are especially useful in its context (see the filter parameter section):

considered_hits: Limits the number of alternative peptide hits considered per spectrum/feature for each identification run. This helps to reduce runtime, especially for the PEPMatrix and PEPIons algorithms, which involve costly "all vs. all" comparisons of peptide hits.
min_support: This allows filtering of peptide hits based on agreement between search engines. Every peptide sequence in the analysis has been identified by at least one search run. This parameter defines which fraction (between 0 and 1) of the remaining search runs must "support" a peptide identification that should be kept. The meaning of "support" differs slightly between algorithms: For best, worst, average and rank, each search run supports peptides that it has also identified among its top considered_hits candidates. So min_support simply gives the fraction of additional search engines that must have identified a peptide. (For example, if there are three search runs, and only peptides identified by at least two of them should be kept, set min_support to 0.5.) For the similarity-based algorithms PEPMatrix and PEPIons, the "support" for a peptide is the average similarity of the most-similar peptide from each (other) search run. (In the context of the JPR publication, this is the average of the similarity scores used in the consensus score calculation for a peptide.)
count_empty: Typically not all search engines will provide results for all searched MS2 spectra. This parameter determines whether search runs that provided no results should be counted in the "support" calculation; by default, they are ignored.

The command line parameters of this tool are:

ConsensusID -- Computes a consensus of peptide identifications of several identification engines.
Version: 2.3.0 Jan  9 2018, 17:46:23, Revision: 38ae115

Usage:
  ConsensusID <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option.

Options (mandatory options marked with '*'):
  -in <file>*                       Input file (valid formats: 'idXML', 'featureXML', 'consensusXML')
  -out <file>*                      Output file (valid formats: 'idXML', 'featureXML', 'consensusXML')
                                    
  -rt_delta <value>                 [idXML input only] Maximum allowed retention time deviation between ident
                                    ifications belonging to the same spectrum. (default: '0.1' min: '0')
  -mz_delta <value>                 [idXML input only] Maximum allowed precursor m/z deviation between identi
                                    fications belonging to the same spectrum. (default: '0.1' min: '0')

Options for filtering peptide hits:
  -filter:considered_hits <number>  The number of top hits in each ID run that are considered for consensus 
                                    scoring ('0' for all hits). (default: '0' min: '0')
  -filter:min_support <value>       For each peptide hit from an ID run, the fraction of other ID runs that 
                                    must support that hit (otherwise it is removed). (default: '0' min: '0'
                                    max: '1')
  -filter:count_empty               Count empty ID runs (i.e. those containing no peptide hit for the current
                                    spectrum) when calculating 'min_support'?

  -algorithm <choice>               Algorithm used for consensus scoring.
                                    * PEPMatrix: Scoring based on posterior error probabilities (PEPs) and p
                                    eptide sequence similarities (scored by a substitution matrix). Requires
                                    PEPs as scores.
                                    * PEPIons: Scoring based on posterior error probabilities (PEPs) and fra
                                    gment ion similarities ('shared peak count'). Requires PEPs as scores.
                                    * best: For each peptide ID, use the best score of any search engine as
                                    the consensus score. Requires the same score type in all ID runs.
                                    ...
                                    t', 'average', 'ranks')
                                    
Common TOPP options:
  -ini <file>                       Use the given TOPP INI file
  -threads <n>                      Sets the number of threads allowed to be used by the TOPP tool (default: 
                                    '1')
  -write_ini <file>                 Writes the default configuration file
  --help                            Shows options
  --helphelp                        Shows all options (including advanced)

The following configuration subsections are valid:
 - PEPIons     PEPIons algorithm parameters
 - PEPMatrix   PEPMatrix algorithm parameters

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
Have a look at the OpenMS documentation for more information.

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+ConsensusIDComputes a consensus of peptide identifications of several identification engines.

version2.3.0 Version of the tool that generated this parameters file.

++1Instance '1' section for 'ConsensusID'

in input fileinput file*.idXML,*.featureXML,*.consensusXML

out output fileoutput file*.idXML,*.featureXML,*.consensusXML

rt_delta0.1 [idXML input only] Maximum allowed retention time deviation between identifications belonging to the same spectrum.0:∞

mz_delta0.1 [idXML input only] Maximum allowed precursor m/z deviation between identifications belonging to the same spectrum.0:∞

algorithmPEPMatrix Algorithm used for consensus scoring.
* PEPMatrix: Scoring based on posterior error probabilities (PEPs) and peptide sequence similarities (scored by a substitution matrix). Requires PEPs as scores.
* PEPIons: Scoring based on posterior error probabilities (PEPs) and fragment ion similarities ('shared peak count'). Requires PEPs as scores.
* best: For each peptide ID, use the best score of any search engine as the consensus score. Requires the same score type in all ID runs.
* worst: For each peptide ID, use the worst score of any search engine as the consensus score. Requires the same score type in all ID runs.
* average: For each peptide ID, use the average score of all search engines as the consensus. Requires the same score type in all ID runs.
* ranks: Calculates a consensus score based on the ranks of peptide IDs in the results of different search engines. The final score is in the range (0, 1], with 1 being the best score. No requirements about score types.PEPMatrix,PEPIons,best,worst,average,ranks

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue,false

forcefalse Overwrite tool specific checks.true,false

testfalse Enables the test mode (needed for internal use only)true,false

+++filterOptions for filtering peptide hits

considered_hits0 The number of top hits in each ID run that are considered for consensus scoring ('0' for all hits).0:∞

min_support0 For each peptide hit from an ID run, the fraction of other ID runs that must support that hit (otherwise it is removed).0:1

count_emptyfalse Count empty ID runs (i.e. those containing no peptide hit for the current spectrum) when calculating 'min_support'?true,false

+++PEPIonsPEPIons algorithm parameters

mass_tolerance0.5 Maximum difference between fragment masses (in Da) for fragments to be considered 'shared' between peptides .0:∞

min_shared2 The minimal number of 'shared' fragments (between two suggested peptides) that is necessary to evaluate the similarity based on shared peak count (SPC).1:∞

+++PEPMatrixPEPMatrix algorithm parameters

matrixidentity Substitution matrix to use for alignment-based similarity scoringidentity,PAM30MS

penalty5 Alignment gap penalty (the same value is used for gap opening and extension)1:∞