Computes a protein identification score based on an aggregation of scores of identified peptides.

pot. predecessor tools	→ ProteinInterference →	pot. successor tools
CometAdapter (or other ID engines)		PeptideIndexer
FalseDiscoveryRate
IDFilter

This tool counts and aggregates the scores of peptide sequences that match a protein accession. Only the top PSM for a peptide is used. By default it also annotates the number of peptides used for the calculation (metavalue "nr_found_peptides") and can be used for further filtering. 0 probability peptides are counted but ignored in aggregation method "multiplication".

Note: Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

ProteinInference -- Protein inference based on an aggregation of the scores of the identified peptides.
Full documentation: http://www.openms.de/doxygen/release/3.2.0/html/TOPP_ProteinInference.html
Version: 3.2.0 Nov 26 2024, 13:16:38, Revision: 962e60f
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec
   trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

Usage:
  ProteinInference <options>

Options (mandatory options marked with '*'):
  -in <file>*                                                 Input file(s) (valid formats: 'idXML', 'consens
                                                              usXML')
  -out <file>*                                                Output file (valid formats: 'idXML', 'consensus
                                                              XML')
  -out_type <file>                                            Output file type (valid: 'idXML', 'consensusXML
                                                              ')
  -merge_runs <choice>                                        If your idXML contains multiple runs, merge 
                                                              them beforehand? Otherwise performs inference 
                                                              separately per run. (default: 'all') (valid: 
                                                              'no', 'all')
  -protein_fdr <option>                                       Additionally calculate the target-decoy FDR on 
                                                              protein-level after inference (default: 'false'
                                                              ) (valid: 'true', 'false')
                                                              

Merging:
  -Merging:annotate_origin <choice>                           If true, adds a map_index MetaValue to the Pept
                                                              ideIDs to annotate the IDRun they came from. 
                                                              (default: 'true') (valid: 'true', 'false')
  -Merging:allow_disagreeing_settings                         Force merging of disagreeing runs. Use at your 
                                                              own risk.

Algorithm:
  -Algorithm:min_peptides_per_protein <number>                Minimal number of peptides needed for a protein
                                                               identification. If set to zero, unmatched prot
                                                              eins get a score of -Infinity. If bigger than 
                                                              zero, proteins with less peptides are filtered 
                                                              and evidences removed from the PSMs. PSMs that 
                                                              do not reference any proteins anymore are remov
                                                              ed but the spectrum info is kept. (default: 
                                                              '1') (min: '0')
  -Algorithm:score_aggregation_method <choice>                How to aggregate scores of peptides matching 
                                                              to the same protein? (default: 'best') (valid: 
                                                              'best', 'product', 'sum', 'maximum')
  -Algorithm:treat_charge_variants_separately <choice>        If this is true, different charge variants of 
                                                              the same peptide sequence count as individual 
                                                              evidences. (default: 'true') (valid: 'true', 
                                                              'false')
  -Algorithm:treat_modification_variants_separately <choice>  If this is true, different modification variant
                                                              s of the same peptide sequence count as individ
                                                              ual evidences. (default: 'true') (valid: 'true'
                                                              , 'false')
  -Algorithm:use_shared_peptides <choice>                     If this is true, shared peptides are used as 
                                                              evidences. Note: shared_peptides are not delete
                                                              d and potentially resolved in postprocessing 
                                                              as well. (default: 'true') (valid: 'true', 'fal
                                                              se')
  -Algorithm:skip_count_annotation                            If this is set, peptide counts won't be annotat
                                                              ed at the proteins.
  -Algorithm:annotate_indistinguishable_groups <choice>       If this is true, calculates and annotates indis
                                                              tinguishable protein groups. (default: 'true') 
                                                              (valid: 'true', 'false')
  -Algorithm:greedy_group_resolution                          If this is true, shared peptides will be associ
                                                              ated to best proteins only (i.e. become potenti
                                                              ally quantifiable razor peptides).

                                                              
Common TOPP options:
  -ini <file>                                                 Use the given TOPP INI file
  -threads <n>                                                Sets the number of threads allowed to be used 
                                                              by the TOPP tool (default: '1')
  -write_ini <file>                                           Writes the default configuration file
  --help                                                      Shows options
  --helphelp                                                  Shows all options (including advanced)

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+ProteinInferenceProtein inference based on an aggregation of the scores of the identified peptides.

version3.2.0 Version of the tool that generated this parameters file.

++1Instance '1' section for 'ProteinInference'

in[] input file(s)input file*.idXML, *.consensusXML

out output fileoutput file*.idXML, *.consensusXML

out_type output file typeidXML, consensusXML

merge_runsall If your idXML contains multiple runs, merge them beforehand? Otherwise performs inference separately per run.no, all

protein_fdrfalse Additionally calculate the target-decoy FDR on protein-level after inferencetrue, false

conservative_fdrtrue Use (D+1)/(T) instead of (D+1)/(T+D) for reporting protein FDRs.true, false

picked_fdrtrue Use picked protein FDRs.true, false

picked_decoy_string If using picked protein FDRs, which decoy string was used? Leave blank for auto-detection.

picked_decoy_prefixprefix If using picked protein FDRs, was the decoy string a prefix or suffix? Ignored during auto-detection.prefix, suffix

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue, false

forcefalse Overrides tool-specific checkstrue, false

testfalse Enables the test mode (needed for internal use only)true, false

+++Merging

annotate_origintrue If true, adds a map_index MetaValue to the PeptideIDs to annotate the IDRun they came from.true, false

allow_disagreeing_settingsfalse Force merging of disagreeing runs. Use at your own risk.true, false

+++Algorithm

min_peptides_per_protein1 Minimal number of peptides needed for a protein identification. If set to zero, unmatched proteins get a score of -Infinity. If bigger than zero, proteins with less peptides are filtered and evidences removed from the PSMs. PSMs that do not reference any proteins anymore are removed but the spectrum info is kept.0:∞

score_aggregation_methodbest How to aggregate scores of peptides matching to the same protein?best, product, sum, maximum

treat_charge_variants_separatelytrue If this is true, different charge variants of the same peptide sequence count as individual evidences.true, false

treat_modification_variants_separatelytrue If this is true, different modification variants of the same peptide sequence count as individual evidences.true, false

use_shared_peptidestrue If this is true, shared peptides are used as evidences. Note: shared_peptides are not deleted and potentially resolved in postprocessing as well.true, false

skip_count_annotationfalse If this is set, peptide counts won't be annotated at the proteins.true, false

annotate_indistinguishable_groupstrue If this is true, calculates and annotates indistinguishable protein groups.true, false

greedy_group_resolutionfalse If this is true, shared peptides will be associated to best proteins only (i.e. become potentially quantifiable razor peptides).true, false