FidoAdapter

Runs the protein inference engine Fido.

pot. predecessor tools	$\longrightarrow$ FidoAdapter $\longrightarrow$	pot. successor tools
PeptideIndexer		ProteinQuantifier (via `protein_groups` parameter)
IDPosteriorErrorProbability (with `prob_correct` option)
IDScoreSwitcher

This tool wraps the protein inference algorithm Fido (http://noble.gs.washington.edu/proj/fido/). Fido uses a Bayesian probabilistic model to group and score proteins based on peptide-spectrum matches. It was published in:

Serang et al.: Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data (J. Proteome Res., 2010).

By default, this adapter runs the Fido variant with parameter estimation (FidoChooseParameters), as recommended by the authors of Fido. However, it is also possible to run "pure" Fido by setting the prob:protein, prob:peptide and prob:spurious parameters, if appropriate values are known (e.g. from a previous Fido run). Other parameters, except for log2_states, are not applicable in this case.

Depending on the separate_runs setting, data from input files containing multiple protein identification runs (e.g. several replicates or different search engines) will be merged (default) or annotated separately.

Input format:

Care has to be taken to provide suitable input data for this adapter. In the peptide/protein identification results (e.g. coming from a database search engine), the proteins have to be annotated with target/decoy meta data. To achieve this, run PeptideIndexer.
In addition, the scores for peptide hits in the input data have to be posterior probabilities - as produced e.g. by PeptideProphet in the TPP or by IDPosteriorErrorProbability (with the prob_correct option switched on) in OpenMS. If scores are found to be posterior error probabilities (PEPs, lower is better), they are converted to posterior probabilities (higher is better) using "1 - PEP".
If the posterior (error) probabilities are stored in user parameters ("UserParam") in the idXML instead of in the score fields, IDScoreSwitcher can be used to rewrite the scores. (This may be the case e.g. if FalseDiscoveryRate and IDFilter were applied for FDR filtering prior to protein inference.)

Output format:

The output of this tool is an augmented version of the input: The protein groups and accompanying posterior probabilities inferred by Fido are stored as "indistinguishable protein groups", attached to the protein identification run(s) of the input data. Also attached are meta values recording the Fido parameters (Fido_prob_protein, Fido_prob_peptide, Fido_prob_spurious).
The result can be passed to ProteinQuantifier via its protein_groups parameter, to have the protein grouping taken into account during quantification.
Note that if the input contains multiple identification runs and separate_runs is not set (the default), the identification data from all runs will be pooled for the Fido analysis and the result will only contain one (merged) identification run. This is the desired behavior if the protein grouping should be used by ProteinQuantifier. When the greedy_group_resolution flag is set, "peptide to indistinguishable proteins" mappings will be unique in the output and the actual resolved groups are added as "protein groups", attached to the protein identification run(s) of the input data (in addition to the "indistinguishable protein groups").

Note: Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

FidoAdapter -- Runs the protein inference engine Fido.
Version: 2.3.0 Jan  9 2018, 17:46:23, Revision: 38ae115

Usage:
  FidoAdapter <options>

Options (mandatory options marked with '*'):
  -in <file>*                 Input: identification results (valid formats: 'idXML')
  -out <file>*                Output: identification results with scored/grouped proteins (valid formats: 
                              'idXML')
  -fido_executable <path>*    Path to the Fido executable to use; may be empty if the executable is globally 
                              available.
  -fidocp_executable <path>*  Path to the FidoChooseParameters executable to use; may be empty if the executa
                              ble is globally available.
  -separate_runs              Process multiple protein identification runs in the input separately, don't 
                              merge them. Merging results in loss of descriptive information of the single
                              protein identification runs.
  -greedy_group_resolution    Post-process Fido output with greedy resolution of shared peptides based on 
                              the protein probabilities. Also adds the resolved ambiguity groups to output.
  -no_cleanup                 Omit clean-up of peptide sequences (removal of non-letter characters, replaceme
                              nt of I with L)
  -all_PSMs                   Consider all PSMs of each peptide, instead of only the best one
  -group_level                Perform inference on protein group level (instead of individual protein level).
                              This will lead to higher probabilities for (bigger) protein groups.
  -log2_states <number>       Binary logarithm of the max. number of connected states in a subgraph. For a 
                              value N, subgraphs that are bigger than 2^N will be split up, sacrificing accur
                              acy for runtime. '0' uses the default (18). (default: '0' min: '0')

Probability values for running Fido directly, i.e. without parameter estimation (in which case other settings
, except 'log2_states', are ignored):
  -prob:protein <value>       Protein prior probability ('gamma' parameter) (default: '0' min: '0')
  -prob:peptide <value>       Peptide emission probability ('alpha' parameter) (default: '0' min: '0')
  -prob:spurious <value>      Spurious peptide identification probability ('beta' parameter) (default: '0' 
                              min: '0')

                              
Common TOPP options:
  -ini <file>                 Use the given TOPP INI file
  -threads <n>                Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>           Writes the default configuration file
  --help                      Shows options
  --helphelp                  Shows all options (including advanced)

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+FidoAdapterRuns the protein inference engine Fido.

version2.3.0 Version of the tool that generated this parameters file.

++1Instance '1' section for 'FidoAdapter'

in Input: identification resultsinput file*.idXML

out Output: identification results with scored/grouped proteinsoutput file*.idXML

fido_executableFido Path to the Fido executable to use; may be empty if the executable is globally available.input file

fidocp_executableFidoChooseParameters Path to the FidoChooseParameters executable to use; may be empty if the executable is globally available.input file

separate_runsfalse Process multiple protein identification runs in the input separately, don't merge them. Merging results in loss of descriptive information of the single protein identification runs.true,false

keep_zero_groupfalse Keep the group of proteins with estimated probability of zero, which is otherwise removed (it may be very large)true,false

greedy_group_resolutionfalse Post-process Fido output with greedy resolution of shared peptides based on the protein probabilities. Also adds the resolved ambiguity groups to output.true,false

no_cleanupfalse Omit clean-up of peptide sequences (removal of non-letter characters, replacement of I with L)true,false

all_PSMsfalse Consider all PSMs of each peptide, instead of only the best onetrue,false

group_levelfalse Perform inference on protein group level (instead of individual protein level). This will lead to higher probabilities for (bigger) protein groups.true,false

accuracy Accuracy level of start parameters. There is a trade-off between accuracy and runtime. Empty uses the default ('best').,best,relaxed,sloppy

log2_states0 Binary logarithm of the max. number of connected states in a subgraph. For a value N, subgraphs that are bigger than 2^N will be split up, sacrificing accuracy for runtime. '0' uses the default (18).0:∞

log2_states_precalc0 Like 'log2_states', but allows to set a separate limit for the precalculation0:∞

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue,false

forcefalse Overwrite tool specific checks.true,false

testfalse Enables the test mode (needed for internal use only)true,false

+++probProbability values for running Fido directly, i.e. without parameter estimation (in which case other settings, except 'log2_states', are ignored)

protein0 Protein prior probability ('gamma' parameter)0:∞

peptide0 Peptide emission probability ('alpha' parameter)0:∞

spurious0 Spurious peptide identification probability ('beta' parameter)0:∞