Runs the protein inference engine Fido.
pot. predecessor tools | FidoAdapter | pot. successor tools |
PeptideIndexer | ProteinQuantifier (via protein_groups parameter) | |
IDPosteriorErrorProbability (with prob_correct option) | ||
IDScoreSwitcher |
This tool wraps the protein inference algorithm Fido (http://noble.gs.washington.edu/proj/fido/). Fido uses a Bayesian probabilistic model to group and score proteins based on peptide-spectrum matches. It was published in:
Serang et al.: Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data (J. Proteome Res., 2010).
By default, this adapter runs the Fido variant with parameter estimation (FidoChooseParameters
), as recommended by the authors of Fido. However, it is also possible to run "pure" Fido by setting the prob:protein
, prob:peptide
and prob:spurious
parameters, if appropriate values are known (e.g. from a previous Fido run). Other parameters, except for log2_states
, are not applicable in this case.
Depending on the separate_runs
setting, data from input files containing multiple protein identification runs (e.g. several replicates or different search engines) will be merged (default) or annotated separately.
Input format:
Care has to be taken to provide suitable input data for this adapter. In the peptide/protein identification results (e.g. coming from a database search engine), the proteins have to be annotated with target/decoy meta data. To achieve this, run PeptideIndexer.
In addition, the scores for peptide hits in the input data have to be posterior probabilities - as produced e.g. by PeptideProphet in the TPP or by IDPosteriorErrorProbability (with the prob_correct
option switched on) in OpenMS. If scores are found to be posterior error probabilities (PEPs, lower is better), they are converted to posterior probabilities (higher is better) using "1 - PEP".
If the posterior (error) probabilities are stored in user parameters ("UserParam") in the idXML instead of in the score fields, IDScoreSwitcher can be used to rewrite the scores. (This may be the case e.g. if FalseDiscoveryRate and IDFilter were applied for FDR filtering prior to protein inference.)
Output format:
The output of this tool is an augmented version of the input: The protein groups and accompanying posterior probabilities inferred by Fido are stored as "indistinguishable protein groups", attached to the protein identification run(s) of the input data. Also attached are meta values recording the Fido parameters (Fido_prob_protein
, Fido_prob_peptide
, Fido_prob_spurious
).
The result can be passed to ProteinQuantifier via its protein_groups
parameter, to have the protein grouping taken into account during quantification.
Note that if the input contains multiple identification runs and separate_runs
is not set (the default), the identification data from all runs will be pooled for the Fido analysis and the result will only contain one (merged) identification run. This is the desired behavior if the protein grouping should be used by ProteinQuantifier. When the greedy_group_resolution
flag is set, "peptide to indistinguishable proteins" mappings will be unique in the output and the actual resolved groups are added as "protein groups", attached to the protein identification run(s) of the input data (in addition to the "indistinguishable protein groups").
The command line parameters of this tool are:
FidoAdapter -- Runs the protein inference engine Fido. Version: 2.3.0 Jan 9 2018, 17:46:23, Revision: 38ae115 Usage: FidoAdapter <options> Options (mandatory options marked with '*'): -in <file>* Input: identification results (valid formats: 'idXML') -out <file>* Output: identification results with scored/grouped proteins (valid formats: 'idXML') -fido_executable <path>* Path to the Fido executable to use; may be empty if the executable is globally available. -fidocp_executable <path>* Path to the FidoChooseParameters executable to use; may be empty if the executa ble is globally available. -separate_runs Process multiple protein identification runs in the input separately, don't merge them. Merging results in loss of descriptive information of the single protein identification runs. -greedy_group_resolution Post-process Fido output with greedy resolution of shared peptides based on the protein probabilities. Also adds the resolved ambiguity groups to output. -no_cleanup Omit clean-up of peptide sequences (removal of non-letter characters, replaceme nt of I with L) -all_PSMs Consider all PSMs of each peptide, instead of only the best one -group_level Perform inference on protein group level (instead of individual protein level). This will lead to higher probabilities for (bigger) protein groups. -log2_states <number> Binary logarithm of the max. number of connected states in a subgraph. For a value N, subgraphs that are bigger than 2^N will be split up, sacrificing accur acy for runtime. '0' uses the default (18). (default: '0' min: '0') Probability values for running Fido directly, i.e. without parameter estimation (in which case other settings , except 'log2_states', are ignored): -prob:protein <value> Protein prior probability ('gamma' parameter) (default: '0' min: '0') -prob:peptide <value> Peptide emission probability ('alpha' parameter) (default: '0' min: '0') -prob:spurious <value> Spurious peptide identification probability ('beta' parameter) (default: '0' min: '0') Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced)
INI file documentation of this tool:
OpenMS / TOPP release 2.3.0 | Documentation generated on Tue Jan 9 2018 18:22:06 using doxygen 1.8.13 |