OpenMS  2.5.0
PercolatorAdapter

PercolatorAdapter facilitates the input to, the call of and output integration of Percolator. Percolator (http://per-colator.com/) is a tool to apply semi-supervised learning for peptide identification from shotgun proteomics datasets.

Experimental classes:
This tool is work in progress and usage and input requirements might change.
pot. predecessor tools $ \longrightarrow $ PercolatorAdapter $ \longrightarrow $ pot. successor tools
PSMFeatureExtractor IDFilter

Percolator is search engine sensitive, i.e. it's input features vary, depending on the search engine. Must be prepared beforehand. If you do not want to use the specific features, use the generic-feature-set flag. Will incorporate the score attribute of a PSM, so be sure, the score you want is set as main score with TOPP_IDScoreSwitcher . Be aware, that you might very well experience a performance loss compared to the search engine specific features. You can also perform protein inference with percolator when you activate the protein fdr parameter. Additionally you need to set the enzyme setting. We only read the q-value for protein groups since Percolator has a more elaborate FDR estimation. For proteins we add q-value as main score and PEP as metavalue. For PSMs you can choose the main score. Peptide level FDRs cannot be parsed and used yet.

The command line parameters of this tool are:

PercolatorAdapter -- Facilitate input to Percolator and reintegrate.
Full documentation: http://www.openms.de/documentation/TOPP_PercolatorAdapter.html
Version: 2.5.0-nightly-2020-03-06 Mar  7 2020, 01:22:16, Revision: 84b1398
To cite OpenMS:
  Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  PercolatorAdapter <options>

Options (mandatory options marked with '*'):
  -in <files>                           Input file(s) (valid formats: 'mzid', 'idXML')
  -in_decoy <files>                     Input decoy file(s) in case of separate searches (valid formats: 'mzi
                                        d', 'idXML')
  -in_osw <file>                        Input file in OSW format (valid formats: 'OSW')
  -out <file>*                          Output file (valid formats: 'idXML', 'mzid', 'osw')
  -out_type <type>                      Output file type -- default: determined from file extension or conten
                                        t. (valid: 'mzid', 'idXML', 'osw')
  -enzyme <enzyme>                      Type of enzyme: no_enzyme,elastase,pepsin,proteinasek,thermolysin,chy
                                        motrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin,trypsinp (default:
                                        'trypsin' valid: 'no_enzyme', 'elastase', 'pepsin', 'proteinasek',
                                        'thermolysin', 'chymotrypsin', 'lys-n', 'lys-c', 'arg-c', 'asp-n',
                                        'glu-c', 'trypsin', 'trypsinp')
  -percolator_executable <executable>*  The Percolator executable. Provide a full or relative path, or make 
                                        sure it can be found in your PATH environment.
  -peptide-level-fdrs                   Calculate peptide-level FDRs instead of PSM-level FDRs.
  -protein-level-fdrs                   Use the picked protein-level FDR to infer protein probabilities. Use 
                                        the -fasta option and -decoy-pattern to set the Fasta file and decoy
                                        pattern.
  -osw_level <osw_level>                OSW: Either "ms1", "ms2" or "transition"; the data level selected 
                                        for scoring. (default: 'ms2')
  -score_type <type>                    Type of the peptide main score (default: 'q-value' valid: 'q-value', 
                                        'pep', 'svm')
                                        
Common TOPP options:
  -ini <file>                           Use the given TOPP INI file
  -threads <n>                          Sets the number of threads allowed to be used by the TOPP tool (defau
                                        lt: '1')
  -write_ini <file>                     Writes the default configuration file
  --help                                Shows options
  --helphelp                            Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+PercolatorAdapterFacilitate input to Percolator and reintegrate.
version2.5.0-nightly-2020-03-06 Version of the tool that generated this parameters file.
++1Instance '1' section for 'PercolatorAdapter'
in[] Input file(s)input file*.mzid,*.idXML
in_decoy[] Input decoy file(s) in case of separate searchesinput file*.mzid,*.idXML
in_osw Input file in OSW formatinput file*.OSW
out Output fileoutput file*.idXML,*.mzid,*.osw
out_pin Write pin file (e.g., for debugging)output file*.tsv
out_type Output file type -- default: determined from file extension or content.mzid,idXML,osw
enzymetrypsin Type of enzyme: no_enzyme,elastase,pepsin,proteinasek,thermolysin,chymotrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin,trypsinpno_enzyme,elastase,pepsin,proteinasek,thermolysin,chymotrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin,trypsinp
percolator_executablepercolator The Percolator executable. Provide a full or relative path, or make sure it can be found in your PATH environment.input file
peptide-level-fdrsfalse Calculate peptide-level FDRs instead of PSM-level FDRs.true,false
protein-level-fdrsfalse Use the picked protein-level FDR to infer protein probabilities. Use the -fasta option and -decoy-pattern to set the Fasta file and decoy pattern.true,false
osw_levelms2 OSW: Either "ms1", "ms2" or "transition"; the data level selected for scoring.
score_typeq-value Type of the peptide main scoreq-value,pep,svm
generic-feature-setfalse Use only generic (i.e. not search engine specific) features. Generating search engine specific features for common search engines by PSMFeatureExtractor will typically boost the identification rate significantly.true,false
subset-max-train0 Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other PSMs. Recommended when analyzing huge numbers (>1 million) of PSMs. When set to 0, all PSMs are used for training as normal.
cpos0.0 Cpos, penalty for mistakes made on positive examples. Set by cross validation if not specified.
cneg0.0 Cneg, penalty for mistakes made on negative examples. Set by cross validation if not specified.
testFDR0.01 False discovery rate threshold for evaluating best cross validation result and the reported end result.
trainFDR0.01 False discovery rate threshold to define positive examples in training. Set to testFDR if 0.
maxiter10 Maximal number of iterations
quick-validationfalse Quicker execution by reduced internal cross-validation.true,false
weights Output final weights to the given fileoutput file*.tsv
init-weights Read initial weights to the given fileinput file*.tsv
default-direction The most informative feature given as the feature name, can be negated to indicate that a lower value is better.
verbose2 Set verbosity of output: 0=no processing info, 5=all.
unitnormfalse Use unit normalization [0-1] instead of standard deviation normalizationtrue,false
test-each-iterationfalse Measure performance on test set each iterationtrue,false
overridefalse Override error check and do not fall back on default score vector in case of suspect score vectortrue,false
seed1 Setting seed of the random number generator.
doc0 Include description of correct features
klammerfalse Retention time features calculated as in Klammer et al. Only available if -doc is settrue,false
fasta Provide the fasta file as the argument to this flag, which will be used for protein grouping based on an in-silico digest (only valid if option -protein-level-fdrs is active).input file*.FASTA
decoy-patternrandom Define the text pattern to identify the decoy proteins and/or PSMs, set this up if the label that identifies the decoys in the database is not the default (Only valid if option -protein-level-fdrs is active).
post-processing-tdcfalse Use target-decoy competition to assign q-values and PEPs.true,false
train-best-positivefalse Enforce that, for each spectrum, at most one PSM is included in the positive set during each training iteration. If the user only provides one PSM per spectrum, this filter will have no effect.true,false
ipf_max_peakgroup_pep0.7 OSW/IPF: Assess transitions only for candidate peak groups until maximum posterior error probability.
ipf_max_transition_isotope_overlap0.5 OSW/IPF: Maximum isotope overlap to consider transitions in IPF.
ipf_min_transition_sn0.0 OSW/IPF: Minimum log signal-to-noise level to consider transitions in IPF. Set -1 to disable this filter.
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overwrite tool specific checks.true,false
testfalse Enables the test mode (needed for internal use only)true,false

Percolator is written by Lukas Käll (http://per-colator.com/ Copyright Lukas Käll lukas.nosp@m..kal.nosp@m.l@sci.nosp@m.life.nosp@m.lab.s.nosp@m.e)