Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
PercolatorAdapter

PercolatorAdapter facilitates the input to, the call of and output integration of Percolator. Percolator (http://per-colator.com/) is a tool to apply semi-supervised learning for peptide identification from shotgun proteomics datasets.

Experimental classes:
This tool is work in progress and usage and input requirements might change.
pot. predecessor tools $ \longrightarrow $ PercolatorAdapter $ \longrightarrow $ pot. successor tools
PSMFeatureExtractor IDFilter

Percolator is search engine sensitive, i.e. it's input features vary, depending on the search engine. Must be prepared beforehand. If you do not want to use the specific features, use the generic-feature-set flag. Will incorporate the score attribute of a PSM, so be sure, the score you want is set as main score with TOPP_IDScoreSwitcher . Be aware, that you might very well experience a perfomance loss compared to the search engine specific features.

The command line parameters of this tool are:

PercolatorAdapter -- Facilitate input to Percolator and reintegrate.
Version: 2.3.0 Jan  9 2018, 17:46:23, Revision: 38ae115

Usage:
  PercolatorAdapter <options>

Options (mandatory options marked with '*'):
  -in <files>*                          Input file(s) (valid formats: 'mzid', 'idXML')
  -in_decoy <files>                     Input decoy file(s) in case of separate searches (valid formats: 'mzi
                                        d', 'idXML')
  -out <file>                           Output file in idXML format (valid formats: 'idXML')
  -mzid_out <file>                      Output file in mzid format (valid formats: 'mzid')
  -enzyme <enzyme>                      Type of enzyme: no_enzyme,elastase,pepsin,proteinasek,thermolysin,chy
                                        motrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin (default: 'trypsin'
                                        valid: 'no_enzyme', 'elastase', 'pepsin', 'proteinasek', 'thermolysin
                                        ', 'chymotrypsin', 'lys-n', 'lys-c', 'arg-c', 'asp-n', 'glu-c', 'tryp
                                        sin')
  -percolator_executable <executable>*  Percolator executable of the installation e.g. 'percolator.exe'
  -peptide-level-fdrs                   Calculate peptide-level FDRs instead of PSM-level FDRs.
  -protein-level-fdrs                   Use the picked protein-level FDR to infer protein probabilities. Use 
                                        the -fasta option and -decoy-pattern to set the Fasta file and decoy
                                        pattern.
                                        
Common TOPP options:
  -ini <file>                           Use the given TOPP INI file
  -threads <n>                          Sets the number of threads allowed to be used by the TOPP tool (defau
                                        lt: '1')
  -write_ini <file>                     Writes the default configuration file
  --help                                Shows options
  --helphelp                            Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+PercolatorAdapterFacilitate input to Percolator and reintegrate.
version2.3.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'PercolatorAdapter'
in[] Input file(s)input file*.mzid,*.idXML
in_decoy[] Input decoy file(s) in case of separate searchesinput file*.mzid,*.idXML
out Output file in idXML formatoutput file*.idXML
mzid_out Output file in mzid formatoutput file*.mzid
enzymetrypsin Type of enzyme: no_enzyme,elastase,pepsin,proteinasek,thermolysin,chymotrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsinno_enzyme,elastase,pepsin,proteinasek,thermolysin,chymotrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin
percolator_executablepercolator Percolator executable of the installation e.g. 'percolator.exe'input file
peptide-level-fdrsfalse Calculate peptide-level FDRs instead of PSM-level FDRs.true,false
protein-level-fdrsfalse Use the picked protein-level FDR to infer protein probabilities. Use the -fasta option and -decoy-pattern to set the Fasta file and decoy pattern.true,false
generic-feature-setfalse Use only generic (i.e. not search engine specific) features. Generating search engine specific features for common search engines by PSMFeatureExtractor will typically boost the identification rate significantly.true,false
subset-max-train0 Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other PSMs. Recommended when analyzing huge numbers (>1 million) of PSMs. When set to 0, all PSMs are used for training as normal.
cpos0 Cpos, penalty for mistakes made on positive examples. Set by cross validation if not specified.
cneg0 Cneg, penalty for mistakes made on negative examples. Set by cross validation if not specified.
testFDR0.01 False discovery rate threshold for evaluating best cross validation result and the reported end result.
trainFDR0.01 False discovery rate threshold to define positive examples in training. Set to testFDR if 0.
maxiter10 Maximal number of iterations
quick-validationfalse Quicker execution by reduced internal cross-validation.true,false
weights Output final weights to the given fileoutput file
init-weights Read initial weights to the given fileinput file
default-direction The most informative feature given as the feature name, can be negated to indicate that a lower value is better.
verbose2 Set verbosity of output: 0=no processing info, 5=all.
unitnormfalse Use unit normalization [0-1] instead of standard deviation normalizationtrue,false
test-each-iterationfalse Measure performance on test set each iterationtrue,false
overridefalse Override error check and do not fall back on default score vector in case of suspect score vectortrue,false
seed1 Setting seed of the random number generator.
doc0 Include description of correct features
klammerfalse Retention time features calculated as in Klammer et al. Only available if -doc is settrue,false
fasta Provide the fasta file as the argument to this flag, which will be used for protein grouping based on an in-silico digest (only valid if option -protein-level-fdrs is active).input file*.FASTA
decoy-patternrandom Define the text pattern to identify the decoy proteins and/or PSMs, set this up if the label that identifies the decoys in the database is not the default (Only valid if option -protein-level-fdrs is active).
post-processing-tdcfalse Use target-decoy competition to assign q-values and PEPs.true,false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overwrite tool specific checks.true,false
testfalse Enables the test mode (needed for internal use only)true,false

Percolator is written by Lukas Käll (http://per-colator.com/ Copyright Lukas Käll lukas.nosp@m..kal.nosp@m.l@sci.nosp@m.life.nosp@m.lab.s.nosp@m.e)


OpenMS / TOPP release 2.3.0 Documentation generated on Tue Jan 9 2018 18:22:06 using doxygen 1.8.13