OpenMS  2.8.0
SvmTheoreticalSpectrumGeneratorTrainer

Trainer for SVM model as input for SvmTheoreticalSpectrumGenerator.

This application requires mzML file with ms2 spectra and annotations in an idXml file and trains a SVM model usable by SvmTheoreticalSpectrumGenerator. Please refer to the documentation of the corresponding class OpenMS::SvmTheoreticalSpectrumGeneratorTrainer

Note
This tool is experimental!
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

SvmTheoreticalSpectrumGeneratorTrainer -- Trainer for SVM models as input for SvmTheoreticalSpectrumGenerator
Full documentation: http://www.openms.de/doxygen/release/2.8.0/html/UTILS_SvmTheoreticalSpectrumGeneratorTrai
ner.html
Version: 2.8.0 Feb 22 2022, 11:52:07, Revision: d203985
To cite OpenMS:
  Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  SvmTheoreticalSpectrumGeneratorTrainer <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option.

Options (mandatory options marked with '*'):
  -in_spectra <file>*          Input Training Spectra in mzML (valid formats: 'mzML')
  -in_identifications <file>*  Input file with corresponding sequences in idXML (valid formats: 'idXML')
  -model_output_file <file>*   Name for output files. For each ion_type one file <filename>_residue_loss_char
                               ge.svm and one <filename>.info which has to be passed to the SvmTheoretical
                               SpectrumGenerator
  -precursor_charge <Int>      Precursor charge state used for model training (default: '2' min: '1' max: 
                               '3')
  -write_training_files        No models are trained but input training files for libSVM command line tools 
                               are produced
                               
Common UTIL options:
  -ini <file>                  Use the given TOPP INI file
  -threads <n>                 Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>            Writes the default configuration file
  --help                       Shows options
  --helphelp                   Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
For more information, please consult the online documentation for this tool:
  - http://www.openms.de/doxygen/release/2.8.0/html/UTILS_SvmTheoreticalSpectrumGeneratorTrainer.html

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+SvmTheoreticalSpectrumGeneratorTrainerTrainer for SVM models as input for SvmTheoreticalSpectrumGenerator
version2.8.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'SvmTheoreticalSpectrumGeneratorTrainer'
in_spectra Input Training Spectra in mzMLinput file*.mzML
in_identifications Input file with corresponding sequences in idXMLinput file*.idXML
model_output_file Name for output files. For each ion_type one file _residue_loss_charge.svm and one .info which has to be passed to the SvmTheoretical SpectrumGeneratoroutput file
precursor_charge2 Precursor charge state used for model training1:3
write_training_filesfalse No models are trained but input training files for libSVM command line tools are producedtrue,false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overrides tool-specific checkstrue,false
testfalse Enables the test mode (needed for internal use only)true,false
+++algorithm
number_intensity_levels7 The number of intensity bins (for secondary type models)
number_regions3 The number of regions each spectrum is split to (for secondary type models)
parent_tolerance2.5 The maximum difference between theoretical and experimental parent mass to accept training spectrum
peak_tolerance0.5 The maximum mass error for a peak to the expected mass of some ion type
add_b_ionstrue Train simulator for b-ionstrue,false
add_y_ionstrue Train simulator for y-ionstrue,false
add_a_ionsfalse Train simulator for a-ionstrue,false
add_c_ionsfalse Train simulator for c-ionstrue,false
add_x_ionsfalse Train simulator for x-ionstrue,false
add_z_ionsfalse Train simulator for z-ionstrue,false
add_lossesfalse Train simulator for neutral losses of H2O and NH3 for b-ions and y-ionstrue,false
add_b2_ionsfalse Train simulator for doubly charged b-ionstrue,false
add_y2_ionsfalse Train simulator for double charged y-ionstrue,false
++++svmParameters controlling SVM trainig behaviour. All parameter names are chosen as in the libSVM library. Please refer to libSVM documentation for explanation
svc_type0 Type of the SVC: 0=C_SVC 1=NU_SVC0:1
svr_type1 Type of the SVR: 0=EPSILON_SVR 1=NU_SVR0:1
scalingtrue Apply scaling of feature valuestrue,false
scaling_lower0.0 Lower bound for scaling
scaling_upper1.0 Upper bound for scaling
n_fold5 n_fold cross validation is performed1:∞
gridfalse Perform grid searchtrue,false
additive_cvfalse Additive step size (if false multiplicative)true,false
+++++svcParameters for svm - classification of missing/abundant
kernel_type2 Type of the kernel: 0=LINEAR 1=POLY 2=RBF 3=SIGMOID0:3
degree3 For POLY1:∞
gamma0.0 For POLY/RBF/SIGMOID0.0:∞
C1.0 Cost of constraint violation
nu0.5 For NU_SVC, ONE_CLASS and NU_SVR
balancingtrue Use class balanced SVC trainingtrue,false
degree_start1 starting point of degree1:∞
degree_step_size2 step size point of degree
degree_stop4 stopping point of degree
gamma_start1.0e-05 starting point of gamma0.0:1.0
gamma_step_size100 step size point of gamma
gamma_stop0.1 stopping point of gamma
c_start0.1 starting point of c
c_step_size100 step size of c
c_stop1000 stopping point of c
nu_start0.3 starting point of nu0.0:1.0
nu_step_size2 step size of nu
nu_stop0.6 stopping point of nu0.0:1.0
+++++svrParameters for svm - regression of peak intensities
kernel_type2 Type of the kernel: 0=LINEAR 1=POLY 2=RBF 3=SIGMOID0:3
degree3 For POLY1:∞
gamma0.0 For POLY/RBF/SIGMOID0.0:∞
C1.0 Cost of constraint violation
p0.1 The epsilon for the loss function in epsilon-SVR
nu0.5 For NU_SVC, ONE_CLASS and NU_SVR
degree_start1 starting point of degree1:∞
degree_step_size2 step size point of degree
degree_stop4 stopping point of degree
gamma_start1.0e-05 starting point of gamma0.0:1.0
gamma_step_size100 step size point of gamma
gamma_stop0.1 stopping point of gamma
p_start1.0e-05 starting point of p
p_step_size100 step size point of p
p_stop0.1 stopping point of p
c_start0.1 starting point of c
c_step_size100 step size of c
c_stop1000 stopping point of c
nu_start0.3 starting point of nu0.0:1.0
nu_step_size2 step size of nu
nu_stop0.6 stopping point of nu0.0:1.0