OpenMS
PTModel

Used to train a model for the prediction of proteotypic peptides.

The input consists of two files: One file contains the positive examples (the peptides which are proteotypic) and the other contains the negative examples (the nonproteotypic peptides).

Parts of this model has been described in the publication

Ole Schulz-Trieglaff, Nico Pfeifer, Clemens Gröpl, Oliver Kohlbacher and Knut Reinert LC-MSsim - a simulation software for Liquid Chromatography Mass Spectrometry data BMC Bioinformatics 2008, 9:423.

There are a number of parameters which can be changed for the svm (specified in the ini file):

  • kernel_type: the kernel function (e.g., POLY for the polynomial kernel, LINEAR for the linear kernel or RBF for the gaussian kernel); we recommend SVMWrapper::OLIGO for our paired oligo-border kernel (POBK)
  • border_length: border length for the POBK
  • k_mer_length: length of the signals considered in the POBK
  • sigma: the amount of positional smoothing for the POBK
  • degree: the degree parameter for the polynomial kernel
  • c: the penalty parameter of the svm
  • nu: the nu parameter for nu-SVC

The last five parameters (sigma, degree, c, nu and p) are used in a cross validation (CV) to find the best parameters according to the training set. Thus, you have to specify the start value of a parameter, the step size in which the parameters should be increased and a final value for the particular parameter such that the tested parameter is never bigger than the given final value. If you want to perform a cross validation, for example, for the parameter c, you have to specify c_start, c_step_size and c_stop in the ini file. Let's say you want to perform a CV for c from 0.1 to 2 with step size 0.1. Open up your ini-file with INIFileEditor and modify the fields c_start, c_step_size, and c_stop accordingly.

If the CV should test additional parameters in a certain range you just include them analogously to the example above. Furthermore, you can specify the number of partitions for the CV with number_of_partitions in the ini file and the number of runs with number_of_runs.


Consequently you have two choices to use this application:

  1. Set the parameters of the svm: The PTModel application will train the svm with the training data and store the svm model.
  2. Give a range of parameters for which a CV should be performed: The PTModel application will perform a CV to find the best parameter combination in the given range and afterwards train the svm with the best parameters and the whole training data. Then the model is stored.


The model can be used in PTPredict, to predict the likelihood for peptides to be proteotypic.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

PTModel -- Trains a model for the prediction of proteotypic peptides from a training set.
Full documentation: http://www.openms.de/doxygen/release/3.0.0/html/TOPP_PTModel.html
Version: 3.0.0 Jul 14 2023, 11:57:33, Revision: be787e9
To cite OpenMS:
 + Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for 
   mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  PTModel <options>

Options (mandatory options marked with '*'):
  -in_positive <file>*            Input file with positive examples (valid formats: 'idXML')
  -in_negative <file>*            Input file with negative examples (valid formats: 'idXML')
  -out <file>*                    Output file: the model in libsvm format (valid formats: 'txt')
  -out_oligo_params <file>        Output file with additional model parameters when using the OLIGO kernel 
                                  (valid formats: 'paramXML')
  -out_oligo_trainset <file>      Output file with the used training dataset when using the OLIGO kernel (val
                                  id formats: 'txt')
  -c <float>                      The penalty parameter of the svm (default: '1.0')
  -svm_type <type>                The type of the svm (NU_SVC or C_SVC) (default: 'C_SVC') (valid: 'NU_SVC', 
                                  'C_SVC')
  -nu <float>                     The nu parameter [0..1] of the svm (for nu-SVR) (default: '0.5') (min: '0.0
                                  ' max: '1.0')
  -kernel_type <type>             The kernel type of the svm (default: 'OLIGO') (valid: 'LINEAR', 'RBF', 'POL
                                  Y', 'OLIGO')
  -degree <int>                   The degree parameter of the kernel function of the svm (POLY kernel) (defau
                                  lt: '1') (min: '1')
  -border_length <int>            Length of the POBK (default: '22') (min: '1')
  -k_mer_length <int>             K_mer length of the POBK (default: '1') (min: '1')
  -sigma <float>                  Sigma of the POBK (default: '5.0')
  -max_positive_count <int>       Quantity of positive samples for training (randomly chosen if smaller than 
                                  available quantity) (default: '1000') (min: '1')
  -max_negative_count <int>       Quantity of positive samples for training (randomly chosen if smaller than 
                                  available quantity) (default: '1000') (min: '1')
  -redundant                      If the input sets are redundant and the redundant peptides should occur 
                                  more than once in the training set, this flag has to be set
  -additive_cv                    If the step sizes should be interpreted additively (otherwise the actual 
                                  value is multiplied with the step size to get the new value
                                  

Parameters for the grid search / cross validation::
  -cv:skip_cv                     Has to be set if the cv should be skipped and the model should just be trai
                                  ned with the specified parameters.
  -cv:number_of_runs <int>        Number of runs for the CV (default: '10') (min: '1')
  -cv:number_of_partitions <int>  Number of CV partitions (default: '10') (min: '2')
  -cv:degree_start <int>          Starting point of degree (default: '1') (min: '1')
  -cv:degree_step_size <int>      Step size point of degree (default: '2')
  -cv:degree_stop <int>           Stopping point of degree (default: '4')
  -cv:c_start <float>             Starting point of c (default: '1.0')
  -cv:c_step_size <float>         Step size of c (default: '100.0')
  -cv:c_stop <float>              Stopping point of c (default: '1000.0')
  -cv:nu_start <float>            Starting point of nu (default: '0.1') (min: '0.0' max: '1.0')
  -cv:nu_step_size <float>        Step size of nu (default: '1.3')
  -cv:nu_stop <float>             Stopping point of nu (default: '0.9') (min: '0.0' max: '1.0')
  -cv:sigma_start <float>         Starting point of sigma (default: '1.0')
  -cv:sigma_step_size <float>     Step size of sigma (default: '1.3')
  -cv:sigma_stop <float>          Stopping point of sigma (default: '15.0')

                                  
Common TOPP options:
  -ini <file>                     Use the given TOPP INI file
  -threads <n>                    Sets the number of threads allowed to be used by the TOPP tool (default: 
                                  '1')
  -write_ini <file>               Writes the default configuration file
  --help                          Shows options
  --helphelp                      Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+PTModelTrains a model for the prediction of proteotypic peptides from a training set.
version3.0.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'PTModel'
in_positive input file with positive examplesinput file*.idXML
in_negative input file with negative examplesinput file*.idXML
out output file: the model in libsvm formatoutput file*.txt
out_oligo_params output file with additional model parameters when using the OLIGO kerneloutput file*.paramXML
out_oligo_trainset output file with the used training dataset when using the OLIGO kerneloutput file*.txt
c1.0 the penalty parameter of the svm
svm_typeC_SVC the type of the svm (NU_SVC or C_SVC)NU_SVC, C_SVC
nu0.5 the nu parameter [0..1] of the svm (for nu-SVR)0.0:1.0
kernel_typeOLIGO the kernel type of the svmLINEAR, RBF, POLY, OLIGO
degree1 the degree parameter of the kernel function of the svm (POLY kernel)1:∞
border_length22 length of the POBK1:∞
k_mer_length1 k_mer length of the POBK1:∞
sigma5.0 sigma of the POBK
max_positive_count1000 quantity of positive samples for training (randomly chosen if smaller than available quantity)1:∞
max_negative_count1000 quantity of positive samples for training (randomly chosen if smaller than available quantity)1:∞
redundantfalse if the input sets are redundant and the redundant peptides should occur more than once in the training set, this flag has to be settrue, false
additive_cvfalse if the step sizes should be interpreted additively (otherwise the actual value is multiplied with the step size to get the new valuetrue, false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false
+++cvParameters for the grid search / cross validation:
skip_cvfalse Has to be set if the cv should be skipped and the model should just be trained with the specified parameters.true, false
number_of_runs10 number of runs for the CV1:∞
number_of_partitions10 number of CV partitions2:∞
degree_start1 starting point of degree1:∞
degree_step_size2 step size point of degree
degree_stop4 stopping point of degree
c_start1.0 starting point of c
c_step_size100.0 step size of c
c_stop1000.0 stopping point of c
nu_start0.1 starting point of nu0.0:1.0
nu_step_size1.3 step size of nu
nu_stop0.9 stopping point of nu0.0:1.0
sigma_start1.0 starting point of sigma
sigma_step_size1.3 step size of sigma
sigma_stop15.0 stopping point of sigma