Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
MSSimulator

A highly configurable simulator for mass spectrometry experiments.

This implementation is described in

Bielow C, Aiche S, Andreotti S, Reinert K
MSSimulator: Simulation of Mass Spectrometry Data
Journal of Proteome Research (2011), DOI: 10.1021/pr200155f

The most important features are:

Look at the INI file (via "MSSimulator -write_ini myini.ini") to see the available parameters and more functionality.

Input: FASTA files

Protein sequences can be provided as FASTA file. We allow a special tag in the description of each entry to specify protein abundance. If you want to create a complex FASTA file with a Gaussian protein abundance model in log space, see our Python script shipping with your OpenMS installation (e.g., <OpenMS-dir>/share/OpenMS/examples/simulation/FASTAProteinAbundanceSampling.py). It supports (random) sampling from a large FASTA file, protein weight filtering and adds an intensity tag to each entry.

If multiplexed data is simulated (like SILAC or iTRAQ) you need to supply multiple FASTA input files. For the label-free setting, all FASTA input files will be merged into one, before simulation.

For MS/MS simulation only a test model is shipped with OpenMS.
Please find trained models at: http://sourceforge.net/projects/open-ms/files/Supplementary/Simulation/.

To specify intensity values for certain proteins, add an abundance tag for the corresponding protein in the FASTA input file:

e.g.

>seq1 optional comment [# intensity=567.4 #]
ASQYLATARHGFLPRHRDTGILP
>seq2 optional comment [# intensity=117.4, RT=405.3 #]
QKRPSQRHGLATARHGTGGGDRA

The command line parameters of this tool are:

MSSimulator -- A highly configurable simulator for mass spectrometry experiments.
Version: 2.3.0 Jan  9 2018, 17:46:23, Revision: 38ae115

Usage:
  MSSimulator <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option.

Options (mandatory options marked with '*'):
  -in <files>*       Input protein sequences (valid formats: 'FASTA')
  -out <file>        Output: simulated MS raw (profile) data (valid formats: 'mzML')
  -out_pm <file>     Output: ground-truth picked (centroided) MS data (valid formats: 'mzML')
  -out_fm <file>     Output: ground-truth features (valid formats: 'featureXML')
  -out_cm <file>     Output: ground-truth features, grouping ESI charge variants of each parent peptide (vali
                     d formats: 'consensusXML')
  -out_lcm <file>    Output: ground-truth features, grouping labeled variants (valid formats: 'consensusXML')
  -out_cntm <file>   Output: ground-truth features caused by contaminants (valid formats: 'featureXML')
  -out_id <file>     Output: ground-truth MS2 peptide identifications (valid formats: 'idXML')
                     
Common UTIL options:
  -ini <file>        Use the given TOPP INI file
  -threads <n>       Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>  Writes the default configuration file
  --help             Shows options
  --helphelp         Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   Algorithm parameters section

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
Have a look at the OpenMS documentation for more information.

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+MSSimulatorA highly configurable simulator for mass spectrometry experiments.
version2.3.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'MSSimulator'
in[] Input protein sequencesinput file*.FASTA
out output: simulated MS raw (profile) dataoutput file*.mzML
out_pm output: ground-truth picked (centroided) MS dataoutput file*.mzML
out_fm output: ground-truth featuresoutput file*.featureXML
out_cm output: ground-truth features, grouping ESI charge variants of each parent peptideoutput file*.consensusXML
out_lcm output: ground-truth features, grouping labeled variantsoutput file*.consensusXML
out_cntm output: ground-truth features caused by contaminantsoutput file*.featureXML
out_id output: ground-truth MS2 peptide identificationsoutput file*.idXML
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overwrite tool specific checks.true,false
testfalse Enables the test mode (needed for internal use only)true,false
+++algorithmAlgorithm parameters section
++++MSSim
+++++Digestion
enzymeTrypsin Enzyme to use for digestion (select 'no cleavage' to skip digestion)Chymotrypsin,Asp-N,PepsinA,Lys-C/P,V8-E,Arg-C,Trypsin/P,Arg-C/P,CNBr,Lys-N,Formic_acid,glutamyl endopeptidase,2-iodobenzoate,no cleavage,leukocyte elastase,proline endopeptidase,Alpha-lytic protease,Asp-N/B,V8-DE,Asp-N_ambic,Chymotrypsin/P,Lys-C,unspecific cleavage,TrypChymo,Trypsin
modelnaive The cleavage model to use for digestion. 'Trained' is based on a log likelihood model (see DOI:10.1021/pr060507u).trained,naive
min_peptide_length3 Minimum peptide length after digestion (shorter ones will be discarded)1:∞
++++++model_trained
threshold0.5 Model threshold for calling a cleavage. Higher values increase the number of cleavages. -2 will give no cleavages, +4 almost full cleavage.-2:4
++++++model_naive
missed_cleavages1 Maximum number of missed cleavages considered. All possible resulting peptides will be created.0:∞
+++++RT
rt_columnHPLC Modelling of an RT or CE columnnone,HPLC,CE
auto_scaletrue Scale predicted RT's/MT's to given 'total_gradient_time'? If 'true', for CE this means that 'CE:lenght_d', 'CE:length_total', 'CE:voltage' have no influence.true,false
total_gradient_time2500 The duration [s] of the gradient.1e-05:∞
sampling_rate2 Time interval [s] between consecutive scans0.01:60
++++++scan_window
min500 Start of RT Scan Window [s]0:∞
max1500 End of RT Scan Window [s]1:∞
++++++variationRandom component that simulates technical/biological variation
feature_stddev3 Standard deviation of shift in retention time [s] from predicted model (applied to every single feature independently)
affine_offset0 Global offset in retention time [s] from predicted model
affine_scale1 Global scaling in retention time from predicted model
++++++column_condition
distortion0 Distortion of the elution profiles. Good presets are 0 for a perfect elution profile, 1 for a slightly distorted elution profile etc... For trapping instruments (e.g. Orbitrap) distortion should be >4.0:10
++++++profile_shape
+++++++widthWidth of the EGH elution shape, i.e. the sigma^2 parameter, which is computed using 'value' + rnd_cauchy('variance')
value9 Width of the Exponential Gaussian Hybrid distribution shape of the elution profile. This does not correspond directly to the width in [s].0:∞
variance1.6 Random component of the width (set to 0 to disable randomness), i.e. scale parameter for the lorentzian variation of the variance (Note: The scale parameter has to be >= 0).0:∞
+++++++skewnessSkewness of the EGH elution shape, i.e. the tau parameter, which is computed using 'value' + rnd_cauchy('variance')
value0.1 Asymmetric component of the EGH. Higher absolute(!) values lead to more skewness (negative values cause fronting, positive values cause tailing). Tau parameter of the EGH, i.e. time constant of the exponential decay of the Exponential Gaussian Hybrid distribution shape of the elution profile.
variance0.3 Random component of skewness (set to 0 to disable randomness), i.e. scale parameter for the lorentzian variation of the time constant (Note: The scale parameter has to be > 0).0:∞
++++++HPLC
model_fileexamples/simulation/RTPredict.model SVM model for retention time prediction
++++++CE
pH3 pH of buffer0:14
alpha0.5 Exponent Alpha used to calculate mobility0:1
mu_eo0 Electroosmotic flow0:5
lenght_d70 Length of capillary [cm] from injection site to MS0:1000
length_total75 Total length of capillary [cm]0:1000
voltage1000 Voltage applied to capillary0:∞
+++++Detectability
dt_simulation_onfalse Modelling detectibility enabled? This can serve as a filter to remove peptides which ionize badly, thus reducing peptide counttrue,false
min_detect0.5 Minimum peptide detectability accepted. Peptides with a lower score will be removed
dt_model_fileexamples/simulation/DTPredict.model SVM model for peptide detectability prediction
+++++Ionization
++++++esi
ionized_residues[Arg, Lys, His] List of residues (as three letter code) that will be considered during ES ionization. The N-term is always assumed to carry a charge. This parameter will be ignored during MALDI ionization.Ala,Cys,Asp,Glu,Phe,Gly,His,Ile,Lys,Leu,Met,Asn,Pro,Gln,Arg,Sec,Ser,Thr,Val,Trp,Tyr
charge_impurity[H+:1] List of charged ions that contribute to charge with weight of occurrence (their sum is scaled to 1 internally), e.g. ['H:1'] or ['H:0.7' 'Na:0.3'], ['H:4' 'Na:1'] (which internally translates to ['H:0.8' 'Na:0.2'])
max_impurity_set_size3 Maximal #combinations of charge impurities allowed (each generating one feature) per charge state. E.g. assuming charge=3 and this parameter is 2, then we could choose to allow '3H+, 2H+Na+' features (given a certain 'charge_impurity' constraints), but no '3H+, 2H+Na+, 3Na+'
ionization_probability0.8 Probability for the binomial distribution of the ESI charge states
++++++maldi
ionization_probabilities[0.9, 0.1] List of probabilities for the different charge states during MALDI ionization (the list must sum up to 1.0)
++++++mz
lower_measurement_limit200 Lower m/z detector limit.0:∞
upper_measurement_limit2500 Upper m/z detector limit.0:∞
+++++RawSignal
enabledtrue Enable RAW signal simulation? (select 'false' if you only need feature-maps)true,false
peak_shapeGaussian Peak Shape used around each isotope peak (be aware that the area under the curve is constant for both types, but the maximal height will differ (~ 2:3 = Lorentz:Gaussian) due to the wider base of the Lorentzian.Gaussian,Lorentzian
++++++resolution
value50000 Instrument resolution at 400 Th.
typelinear How does resolution change with increasing m/z?! QTOFs usually show 'constant' behavior, FTs have linear degradation, and on Orbitraps the resolution decreases with square root of mass.constant,linear,sqrt
++++++baselineBaseline modeling for MALDI ionization
scaling0 Scale of baseline. Set to 0 to disable simulation of baseline.0:∞
shape0.5 The baseline is modeled by an exponential probability density function (pdf) with f(x) = shape*e^(- shape*x)0:∞
++++++mz
sampling_points3 Number of raw data points per FWHM of the peak.2:∞
++++++contaminants
fileexamples/simulation/contaminants.csv Contaminants file with sum formula and absolute RT interval. See 'OpenMS/examples/simulation/contaminants.txt' for details.
++++++variationRandom components that simulate biological and technical variations of the simulated data.
+++++++mzShifts in mass to charge dimension of the simulated signals.
error_stddev0 Standard deviation for m/z errors. Set to 0 to disable simulation of m/z errors.
error_mean0 Average systematic m/z error (Da)
+++++++intensityVariations in intensity to model randomness in feature intensity.
scale100 Constant scale factor of the feature intensity. Set to 1.0 to get the real intensity values provided in the FASTA file.0:∞
scale_stddev0 Standard deviation of peak intensity (relative to the scaled peak height). Set to 0 to get simple rescaled intensities.0:∞
++++++noiseParameters modeling noise in mass spectrometry measurements.
+++++++shotParameters of Poisson and Exponential for shot noise modeling (set :rate OR :mean = 0 to disable).
rate0 Poisson rate of shot noise per unit m/z. Set this to 0 to disable simulation of shot noise.0:∞
intensity-mean1 Shot noise intensity mean (exponentially distributed with given mean).
+++++++whiteParameters of Gaussian distribution for white noise modeling (set :mean AND :stddev = 0 to disable).
mean0 Mean value of white noise being added to each measured signal.
stddev0 Standard deviation of white noise being added to each measured signal.
+++++++detectorParameters of Gaussian distribution for detector noise modeling (set :mean AND :stddev = 0 to disable).
mean0 Mean value of the detector noise being added to the complete measurement.
stddev0 Standard deviation of the detector noise being added to the complete measurement.
+++++RawTandemSignal
statusdisabled Create Tandem-MS scans?disabled,precursor,MS^E
tandem_mode0 Algorithm to generate the tandem-MS spectra. 0 - fixed intensities, 1 - SVC prediction (abundant/missing), 2 - SVR prediction of peak intensity
0:2
svm_model_set_fileexamples/simulation/SvmModelSet.model File containing the filenames of SVM Models for different charge variants
++++++Precursor
ms2_spectra_per_rt_bin5 Number of allowed MS/MS spectra in a retention time bin.1:∞
min_mz_peak_distance2 The minimal distance (in Th) between two peaks for concurrent selection for fragmentation. Also used to define the m/z width of an exclusion window (distance +/- from m/z of precursor). If you set this lower than the isotopic envelope of a peptide, you might get multiple fragment spectra pointing to the same precursor.0.0001:∞
mz_isolation_window2 All peaks within a mass window (in Th) of a selected peak are also selected for fragmentation.0:∞
exclude_overlapping_peaksfalse If true, overlapping or nearby peaks (within 'min_mz_peak_distance') are excluded for selection.true,false
charge_filter[2, 3] Charges considered for MS2 fragmentation.1:5
+++++++Exclusion
use_dynamic_exclusionfalse If true dynamic exclusion is applied.true,false
exclusion_time100 The time (in seconds) a feature is excluded.0:∞
+++++++ProteinBasedInclusion
max_list_size1000 The maximal number of precursors in the inclusion list.1:∞
++++++++rt
min_rt960 Minimal rt in seconds.0:∞
max_rt3840 Maximal rt in seconds.0:∞
rt_step_size30 rt step size in seconds.1:∞
rt_window_size100 rt window size in seconds.1:∞
++++++++thresholds
min_protein_id_probability0.95 Minimal protein probability for a protein to be considered identified.0:1
min_pt_weight0.5 Minimal pt weight of a precursor0:1
min_mz500 Minimal mz to be considered in protein based LP formulation.0:∞
max_mz5000 Minimal mz to be considered in protein based LP formulation.0:∞
use_peptide_rulefalse Use peptide rule instead of minimal protein id probabilitytrue,false
min_peptide_ids2 If use_peptide_rule is true, this parameter sets the minimal number of peptide ids for a protein id1:∞
min_peptide_probability0.95 If use_peptide_rule is true, this parameter sets the minimal probability for a peptide to be safely identified0:1
++++++MS_E
add_single_spectrafalse If true, the MS2 spectra for each peptide signal are included in the output (might be a lot). They will have a meta value 'MSE_DebugSpectrum' attached, so they can be filtered out. Native MS_E spectra will have 'MSE_Spectrum' instead.true,false
++++++TandemSim
+++++++Simple
add_isotopesfalse If set to 1 isotope peaks of the product ion peaks are addedtrue,false
max_isotope2 Defines the maximal isotopic peak which is added, add_isotopes must be set to 1
add_metainfofalse Adds the type of peaks as metainfo to the peaks, like y8+, [M-H2O+2H]++true,false
add_lossesfalse Adds common losses to those ion expect to have them, only water and ammonia loss is consideredtrue,false
add_precursor_peaksfalse Adds peaks of the precursor to the spectrum, which happen to occur sometimestrue,false
add_all_precursor_chargesfalse Adds precursor peaks with all charges in the given rangetrue,false
add_abundant_immonium_ionsfalse Add most abundant immonium ionstrue,false
add_first_prefix_ionfalse If set to true e.g. b1 ions are addedtrue,false
add_y_ionstrue Add peaks of y-ions to the spectrumtrue,false
add_b_ionstrue Add peaks of b-ions to the spectrumtrue,false
add_a_ionsfalse Add peaks of a-ions to the spectrumtrue,false
add_c_ionsfalse Add peaks of c-ions to the spectrumtrue,false
add_x_ionsfalse Add peaks of x-ions to the spectrumtrue,false
add_z_ionsfalse Add peaks of z-ions to the spectrumtrue,false
y_intensity1 Intensity of the y-ions
b_intensity1 Intensity of the b-ions
a_intensity1 Intensity of the a-ions
c_intensity1 Intensity of the c-ions
x_intensity1 Intensity of the x-ions
z_intensity1 Intensity of the z-ions
relative_loss_intensity0.1 Intensity of loss ions, in relation to the intact ion intensity
precursor_intensity1 Intensity of the precursor peak
precursor_H2O_intensity1 Intensity of the H2O loss peak of the precursor
precursor_NH3_intensity1 Intensity of the NH3 loss peak of the precursor
+++++++SVM
add_isotopesfalse If set to 1 isotope peaks of the product ion peaks are addedtrue,false
max_isotope2 Defines the maximal isotopic peak which is added, add_isotopes must be set to 1
add_metainfofalse Adds the type of peaks as metainfo to the peaks, like y8+, [M-H2O+2H]++true,false
add_first_prefix_ionfalse If set to true e.g. b1 ions are addedtrue,false
hide_y_ionsfalse Add peaks of y-ions to the spectrumtrue,false
hide_y2_ionsfalse Add peaks of y-ions to the spectrumtrue,false
hide_b_ionsfalse Add peaks of b-ions to the spectrumtrue,false
hide_b2_ionsfalse Add peaks of b-ions to the spectrumtrue,false
hide_a_ionsfalse Add peaks of a-ions to the spectrumtrue,false
hide_c_ionsfalse Add peaks of c-ions to the spectrumtrue,false
hide_x_ionsfalse Add peaks of x-ions to the spectrumtrue,false
hide_z_ionsfalse Add peaks of z-ions to the spectrumtrue,false
hide_lossesfalse Adds common losses to those ion expect to have them, only water and ammonia loss is consideredtrue,false
y_intensity1 Intensity of the y-ions
b_intensity1 Intensity of the b-ions
a_intensity1 Intensity of the a-ions
c_intensity1 Intensity of the c-ions
x_intensity1 Intensity of the x-ions
z_intensity1 Intensity of the z-ions
relative_loss_intensity0.1 Intensity of loss ions, in relation to the intact ion intensity
+++++Global
ionization_typeESI Type of Ionization (MALDI or ESI)MALDI,ESI
+++++Labeling
typelabelfree Select the labeling type you want for your experimentICPL,SILAC,itraq,labelfree,o18
++++++ICPLICPL labeling on MS1 level of lysines and n-term (on protein or peptide level) with either two or three channels.
ICPL_fixed_rtshift0 Fixed retention time shift between labeled pairs. If set to 0.0 only the retention times, computed by the RT model step are used.
label_proteinstrue Enables protein-labeling. (select 'false' if you only need peptide-labeling)true,false
ICPL_light_channel_labelUniMod:365 UniMod Id of the light channel ICPL label.
ICPL_medium_channel_labelUniMod:687 UniMod Id of the medium channel ICPL label.
ICPL_heavy_channel_labelUniMod:364 UniMod Id of the heavy channel ICPL label.
++++++SILACSILAC labeling on MS1 level with up to 3 channels and custom modifications.
fixed_rtshift0.0001 Fixed retention time shift between labeled peptides. If set to 0.0 only the retention times computed by the RT model step are used.0:∞
+++++++medium_channelModifications for the medium SILAC channel.
modification_lysineUniMod:481 Modification of Lysine in the medium SILAC channel
modification_arginineUniMod:188 Modification of Arginine in the medium SILAC channel
+++++++heavy_channelModifications for the heavy SILAC channel. If you want to use only 2 channels, just leave the Labels as they are and provide only 2 input files.
modification_lysineUniMod:259 Modification of Lysine in the heavy SILAC channel. If left empty, two channelSILAC is assumed.
modification_arginineUniMod:267 Modification of Arginine in the heavy SILAC channel. If left empty, two-channel SILAC is assumed.
++++++itraqiTRAQ labeling on MS2 level with up to 4 (4plex) or 8 (8plex) channels.
iTRAQ4plex 4plex or 8plex iTRAQ?4plex,8plex
reporter_mass_shift0.1 Allowed shift (uniformly distributed - left to right) in Da from the expected position (of e.g. 114.1, 115.1)0:0.5
channel_active_4plex[114:myReference] Four-plex only: Each channel that was used in the experiment and its description (114-117) in format :, e.g. "114:myref","115:liver".
channel_active_8plex[113:myReference] Eight-plex only: Each channel that was used in the experiment and its description (113-121) in format :, e.g. "113:myref","115:liver","118:lung".
isotope_correction_values_4plex[114:0/1/5.9/0.2, 115:0/2/5.6/0.1, 116:0/3/4.5/0.1, 117:0.1/4/3.5/0.1] override default values (see Documentation); use the following format: :<-2Da>/<-1Da>/<+1Da>/<+2Da> ; e.g. '114:0/0.3/4/0' , '116:0.1/0.3/3/0.2'
isotope_correction_values_8plex[113:0/0/6.89/0.22, 114:0/0.94/5.9/0.16, 115:0/1.88/4.9/0.1, 116:0/2.82/3.9/0.07, 117:0.06/3.77/2.99/0, 118:0.09/4.71/1.88/0, 119:0.14/5.66/0.87/0, 121:0.27/7.44/0.18/0] override default values (see Documentation); use the following format: :<-2Da>/<-1Da>/<+1Da>/<+2Da> ; e.g. '113:0/0.3/4/0' , '116:0.1/0.3/3/0.2'
Y_contamination0.3 Efficiency of labeling tyrosine ('Y') residues. 0=off, 1=full labeling0:1
++++++o1818O labeling on MS1 level with 2 channels, requiring trypsin digestion.
labeling_efficiency1 Describes the distribution of the labeled peptide over the different states (unlabeled, mono- and di-labeled)0:1
++++RandomNumberGeneratorsParameters for generating the random aspects (e.g. noise) in the simulated data. The generation is separated into two parts, the technical part, like noise in the raw signal, and the biological part, like systematic deviations in the predicted retention times.
biologicalrandom Controls the 'biological' randomness of the generated data (e.g. systematic effects like deviations in RT). If set to 'random' each experiment will look different. If set to 'reproducible' each experiment will have the same outcome (given that the input data is the same).reproducible,random
technicalrandom Controls the 'technical' randomness of the generated data (e.g. noise in the raw signal). If set to 'random' each experiment will look different. If set to 'reproducible' each experiment will have the same outcome (given that the input data is the same).reproducible,random

OpenMS / TOPP release 2.3.0 Documentation generated on Tue Jan 9 2018 18:22:06 using doxygen 1.8.13