A highly configurable simulator for mass spectrometry experiments.
Look at the INI file (via "MSSimulator -write_ini myini.ini") to see the available parameters and more functionality.
Protein sequences (including amino acid modifications) can be provided as FASTA file. We allow a special tag in the description of each entry to specify protein abundance. If you want to create a complex FASTA file with a Gaussian protein abundance model in log space, see our Python script shipping with your OpenMS installation (e.g., <OpenMS-dir>/share/OpenMS/examples/simulation/FASTAProteinAbundanceSampling.py). It supports (random) sampling from a large FASTA file, protein weight filtering and adds an intensity tag to each entry.
If multiplexed data is simulated (like SILAC or iTRAQ) you need to supply multiple FASTA input files. For the label-free setting, all FASTA input files will be merged into one, before simulation.
To specify intensity values for certain proteins, add an abundance tag for the corresponding protein in the FASTA input file:
For amino acid modifications, insert their name at the respective amino acid residues. The modifications are fixed. If you need variable modifications, you have to add the desired combinatorial variants (presence/absence of one or all modifications) to the FASTA file. Valid modification names are listed in many TOPP/UTILS, e.g MSGFPlusAdapter 's -fixed_modifications parameter.
e.g.
Legend:
required parameter
advanced parameter
+MSSimulatorA highly configurable simulator for mass spectrometry experiments.
version2.4.0
Version of the tool that generated this parameters file.
++1Instance '1' section for 'MSSimulator'
in[]
Input protein sequencesinput file*.FASTA
out
output: simulated MS raw (profile) dataoutput file*.mzML
out_pm
output: ground-truth picked (centroided) MS dataoutput file*.mzML
out_fm
output: ground-truth featuresoutput file*.featureXML
out_cm
output: ground-truth features, grouping ESI charge variants of each parent peptideoutput file*.consensusXML
out_lcm
output: ground-truth features, grouping labeled variantsoutput file*.consensusXML
out_cntm
output: ground-truth features caused by contaminantsoutput file*.featureXML
out_id
output: ground-truth MS2 peptide identificationsoutput file*.idXML
log
Name of log file (created only when specified)
debug0
Sets the debug level
threads1
Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse
Disables progress logging to command linetrue,false
forcefalse
Overwrite tool specific checks.true,false
testfalse
Enables the test mode (needed for internal use only)true,false
+++algorithmAlgorithm parameters section
++++MSSim
+++++Digestion
enzymeTrypsin
Enzyme to use for digestion (select 'no cleavage' to skip digestion)Arg-C,Trypsin,Asp-N_ambic,TrypChymo,Trypsin/P,V8-DE,V8-E,proline endopeptidase,leukocyte elastase,Alpha-lytic protease,glutamyl endopeptidase,2-iodobenzoate,iodosobenzoate,staphylococcal protease/D,proline-endopeptidase/HKR,Glu-C+P,PepsinA + P,cyanogen-bromide,Clostripain/P,elastase-trypsin-chymotrypsin,Chymotrypsin,no cleavage,unspecific cleavage,Arg-C/P,Asp-N,CNBr,Formic_acid,Lys-C,Lys-N,Lys-C/P,PepsinA,Asp-N/B,Chymotrypsin/P
modelnaive
The cleavage model to use for digestion. 'Trained' is based on a log likelihood model (see DOI:10.1021/pr060507u).trained,naive
min_peptide_length3
Minimum peptide length after digestion (shorter ones will be discarded)1:∞
++++++model_trained
threshold0.5
Model threshold for calling a cleavage. Higher values increase the number of cleavages. -2 will give no cleavages, +4 almost full cleavage.-2:4
++++++model_naive
missed_cleavages1
Maximum number of missed cleavages considered. All possible resulting peptides will be created.0:∞
+++++RT
rt_columnHPLC
Modelling of an RT or CE columnnone,HPLC,CE
auto_scaletrue
Scale predicted RT's/MT's to given 'total_gradient_time'? If 'true', for CE this means that 'CE:lenght_d', 'CE:length_total', 'CE:voltage' have no influence.true,false
total_gradient_time2500
The duration [s] of the gradient.1e-05:∞
sampling_rate2
Time interval [s] between consecutive scans0.01:60
++++++scan_window
min500
Start of RT Scan Window [s]0:∞
max1500
End of RT Scan Window [s]1:∞
++++++variationRandom component that simulates technical/biological variation
feature_stddev3
Standard deviation of shift in retention time [s] from predicted model (applied to every single feature independently)
affine_offset0
Global offset in retention time [s] from predicted model
affine_scale1
Global scaling in retention time from predicted model
++++++column_condition
distortion0
Distortion of the elution profiles. Good presets are 0 for a perfect elution profile, 1 for a slightly distorted elution profile etc... For trapping instruments (e.g. Orbitrap) distortion should be >4.0:10
++++++profile_shape
+++++++widthWidth of the EGH elution shape, i.e. the sigma^2 parameter, which is computed using 'value' + rnd_cauchy('variance')
value9
Width of the Exponential Gaussian Hybrid distribution shape of the elution profile. This does not correspond directly to the width in [s].0:∞
variance1.6
Random component of the width (set to 0 to disable randomness), i.e. scale parameter for the lorentzian variation of the variance (Note: The scale parameter has to be >= 0).0:∞
+++++++skewnessSkewness of the EGH elution shape, i.e. the tau parameter, which is computed using 'value' + rnd_cauchy('variance')
value0.1
Asymmetric component of the EGH. Higher absolute(!) values lead to more skewness (negative values cause fronting, positive values cause tailing). Tau parameter of the EGH, i.e. time constant of the exponential decay of the Exponential Gaussian Hybrid distribution shape of the elution profile.
variance0.3
Random component of skewness (set to 0 to disable randomness), i.e. scale parameter for the lorentzian variation of the time constant (Note: The scale parameter has to be > 0).0:∞
++++++HPLC
model_fileexamples/simulation/RTPredict.model
SVM model for retention time prediction
++++++CE
pH3
pH of buffer0:14
alpha0.5
Exponent Alpha used to calculate mobility0:1
mu_eo0
Electroosmotic flow0:5
lenght_d70
Length of capillary [cm] from injection site to MS0:1000
length_total75
Total length of capillary [cm]0:1000
voltage1000
Voltage applied to capillary0:∞
+++++Detectability
dt_simulation_onfalse
Modelling detectibility enabled? This can serve as a filter to remove peptides which ionize badly, thus reducing peptide counttrue,false
min_detect0.5
Minimum peptide detectability accepted. Peptides with a lower score will be removed
dt_model_fileexamples/simulation/DTPredict.model
SVM model for peptide detectability prediction
+++++Ionization
++++++esi
ionized_residues[Arg, Lys, His]
List of residues (as three letter code) that will be considered during ES ionization. The N-term is always assumed to carry a charge. This parameter will be ignored during MALDI ionization.Ala,Cys,Asp,Glu,Phe,Gly,His,Ile,Lys,Leu,Met,Asn,Pro,Gln,Arg,Sec,Ser,Thr,Val,Trp,Tyr
charge_impurity[H+:1]
List of charged ions that contribute to charge with weight of occurrence (their sum is scaled to 1 internally), e.g. ['H:1'] or ['H:0.7' 'Na:0.3'], ['H:4' 'Na:1'] (which internally translates to ['H:0.8' 'Na:0.2'])
max_impurity_set_size3
Maximal #combinations of charge impurities allowed (each generating one feature) per charge state. E.g. assuming charge=3 and this parameter is 2, then we could choose to allow '3H+, 2H+Na+' features (given a certain 'charge_impurity' constraints), but no '3H+, 2H+Na+, 3Na+'
ionization_probability0.8
Probability for the binomial distribution of the ESI charge states
++++++maldi
ionization_probabilities[0.9, 0.1]
List of probabilities for the different charge states during MALDI ionization (the list must sum up to 1.0)
++++++mz
lower_measurement_limit200
Lower m/z detector limit.0:∞
upper_measurement_limit2500
Upper m/z detector limit.0:∞
+++++RawSignal
enabledtrue
Enable RAW signal simulation? (select 'false' if you only need feature-maps)true,false
peak_shapeGaussian
Peak Shape used around each isotope peak (be aware that the area under the curve is constant for both types, but the maximal height will differ (~ 2:3 = Lorentz:Gaussian) due to the wider base of the Lorentzian.Gaussian,Lorentzian
++++++resolution
value50000
Instrument resolution at 400 Th.
typelinear
How does resolution change with increasing m/z?! QTOFs usually show 'constant' behavior, FTs have linear degradation, and on Orbitraps the resolution decreases with square root of mass.constant,linear,sqrt
++++++baselineBaseline modeling for MALDI ionization
scaling0
Scale of baseline. Set to 0 to disable simulation of baseline.0:∞
shape0.5
The baseline is modeled by an exponential probability density function (pdf) with f(x) = shape*e^(- shape*x)0:∞
++++++mz
sampling_points3
Number of raw data points per FWHM of the peak.2:∞
++++++contaminants
fileexamples/simulation/contaminants.csv
Contaminants file with sum formula and absolute RT interval. See 'OpenMS/examples/simulation/contaminants.txt' for details.
++++++variationRandom components that simulate biological and technical variations of the simulated data.
+++++++mzShifts in mass to charge dimension of the simulated signals.
error_stddev0
Standard deviation for m/z errors. Set to 0 to disable simulation of m/z errors.
error_mean0
Average systematic m/z error (Da)
+++++++intensityVariations in intensity to model randomness in feature intensity.
scale100
Constant scale factor of the feature intensity. Set to 1.0 to get the real intensity values provided in the FASTA file.0:∞
scale_stddev0
Standard deviation of peak intensity (relative to the scaled peak height). Set to 0 to get simple rescaled intensities.0:∞
++++++noiseParameters modeling noise in mass spectrometry measurements.
+++++++shotParameters of Poisson and Exponential for shot noise modeling (set :rate OR :mean = 0 to disable).
rate0
Poisson rate of shot noise per unit m/z. Set this to 0 to disable simulation of shot noise.0:∞
intensity-mean1
Shot noise intensity mean (exponentially distributed with given mean).
+++++++whiteParameters of Gaussian distribution for white noise modeling (set :mean AND :stddev = 0 to disable).
mean0
Mean value of white noise being added to each measured signal.
stddev0
Standard deviation of white noise being added to each measured signal.
+++++++detectorParameters of Gaussian distribution for detector noise modeling (set :mean AND :stddev = 0 to disable).
mean0
Mean value of the detector noise being added to the complete measurement.
stddev0
Standard deviation of the detector noise being added to the complete measurement.
+++++RawTandemSignal
statusdisabled
Create Tandem-MS scans?disabled,precursor,MS^E
tandem_mode0
Algorithm to generate the tandem-MS spectra. 0 - fixed intensities, 1 - SVC prediction (abundant/missing), 2 - SVR prediction of peak intensity
0:2
svm_model_set_fileexamples/simulation/SvmModelSet.model
File containing the filenames of SVM Models for different charge variants
++++++Precursor
ms2_spectra_per_rt_bin5
Number of allowed MS/MS spectra in a retention time bin.1:∞
min_mz_peak_distance2
The minimal distance (in Th) between two peaks for concurrent selection for fragmentation. Also used to define the m/z width of an exclusion window (distance +/- from m/z of precursor). If you set this lower than the isotopic envelope of a peptide, you might get multiple fragment spectra pointing to the same precursor.0.0001:∞
mz_isolation_window2
All peaks within a mass window (in Th) of a selected peak are also selected for fragmentation.0:∞
exclude_overlapping_peaksfalse
If true, overlapping or nearby peaks (within 'min_mz_peak_distance') are excluded for selection.true,false
charge_filter[2, 3]
Charges considered for MS2 fragmentation.1:5
+++++++Exclusion
use_dynamic_exclusionfalse
If true dynamic exclusion is applied.true,false
exclusion_time100
The time (in seconds) a feature is excluded.0:∞
+++++++ProteinBasedInclusion
max_list_size1000
The maximal number of precursors in the inclusion list.1:∞
++++++++rt
min_rt960
Minimal rt in seconds.0:∞
max_rt3840
Maximal rt in seconds.0:∞
rt_step_size30
rt step size in seconds.1:∞
rt_window_size100
rt window size in seconds.1:∞
++++++++thresholds
min_protein_id_probability0.95
Minimal protein probability for a protein to be considered identified.0:1
min_pt_weight0.5
Minimal pt weight of a precursor0:1
min_mz500
Minimal mz to be considered in protein based LP formulation.0:∞
max_mz5000
Minimal mz to be considered in protein based LP formulation.0:∞
use_peptide_rulefalse
Use peptide rule instead of minimal protein id probabilitytrue,false
min_peptide_ids2
If use_peptide_rule is true, this parameter sets the minimal number of peptide ids for a protein id1:∞
min_peptide_probability0.95
If use_peptide_rule is true, this parameter sets the minimal probability for a peptide to be safely identified0:1
++++++MS_E
add_single_spectrafalse
If true, the MS2 spectra for each peptide signal are included in the output (might be a lot). They will have a meta value 'MSE_DebugSpectrum' attached, so they can be filtered out. Native MS_E spectra will have 'MSE_Spectrum' instead.true,false
++++++TandemSim
+++++++Simple
add_isotopesfalse
If set to 1 isotope peaks of the product ion peaks are addedtrue,false
max_isotope2
Defines the maximal isotopic peak which is added, add_isotopes must be set to 1
add_metainfofalse
Adds the type of peaks as metainfo to the peaks, like y8+, [M-H2O+2H]++true,false
add_lossesfalse
Adds common losses to those ion expect to have them, only water and ammonia loss is consideredtrue,false
add_precursor_peaksfalse
Adds peaks of the precursor to the spectrum, which happen to occur sometimestrue,false
add_all_precursor_chargesfalse
Adds precursor peaks with all charges in the given rangetrue,false
add_abundant_immonium_ionsfalse
Add most abundant immonium ionstrue,false
add_first_prefix_ionfalse
If set to true e.g. b1 ions are addedtrue,false
add_y_ionstrue
Add peaks of y-ions to the spectrumtrue,false
add_b_ionstrue
Add peaks of b-ions to the spectrumtrue,false
add_a_ionsfalse
Add peaks of a-ions to the spectrumtrue,false
add_c_ionsfalse
Add peaks of c-ions to the spectrumtrue,false
add_x_ionsfalse
Add peaks of x-ions to the spectrumtrue,false
add_z_ionsfalse
Add peaks of z-ions to the spectrumtrue,false
y_intensity1
Intensity of the y-ions
b_intensity1
Intensity of the b-ions
a_intensity1
Intensity of the a-ions
c_intensity1
Intensity of the c-ions
x_intensity1
Intensity of the x-ions
z_intensity1
Intensity of the z-ions
relative_loss_intensity0.1
Intensity of loss ions, in relation to the intact ion intensity
precursor_intensity1
Intensity of the precursor peak
precursor_H2O_intensity1
Intensity of the H2O loss peak of the precursor
precursor_NH3_intensity1
Intensity of the NH3 loss peak of the precursor
+++++++SVM
add_isotopesfalse
If set to 1 isotope peaks of the product ion peaks are addedtrue,false
max_isotope2
Defines the maximal isotopic peak which is added, add_isotopes must be set to 1
add_metainfofalse
Adds the type of peaks as metainfo to the peaks, like y8+, [M-H2O+2H]++true,false
add_first_prefix_ionfalse
If set to true e.g. b1 ions are addedtrue,false
hide_y_ionsfalse
Add peaks of y-ions to the spectrumtrue,false
hide_y2_ionsfalse
Add peaks of y-ions to the spectrumtrue,false
hide_b_ionsfalse
Add peaks of b-ions to the spectrumtrue,false
hide_b2_ionsfalse
Add peaks of b-ions to the spectrumtrue,false
hide_a_ionsfalse
Add peaks of a-ions to the spectrumtrue,false
hide_c_ionsfalse
Add peaks of c-ions to the spectrumtrue,false
hide_x_ionsfalse
Add peaks of x-ions to the spectrumtrue,false
hide_z_ionsfalse
Add peaks of z-ions to the spectrumtrue,false
hide_lossesfalse
Adds common losses to those ion expect to have them, only water and ammonia loss is consideredtrue,false
y_intensity1
Intensity of the y-ions
b_intensity1
Intensity of the b-ions
a_intensity1
Intensity of the a-ions
c_intensity1
Intensity of the c-ions
x_intensity1
Intensity of the x-ions
z_intensity1
Intensity of the z-ions
relative_loss_intensity0.1
Intensity of loss ions, in relation to the intact ion intensity
+++++Global
ionization_typeESI
Type of Ionization (MALDI or ESI)MALDI,ESI
+++++Labeling
typelabelfree
Select the labeling type you want for your experimentICPL,SILAC,itraq,labelfree,o18
++++++ICPLICPL labeling on MS1 level of lysines and n-term (on protein or peptide level) with either two or three channels.
ICPL_fixed_rtshift0
Fixed retention time shift between labeled pairs. If set to 0.0 only the retention times, computed by the RT model step are used.
label_proteinstrue
Enables protein-labeling. (select 'false' if you only need peptide-labeling)true,false
ICPL_light_channel_labelUniMod:365
UniMod Id of the light channel ICPL label.
ICPL_medium_channel_labelUniMod:687
UniMod Id of the medium channel ICPL label.
ICPL_heavy_channel_labelUniMod:364
UniMod Id of the heavy channel ICPL label.
++++++SILACSILAC labeling on MS1 level with up to 3 channels and custom modifications.
fixed_rtshift0.0001
Fixed retention time shift between labeled peptides. If set to 0.0 only the retention times computed by the RT model step are used.0:∞
+++++++medium_channelModifications for the medium SILAC channel.
modification_lysineUniMod:481
Modification of Lysine in the medium SILAC channel
modification_arginineUniMod:188
Modification of Arginine in the medium SILAC channel
+++++++heavy_channelModifications for the heavy SILAC channel. If you want to use only 2 channels, just leave the Labels as they are and provide only 2 input files.
modification_lysineUniMod:259
Modification of Lysine in the heavy SILAC channel. If left empty, two channelSILAC is assumed.
modification_arginineUniMod:267
Modification of Arginine in the heavy SILAC channel. If left empty, two-channel SILAC is assumed.
++++++itraqiTRAQ labeling on MS2 level with up to 4 (4plex) or 8 (8plex) channels.
iTRAQ4plex
4plex or 8plex iTRAQ?4plex,8plex
reporter_mass_shift0.1
Allowed shift (uniformly distributed - left to right) in Da from the expected position (of e.g. 114.1, 115.1)0:0.5
channel_active_4plex[114:myReference]
Four-plex only: Each channel that was used in the experiment and its description (114-117) in format :, e.g. "114:myref","115:liver".
channel_active_8plex[113:myReference]
Eight-plex only: Each channel that was used in the experiment and its description (113-121) in format :, e.g. "113:myref","115:liver","118:lung".
isotope_correction_values_4plex[114:0/1/5.9/0.2, 115:0/2/5.6/0.1, 116:0/3/4.5/0.1, 117:0.1/4/3.5/0.1]
override default values (see Documentation); use the following format: :<-2Da>/<-1Da>/<+1Da>/<+2Da> ; e.g. '114:0/0.3/4/0' , '116:0.1/0.3/3/0.2'
isotope_correction_values_8plex[113:0/0/6.89/0.22, 114:0/0.94/5.9/0.16, 115:0/1.88/4.9/0.1, 116:0/2.82/3.9/0.07, 117:0.06/3.77/2.99/0, 118:0.09/4.71/1.88/0, 119:0.14/5.66/0.87/0, 121:0.27/7.44/0.18/0]
override default values (see Documentation); use the following format: :<-2Da>/<-1Da>/<+1Da>/<+2Da> ; e.g. '113:0/0.3/4/0' , '116:0.1/0.3/3/0.2'
Y_contamination0.3
Efficiency of labeling tyrosine ('Y') residues. 0=off, 1=full labeling0:1
++++++o1818O labeling on MS1 level with 2 channels, requiring trypsin digestion.
labeling_efficiency1
Describes the distribution of the labeled peptide over the different states (unlabeled, mono- and di-labeled)0:1
++++RandomNumberGeneratorsParameters for generating the random aspects (e.g. noise) in the simulated data. The generation is separated into two parts, the technical part, like noise in the raw signal, and the biological part, like systematic deviations in the predicted retention times.
biologicalrandom
Controls the 'biological' randomness of the generated data (e.g. systematic effects like deviations in RT). If set to 'random' each experiment will look different. If set to 'reproducible' each experiment will have the same outcome (given that the input data is the same).reproducible,random
technicalrandom
Controls the 'technical' randomness of the generated data (e.g. noise in the raw signal). If set to 'random' each experiment will look different. If set to 'reproducible' each experiment will have the same outcome (given that the input data is the same).reproducible,random