A highly configurable simulator for mass spectrometry experiments.
Look at the INI file (via "MSSimulator -write_ini myini.ini") to see the available parameters and more functionality.
Protein sequences (including amino acid modifications) can be provided as FASTA file. We allow a special tag in the description of each entry to specify protein abundance. If you want to create a complex FASTA file with a Gaussian protein abundance model in log space, see our Python script shipping with your OpenMS installation (e.g., <OpenMS-dir>/share/OpenMS/examples/simulation/FASTAProteinAbundanceSampling.py). It supports (random) sampling from a large FASTA file, protein weight filtering and adds an intensity tag to each entry.
If multiplexed data is simulated (like SILAC or iTRAQ) you need to supply multiple FASTA input files. For the label-free setting, all FASTA input files will be merged into one, before simulation.
To specify intensity values for certain proteins, add an abundance tag for the corresponding protein in the FASTA input file:
For amino acid modifications, insert their name at the respective amino acid residues. The modifications are fixed. If you need variable modifications, you have to add the desired combinatorial variants (presence/absence of one or all modifications) to the FASTA file. Valid modification names are listed in many TOPP/UTILS, e.g MSGFPlusAdapter 's -fixed_modifications parameter.
e.g.
Legend:
required parameter
advanced parameter
+MSSimulatorA highly configurable simulator for mass spectrometry experiments.
version2.6.0
Version of the tool that generated this parameters file.
++1Instance '1' section for 'MSSimulator'
in[]
Input protein sequencesinput file*.FASTA
out
output: simulated MS raw (profile) dataoutput file*.mzML
out_pm
output: ground-truth picked (centroided) MS dataoutput file*.mzML
out_fm
output: ground-truth featuresoutput file*.featureXML
out_cm
output: ground-truth features, grouping ESI charge variants of each parent peptideoutput file*.consensusXML
out_lcm
output: ground-truth features, grouping labeled variantsoutput file*.consensusXML
out_cntm
output: ground-truth features caused by contaminantsoutput file*.featureXML
out_id
output: ground-truth MS2 peptide identificationsoutput file*.idXML
log
Name of log file (created only when specified)
debug0
Sets the debug level
threads1
Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse
Disables progress logging to command linetrue,false
forcefalse
Overrides tool-specific checkstrue,false
testfalse
Enables the test mode (needed for internal use only)true,false
+++algorithmAlgorithm parameters section
++++MSSim
+++++Digestion
enzymeTrypsin
Enzyme to use for digestion (select 'no cleavage' to skip digestion)Arg-C,Asp-N,Asp-N/B,Asp-N_ambic,Chymotrypsin,Chymotrypsin/P,CNBr,Formic_acid,Lys-C,Lys-N,Lys-C/P,PepsinA,TrypChymo,V8-DE,Trypsin/P,V8-E,Alpha-lytic protease,leukocyte elastase,proline endopeptidase,iodosobenzoate,glutamyl endopeptidase,2-iodobenzoate,staphylococcal protease/D,proline-endopeptidase/HKR,Glu-C+P,PepsinA + P,cyanogen-bromide,Clostripain/P,Arg-C/P,unspecific cleavage,Trypsin,elastase-trypsin-chymotrypsin,no cleavage
modelnaive
The cleavage model to use for digestion. 'Trained' is based on a log likelihood model (see DOI:10.1021/pr060507u).trained,naive
min_peptide_length3
Minimum peptide length after digestion (shorter ones will be discarded)1:∞
++++++model_trained
threshold0.5
Model threshold for calling a cleavage. Higher values increase the number of cleavages. -2 will give no cleavages, +4 almost full cleavage.-2.0:4.0
++++++model_naive
missed_cleavages1
Maximum number of missed cleavages considered. All possible resulting peptides will be created.0:∞
+++++RT
rt_columnHPLC
Modelling of an RT or CE columnnone,HPLC,CE
auto_scaletrue
Scale predicted RT's/MT's to given 'total_gradient_time'? If 'true', for CE this means that 'CE:lenght_d', 'CE:length_total', 'CE:voltage' have no influence.true,false
total_gradient_time2500.0
The duration [s] of the gradient.1.0e-05:∞
sampling_rate2.0
Time interval [s] between consecutive scans0.01:60.0
++++++scan_window
min500.0
Start of RT Scan Window [s]0.0:∞
max1500.0
End of RT Scan Window [s]1.0:∞
++++++variationRandom component that simulates technical/biological variation
feature_stddev3
Standard deviation of shift in retention time [s] from predicted model (applied to every single feature independently)
affine_offset0
Global offset in retention time [s] from predicted model
affine_scale1
Global scaling in retention time from predicted model
++++++column_condition
distortion0
Distortion of the elution profiles. Good presets are 0 for a perfect elution profile, 1 for a slightly distorted elution profile etc... For trapping instruments (e.g. Orbitrap) distortion should be >4.0:10
++++++profile_shape
+++++++widthWidth of the EGH elution shape, i.e. the sigma^2 parameter, which is computed using 'value' + rnd_cauchy('variance')
value9.0
Width of the Exponential Gaussian Hybrid distribution shape of the elution profile. This does not correspond directly to the width in [s].0.0:∞
variance1.6
Random component of the width (set to 0 to disable randomness), i.e. scale parameter for the lorentzian variation of the variance (Note: The scale parameter has to be >= 0).0.0:∞
+++++++skewnessSkewness of the EGH elution shape, i.e. the tau parameter, which is computed using 'value' + rnd_cauchy('variance')
value0.1
Asymmetric component of the EGH. Higher absolute(!) values lead to more skewness (negative values cause fronting, positive values cause tailing). Tau parameter of the EGH, i.e. time constant of the exponential decay of the Exponential Gaussian Hybrid distribution shape of the elution profile.
variance0.3
Random component of skewness (set to 0 to disable randomness), i.e. scale parameter for the lorentzian variation of the time constant (Note: The scale parameter has to be > 0).0.0:∞
++++++HPLC
model_fileexamples/simulation/RTPredict.model
SVM model for retention time prediction
++++++CE
pH3.0
pH of buffer0.0:14.0
alpha0.5
Exponent Alpha used to calculate mobility0.0:1.0
mu_eo0.0
Electroosmotic flow0.0:5.0
lenght_d70.0
Length of capillary [cm] from injection site to MS0.0:1000.0
length_total75.0
Total length of capillary [cm]0.0:1000.0
voltage1000.0
Voltage applied to capillary0.0:∞
+++++Detectability
dt_simulation_onfalse
Modelling detectibility enabled? This can serve as a filter to remove peptides which ionize badly, thus reducing peptide counttrue,false
min_detect0.5
Minimum peptide detectability accepted. Peptides with a lower score will be removed
dt_model_fileexamples/simulation/DTPredict.model
SVM model for peptide detectability prediction
+++++Ionization
++++++esi
ionized_residues[Arg, Lys, His]
List of residues (as three letter code) that will be considered during ES ionization. The N-term is always assumed to carry a charge. This parameter will be ignored during MALDI ionizationAla,Cys,Asp,Glu,Phe,Gly,His,Ile,Lys,Leu,Met,Asn,Pro,Gln,Arg,Sec,Ser,Thr,Val,Trp,Tyr
charge_impurity[H+:1]
List of charged ions that contribute to charge with weight of occurrence (their sum is scaled to 1 internally), e.g. ['H:1'] or ['H:0.7' 'Na:0.3'], ['H:4' 'Na:1'] (which internally translates to ['H:0.8' 'Na:0.2'])
max_impurity_set_size3
Maximal #combinations of charge impurities allowed (each generating one feature) per charge state. E.g. assuming charge=3 and this parameter is 2, then we could choose to allow '3H+, 2H+Na+' features (given a certain 'charge_impurity' constraints), but no '3H+, 2H+Na+, 3Na+'
ionization_probability0.8
Probability for the binomial distribution of the ESI charge states
++++++maldi
ionization_probabilities[0.9, 0.1, 0.0]
List of probabilities for different charge states (starting at charge=1, 2, ...) during MALDI ionization (the list must sum up to 1.0)
++++++mz
lower_measurement_limit200.0
Lower m/z detector limit0.0:∞
upper_measurement_limit1200.0
Upper m/z detector limit0.0:∞
+++++RawSignal
enabledtrue
Enable RAW signal simulation? (select 'false' if you only need feature-maps)true,false
peak_shapeGaussian
Peak Shape used around each isotope peak (be aware that the area under the curve is constant for both types, but the maximal height will differ (~ 2:3 = Lorentz:Gaussian) due to the wider base of the LorentzianGaussian,Lorentzian
++++++resolution
value50000
Instrument resolution at 400 Th
typelinear
How does resolution change with increasing m/z?! QTOFs usually show 'constant' behavior, FTs have linear degradation, and on Orbitraps the resolution decreases with square root of massconstant,linear,sqrt
++++++baselineBaseline modeling for MALDI ionization
scaling0.0
Scale of baseline. Set to 0 to disable simulation of baseline0.0:∞
shape0.5
The baseline is modeled by an exponential probability density function (pdf) with f(x) = shape*e^(- shape*x)0.0:∞
++++++mz
sampling_points3
Number of raw data points per FWHM of the peak2:∞
++++++contaminants
fileexamples/simulation/contaminants.csv
Contaminants file with sum formula and absolute RT interval. See 'OpenMS/examples/simulation/contaminants.txt' for details
++++++variationRandom components that simulate biological and technical variations of the simulated data
+++++++mzShifts in mass to charge dimension of the simulated signals
error_mean0.0
Average systematic m/z error (in Da)
error_stddev0.0
Standard deviation for m/z errors. Set to 0 to disable simulation of m/z errors
+++++++intensityVariations in intensity to model randomness in feature intensity
scale100.0
Constant scale factor of the feature intensity. Set to 1.0 to get the real intensity values provided in the FASTA file0.0:∞
scale_stddev0.0
Standard deviation of peak intensity (relative to the scaled peak height). Set to 0 to get simple rescaled intensities0.0:∞
++++++noiseParameters modeling noise in mass spectrometry measurements
+++++++shotParameters of Poisson and Exponential for shot noise modeling (set :rate OR :mean = 0 to disable)
rate0.0
Poisson rate of shot noise per unit m/z (random peaks in m/z, where the number of peaks per unit m/z follows a Poisson distribution). Set this to 0 to disable simulation of shot noise0.0:∞
intensity-mean1.0
Shot noise intensity mean (exponentially distributed with given mean)
+++++++whiteParameters of Gaussian distribution for white noise modeling (set :mean AND :stddev = 0 to disable). No new peaks are generated; only intensity of existing ones is changed
mean0.0
Mean value of white noise (Gaussian) being added to each *measured* signal intensity
stddev0.0
Standard deviation of white noise being added to each *measured* signal intensity
+++++++detectorParameters of Gaussian distribution for detector noise modeling (set :mean AND :stddev = 0 to disable). If enabled, ALL possible m/z positions (up to sampling frequency of detector) will receive an intensity increase/decrease according to the specified Gaussian intensity distribution (similar to a noisy baseline)
mean0.0
Mean intensity value of the detector noise (Gaussian distribution)
stddev0.0
Standard deviation of the detector noise (Gaussian distribution)
+++++RawTandemSignal
statusdisabled
Create Tandem-MS scans?disabled,precursor,MS^E
tandem_mode0
Algorithm to generate the tandem-MS spectra. 0 - fixed intensities, 1 - SVC prediction (abundant/missing), 2 - SVR prediction of peak intensity
0:2
svm_model_set_fileexamples/simulation/SvmModelSet.model
File containing the filenames of SVM Models for different charge variants
++++++Precursor
ms2_spectra_per_rt_bin5
Number of allowed MS/MS spectra in a retention time bin.1:∞
min_mz_peak_distance2.0
The minimal distance (in Th) between two peaks for concurrent selection for fragmentation. Also used to define the m/z width of an exclusion window (distance +/- from m/z of precursor). If you set this lower than the isotopic envelope of a peptide, you might get multiple fragment spectra pointing to the same precursor.1.0e-04:∞
mz_isolation_window2.0
All peaks within a mass window (in Th) of a selected peak are also selected for fragmentation.0.0:∞
exclude_overlapping_peaksfalse
If true, overlapping or nearby peaks (within 'min_mz_peak_distance') are excluded for selection.true,false
charge_filter[2, 3]
Charges considered for MS2 fragmentation.1:5
+++++++Exclusion
use_dynamic_exclusionfalse
If true dynamic exclusion is applied.true,false
exclusion_time100.0
The time (in seconds) a feature is excluded.0.0:∞
+++++++ProteinBasedInclusion
max_list_size1000
The maximal number of precursors in the inclusion list.1:∞
++++++++rt
min_rt960.0
Minimal rt in seconds.0.0:∞
max_rt3840.0
Maximal rt in seconds.0.0:∞
rt_step_size30.0
rt step size in seconds.1.0:∞
rt_window_size100
rt window size in seconds.1:∞
++++++++thresholds
min_protein_id_probability0.95
Minimal protein probability for a protein to be considered identified.0.0:1.0
min_pt_weight0.5
Minimal pt weight of a precursor0.0:1.0
min_mz500.0
Minimal mz to be considered in protein based LP formulation.0.0:∞
max_mz5000.0
Minimal mz to be considered in protein based LP formulation.0.0:∞
use_peptide_rulefalse
Use peptide rule instead of minimal protein id probabilitytrue,false
min_peptide_ids2
If use_peptide_rule is true, this parameter sets the minimal number of peptide ids for a protein id1:∞
min_peptide_probability0.95
If use_peptide_rule is true, this parameter sets the minimal probability for a peptide to be safely identified0.0:1.0
++++++MS_E
add_single_spectrafalse
If true, the MS2 spectra for each peptide signal are included in the output (might be a lot). They will have a meta value 'MSE_DebugSpectrum' attached, so they can be filtered out. Native MS_E spectra will have 'MSE_Spectrum' instead.true,false
++++++TandemSim
+++++++Simple
isotope_modelnone
Model to use for isotopic peaks ('none' means no isotopic peaks are added, 'coarse' adds isotopic peaks in unit mass distance, 'fine' uses the hyperfine isotopic generator to add accurate isotopic peaks. Note that adding isotopic peaks is very slow.none,coarse,fine
max_isotope2
Defines the maximal isotopic peak which is added if 'isotope_model' is 'coarse'
max_isotope_probability0.05
Defines the maximal isotopic probability to cover if 'isotope_model' is 'fine'
add_metainfofalse
Adds the type of peaks as metainfo to the peaks, like y8+, [M-H2O+2H]++true,false
add_lossesfalse
Adds common losses to those ion expect to have them, only water and ammonia loss is consideredtrue,false
sort_by_positiontrue
Sort output by positiontrue,false
add_precursor_peaksfalse
Adds peaks of the unfragmented precursor ion to the spectrumtrue,false
add_all_precursor_chargesfalse
Adds precursor peaks with all charges in the given rangetrue,false
add_abundant_immonium_ionsfalse
Add most abundant immonium ionstrue,false
add_first_prefix_ionfalse
If set to true e.g. b1 ions are addedtrue,false
add_y_ionstrue
Add peaks of y-ions to the spectrumtrue,false
add_b_ionstrue
Add peaks of b-ions to the spectrumtrue,false
add_a_ionsfalse
Add peaks of a-ions to the spectrumtrue,false
add_c_ionsfalse
Add peaks of c-ions to the spectrumtrue,false
add_x_ionsfalse
Add peaks of x-ions to the spectrumtrue,false
add_z_ionsfalse
Add peaks of z-ions to the spectrumtrue,false
y_intensity1.0
Intensity of the y-ions
b_intensity1.0
Intensity of the b-ions
a_intensity1.0
Intensity of the a-ions
c_intensity1.0
Intensity of the c-ions
x_intensity1.0
Intensity of the x-ions
z_intensity1.0
Intensity of the z-ions
relative_loss_intensity0.1
Intensity of loss ions, in relation to the intact ion intensity
precursor_intensity1.0
Intensity of the precursor peak
precursor_H2O_intensity1.0
Intensity of the H2O loss peak of the precursor
precursor_NH3_intensity1.0
Intensity of the NH3 loss peak of the precursor
+++++++SVM
add_isotopesfalse
If set to 1 isotope peaks of the product ion peaks are addedtrue,false
max_isotope2
Defines the maximal isotopic peak which is added, add_isotopes must be set to 1
add_metainfofalse
Adds the type of peaks as metainfo to the peaks, like y8+, [M-H2O+2H]++true,false
add_first_prefix_ionfalse
If set to true e.g. b1 ions are addedtrue,false
hide_y_ionsfalse
Add peaks of y-ions to the spectrumtrue,false
hide_y2_ionsfalse
Add peaks of y-ions to the spectrumtrue,false
hide_b_ionsfalse
Add peaks of b-ions to the spectrumtrue,false
hide_b2_ionsfalse
Add peaks of b-ions to the spectrumtrue,false
hide_a_ionsfalse
Add peaks of a-ions to the spectrumtrue,false
hide_c_ionsfalse
Add peaks of c-ions to the spectrumtrue,false
hide_x_ionsfalse
Add peaks of x-ions to the spectrumtrue,false
hide_z_ionsfalse
Add peaks of z-ions to the spectrumtrue,false
hide_lossesfalse
Adds common losses to those ion expect to have them, only water and ammonia loss is consideredtrue,false
y_intensity1.0
Intensity of the y-ions
b_intensity1.0
Intensity of the b-ions
a_intensity1.0
Intensity of the a-ions
c_intensity1.0
Intensity of the c-ions
x_intensity1.0
Intensity of the x-ions
z_intensity1.0
Intensity of the z-ions
relative_loss_intensity0.1
Intensity of loss ions, in relation to the intact ion intensity
+++++Global
ionization_typeESI
Type of Ionization (MALDI or ESI)MALDI,ESI
+++++Labeling
typelabelfree
Select the labeling type you want for your experimentICPL,SILAC,itraq,labelfree,o18
++++++ICPLICPL labeling on MS1 level of lysines and n-term (on protein or peptide level) with either two or three channels.
ICPL_fixed_rtshift0.0
Fixed retention time shift between labeled pairs. If set to 0.0 only the retention times, computed by the RT model step are used.
label_proteinstrue
Enables protein-labeling. (select 'false' if you only need peptide-labeling)true,false
ICPL_light_channel_labelUniMod:365
UniMod Id of the light channel ICPL label.
ICPL_medium_channel_labelUniMod:687
UniMod Id of the medium channel ICPL label.
ICPL_heavy_channel_labelUniMod:364
UniMod Id of the heavy channel ICPL label.
++++++SILACSILAC labeling on MS1 level with up to 3 channels and custom modifications.
fixed_rtshift1.0e-04
Fixed retention time shift between labeled peptides. If set to 0.0 only the retention times computed by the RT model step are used.0.0:∞
+++++++medium_channelModifications for the medium SILAC channel.
modification_lysineUniMod:481
Modification of Lysine in the medium SILAC channel
modification_arginineUniMod:188
Modification of Arginine in the medium SILAC channel
+++++++heavy_channelModifications for the heavy SILAC channel. If you want to use only 2 channels, just leave the Labels as they are and provide only 2 input files.
modification_lysineUniMod:259
Modification of Lysine in the heavy SILAC channel. If left empty, two channelSILAC is assumed.
modification_arginineUniMod:267
Modification of Arginine in the heavy SILAC channel. If left empty, two-channel SILAC is assumed.
++++++itraqiTRAQ labeling on MS2 level with up to 4 (4plex) or 8 (8plex) channels.
iTRAQ4plex
4plex or 8plex iTRAQ?4plex,8plex
reporter_mass_shift0.1
Allowed shift (uniformly distributed - left to right) in Da from the expected position (of e.g. 114.1, 115.1)0.0:0.5
channel_active_4plex[114:myReference]
Four-plex only: Each channel that was used in the experiment and its description (114-117) in format :, e.g. "114:myref","115:liver".
channel_active_8plex[113:myReference]
Eight-plex only: Each channel that was used in the experiment and its description (113-121) in format :, e.g. "113:myref","115:liver","118:lung".
isotope_correction_values_4plex[114:0.0/1.0/5.9/0.2, 115:0.0/2.0/5.6/0.1, 116:0.0/3.0/4.5/0.1, 117:0.1/4.0/3.5/0.1]
override default values (see Documentation); use the following format: :<-2Da>/<-1Da>/<+1Da>/<+2Da> ; e.g. '114:0/0.3/4/0' , '116:0.1/0.3/3/0.2'
isotope_correction_values_8plex[113:0.0/0.0/6.89/0.22, 114:0.0/0.94/5.9/0.16, 115:0.0/1.88/4.9/0.1, 116:0.0/2.82/3.9/0.07, 117:0.06/3.77/2.99/0.0, 118:0.09/4.71/1.88/0.0, 119:0.14/5.66/0.87/0.0, 121:0.27/7.44/0.18/0.0]
override default values (see Documentation); use the following format: :<-2Da>/<-1Da>/<+1Da>/<+2Da> ; e.g. '113:0/0.3/4/0' , '116:0.1/0.3/3/0.2'
Y_contamination0.3
Efficiency of labeling tyrosine ('Y') residues. 0=off, 1=full labeling0.0:1.0
++++++o1818O labeling on MS1 level with 2 channels, requiring trypsin digestion.
labeling_efficiency1.0
Describes the distribution of the labeled peptide over the different states (unlabeled, mono- and di-labeled)0.0:1.0
++++RandomNumberGeneratorsParameters for generating the random aspects (e.g. noise) in the simulated data. The generation is separated into two parts, the technical part, like noise in the raw signal, and the biological part, like systematic deviations in the predicted retention times
biologicalrandom
Controls the 'biological' randomness of the generated data (e.g. systematic effects like deviations in RT). If set to 'random' each experiment will look different. If set to 'reproducible' each experiment will have the same outcome (given that the input data is the same)reproducible,random
technicalrandom
Controls the 'technical' randomness of the generated data (e.g. noise in the raw signal). If set to 'random' each experiment will look different. If set to 'reproducible' each experiment will have the same outcome (given that the input data is the same)reproducible,random