OpenMS  2.4.0
MascotAdapter

Identifies peptides in MS/MS spectra via Mascot.

pot. predecessor tools $ \longrightarrow $ MascotAdapter $ \longrightarrow $ pot. successor tools
any signal-/preprocessing tool
(in mzML format)
IDFilter or
any protein/peptide processing tool

This wrapper application serves for getting peptide identifications for MS/MS spectra. It uses a local installation of the Mascot server to generate the identifications. A second wrapper (MascotAdapterOnline) is available which is able to perform identifications by communicating with a Mascot server over the network. So, it is not necessary to execute MascotAdapterOnline on the same machine as Mascot.

The minimal version of Mascot supported with this server is 2.1.

This wrapper can be executed in three different modes:

  1. The whole process of ProteinIdentification via Mascot is executed. Inputfile is a mzData file containing the MS/MS spectra for which the identifications are to be found. The results are written as a idXML output file. This mode is selected by default.

  2. Only the first part of the ProteinIdentification process is performed. This means that the MS/MS data is transformed into Mascot Generic Format (mgf) which can be used directly with Mascot. Being in the cgi directory of the Mascot directory calling a Mascot process should look like the following:

    ./nph-mascot.exe 1 -commandline -f outputfilename < inputfilename

    Consult your Mascot reference manual for further details.

    This mode is selected by the -mascot_in option in the command line.

  3. Only the second part of the ProteinIdentification process is performed. This means that the outputfile of the Mascot server is translated into idXML.

    This mode is selected by the -mascot_out option in the command line.


If your Mascot server is installed on the same computer as the TOPP applications the MascotAdapter can be executed in mode 1. Otherwise the Mascot engine has to be executed manually assisted by mode 2 and mode 3. The ProteinIdentification steps then look like:

For mode 1 you have to specify the directory in which the Mascot server is installed. This is done by setting the option mascot_dir in the ini file. Furthermore you have to specify a folder in which the user has write permissions. This is done by setting the option temp_data_directory in the ini file. Two temporary files will be created in this directory during execution but deleted at the end of execution.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

MascotAdapter -- Annotates MS/MS spectra using Mascot.
Version: 2.4.0 Oct 29 2018, 15:52:19, Revision: 9690d06
To cite OpenMS:
  Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  MascotAdapter <options>

Options (mandatory options marked with '*'):
  -in <file>*                      Input file in mzData format.
                                   Note: In mode 'mascot_out' a Mascot results file (.mascotXML) is read
  -out <file>*                     Output file in idXML format.
                                   Note: In mode 'mascot_in' Mascot generic format is written.
  -mascot_in                       If this flag is set the MascotAdapter will read in mzData and write Mascot
                                   generic format
  -mascot_out                      If this flag is set the MascotAdapter will read in a Mascot results file 
                                   (.mascotXML) and write idXML
  -instrument <i>                  The instrument that was used to measure the spectra (default: 'Default')
  -precursor_mass_tolerance <tol>  The precursor mass tolerance (default: '2')
  -peak_mass_tolerance <tol>       The peak mass tolerance (default: '1')
  -taxonomy <tax>                  The taxonomy (default: 'All entries' valid: 'All entries', '. . Archaea 
                                   (Archaeobacteria)', '. . Eukaryota (eucaryotes)', '. . . . Alveolata (alve
                                   olates)', '. . . . . . Plasmodium falciparum (malaria parasite)', '. . .
                                   . . . Other Alveolata', '. . . . Metazoa (Animals)', '. . . . . . Caenorha
                                   bditis elegans', '. . . . . . Drosophila (fruit flies)', '. . . . . . Chor
                                   data (vertebrates and relatives)', '. . . . . . . . bony vertebrates', '.
                                   . . . . . . . . . lobe-finned fish and tetrapod clade', '. . . . . . . .
                                   . . . . Mammalia (mammals)', '. . . . . . . . . . . . . . Primates', '. .
                                   ...
                                   ilable')
  -modifications <mods>            The modifications i.e. Carboxymethyl (C)
  -variable_modifications <mods>   The variable modifications i.e. Carboxymethyl (C)
  -charges [1+ 2+ ...]             The different charge states (default: '[1+ 2+ 3+]')
  -db <name>                       The database to search in (default: 'MSDB')
  -hits <num>                      The number of hits to report (default: 'AUTO')
  -cleavage <enz>                  The enzyme descriptor to the enzyme used for digestion. (Trypsin is defaul
                                   t, None would be best for peptide input or unspecific digestion, for more
                                   please refer to your mascot server). (default: 'Trypsin' valid: 'Trypsin',
                                   'Arg-C', 'Asp-N', 'Asp-N_ambic', 'Chymotrypsin', 'CNBr', 'CNBr+Trypsin',
                                   'Formic_acid', 'Lys-C', 'Lys-C/P', 'PepsinA', 'Tryp-CNBr', 'TrypChymo',
                                   'Trypsin/P', 'V8-DE', 'V8-E', 'semiTrypsin', 'LysC+AspN', 'None')
  -missed_cleavages <num>          Number of allowed missed cleavages (default: '0' min: '0')
  -sig_threshold <num>             Significance threshold (default: '0.05')
  -pep_homol <num>                 Peptide homology threshold (default: '1')
  -pep_ident <num>                 Peptide ident threshold (default: '1')
  -pep_rank <num>                  Peptide rank (default: '1')
  -prot_score <num>                Protein score (default: '1')
  -pep_score <num>                 Peptide score (default: '1')
  -pep_exp_z <num>                 Peptide expected charge (default: '1')
  -show_unassigned <num>           Show_unassigned (default: '1')
  -first_dim_rt <num>              Additional information which is added to every peptide identification as 
                                   metavalue if set > 0 (default: '0')
  -boundary <string>               MIME boundary for mascot output format
  -mass_type <type>                Mass type (default: 'Monoisotopic' valid: 'Monoisotopic', 'Average')
  -mascot_directory <dir>          The directory in which mascot is located
  -temp_data_directory <dir>       A directory in which some temporary files can be stored
                                   
Common TOPP options:
  -ini <file>                      Use the given TOPP INI file
  -threads <n>                     Sets the number of threads allowed to be used by the TOPP tool (default: 
                                   '1')
  -write_ini <file>                Writes the default configuration file
  --help                           Shows options
  --helphelp                       Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+MascotAdapterAnnotates MS/MS spectra using Mascot.
version2.4.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'MascotAdapter'
in input file in mzData format.
Note: In mode 'mascot_out' a Mascot results file (.mascotXML) is read
input file
out output file in idXML format.
Note: In mode 'mascot_in' Mascot generic format is written.
output file
mascot_infalse if this flag is set the MascotAdapter will read in mzData and write Mascot generic formattrue,false
mascot_outfalse if this flag is set the MascotAdapter will read in a Mascot results file (.mascotXML) and write idXMLtrue,false
instrumentDefault the instrument that was used to measure the spectra
precursor_mass_tolerance2 the precursor mass tolerance
peak_mass_tolerance1 the peak mass tolerance
taxonomyAll entries the taxonomyAll entries,. . Archaea (Archaeobacteria),. . Eukaryota (eucaryotes),. . . . Alveolata (alveolates),. . . . . . Plasmodium falciparum (malaria parasite),. . . . . . Other Alveolata,. . . . Metazoa (Animals),. . . . . . Caenorhabditis elegans,. . . . . . Drosophila (fruit flies),. . . . . . Chordata (vertebrates and relatives),. . . . . . . . bony vertebrates,. . . . . . . . . . lobe-finned fish and tetrapod clade,. . . . . . . . . . . . Mammalia (mammals),. . . . . . . . . . . . . . Primates,. . . . . . . . . . . . . . . . Homo sapiens (human),. . . . . . . . . . . . . . . . Other primates,. . . . . . . . . . . . . . Rodentia (Rodents),. . . . . . . . . . . . . . . . Mus.,. . . . . . . . . . . . . . . . . . Mus musculus (house mouse),. . . . . . . . . . . . . . . . Rattus,. . . . . . . . . . . . . . . . Other rodentia,. . . . . . . . . . . . . . Other mammalia,. . . . . . . . . . . . Xenopus laevis (African clawed frog),. . . . . . . . . . . . Other lobe-finned fish and tetrapod clade,. . . . . . . . . . Actinopterygii (ray-finned fishes),. . . . . . . . . . . . Takifugu rubripes (Japanese Pufferfish),. . . . . . . . . . . . Danio rerio (zebra fish),. . . . . . . . . . . . Other Actinopterygii,. . . . . . . . Other Chordata,. . . . . . Other Metazoa,. . . . Dictyostelium discoideum,. . . . Fungi,. . . . . . Saccharomyces Cerevisiae (baker's yeast),. . . . . . Schizosaccharomyces pombe (fission yeast),. . . . . . Pneumocystis carinii,. . . . . . Other Fungi,. . . . Viridiplantae (Green Plants),. . . . . . Arabidopsis thaliana (thale cress),. . . . . . Oryza sativa (rice),. . . . . . Other green plants,. . . . Other Eukaryota,. . Bacteria (Eubacteria),. . . . Actinobacteria (class),. . . . . . Mycobacterium tuberculosis complex,. . . . . . Other Actinobacteria (class),. . . . Firmicutes (gram-positive bacteria),. . . . . . Bacillus subtilis,. . . . . . Mycoplasma,. . . . . . Streptococcus Pneumoniae,. . . . . . Streptomyces coelicolor,. . . . . . Other Firmicutes,. . . . Proteobacteria (purple bacteria),. . . . . . Agrobacterium tumefaciens,. . . . . . Campylobacter jejuni,. . . . . . Escherichia coli,. . . . . . Neisseria meningitidis,. . . . . . Salmonella,. . . . . . Other Proteobacteria,. . . . Other Bacteria,. . Viruses,. . . . Hepatitis C virus,. . . . Other viruses,. . Other (includes plasmids and artificial sequences),. . unclassified,. . Species information unavailable
modifications[] the modifications i.e. Carboxymethyl (C)
variable_modifications[] the variable modifications i.e. Carboxymethyl (C)
charges[1+, 2+, 3+] the different charge states
dbMSDB the database to search in
hitsAUTO the number of hits to report
cleavageTrypsin The enzyme descriptor to the enzyme used for digestion. (Trypsin is default, None would be best for peptide input or unspecific digestion, for more please refer to your mascot server).Trypsin,Arg-C,Asp-N,Asp-N_ambic,Chymotrypsin,CNBr,CNBr+Trypsin,Formic_acid,Lys-C,Lys-C/P,PepsinA,Tryp-CNBr,TrypChymo,Trypsin/P,V8-DE,V8-E,semiTrypsin,LysC+AspN,None
missed_cleavages0 number of allowed missed cleavages0:∞
sig_threshold0.05 significance threshold
pep_homol1 peptide homology threshold
pep_ident1 peptide ident threshold
pep_rank1 peptide rank
prot_score1 protein score
pep_score1 peptide score
pep_exp_z1 peptide expected charge
show_unassigned1 show_unassigned
first_dim_rt0 additional information which is added to every peptide identification as metavalue if set > 0
boundary MIME boundary for mascot output format
mass_typeMonoisotopic mass typeMonoisotopic,Average
mascot_directory the directory in which mascot is located
temp_data_directory a directory in which some temporary files can be stored
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overwrite tool specific checks.true,false
testfalse Enables the test mode (needed for internal use only)true,false

You can specify the Mascot parameters precursor_mass_tolerance (the peptide mass tolerance), peak_mass_tolerance (the MS/MS tolerance), taxonomy (restriction to a certain subset of the database), modifications, variable_modifications, charges (the possible charge variants), db (database where the peptides are searched in), hits (number of hits), cleavage (the cleavage enzyme), missed_cleavages (number of missed cleavages) and mass_type (Monoisotopic or Average) via the ini file.


Known problems with Mascot server execution:

Todo:
This adapter is using antiquated internal methods and needs to be updated! E.g. use MascotGenericFile.h instead of MascotInfile.h....