Create a decoy peptide database from standard FASTA databases.

Decoy databases are useful to control false discovery rates and thus estimate score cutoffs for identified spectra.

The decoy can either be generated from reversed or shuffled sequences.

To get a 'contaminants' database have a look at http://www.thegpm.org/crap/index.html or find/create your own contaminant database.

Multiple databases can be provided as input, which will internally be concatenated before being used for decoy generation. This allows you to specify your target database plus a contaminant file and obtain a concatenated target-decoy database using a single call, e.g., DecoyDatabase -in human.fasta crap.fasta -out human_TD.fasta

By default, a combined database is created where target and decoy sequences are written interleaved (i.e., target1, decoy1, target2, decoy2,...). If you need all targets before the decoys for some reason, use only_decoy and concatenate the files externally.

The tool will keep track of all protein identifiers and report duplicates.

The command line parameters of this tool are:

DecoyDatabase -- Create decoy protein DB from forward protein DB.
Version: 2.4.0 Oct 29 2018, 15:52:19, Revision: 9690d06
To cite OpenMS:
Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
DecoyDatabase <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option.

Options (mandatory options marked with '*'):
-enzyme <enzyme> Enzyme used for the digestion of the sample (default: 'Trypsin' valid: 'V8-E
', 'iodosobenzoate', 'leukocyte elastase', 'proline endopeptidase', 'Alpha-l
ytic protease', 'glutamyl endopeptidase', '2-iodobenzoate', 'staphylococcal
protease/D', 'proline-endopeptidase/HKR', 'Glu-C+P', 'PepsinA + P', 'cyanoge
n-bromide', 'Clostripain/P', 'elastase-trypsin-chymotrypsin', 'Trypsin',
'Arg-C', 'Arg-C/P', 'no cleavage', 'unspecific cleavage', 'Asp-N/B', 'Asp-N'
, 'Asp-N_ambic', 'Chymotrypsin/P', 'Chymotrypsin', 'CNBr', 'Formic_acid',
'Lys-C', 'Lys-N', 'Lys-C/P', 'PepsinA', 'TrypChymo', 'V8-DE', 'Trypsin/P')
-in <file(s)>* Input FASTA file(s), each containing a database. It is recommended to includ
e a contaminant database as well. (valid formats: 'fasta')
-out <file>* Output FASTA file where the decoy database will be written to. (valid format
s: 'fasta')
-decoy_string <string> String that is combined with the accession of the protein identifier to indi
cate a decoy protein. (default: 'DECOY_')
-decoy_string_position <enum> Should the 'decoy_string' be prepended (prefix) or appended (suffix) to the
protein accession? (default: 'prefix' valid: 'prefix', 'suffix')
-only_decoy Write only decoy proteins to the output database instead of a combined datab
ase.
-method <enum> Method by which decoy sequences are generated from target sequences. Note
that all sequences are shuffled using the same random seed, ensuring that
identical sequences produce the same shuffled decoy sequences. Shuffled sequ
ences that produce highly similar output sequences are shuffled again (see
shuffle_sequence_identity_threshold). (default: 'reverse' valid: 'reverse',
'shuffle')

Common UTIL options:
-ini <file> Use the given TOPP INI file
-threads <n> Sets the number of threads allowed to be used by the TOPP tool (default:
'1')
-write_ini <file> Writes the default configuration file
--help Shows options
--helphelp Shows all options (including advanced)

The following configuration subsections are valid:
- Decoy Decoy parameters section

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
Have a look at the OpenMS documentation for more information.

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+DecoyDatabaseCreate decoy protein DB from forward protein DB.

version2.4.0 Version of the tool that generated this parameters file.

++1Instance '1' section for 'DecoyDatabase'

enzymeTrypsin enzyme used for the digestion of the sampleV8-E,iodosobenzoate,leukocyte elastase,proline endopeptidase,Alpha-lytic protease,glutamyl endopeptidase,2-iodobenzoate,staphylococcal protease/D,proline-endopeptidase/HKR,Glu-C+P,PepsinA + P,cyanogen-bromide,Clostripain/P,elastase-trypsin-chymotrypsin,Trypsin,Arg-C,Arg-C/P,no cleavage,unspecific cleavage,Asp-N/B,Asp-N,Asp-N_ambic,Chymotrypsin/P,Chymotrypsin,CNBr,Formic_acid,Lys-C,Lys-N,Lys-C/P,PepsinA,TrypChymo,V8-DE,Trypsin/P

in[] Input FASTA file(s), each containing a database. It is recommended to include a contaminant database as well.input file*.fasta

out Output FASTA file where the decoy database will be written to.output file*.fasta

decoy_stringDECOY_ String that is combined with the accession of the protein identifier to indicate a decoy protein.

decoy_string_positionprefix Should the 'decoy_string' be prepended (prefix) or appended (suffix) to the protein accession?prefix,suffix

only_decoyfalse Write only decoy proteins to the output database instead of a combined database.true,false

methodreverse Method by which decoy sequences are generated from target sequences. Note that all sequences are shuffled using the same random seed, ensuring that identical sequences produce the same shuffled decoy sequences. Shuffled sequences that produce highly similar output sequences are shuffled again (see shuffle_sequence_identity_threshold).reverse,shuffle

shuffle_max_attempts30 shuffle: maximum attempts to lower the amino acid sequence identity between target and decoy for the shuffle algorithm

shuffle_sequence_identity_threshold0.5 shuffle: target-decoy amino acid sequence identity threshold for the shuffle algorithm. If the sequence identity is above this threshold, shuffling is repeated. In case of repeated failure, individual amino acids are 'mutated' to produce a different amino acid sequence.

seed1 Random number seed (use 'time' for system time)

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue,false

forcefalse Overwrite tool specific checks.true,false

testfalse Enables the test mode (needed for internal use only)true,false

+++DecoyDecoy parameters section

non_shuffle_pattern Residues to not shuffle (keep at a constant position when shuffling). Separate by comma, e.g. use 'K,P,R' here.

keepPeptideNTermtrue Whether to keep peptide N terminus constant when shuffling / reversing.true,false

keepPeptideCTermtrue Whether to keep peptide C terminus constant when shuffling / reversing.true,false