Refreshes the protein references for all peptide hits from an idXML file and adds target/decoy information.
pot. predecessor tools | PeptideIndexer | pot. successor tools |
IDFilter or any protein/peptide processing tool | FalseDiscoveryRate |
A detailed description of the parameters and functionality is given in PeptideIndexing.
All peptide and protein hits are annotated with target/decoy information, using the meta value "target_decoy". For proteins the possible values are "target" and "decoy", depending on whether the protein accession contains the decoy pattern (parameter decoy_string
) as a suffix or prefix, respectively (see parameter prefix
). For peptides, the possible values are "target", "decoy" and "target+decoy", depending on whether the peptide sequence is found only in target proteins, only in decoy proteins, or in both. The target/decoy information is crucial for the FalseDiscoveryRate tool. (For FDR calculations, "target+decoy" peptide hits count as target hits.)
PeptideIndexer supports relative database filenames, which (when not found in the current working directory) are looked up in the directories specified by OpenMS.ini:id_db_dir
(see TOPP for Advanced Users).
The command line parameters of this tool are:
PeptideIndexer -- Refreshes the protein references for all peptide hits. Version: 2.3.0 Jan 9 2018, 17:46:23, Revision: 38ae115 Usage: PeptideIndexer <options> Options (mandatory options marked with '*'): -in <file>* Input idXML file containing the identifications. (valid formats: 'idXML') -fasta <file>* Input sequence database in FASTA format. Non-existing relative filenames are looked up via 'OpenMS.ini:id_db_dir' (valid formats: 'fasta') -out <file>* Output idXML file. (valid formats: 'idXML') -decoy_string <text> String that was appended (or prefixed - see 'decoy_string_position' flag below) to the accessions in the protein database to indicate decoy protein s. (default: 'DECOY_') -decoy_string_position <choice> Should the 'decoy_string' be prepended (prefix) or appended (suffix) to the protein accession? (default: 'prefix' valid: 'prefix', 'suffix') -missing_decoy_action <choice> Action to take if NO peptide was assigned to a decoy protein (which indica tes wrong database or decoy string): 'error' (exit with error, no output), 'warn' (exit with success, warning message) (default: 'error' valid: 'err or', 'warn') -write_protein_sequence If set, the protein sequences are stored as well. -write_protein_description If set, the protein description is stored as well. -keep_unreferenced_proteins If set, protein hits which are not referenced by any peptide are kept. -allow_unmatched If set, unmatched peptide sequences are allowed. By default (i.e. if this flag is not set) the program terminates with an error on unmatched peptide s. -full_tolerant_search If set, all peptide sequences are matched using tolerant search. Thus pote ntially more proteins (containing ambiguous amino acids) are associated. This is much slower! -aaa_max <number> [tolerant search only] Maximal number of ambiguous amino acids (AAAs) allo wed when matching to a protein database with AAAs. AAAs are 'B', 'Z' and 'X' (default: '4' min: '0') -mismatches_max <number> [tolerant search only] Maximal number of real mismatches (will be used after checking for ambiguous AA's (see 'aaa_max' option). In general this param should only be changed if you want to look for other potential origi ns of a peptide which might have unknown SNPs or the like. (default: '0' min: '0') -IL_equivalent Treat the isobaric amino acids isoleucine ('I') and leucine ('L') as equiv alent (indistinguishable) -filter_aaa_proteins In the tolerant search for matches to proteins with ambiguous amino acids (AAAs), rebuild the search database to only consider proteins with AAAs. This may save time if most proteins don't contain AAAs and if there is a significant number of peptides that enter the tolerant search. -log <text> Name of log file (created only when specified) -debug <number> Sets the debug level (default: '0') enzyme: -enzyme:name <choice> Enzyme which determines valid cleavage sites - e.g. trypsin cleaves after lysine (K) or arginine (R), but not before proline (P). (default: 'Trypsin ' valid: 'Lys-C/P', 'Lys-N', 'leukocyte elastase', 'proline endopeptidase' , 'Trypsin/P', 'V8-DE', 'V8-E', 'Alpha-lytic protease', 'Lys-C', 'Asp-N', 'Asp-N_ambic', 'Trypsin', 'glutamyl endopeptidase', '2-iodobenzoate', 'Try pChymo', 'Asp-N/B', 'unspecific cleavage', 'Chymotrypsin', 'PepsinA', 'Arg -C', 'CNBr', 'Formic_acid', 'Chymotrypsin/P', 'no cleavage', 'Arg-C/P') -enzyme:specificity <choice> Specificity of the enzyme. 'full': both internal cleavage sites must match. 'semi': one of two internal cleavage sites must match. 'none': allow all peptide hits no matter their context. Therefore, the enzyme chosen does not play a role here (default: 'full' valid: 'full', 'semi', 'none') Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced)
INI file documentation of this tool:
OpenMS / TOPP release 2.3.0 | Documentation generated on Tue Jan 9 2018 18:22:06 using doxygen 1.8.13 |