OpenMS
Loading...
Searching...
No Matches
UniPEFF

Converts a UniProtKB XML protein database to PEFF 1.0 with per-entry modification, processing, variant and disulfide-bond annotations.

This is an OpenMS-native port of the standalone C# UniPEFF tool (David L. Tabb, UMC Groningen). For each entry it emits a PEFF descriptor line with \PName \GName \NcbiTaxId \TaxName \Length \SV \EV \PE \ID \AltAC \ModResPsi \ModResUnimod \ModRes \VariantSimple \VariantComplex \Processed, plus \DisulfideBond when -annotation_identifiers is set.

UniProt's <feature type="disulfide bond"> entries are translated into half-cystine modifications (PSI-MOD:00798) which are then merged into the main modified-residue list in position-sorted order; in Option B (-AnnotationIdentifiers) the resulting half-cystines are referenced by annotation id from \DisulfideBond=(id1,id2).

Modification accession lookup uses UniProt's ptmlist.txt (a snapshot is bundled under share/OpenMS/CHEMISTRY/UniProt_ptmlist.txt); override with -ptmlist. Canonical OBO names come from PSI-MOD.obo (bundled) and an optional unimod.obo; without them, names fall back to the UniProt ptmlist.txt ID and a warning is printed.

Both plain .xml and .xml.gz UniProt inputs are accepted (gzip is auto-detected by the underlying parser).

The command line parameters of this tool are:

UniPEFF -- Convert a UniProtKB XML protein database to PEFF 1.0 with rich annotations.
Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_UniPEFF.html
Version: 3.6.0-pre-nightly-2026-06-27 Jun 28 2026, 01:54:09, Revision: 221d046
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec
   trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

Usage:
  UniPEFF <options>

Options (mandatory options marked with '*'):
  -in <file>*                      Input UniProtKB XML file (plain or gzip; gzip is auto-detected). (valid 
                                   formats: 'xml')
  -out <file>*                     Output PEFF 1.0 file. (valid formats: 'peff')
  -ptmlist <file>                  UniProt ptmlist.txt; defaults to the bundled snapshot. (valid formats: 
                                   'txt')
  -psimod_obo <file>               PSI-MOD.obo for canonical modification names; defaults to the bundled Open
                                   MS PSI-MOD.obo. (valid formats: 'obo')
  -unimod_obo <file>               Optional unimod.obo for canonical Unimod names; if absent, names fall back
                                    to the UniProt ptmlist IDs. (valid formats: 'obo')
  -prefix <string>                 Force a single PEFF prefix for every entry (e.g. 'sp'); if empty, sp/tr 
                                   is derived from the UniProt dataset.
  -dbversion <string>              Value for the mandatory '# DbVersion=' PEFF header line. (default: 'unknow
                                   n')
  -annotation_identifiers          Emit PEFF Option B: assign a sequential id: prefix to every annotation 
                                   tuple and emit \DisulfideBond connectivity.
  -omit_molecular_processing       Skip the \Processed annotations (initiator methionine, signal/transit pept
                                   ide, propeptide, chain).
  -omit_amino_acid_modifications   Skip \ModResPsi / \ModResUnimod / \ModRes and \DisulfideBond; ptmlist is 
                                   not read.
  -omit_sequence_variations        Skip \VariantSimple and \VariantComplex annotations.
                                   
Common TOPP options:
  -ini <file>                      Use the given TOPP INI file
  -threads <n>                     Sets the number of threads allowed to be used by the TOPP tool (0 = all 
                                   available cores) (default: '1')
  -write_ini <file>                Writes the default configuration file
  --help                           Shows options
  --helphelp                       Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter

This section lists all parameters supported by the tool. Parameters are organized into hierarchical subsections that group related settings together. Subsections may contain further subsections or individual parameters.

Each parameter entry contains the following information:

  • Name The identifier used in configuration files and on the command line.
  • Default value The value used if the parameter is not explicitly specified.
  • Description A short explanation describing the purpose and behavior of the parameter.
  • Tags Additional metadata associated with the parameter.
  • Restrictions Allowed value ranges for numeric parameters or valid options for string parameters.

Parameter tags provide additional information about how a parameter is used. Some tags indicate whether a parameter is required or intended for advanced configuration, while others may be used internally by OpenMS or workflow tools.

Parameters highlighted as required must be specified for the tool to run successfully. Parameters marked as advanced allow fine-tuning of algorithm behavior and are typically not needed for standard workflows.

+UniPEFFConvert a UniProtKB XML protein database to PEFF 1.0 with rich annotations.
version3.6.0-pre-nightly-2026-06-27 Version of the tool that generated this parameters file.
++1Instance '1' section for 'UniPEFF'
in Input UniProtKB XML file (plain or gzip; gzip is auto-detected).input file*.xml
out Output PEFF 1.0 file.output file*.peff
ptmlist UniProt ptmlist.txt; defaults to the bundled snapshot.input file*.txt
psimod_obo PSI-MOD.obo for canonical modification names; defaults to the bundled OpenMS PSI-MOD.obo.input file*.obo
unimod_obo Optional unimod.obo for canonical Unimod names; if absent, names fall back to the UniProt ptmlist IDs.input file*.obo
prefix Force a single PEFF prefix for every entry (e.g. 'sp'); if empty, sp/tr is derived from the UniProt dataset.
dbversionunknown Value for the mandatory '# DbVersion=' PEFF header line.
annotation_identifiersfalse Emit PEFF Option B: assign a sequential id: prefix to every annotation tuple and emit \DisulfideBond connectivity.true, false
omit_molecular_processingfalse Skip the \Processed annotations (initiator methionine, signal/transit peptide, propeptide, chain).true, false
omit_amino_acid_modificationsfalse Skip \ModResPsi / \ModResUnimod / \ModRes and \DisulfideBond; ptmlist is not read.true, false
omit_sequence_variationsfalse Skip \VariantSimple and \VariantComplex annotations.true, false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool (0 = all available cores)
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false