![]() |
OpenMS
|
Converts a UniProtKB XML protein database to PEFF 1.0 with per-entry modification, processing, variant and disulfide-bond annotations.
This is an OpenMS-native port of the standalone C# UniPEFF tool (David L. Tabb, UMC Groningen). For each entry it emits a PEFF descriptor line with \PName \GName \NcbiTaxId \TaxName \Length \SV \EV \PE \ID \AltAC \ModResPsi \ModResUnimod \ModRes \VariantSimple \VariantComplex \Processed, plus \DisulfideBond when -annotation_identifiers is set.
UniProt's <feature type="disulfide bond"> entries are translated into half-cystine modifications (PSI-MOD:00798) which are then merged into the main modified-residue list in position-sorted order; in Option B (-AnnotationIdentifiers) the resulting half-cystines are referenced by annotation id from \DisulfideBond=(id1,id2).
Modification accession lookup uses UniProt's ptmlist.txt (a snapshot is bundled under share/OpenMS/CHEMISTRY/UniProt_ptmlist.txt); override with -ptmlist. Canonical OBO names come from PSI-MOD.obo (bundled) and an optional unimod.obo; without them, names fall back to the UniProt ptmlist.txt ID and a warning is printed.
Both plain .xml and .xml.gz UniProt inputs are accepted (gzip is auto-detected by the underlying parser).
The command line parameters of this tool are:
UniPEFF -- Convert a UniProtKB XML protein database to PEFF 1.0 with rich annotations.
Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_UniPEFF.html
Version: 3.6.0-pre-nightly-2026-06-27 Jun 28 2026, 01:54:09, Revision: 221d046
To cite OpenMS:
+ Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec
trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.
Usage:
UniPEFF <options>
Options (mandatory options marked with '*'):
-in <file>* Input UniProtKB XML file (plain or gzip; gzip is auto-detected). (valid
formats: 'xml')
-out <file>* Output PEFF 1.0 file. (valid formats: 'peff')
-ptmlist <file> UniProt ptmlist.txt; defaults to the bundled snapshot. (valid formats:
'txt')
-psimod_obo <file> PSI-MOD.obo for canonical modification names; defaults to the bundled Open
MS PSI-MOD.obo. (valid formats: 'obo')
-unimod_obo <file> Optional unimod.obo for canonical Unimod names; if absent, names fall back
to the UniProt ptmlist IDs. (valid formats: 'obo')
-prefix <string> Force a single PEFF prefix for every entry (e.g. 'sp'); if empty, sp/tr
is derived from the UniProt dataset.
-dbversion <string> Value for the mandatory '# DbVersion=' PEFF header line. (default: 'unknow
n')
-annotation_identifiers Emit PEFF Option B: assign a sequential id: prefix to every annotation
tuple and emit \DisulfideBond connectivity.
-omit_molecular_processing Skip the \Processed annotations (initiator methionine, signal/transit pept
ide, propeptide, chain).
-omit_amino_acid_modifications Skip \ModResPsi / \ModResUnimod / \ModRes and \DisulfideBond; ptmlist is
not read.
-omit_sequence_variations Skip \VariantSimple and \VariantComplex annotations.
Common TOPP options:
-ini <file> Use the given TOPP INI file
-threads <n> Sets the number of threads allowed to be used by the TOPP tool (0 = all
available cores) (default: '1')
-write_ini <file> Writes the default configuration file
--help Shows options
--helphelp Shows all options (including advanced)
INI file documentation of this tool:
This section lists all parameters supported by the tool. Parameters are organized into hierarchical subsections that group related settings together. Subsections may contain further subsections or individual parameters.
Each parameter entry contains the following information:
Parameter tags provide additional information about how a parameter is used. Some tags indicate whether a parameter is required or intended for advanced configuration, while others may be used internally by OpenMS or workflow tools.
Parameters highlighted as required must be specified for the tool to run successfully. Parameters marked as advanced allow fine-tuning of algorithm behavior and are typically not needed for standard workflows.