OpenMS
Loading...
Searching...
No Matches
UniProtXMLFile.h File Reference
#include <OpenMS/FORMAT/XMLFile.h>
#include <functional>
#include <string>
#include <vector>
Include dependency graph for UniProtXMLFile.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  UniProtFeature
 A single <feature> element from a UniProtKB XML entry. More...
 
struct  UniProtEntry
 A single <entry> from a UniProtKB XML file, in a parser-neutral form. More...
 
class  UniProtXMLFile
 Reads UniProtKB XML protein databases (.xml or transparently .xml.gz). More...
 

Namespaces

namespace  OpenMS
 Main OpenMS namespace.
 

Class Documentation

◆ OpenMS::UniProtFeature

struct OpenMS::UniProtFeature

A single <feature> element from a UniProtKB XML entry.

Fields mirror the UniProt schema after the <location> has been collapsed: a feature is either a single point (HasPosition / position) or a range (HasRange / begin / end). A status="unknown" coordinate is encoded as 0 (the caller decides whether that is legal for its target PEFF key).

Collaboration diagram for UniProtFeature:
[legend]
Class Members
int begin {0} 1-based range start (0 = unknown / absent)
string description feature/@description (raw, before any cleanup)
int end {0} 1-based range end (0 = unknown / absent)
bool has_position {false} a single <position> element was present
bool has_range {false} a <begin> / <end> pair was present
string original <original> text (sequence variant)
int position {0} 1-based position (0 = unknown / absent)
string type feature/@type, e.g. "modified residue", "disulfide bond", "sequence variant"
string variation <variation> text (sequence variant; first occurrence only)

◆ OpenMS::UniProtEntry

struct OpenMS::UniProtEntry

A single <entry> from a UniProtKB XML file, in a parser-neutral form.

Captures only the fields needed to build a PEFF descriptor line; isoforms (sequences without a "length" attribute) and splice-variant features are intentionally not represented because UniPEFF's PEFF emission ignores them.

Collaboration diagram for UniProtEntry:
[legend]
Class Members
string accession first <accession> (primary id)
vector< string > alt_accessions second and subsequent <accession> entries
string dataset entry/@dataset, e.g. "Swiss-Prot" or "TrEMBL"
string entry_version <entry version="...">
vector< UniProtFeature > features <feature> elements in document order
string full_name <protein>/<recommendedName>/<fullName> (first occurrence)
string name first <name> under <entry> (mnemonic id, e.g. "KSINK_HUMAN")
string ncbi_tax_id <organism>/<dbReference type="NCBI Taxonomy" id="...">
string primary_gene <gene>/<name type="primary"> (first occurrence)
string protein_existence <proteinExistence type="..."> (raw type string, e.g. "evidence at protein level")
string sequence canonical <sequence> text (whitespace stripped); isoform sequences are skipped
string sequence_version <sequence version="...">
string tax_name <organism>/<name type="scientific">