OpenMS
Loading...
Searching...
No Matches
UniProtXMLFile Class Reference

Reads UniProtKB XML protein databases (.xml or transparently .xml.gz). More...

#include <OpenMS/FORMAT/UniProtXMLFile.h>

Inheritance diagram for UniProtXMLFile:
[legend]
Collaboration diagram for UniProtXMLFile:
[legend]

Public Member Functions

 UniProtXMLFile ()
 Default constructor.
 
 ~UniProtXMLFile () override
 Destructor.
 
void load (const std::string &filename, std::vector< UniProtEntry > &entries)
 Load all entries from a UniProtKB XML file into memory.
 
void loadStreaming (const std::string &filename, const std::function< void(UniProtEntry &&)> &callback)
 Stream entries one at a time via a user-supplied callback.
 
- Public Member Functions inherited from XMLFile
 XMLFile ()
 Construct an XMLFile without schema info; schema_location_ remains unset, so isValid cannot be used until derived-class logic initializes schema_location_ before calling isValid.
 
 XMLFile (const std::string &schema_location, const std::string &version)
 Construct with a schema location for later isValid calls.
 
virtual ~XMLFile ()
 Virtual destructor — defaulted; allows safe deletion through a base-class pointer.
 
bool isValid (const std::string &filename, std::ostream &os)
 Check if filename validates against the bound XML schema.
 
const std::string & getVersion () const
 Return the schema version string passed to the parameterised constructor; empty for default-constructed instances.
 

Private Member Functions

 UniProtXMLFile (const UniProtXMLFile &)=delete
 
UniProtXMLFileoperator= (const UniProtXMLFile &)=delete
 

Additional Inherited Members

- Protected Member Functions inherited from XMLFile
void parse_ (const std::string &filename, XMLHandler *handler)
 Parse the XML file at filename through handler.
 
void parseBuffer_ (const std::string &buffer, XMLHandler *handler)
 Parse an in-memory XML buffer through handler.
 
void save_ (const std::string &filename, XMLHandler *handler) const
 Stores the contents of the XML handler given by handler in the file given by filename.
 
void enforceEncoding_ (const std::string &encoding)
 Set or clear the XML-encoding override applied to subsequent parse_ / parseBuffer_ calls.
 
- Protected Attributes inherited from XMLFile
std::string schema_location_
 Path of the XML schema for validation; empty when the default constructor was used (isValid then throws NotImplemented).
 
std::string schema_version_
 Schema version string returned by getVersion.
 
std::string enforced_encoding_
 Optional XML encoding override applied to the InputSource in parse_ and parseBuffer_; empty disables the override. Used as a workaround for XTandem output XML which carries an encoding the parser otherwise stumbles on.
 

Detailed Description

Reads UniProtKB XML protein databases (.xml or transparently .xml.gz).

Streams the document via Xerces SAX. The on-disk format is the UniProtKB XML release schema; isoforms (sequences without the "length" attribute on <sequence>) and the <comment type="alternative products"> subtree are skipped, matching UniPEFF's PEFF-emission behaviour. Gzip is detected by magic-byte sniffing in the inherited XMLFile parser, so no special handling is required for .xml.gz inputs.

Constructor & Destructor Documentation

◆ UniProtXMLFile() [1/2]

Default constructor.

◆ ~UniProtXMLFile()

~UniProtXMLFile ( )
override

Destructor.

◆ UniProtXMLFile() [2/2]

UniProtXMLFile ( const UniProtXMLFile )
privatedelete

Member Function Documentation

◆ load()

void load ( const std::string &  filename,
std::vector< UniProtEntry > &  entries 
)

Load all entries from a UniProtKB XML file into memory.

Parameters
[in]filenamePath to a .xml or .xml.gz UniProtKB file (gzip auto-detected).
[out]entriesFilled with one UniProtEntry per <entry> element.
Exceptions
Exception::FileNotFoundif filename does not exist.
Exception::ParseErrorif the file is not well-formed XML.

◆ loadStreaming()

void loadStreaming ( const std::string &  filename,
const std::function< void(UniProtEntry &&)> &  callback 
)

Stream entries one at a time via a user-supplied callback.

The callback receives ownership of each fully populated UniProtEntry as soon as its </entry> is reached. This is the preferred path for databases too large to materialise in full (e.g. TrEMBL).

Parameters
[in]filenamePath to a .xml or .xml.gz UniProtKB file (gzip auto-detected).
[in]callbackInvoked once per parsed entry; the entry is moved-from after the call.
Exceptions
Exception::FileNotFoundif filename does not exist.
Exception::ParseErrorif the file is not well-formed XML.

◆ operator=()

UniProtXMLFile & operator= ( const UniProtXMLFile )
privatedelete