OpenMS
Loading...
Searching...
No Matches
PEFFFile Class Reference

This class serves for reading and writing PEFF (PSI Extended FASTA Format) files. More...

#include <OpenMS/FORMAT/PEFFFile.h>

Inheritance diagram for PEFFFile:
[legend]
Collaboration diagram for PEFFFile:
[legend]

Public Member Functions

 PEFFFile ()=default
 Default constructor.
 
 ~PEFFFile () override=default
 Destructor.
 
void load (const String &filename, std::vector< PEFFEntry > &entries, std::vector< PEFFDatabaseMetadata > &headers) const
 Loads a PEFF file and stores entries and headers.
 
void store (const String &filename, const std::vector< PEFFEntry > &entries, const PEFFDatabaseMetadata &header) const
 Stores entries to a PEFF file with the given header.
 
void store (const String &filename, const std::vector< PEFFEntry > &entries, const std::vector< PEFFDatabaseMetadata > &headers) const
 Stores entries to a PEFF file with multiple database headers.
 
void readStart (const String &filename)
 Prepares a PEFF file for streamed reading using readNext().
 
bool readNext (PEFFEntry &entry)
 Reads the next PEFF entry from the file.
 
const std::vector< PEFFDatabaseMetadata > & getHeaders () const
 Returns the headers parsed during readStart().
 
bool atEnd () const
 Returns true if the end of the file has been reached.
 
void writeStart (const String &filename, const PEFFDatabaseMetadata &header)
 Prepares a PEFF file for streamed writing using writeNext().
 
void writeStart (const String &filename, const std::vector< PEFFDatabaseMetadata > &headers)
 Prepares a PEFF file for streamed writing using writeNext(), with multiple headers.
 
void writeNext (const PEFFEntry &entry)
 Writes the next PEFF entry to the file.
 
void writeEnd ()
 Closes the output file (called automatically in destructor)
 
- Public Member Functions inherited from ProgressLogger
 ProgressLogger ()
 Constructor.
 
virtual ~ProgressLogger ()
 Destructor.
 
 ProgressLogger (const ProgressLogger &other)
 Copy constructor.
 
ProgressLoggeroperator= (const ProgressLogger &other)
 Assignment Operator.
 
void setLogType (LogType type) const
 Sets the progress log that should be used. The default type is NONE!
 
LogType getLogType () const
 Returns the type of progress log being used.
 
void setLogger (ProgressLoggerImpl *logger)
 Sets the logger to be used for progress logging.
 
void startProgress (SignedSize begin, SignedSize end, const String &label) const
 Initializes the progress display.
 
void setProgress (SignedSize value) const
 Sets the current progress.
 
void endProgress (UInt64 bytes_processed=0) const
 
void nextProgress () const
 increment progress by 1 (according to range begin-end)
 

Static Public Member Functions

static bool isPEFFFile (const String &filename)
 Checks if a file appears to be a PEFF file (by checking for # PEFF header).
 
static String toProForma (const PEFFEntry &entry)
 Converts a PEFF entry to ProForma notation.
 

Protected Member Functions

void parseHeaderLine_ (const String &line, PEFFDatabaseMetadata &header, bool &new_db)
 Parse a header line (# Key=Value or # //)
 
void parseAnnotations_ (const String &description, PEFFEntry &entry)
 Parse annotations from the description line.
 
PEFFModification parseModification_ (const String &tuple)
 Parse a single modification tuple.
 
PEFFVariantSimple parseVariantSimple_ (const String &tuple)
 Parse a simple variant tuple.
 
PEFFVariantComplex parseVariantComplex_ (const String &tuple)
 Parse a complex variant tuple.
 
PEFFProcessedRegion parseProcessedRegion_ (const String &tuple)
 Parse a processed region tuple.
 
PEFFDisulfideBond parseDisulfideBond_ (const String &tuple)
 Parse a disulfide bond tuple.
 
std::vector< StringparseParenList_ (const String &value)
 Parse a parenthesized list of values.
 
String formatHeader_ (const PEFFDatabaseMetadata &header) const
 Format the header section for output.
 
String formatHeader_ (const std::vector< PEFFDatabaseMetadata > &headers) const
 Format the header section for output (multiple database blocks)
 
String formatEntry_ (const PEFFEntry &entry) const
 Format a single entry for output.
 
bool readEntry_ (std::string &id, std::string &description, std::string &seq)
 Read entry data (identifier, description, sequence)
 

Protected Attributes

std::fstream infile_
 Input file stream.
 
std::ofstream outfile_
 Output file stream.
 
std::vector< PEFFDatabaseMetadataheaders_
 Parsed headers.
 
Size entries_read_ {0}
 Number of entries read.
 
std::streampos fileSize_ {0}
 File size for progress.
 
std::string seq_
 Current sequence buffer.
 
std::string id_
 Current identifier buffer.
 
std::string description_
 Current description buffer.
 
- Protected Attributes inherited from ProgressLogger
LogType type_
 
time_t last_invoke_
 
ProgressLoggerImplcurrent_logger_
 

Additional Inherited Members

- Public Types inherited from ProgressLogger
enum  LogType { CMD , GUI , NONE }
 Possible log types. More...
 
- Static Protected Attributes inherited from ProgressLogger
static int recursion_depth_
 

Detailed Description

This class serves for reading and writing PEFF (PSI Extended FASTA Format) files.

PEFF extends FASTA with rich annotations for modifications, variants, processed regions, and proteoforms. See https://github.com/HUPO-PSI/PEFF for the specification.

You can use aggregate methods load() and store() to read/write a set of protein sequences at the cost of memory.

Or use single read/write of protein sequences using readStart(), readNext() and writeStart(), writeNext(), writeEnd() for more memory efficiency.

Constructor & Destructor Documentation

◆ PEFFFile()

PEFFFile ( )
default

Default constructor.

◆ ~PEFFFile()

~PEFFFile ( )
overridedefault

Destructor.

Member Function Documentation

◆ atEnd()

bool atEnd ( ) const

Returns true if the end of the file has been reached.

◆ formatEntry_()

String formatEntry_ ( const PEFFEntry entry) const
protected

Format a single entry for output.

◆ formatHeader_() [1/2]

String formatHeader_ ( const PEFFDatabaseMetadata header) const
protected

Format the header section for output.

◆ formatHeader_() [2/2]

String formatHeader_ ( const std::vector< PEFFDatabaseMetadata > &  headers) const
protected

Format the header section for output (multiple database blocks)

◆ getHeaders()

const std::vector< PEFFDatabaseMetadata > & getHeaders ( ) const

Returns the headers parsed during readStart().

Headers are available after calling readStart().

◆ isPEFFFile()

static bool isPEFFFile ( const String filename)
static

Checks if a file appears to be a PEFF file (by checking for # PEFF header).

Parameters
filenameThe file to check
Returns
true if the file starts with PEFF headers

◆ load()

void load ( const String filename,
std::vector< PEFFEntry > &  entries,
std::vector< PEFFDatabaseMetadata > &  headers 
) const

Loads a PEFF file and stores entries and headers.

Parameters
filenameThe PEFF file to load
entriesOutput vector for PEFF entries
headersOutput vector for database metadata (one per database in file)
Exceptions
Exception::FileNotFoundis thrown if the file does not exist.
Exception::ParseErroris thrown if the file format is invalid.

◆ parseAnnotations_()

void parseAnnotations_ ( const String description,
PEFFEntry entry 
)
protected

Parse annotations from the description line.

◆ parseDisulfideBond_()

PEFFDisulfideBond parseDisulfideBond_ ( const String tuple)
protected

Parse a disulfide bond tuple.

◆ parseHeaderLine_()

void parseHeaderLine_ ( const String line,
PEFFDatabaseMetadata header,
bool &  new_db 
)
protected

Parse a header line (# Key=Value or # //)

◆ parseModification_()

PEFFModification parseModification_ ( const String tuple)
protected

Parse a single modification tuple.

◆ parseParenList_()

std::vector< String > parseParenList_ ( const String value)
protected

Parse a parenthesized list of values.

◆ parseProcessedRegion_()

PEFFProcessedRegion parseProcessedRegion_ ( const String tuple)
protected

Parse a processed region tuple.

◆ parseVariantComplex_()

PEFFVariantComplex parseVariantComplex_ ( const String tuple)
protected

Parse a complex variant tuple.

◆ parseVariantSimple_()

PEFFVariantSimple parseVariantSimple_ ( const String tuple)
protected

Parse a simple variant tuple.

◆ readEntry_()

bool readEntry_ ( std::string &  id,
std::string &  description,
std::string &  seq 
)
protected

Read entry data (identifier, description, sequence)

◆ readNext()

bool readNext ( PEFFEntry entry)

Reads the next PEFF entry from the file.

Parameters
entryOutput for the next entry
Returns
true if an entry was read, false if EOF was reached
Exceptions
Exception::ParseErroris thrown if parsing fails.

◆ readStart()

void readStart ( const String filename)

Prepares a PEFF file for streamed reading using readNext().

Parameters
filenameThe PEFF file to read
Exceptions
Exception::FileNotFoundis thrown if the file does not exist.
Exception::FileNotReadableis thrown if the file cannot be read.

◆ store() [1/2]

void store ( const String filename,
const std::vector< PEFFEntry > &  entries,
const PEFFDatabaseMetadata header 
) const

Stores entries to a PEFF file with the given header.

Parameters
filenameThe output file path
entriesThe entries to store
headerThe database metadata header
Exceptions
Exception::UnableToCreateFileis thrown if the file cannot be created.

◆ store() [2/2]

void store ( const String filename,
const std::vector< PEFFEntry > &  entries,
const std::vector< PEFFDatabaseMetadata > &  headers 
) const

Stores entries to a PEFF file with multiple database headers.

Writes a single file description block followed by one sequence database description block per provided header, then all entries.

Parameters
filenameThe output file path
entriesThe entries to store
headersThe database metadata headers (one per database in file)
Exceptions
Exception::UnableToCreateFileis thrown if the file cannot be created.
Exception::InvalidParameteris thrown if headers is empty.

◆ toProForma()

static String toProForma ( const PEFFEntry entry)
static

Converts a PEFF entry to ProForma notation.

Parameters
entryThe PEFF entry to convert
Returns
ProForma string representation

◆ writeEnd()

void writeEnd ( )

Closes the output file (called automatically in destructor)

◆ writeNext()

void writeNext ( const PEFFEntry entry)

Writes the next PEFF entry to the file.

Parameters
entryThe entry to write

◆ writeStart() [1/2]

void writeStart ( const String filename,
const PEFFDatabaseMetadata header 
)

Prepares a PEFF file for streamed writing using writeNext().

Parameters
filenameThe output file path
headerThe database metadata header
Exceptions
Exception::UnableToCreateFileis thrown if the file cannot be created.

◆ writeStart() [2/2]

void writeStart ( const String filename,
const std::vector< PEFFDatabaseMetadata > &  headers 
)

Prepares a PEFF file for streamed writing using writeNext(), with multiple headers.

Writes a single file description block followed by one database description block per header.

Parameters
filenameThe output file path
headersThe database metadata headers (one per database in file)
Exceptions
Exception::UnableToCreateFileis thrown if the file cannot be created.
Exception::InvalidParameteris thrown if headers is empty.

Member Data Documentation

◆ description_

std::string description_
protected

Current description buffer.

◆ entries_read_

Size entries_read_ {0}
protected

Number of entries read.

◆ fileSize_

std::streampos fileSize_ {0}
protected

File size for progress.

◆ headers_

std::vector<PEFFDatabaseMetadata> headers_
protected

Parsed headers.

◆ id_

std::string id_
protected

Current identifier buffer.

◆ infile_

std::fstream infile_
protected

Input file stream.

◆ outfile_

std::ofstream outfile_
protected

Output file stream.

◆ seq_

std::string seq_
protected

Current sequence buffer.