This class serves for reading in and writing FASTA files If the protein/gene sequence contains unusual symbols (such as translation end (*)), they will be kept! You can use aggregate methods load() and store() to read/write a set of protein sequences at the cost of memory. More...
#include <OpenMS/FORMAT/FASTAFile.h>
Classes | |
struct | FASTAEntry |
FASTA entry type (identifier, description and sequence) The first String corresponds to the identifier that is written after the > in the FASTA file. The part after the first whitespace is stored in description and the text from the next line until the next > (exclusive) is stored in sequence. More... | |
Public Member Functions | |
FASTAFile ()=default | |
Default constructor. More... | |
~FASTAFile () override=default | |
Destructor. More... | |
void | readStart (const String &filename) |
Prepares a FASTA file given by 'filename' for streamed reading using readNext(). More... | |
void | readStartWithProgress (const String &filename, const String &progress_label) |
same as readStart(), but does internal progress logging whenever readNextWithProgress() is called More... | |
bool | readNext (FASTAEntry &protein) |
Reads the next FASTA entry from file. If you want to read all entries in one go, use load(). More... | |
bool | readNextWithProgress (FASTAEntry &protein) |
std::streampos | position () |
current stream position when reading a file More... | |
bool | atEnd () |
is stream at EOF? More... | |
bool | setPosition (const std::streampos &pos) |
seek stream to pos More... | |
void | writeStart (const String &filename) |
Prepares a FASTA file given by 'filename' for streamed writing using writeNext(). More... | |
void | writeNext (const FASTAEntry &protein) |
Stores the data given by protein . Call writeStart() once before calling writeNext(). Call writeEnd() when done to close the file! More... | |
void | writeEnd () |
Closes the file (flush). Called implicitly when FASTAFile object goes out of scope. More... | |
void | load (const String &filename, std::vector< FASTAEntry > &data) const |
loads a FASTA file given by 'filename' and stores the information in 'data' This uses more RAM than readStart() and readNext(). More... | |
void | store (const String &filename, const std::vector< FASTAEntry > &data) const |
stores the data given by 'data' at the file 'filename' More... | |
Public Member Functions inherited from ProgressLogger | |
ProgressLogger () | |
Constructor. More... | |
virtual | ~ProgressLogger () |
Destructor. More... | |
ProgressLogger (const ProgressLogger &other) | |
Copy constructor. More... | |
ProgressLogger & | operator= (const ProgressLogger &other) |
Assignment Operator. More... | |
void | setLogType (LogType type) const |
Sets the progress log that should be used. The default type is NONE! More... | |
LogType | getLogType () const |
Returns the type of progress log being used. More... | |
void | setLogger (ProgressLoggerImpl *logger) |
Sets the logger to be used for progress logging. More... | |
void | startProgress (SignedSize begin, SignedSize end, const String &label) const |
Initializes the progress display. More... | |
void | setProgress (SignedSize value) const |
Sets the current progress. More... | |
void | endProgress (UInt64 bytes_processed=0) const |
void | nextProgress () const |
increment progress by 1 (according to range begin-end) More... | |
Protected Member Functions | |
bool | readEntry_ (std::string &id, std::string &description, std::string &seq) |
Reads a protein entry from the current file position and returns the ID and sequence. More... | |
Protected Attributes | |
std::fstream | infile_ |
filestream for reading; init using FastaFile::readStart() More... | |
std::ofstream | outfile_ |
filestream for writing; init using FastaFile::writeStart() More... | |
Size | entries_read_ {0} |
some internal book-keeping during reading More... | |
std::streampos | fileSize_ {} |
total number of characters of filestream More... | |
std::string | seq_ |
sequence of currently read protein More... | |
std::string | id_ |
identifier of currently read protein More... | |
std::string | description_ |
description of currently read protein More... | |
Protected Attributes inherited from ProgressLogger | |
LogType | type_ |
time_t | last_invoke_ |
ProgressLoggerImpl * | current_logger_ |
Additional Inherited Members | |
Public Types inherited from ProgressLogger | |
enum | LogType { CMD , GUI , NONE } |
Possible log types. More... | |
Static Protected Attributes inherited from ProgressLogger | |
static int | recursion_depth_ |
This class serves for reading in and writing FASTA files If the protein/gene sequence contains unusual symbols (such as translation end (*)), they will be kept! You can use aggregate methods load() and store() to read/write a set of protein sequences at the cost of memory.
Or use single read/write of protein sequences using readStart(), readNext() and writeStart(), writeNext(), writeEnd() for more memory efficiency. Reading from one and writing to another FASTA file can be handled by one single FASTAFile instance.
|
default |
Default constructor.
|
overridedefault |
Destructor.
bool atEnd | ( | ) |
is stream at EOF?
void load | ( | const String & | filename, |
std::vector< FASTAEntry > & | data | ||
) | const |
loads a FASTA file given by 'filename' and stores the information in 'data' This uses more RAM than readStart() and readNext().
Exception::FileNotFound | is thrown if the file does not exists. |
Exception::ParseError | is thrown if the file does not suit to the standard. |
Referenced by NucleicAcidSearchEngine::main_().
std::streampos position | ( | ) |
current stream position when reading a file
|
protected |
Reads a protein entry from the current file position and returns the ID and sequence.
bool readNext | ( | FASTAEntry & | protein | ) |
Reads the next FASTA entry from file. If you want to read all entries in one go, use load().
Exception::FileNotFound | is thrown if the file does not exists. |
Exception::ParseError | is thrown if the file does not suit to the standard. |
bool readNextWithProgress | ( | FASTAEntry & | protein | ) |
same as readNext(), but does internal progress logging; use readStartWithProgress() to enable this Calls progressEnd() when EOF is reached (i.e. when returning false)
void readStart | ( | const String & | filename | ) |
Prepares a FASTA file given by 'filename' for streamed reading using readNext().
Exception::FileNotFound | is thrown if the file does not exists. |
Exception::ParseError | is thrown if the file does not suit to the standard. |
same as readStart(), but does internal progress logging whenever readNextWithProgress() is called
bool setPosition | ( | const std::streampos & | pos | ) |
seek stream to pos
void store | ( | const String & | filename, |
const std::vector< FASTAEntry > & | data | ||
) | const |
stores the data given by 'data' at the file 'filename'
This uses more RAM than writeStart() and writeNext().
Exception::UnableToCreateFile | is thrown if the process is not able to write the file. |
void writeEnd | ( | ) |
Closes the file (flush). Called implicitly when FASTAFile object goes out of scope.
void writeNext | ( | const FASTAEntry & | protein | ) |
Stores the data given by protein
. Call writeStart() once before calling writeNext(). Call writeEnd() when done to close the file!
Exception::UnableToCreateFile | is thrown if the process is not able to write the file. |
void writeStart | ( | const String & | filename | ) |
Prepares a FASTA file given by 'filename' for streamed writing using writeNext().
Exception::UnableToCreateFile | is thrown if the process is not able to write to the file (disk full?). |
|
protected |
description of currently read protein
|
protected |
some internal book-keeping during reading
|
protected |
total number of characters of filestream
|
protected |
identifier of currently read protein
|
protected |
filestream for reading; init using FastaFile::readStart()
|
protected |
filestream for writing; init using FastaFile::writeStart()
|
protected |
sequence of currently read protein