FASTAContainer<TFI_File> will make FASTA entries available chunk-wise from start to end by loading it from a FASTA file. This avoids having to load the full file into memory. While loading, the container will memorize the file offsets of each entry, allowing to read an arbitrary i'th entry again from disk. If possible, only entries from the currently cached chunk should be queried, otherwise access will be slow.
More...
#include <OpenMS/DATASTRUCTURES/FASTAContainer.h>
template<>
class OpenMS::FASTAContainer< TFI_File >
FASTAContainer<TFI_File> will make FASTA entries available chunk-wise from start to end by loading it from a FASTA file. This avoids having to load the full file into memory. While loading, the container will memorize the file offsets of each entry, allowing to read an arbitrary i'th entry again from disk. If possible, only entries from the currently cached chunk should be queried, otherwise access will be slow.
Internally uses FASTAFile class to read single sequences.
◆ FASTAContainer() [1/2]
◆ FASTAContainer() [2/2]
C'tor with FASTA filename.
◆ activateCache()
Swaps in the background cache of entries, read previously via cacheChunk()
If you call this function without a prior call to cacheChunk()
, the cache will be empty.
- Returns
- true if cache contains data; false if empty
- Note
- Should be invoked by a single thread, followed by a barrier to sync access of subsequent calls to chunkAt()
◆ cacheChunk()
bool cacheChunk |
( |
int |
suggested_size | ) |
|
|
inline |
Prefetch a new cache in the background, with up to suggestedSize
entries (or fewer upon reaching EOF)
Call activateCache()
afterwards to make the data available via chunkAt()
or readAt()
.
- Parameters
-
suggested_size | Number of FASTA entries to read from disk |
- Returns
- true if new data is available; false if background data is empty
◆ chunkAt()
Retrieve a FASTA entry at cache position pos
(fast)
Requires prior call to activateCache(). Index pos
must be smaller than chunkSize().
- Note
- : can be used by multiple threads at a time (until activateCache() is called)
◆ chunkSize()
size_t chunkSize |
( |
| ) |
const |
|
inline |
number of entries in active cache
◆ empty()
◆ getChunkOffset()
size_t getChunkOffset |
( |
| ) |
const |
|
inline |
how many entries were read and got swapped out already
◆ readAt()
Retrieve a FASTA entry at global position @pos (must not be behind the currently active chunk, but can be smaller)
This query is fast, if @pos hits the currently active chunk, and slow (read from disk) for earlier entries. Can be used before reaching the end of the file, since it will reset the file position after its done reading (if reading from disk is required), but must not be used for entries beyond the active chunk (unseen data).
- Parameters
-
protein | Return value |
pos | Absolute entry number in FASTA file |
- Returns
- true if reading was successful; false otherwise (e.g. EOF)
- Exceptions
-
- Note
- : not multi-threading safe (use chunkAt())!
◆ reset()
resets reading of the FASTA file, enables fresh reading of the FASTA from the beginning
◆ size()
NOT the number of entries in the FASTA file, but merely the number of already read entries (since we do not know how many are still to come)
- Note
- Data in the background cache is included here, i.e. access to size()-1 using readAt() might be slow if activateCache() was not called yet.
◆ chunk_offset_
number of entries before the current chunk
◆ data_bg_
prefetched (background) data; will become the next active data
◆ data_fg_
◆ f_
◆ offsets_
std::vector<std::streampos> offsets_ |
|
private |
internal byte offsets into FASTA file for random access reading of previous entries.