OpenMS
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Modules Pages
FASTAFile.h
Go to the documentation of this file.
1 // --------------------------------------------------------------------------
2 // OpenMS -- Open-Source Mass Spectrometry
3 // --------------------------------------------------------------------------
4 // Copyright The OpenMS Team -- Eberhard Karls University Tuebingen,
5 // ETH Zurich, and Freie Universitaet Berlin 2002-2023.
6 //
7 // This software is released under a three-clause BSD license:
8 // * Redistributions of source code must retain the above copyright
9 // notice, this list of conditions and the following disclaimer.
10 // * Redistributions in binary form must reproduce the above copyright
11 // notice, this list of conditions and the following disclaimer in the
12 // documentation and/or other materials provided with the distribution.
13 // * Neither the name of any author or any participating institution
14 // may be used to endorse or promote products derived from this software
15 // without specific prior written permission.
16 // For a full list of authors, refer to the file AUTHORS.
17 // --------------------------------------------------------------------------
18 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
19 // AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 // IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21 // ARE DISCLAIMED. IN NO EVENT SHALL ANY OF THE AUTHORS OR THE CONTRIBUTING
22 // INSTITUTIONS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
23 // EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
24 // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
25 // OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
26 // WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
27 // OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
28 // ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 //
30 // --------------------------------------------------------------------------
31 // $Maintainer: Chris Bielow $
32 // $Authors: Chris Bielow, Nora Wild $
33 // --------------------------------------------------------------------------
34 
35 #pragma once
36 
40 
41 #include <fstream>
42 #include <utility>
43 #include <vector>
44 
45 namespace OpenMS
46 {
60  class OPENMS_DLLAPI FASTAFile : public ProgressLogger
61  {
62  public:
71  struct FASTAEntry
72  {
76 
77  FASTAEntry() = default;
78 
79  FASTAEntry(const String& id, const String& desc, const String& seq) :
80  identifier(id),
81  description(desc),
82  sequence(seq)
83  {
84  }
85 
86  FASTAEntry(const FASTAEntry& rhs) = default;
87 
88  FASTAEntry(FASTAEntry&& rhs) noexcept
89  :
90  identifier(::std::move(rhs.identifier)),
91  description(::std::move(rhs.description)),
92  sequence(::std::move(rhs.sequence))
93  {
94  }
95 
96 
97  FASTAEntry& operator=(const FASTAEntry& rhs) = default;
98 
99  bool operator==(const FASTAEntry& rhs) const
100  {
101  return identifier == rhs.identifier
102  && description == rhs.description
103  && sequence == rhs.sequence;
104  }
105 
106  bool headerMatches(const FASTAEntry& rhs) const
107  {
108  return identifier == rhs.identifier &&
109  description == rhs.description;
110  }
111 
112  bool sequenceMatches(const FASTAEntry& rhs) const
113  {
114  return sequence == rhs.sequence;
115  }
116  };
117 
119  FASTAFile() = default;
120 
122  ~FASTAFile() override = default;
123 
129  void readStart(const String& filename);
130 
138  bool readNext(FASTAEntry& protein);
139 
141  std::streampos position();
142 
144  bool atEnd();
145 
147  bool setPosition(const std::streampos& pos);
148 
153  void writeStart(const String& filename);
154 
160  void writeNext(const FASTAEntry& protein);
161 
165  void writeEnd();
166 
167 
174  void load(const String& filename, std::vector<FASTAEntry>& data) const;
175 
182  void store(const String& filename, const std::vector<FASTAEntry>& data) const;
183 
184  protected:
189  bool readEntry_(std::string& id, std::string& description, std::string& seq);
190 
191  std::fstream infile_;
192  std::ofstream outfile_;
193  Size entries_read_{0};
194  std::streampos fileSize_{};
195  std::string seq_;
196  std::string id_;
197  std::string description_;
198  };
199 
200 } // namespace OpenMS
This class serves for reading in and writing FASTA files If the protein/gene sequence contains unusua...
Definition: FASTAFile.h:61
void writeEnd()
Closes the file (flush). Called implicitly when FASTAFile object goes out of scope.
void load(const String &filename, std::vector< FASTAEntry > &data) const
loads a FASTA file given by 'filename' and stores the information in 'data' This uses more RAM than r...
bool readNext(FASTAEntry &protein)
Reads the next FASTA entry from file. If you want to read all entries in one go, use load().
bool setPosition(const std::streampos &pos)
seek stream to pos
std::string seq_
sequence of currently read protein
Definition: FASTAFile.h:195
void readStart(const String &filename)
Prepares a FASTA file given by 'filename' for streamed reading using readNext().
~FASTAFile() override=default
Destructor.
void writeStart(const String &filename)
Prepares a FASTA file given by 'filename' for streamed writing using writeNext().
std::ofstream outfile_
filestream for writing; init using FastaFile::writeStart()
Definition: FASTAFile.h:192
FASTAFile()=default
Default constructor.
bool atEnd()
is stream at EOF?
bool readEntry_(std::string &id, std::string &description, std::string &seq)
Reads a protein entry from the current file position and returns the ID and sequence.
std::string id_
identifier of currently read protein
Definition: FASTAFile.h:196
void writeNext(const FASTAEntry &protein)
Stores the data given by protein. Call writeStart() once before calling writeNext()....
std::fstream infile_
filestream for reading; init using FastaFile::readStart()
Definition: FASTAFile.h:191
std::string description_
description of currently read protein
Definition: FASTAFile.h:197
void store(const String &filename, const std::vector< FASTAEntry > &data) const
stores the data given by 'data' at the file 'filename'
std::streampos position()
current stream position
Base class for all classes that want to report their progress.
Definition: ProgressLogger.h:53
A more convenient string class.
Definition: String.h:60
size_t Size
Size type e.g. used as variable which can hold result of size()
Definition: Types.h:127
Main OpenMS namespace.
Definition: FeatureDeconvolution.h:48
FASTA entry type (identifier, description and sequence) The first String corresponds to the identifie...
Definition: FASTAFile.h:72
bool headerMatches(const FASTAEntry &rhs) const
Definition: FASTAFile.h:106
String sequence
Definition: FASTAFile.h:75
String description
Definition: FASTAFile.h:74
FASTAEntry(const String &id, const String &desc, const String &seq)
Definition: FASTAFile.h:79
FASTAEntry(const FASTAEntry &rhs)=default
bool operator==(const FASTAEntry &rhs) const
Definition: FASTAFile.h:99
String identifier
Definition: FASTAFile.h:73
FASTAEntry & operator=(const FASTAEntry &rhs)=default
bool sequenceMatches(const FASTAEntry &rhs) const
Definition: FASTAFile.h:112
FASTAEntry(FASTAEntry &&rhs) noexcept
Definition: FASTAFile.h:88