![]() |
OpenMS
|
Generates from a set of Fasta files a 2D-datastructure which stores all theoretical masses of all b and y ions from all peptides generated from the Fasta file. The datastructure is build such that on one axis the fragments are sorted by their own mass and the axis by the mass of their precursor/protein. The FI has two options: Bottom-up and Top Down. In later digestion is skiped and the fragments have a direct reference to the mass of the proteins instead of digested peptides. More...
#include <OpenMS/ANALYSIS/ID/FragmentIndex.h>
Classes | |
| struct | Fragment |
| One entry in the fragment index. More... | |
| struct | Hit |
| struct | IonOffsets |
| Precomputed ion-type mass offsets (from Residue::getInternalTo*Ion formulas) More... | |
| struct | ModSlot |
| A candidate modification slot for a specific peptide. More... | |
| struct | Peptide |
| Compact descriptor of a peptide instance held by the FragmentIndex. More... | |
| struct | SpectrumMatch |
| Match between a query peak and an entry in the DB. More... | |
| struct | SpectrumMatchesTopN |
| container for SpectrumMatch. Also keeps count of total number of candidates and total number of matches. More... | |
| struct | VarModEntry |
| Entry in the per-AA variable modification lookup table. More... | |
Public Member Functions | |
| FragmentIndex () | |
| Default constructor. | |
| ~FragmentIndex () override=default | |
| Default destructor. | |
| bool | isBuild () const |
| Indicates whether the fragment index has been built. | |
| const std::vector< Peptide > & | getPeptides () const |
| Returns a reference to the internal peptide container. | |
| void | build (const std::vector< FASTAFile::FASTAEntry > &fasta_entries) |
| Given a set of Fasta files, builds the Fragment Index datastructure (FID). First all fragments are sorted by their own mass. Next they are placed in buckets. The min-fragment mass is stored for each bucket, whereupon the fragments are sorted within the buckets by their originating precursor mass. | |
| void | clear () |
| Delete fragment index. Sets is_build=false. | |
| std::pair< size_t, size_t > | getPeptidesInMassWindow (float precursor_mass, const std::pair< float, float > &window) const |
| std::vector< Hit > | query (const Peak1D &peak, const std::pair< size_t, size_t > &peptide_idx_range, uint16_t peak_charge) |
| Queries one peak. | |
| void | querySpectrum (const MSSpectrum &spectrum, SpectrumMatchesTopN &sms) |
| : queries one complete experimental spectra against the Database. Loops over all precursor charges Starts at min_precursor_charge and iteratively goes to max_precursor_charge. We query all peaks multiple times with all the different precursor charges and corresponding precursor masses | |
| AASequence | reconstructModifiedSequence (const Peptide &peptide, const std::vector< FASTAFile::FASTAEntry > &fasta_entries) const |
| Reconstruct a fully modified AASequence from a Peptide's bitmask. | |
Public Member Functions inherited from DefaultParamHandler | |
| DefaultParamHandler (const String &name) | |
| Constructor with name that is displayed in error messages. | |
| DefaultParamHandler (const DefaultParamHandler &rhs) | |
| Copy constructor. | |
| virtual | ~DefaultParamHandler () |
| Destructor. | |
| DefaultParamHandler & | operator= (const DefaultParamHandler &rhs) |
| Assignment operator. | |
| virtual bool | operator== (const DefaultParamHandler &rhs) const |
| Equality operator. | |
| void | setParameters (const Param ¶m) |
| Sets the parameters. | |
| const Param & | getParameters () const |
| Non-mutable access to the parameters. | |
| const Param & | getDefaults () const |
| Non-mutable access to the default parameters. | |
| const String & | getName () const |
| Non-mutable access to the name. | |
| void | setName (const String &name) |
| Mutable access to the name. | |
| const std::vector< String > & | getSubsections () const |
| Non-mutable access to the registered subsections. | |
Static Public Member Functions | |
| static bool | isOpenSearchMode (double lower_magnitude, double upper_magnitude, bool unit_ppm) noexcept |
Static Public Member Functions inherited from DefaultParamHandler | |
| static void | writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="") |
| Writes all parameters to meta values. | |
Protected Member Functions | |
| void | updateMembers_ () override |
| This method is used to update extra member variables at the end of the setParameters() method. | |
| void | generatePeptides (const std::vector< FASTAFile::FASTAEntry > &fasta_entries) |
| Generates all peptides from given fasta entries. If Bottom-up is set to false skips digestion. If set to true the Digestion enzyme can be set in the parameters. Additionally introduces fixed and variable modifications for restrictive PSM search. | |
| void | initModificationTables_ () |
| size_t | buildModSlots_ (const char *sequence, size_t seq_len, ModSlot *out_slots, bool is_protein_nterm=false, bool is_protein_cterm=false) const |
| void | generateFragmentsLightweight_ (std::vector< Fragment > &fragments, const char *sequence, size_t seq_len, UInt32 peptide_idx, double n_term_mod_mass, double c_term_mod_mass, const double *residue_mod_masses) const |
Protected Member Functions inherited from DefaultParamHandler | |
| void | defaultsToParam_ () |
| Updates the parameters after the defaults have been set in the constructor. | |
Static Protected Member Functions | |
| static void | initResidueMassTable_ () |
Protected Attributes | |
| bool | is_build_ {false} |
| true, if the database has been populated with fragments | |
| std::array< double, 128 > | fixed_mod_deltas_ {} |
| Per-AA fixed modification delta mass (0.0 if no fixed mod applies) | |
| std::array< const ResidueModification *, 128 > | fixed_mod_ptrs_ {} |
| Per-AA fixed modification pointer (nullptr if none) | |
| double | fixed_nterm_delta_ {0.0} |
| Fixed N-terminal mod delta (0 if none) | |
| double | fixed_cterm_delta_ {0.0} |
| Fixed C-terminal mod delta (0 if none) | |
| const ResidueModification * | fixed_nterm_mod_ptr_ {nullptr} |
| const ResidueModification * | fixed_cterm_mod_ptr_ {nullptr} |
| std::array< std::vector< VarModEntry >, 128 > | variable_mod_table_ {} |
| Per-AA variable modification table: for each ASCII char, list of possible variable mods. | |
| std::vector< VarModEntry > | variable_nterm_mods_ |
| Pure N-terminal variable mods (not residue-specific) | |
| std::vector< VarModEntry > | variable_cterm_mods_ |
| Pure C-terminal variable mods (not residue-specific) | |
| bool | mod_tables_initialized_ {false} |
| std::vector< Peptide > | fi_peptides_ |
| vector of all (digested) peptides | |
| std::vector< Fragment > | fi_fragments_ |
| vector of all theoretical fragments (b- and y- ions) | |
| float | fragment_min_mz_ |
| smallest fragment mz | |
| float | fragment_max_mz_ |
| largest fragment mz | |
| size_t | min_ion_index_ {0} |
| skip ions below this index (0=all, 2=skip b1/b2/y1/y2) | |
| size_t | bucketsize_ |
| number of fragments per outer node | |
| std::vector< float > | bucket_min_mz_ |
| vector of the smalles fragment mz of each bucket | |
| double | precursor_mass_tolerance_lower_ {20.0} |
| positive magnitude, effective lower bound is -lower | |
| double | precursor_mass_tolerance_upper_ {20.0} |
| positive magnitude, effective upper bound is +upper | |
| bool | precursor_mass_tolerance_unit_ppm_ {true} |
| float | fragment_mz_tolerance_ |
| bool | fragment_mz_tolerance_unit_ppm_ {true} |
Protected Attributes inherited from DefaultParamHandler | |
| Param | param_ |
| Container for current parameters. | |
| Param | defaults_ |
| Container for default parameters. This member should be filled in the constructor of derived classes! | |
| std::vector< String > | subsections_ |
| Container for registered subsections. This member should be filled in the constructor of derived classes! | |
| String | error_name_ |
| Name that is displayed in error messages during the parameter checking. | |
| bool | check_defaults_ |
| If this member is set to false no checking if parameters in done;. | |
| bool | warn_empty_defaults_ |
| If this member is set to false no warning is emitted when defaults are empty;. | |
Static Protected Attributes | |
| static constexpr size_t | MAX_MOD_SLOTS = 32 |
| max variable mod slots per peptide (uint32_t bitmask) | |
| static std::array< double, 128 > | residue_mass_table_ |
| static std::once_flag | mass_table_once_flag_ |
| static IonOffsets | ion_offsets_ |
Private Member Functions | |
| void | queryPeaks (SpectrumMatchesTopN &candidates, const MSSpectrum &spectrum, const std::pair< size_t, size_t > &candidates_range, const int16_t isotope_error, const uint16_t precursor_charge) |
| queries peaks for a given experimental spectrum with a set range of potential peptides, isotope error and precursor charge. Hits are transferred into a PSM list. Technically an adapter between query(...) and openSearch(...)/searchDifferentPrecursorRanges(...) | |
| void | searchDifferentPrecursorRanges (const MSSpectrum &spectrum, float precursor_mass, SpectrumMatchesTopN &sms, uint16_t charge) |
| If closed search loops over all isotope errors. For each iteration loop over all peaks with queryPeaks. | |
| void | trimHits (SpectrumMatchesTopN &init_hits) const |
| places the k-largest elements in the front of the input array. Inside of the k-largest elements and outside the elements are not sorted | |
| bool | isOpenSearchMode_ () const noexcept |
| Instance delegate — same rule, reads the member bounds. | |
| std::pair< float, float > | computeMassWindow_ (float precursor_mass) const |
Private Attributes | |
| bool | add_b_ions_ |
| bool | add_y_ions_ |
| bool | add_a_ions_ |
| bool | add_c_ions_ |
| bool | add_x_ions_ |
| bool | add_z_ions_ |
| std::string | digestion_enzyme_ |
| EnzymaticDigestion::Specificity | enzyme_specificity_ {EnzymaticDigestion::SPEC_FULL} |
| 'full' (default), 'semi' (semi-tryptic), or 'none' (e.g. immunopeptidomics) | |
| size_t | missed_cleavages_ |
| number of missed cleavages | |
| float | peptide_min_mass_ |
| float | peptide_max_mass_ |
| size_t | peptide_min_length_ |
| size_t | peptide_max_length_ |
| StringList | modifications_fixed_ |
| Modification that are one all peptides. | |
| StringList | modifications_variable_ |
| Variable Modification -> all possible comibnations are created. | |
| size_t | max_variable_mods_per_peptide_ |
| uint16_t | min_matched_peaks_ |
| PSM with less hits are discarded. | |
| int16_t | min_isotope_error_ |
| Minimal possible isotope error. | |
| int16_t | max_isotope_error_ |
| Maximal possible isotope error (both only used for closed search) | |
| uint16_t | min_precursor_charge_ |
| minimal possible precursor charge (usually always 1) | |
| uint16_t | max_precursor_charge_ |
| maximal possible precursor charge | |
| uint16_t | max_fragment_charge_ |
| The maximal possible charge of the fragments. | |
| uint32_t | max_processed_hits_ |
| The amount of PSM that will be used. the rest is filtered out. | |
Generates from a set of Fasta files a 2D-datastructure which stores all theoretical masses of all b and y ions from all peptides generated from the Fasta file. The datastructure is build such that on one axis the fragments are sorted by their own mass and the axis by the mass of their precursor/protein. The FI has two options: Bottom-up and Top Down. In later digestion is skiped and the fragments have a direct reference to the mass of the proteins instead of digested peptides.
| struct OpenMS::FragmentIndex::IonOffsets |
Precomputed ion-type mass offsets (from Residue::getInternalTo*Ion formulas)
| Class Members | ||
|---|---|---|
| double | a_offset {0.0} | |
| double | b_offset {0.0} | |
| double | c_offset {0.0} | |
| double | x_offset {0.0} | |
| double | y_offset {0.0} | |
| double | z_offset {0.0} | |
| struct OpenMS::FragmentIndex::SpectrumMatch |
Match between a query peak and an entry in the DB.
| struct OpenMS::FragmentIndex::VarModEntry |
Entry in the per-AA variable modification lookup table.
| Class Members | ||
|---|---|---|
| double | delta_mass | mass delta from this modification |
| const ResidueModification * | mod_ptr | pointer to the modification (for AASequence reconstruction) |
| TermSpecificity | term_spec | where this mod can be applied |
| FragmentIndex | ( | ) |
Default constructor.
Initializes an empty FragmentIndex. Call build() before using any query functions. After clear(), the index returns to this unbuilt state.
Thread-safety: constructing the object is thread-safe as long as the instance is not shared across threads before initialization completes.
|
overridedefault |
Default destructor.
Releases owned memory. If the index was built, all internal buffers and fragment buckets are freed. No exceptions are thrown.
| void build | ( | const std::vector< FASTAFile::FASTAEntry > & | fasta_entries | ) |
Given a set of Fasta files, builds the Fragment Index datastructure (FID). First all fragments are sorted by their own mass. Next they are placed in buckets. The min-fragment mass is stored for each bucket, whereupon the fragments are sorted within the buckets by their originating precursor mass.
| [in] | fasta_entries |
|
protected |
Scan a peptide sequence to find all variable modification slots. Returns the number of slots written to out_slots (at most MAX_MOD_SLOTS). Deterministic ordering: N-term pure-terminal mods, then left-to-right residue mods (ANYWHERE + position-specific terminal), then C-term pure-terminal mods.
| sequence | raw amino acid character array |
| seq_len | length of the sequence |
| out_slots | output array for modification slots (must have space for MAX_MOD_SLOTS entries) |
| is_protein_nterm | true if this peptide starts at protein position 0 |
| is_protein_cterm | true if this peptide ends at the last protein residue |
| void clear | ( | ) |
Delete fragment index. Sets is_build=false.
|
private |
Compute the signed mass window {lo, hi} around a precursor_mass, converting ppm → Da if the unit is ppm. lo is negative (or zero), hi is positive (or zero). This is the only place where positive member magnitudes become signed offsets.
|
protected |
Lightweight fragment generation: compute b/y ion m/z directly from amino acid chars. Bypasses AASequence::fromString and TheoreticalSpectrumGenerator.
| [out] | fragments | Output vector to append Fragment entries to |
| [in] | sequence | Raw amino acid string (no modifications) |
| [in] | seq_len | Length of sequence |
| [in] | peptide_idx | Index of this peptide in fi_peptides_ |
| [in] | n_term_mod_mass | Mass delta from N-terminal modification (0 if none) |
| [in] | c_term_mod_mass | Mass delta from C-terminal modification (0 if none) |
| [in] | residue_mod_masses | Per-residue modification mass deltas (nullptr if none; array of seq_len doubles) |
|
protected |
Generates all peptides from given fasta entries. If Bottom-up is set to false skips digestion. If set to true the Digestion enzyme can be set in the parameters. Additionally introduces fixed and variable modifications for restrictive PSM search.
| [in] | fasta_entries |
| const std::vector< Peptide > & getPeptides | ( | ) | const |
Returns a reference to the internal peptide container.
Provides read-only access to all peptides currently held by the index, typically populated during build().
Preconditions: The vector may be empty if build() has not been called yet. Thread-safety: read-only view; safe to access concurrently as long as no thread mutates the index (e.g., build()/clear()).
| std::pair< size_t, size_t > getPeptidesInMassWindow | ( | float | precursor_mass, |
| const std::pair< float, float > & | window | ||
| ) | const |
Return the [begin_idx, end_idx) peptide index range such that fi_peptides_[i].precursor_mz_ ∈ [precursor_mass + window.first, precursor_mass + window.second] for all i in the returned range.
| [in] | precursor_mass | The mono-charged precursor mass (M+H). |
| [in] | window | Signed absolute offsets around the precursor mass. By convention window.first is <= 0 and window.second is >= 0 (produced by computeMassWindow_). A reversed window trivially returns an empty range; no diagnostic is emitted. No hidden tolerance is added. |
fi_peptides_.
|
protected |
Build per-AA modification lookup tables from modifications_fixed_ and modifications_variable_. Called once at the start of generatePeptides().
|
staticprotected |
| bool isBuild | ( | ) | const |
Indicates whether the fragment index has been built.
Thread-safety: read-only and can be called concurrently with other read-only methods. Must not race with build()/clear() on the same instance.
|
inlinestaticnoexcept |
Shared auto-detection: open-search iff max(lower, upper) > threshold (1000 ppm or 1 Da). Strict >: exactly 1000 ppm stays closed. This is the single source of truth for the open-search auto-detection rule and is reused by ProSEAlgorithm and the TOPP tool.
|
inlineprivatenoexcept |
Instance delegate — same rule, reads the member bounds.
| std::vector< Hit > query | ( | const Peak1D & | peak, |
| const std::pair< size_t, size_t > & | peptide_idx_range, | ||
| uint16_t | peak_charge | ||
| ) |
Queries one peak.
| [in] | peak | The queried peak |
| [in] | peptide_idx_range | The range of precursors/peptides the peptide could potentially belongs to |
| [in] | peak_charge | The charge of the peak. Is used to calculate the mass from the mz |
|
private |
queries peaks for a given experimental spectrum with a set range of potential peptides, isotope error and precursor charge. Hits are transferred into a PSM list. Technically an adapter between query(...) and openSearch(...)/searchDifferentPrecursorRanges(...)
| [out] | candidates | The n best Spectrum matches |
| [in] | spectrum | The queried experimental spectrum |
| [in] | candidates_range | The range of precursors/peptides the peptide could potentially belong to |
| [in] | isotope_error | The applied isotope error |
| [in] | precursor_charge | The applied precursor charge |
| void querySpectrum | ( | const MSSpectrum & | spectrum, |
| SpectrumMatchesTopN & | sms | ||
| ) |
: queries one complete experimental spectra against the Database. Loops over all precursor charges Starts at min_precursor_charge and iteratively goes to max_precursor_charge. We query all peaks multiple times with all the different precursor charges and corresponding precursor masses
| [in] | spectrum | experimental spectrum |
| [out] | sms | The n best Spectrum matches |
| AASequence reconstructModifiedSequence | ( | const Peptide & | peptide, |
| const std::vector< FASTAFile::FASTAEntry > & | fasta_entries | ||
| ) | const |
Reconstruct a fully modified AASequence from a Peptide's bitmask.
Used for result output - only called for final hits (not in the build hot path). Applies fixed modifications, then uses the bitmask to determine which variable modifications are active at which positions.
| [in] | peptide | The Peptide descriptor with mod_bitmask_ |
| [in] | fasta_entries | The FASTA database used during build() |
|
private |
If closed search loops over all isotope errors. For each iteration loop over all peaks with queryPeaks.
If open search applies a precursor-mass window
| [in] | spectrum | experimental query-spectrum |
| [in] | precursor_mass | The mass of the precursor (mz * charge) |
| [out] | sms | The Top m SpectrumMatches |
| [in] | charge | Applied charge |
|
private |
places the k-largest elements in the front of the input array. Inside of the k-largest elements and outside the elements are not sorted
|
overrideprotectedvirtual |
This method is used to update extra member variables at the end of the setParameters() method.
Also call it at the end of the derived classes' copy constructor and assignment operator.
The default implementation is empty.
Reimplemented from DefaultParamHandler.
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
protected |
vector of the smalles fragment mz of each bucket
|
protected |
number of fragments per outer node
|
private |
|
private |
'full' (default), 'semi' (semi-tryptic), or 'none' (e.g. immunopeptidomics)
|
protected |
vector of all theoretical fragments (b- and y- ions)
|
protected |
vector of all (digested) peptides
|
protected |
Fixed C-terminal mod delta (0 if none)
|
protected |
|
protected |
Per-AA fixed modification delta mass (0.0 if no fixed mod applies)
|
protected |
Per-AA fixed modification pointer (nullptr if none)
|
protected |
Fixed N-terminal mod delta (0 if none)
|
protected |
|
protected |
largest fragment mz
|
protected |
smallest fragment mz
|
protected |
|
protected |
|
staticprotected |
|
protected |
true, if the database has been populated with fragments
|
staticprotected |
|
private |
The maximal possible charge of the fragments.
|
private |
Maximal possible isotope error (both only used for closed search)
|
staticconstexprprotected |
max variable mod slots per peptide (uint32_t bitmask)
|
private |
maximal possible precursor charge
|
private |
The amount of PSM that will be used. the rest is filtered out.
|
private |
|
protected |
skip ions below this index (0=all, 2=skip b1/b2/y1/y2)
|
private |
Minimal possible isotope error.
|
private |
PSM with less hits are discarded.
|
private |
minimal possible precursor charge (usually always 1)
|
private |
number of missed cleavages
|
protected |
|
private |
Modification that are one all peptides.
|
private |
Variable Modification -> all possible comibnations are created.
|
private |
|
private |
|
private |
|
private |
|
protected |
positive magnitude, effective lower bound is -lower
|
protected |
|
protected |
positive magnitude, effective upper bound is +upper
|
staticprotected |
Precomputed residue mass lookup table: ASCII char -> internal monoisotopic mass (Da). Indexed by single-letter amino acid code (e.g., 'A'=65). Entries for non-AA chars are 0.
|
protected |
Pure C-terminal variable mods (not residue-specific)
|
protected |
Per-AA variable modification table: for each ASCII char, list of possible variable mods.
|
protected |
Pure N-terminal variable mods (not residue-specific)