OpenMS
Loading...
Searching...
No Matches
FragmentIndex Class Reference

Generates from a set of Fasta files a 2D-datastructure which stores all theoretical masses of all b and y ions from all peptides generated from the Fasta file. The datastructure is build such that on one axis the fragments are sorted by their own mass and the axis by the mass of their precursor/protein. The FI has two options: Bottom-up and Top Down. In later digestion is skiped and the fragments have a direct reference to the mass of the proteins instead of digested peptides. More...

#include <OpenMS/ANALYSIS/ID/FragmentIndex.h>

Inheritance diagram for FragmentIndex:
[legend]
Collaboration diagram for FragmentIndex:
[legend]

Classes

struct  Fragment
 One entry in the fragment index. More...
 
struct  Hit
 
struct  IonOffsets
 Precomputed ion-type mass offsets (from Residue::getInternalTo*Ion formulas) More...
 
struct  ModSlot
 A candidate modification slot for a specific peptide. More...
 
struct  Peptide
 Compact descriptor of a peptide instance held by the FragmentIndex. More...
 
struct  SpectrumMatch
 Match between a query peak and an entry in the DB. More...
 
struct  SpectrumMatchesTopN
 container for SpectrumMatch. Also keeps count of total number of candidates and total number of matches. More...
 
struct  VarModEntry
 Entry in the per-AA variable modification lookup table. More...
 

Public Member Functions

 FragmentIndex ()
 Default constructor.
 
 ~FragmentIndex () override=default
 Default destructor.
 
bool isBuild () const
 Indicates whether the fragment index has been built.
 
const std::vector< Peptide > & getPeptides () const
 Returns a reference to the internal peptide container.
 
Size getNumFragments () const noexcept
 Number of theoretical fragments stored in the index (0 before build()).
 
void build (const std::vector< FASTAFile::FASTAEntry > &fasta_entries)
 Given a set of Fasta files, builds the Fragment Index datastructure (FID). First all fragments are sorted by their own mass. Next they are placed in buckets. The min-fragment mass is stored for each bucket, whereupon the fragments are sorted within the buckets by their originating precursor mass.
 
void clear ()
 Delete fragment index. Sets is_build=false.
 
std::pair< size_t, size_t > getPeptidesInMassWindow (float precursor_mass, const std::pair< float, float > &window) const
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const std::string &name)
 Constructor with name that is displayed in error messages.
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor.
 
virtual ~DefaultParamHandler ()
 Destructor.
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator.
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator.
 
void setParameters (const Param &param)
 Sets the parameters.
 
const ParamgetParameters () const
 Non-mutable access to the parameters.
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters.
 
const std::string & getName () const
 Non-mutable access to the name.
 
void setName (const std::string &name)
 Mutable access to the name.
 
const std::vector< std::string > & getSubsections () const
 Non-mutable access to the registered subsections.
 

Static Public Member Functions

static bool isOpenSearchMode (double lower_magnitude, double upper_magnitude, bool unit_ppm) noexcept
 
- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const std::string &key_prefix="")
 Writes all parameters to meta values.
 

SNES (Speedy Non-specific Enzyme Search) bit encoding

When the index is built in SNES mode (isSnesMode), a Peptide entry represents a mother peptide — the longest peptide anchored at one terminus of the protein. Bit 31 of Peptide::mod_bitmask_ distinguishes Single-N mothers (N-terminus anchored, C-terminus free, b-ion series indexed) from Single-C mothers (C-terminus anchored, N-terminus free, y-ion series indexed). SNES v1 does not enumerate variable modifications on mothers, so the lower 31 bits are currently unused (reserved for variable-mod-on-mother support in a future version).

When the index is built in the default (non-SNES) mode, mod_bitmask_ uses all 32 bits as variable-modification slots (existing semantics) and these accessors are not consulted.

Rationale for stealing a single bit from the bitmask instead of adding a field to Peptide: zero runtime overhead and zero size impact for the common full-specificity trypsin case — the bit is only interpreted when the index dispatcher has already branched on is_snes_mode_.

enum class  SnesAnchor { NONE , PROT_NTERM , PROT_CTERM }
 
static constexpr uint32_t SNES_KIND_BIT_MASK = 1u << 31
 bit 31; set = Single-C mother
 
static constexpr uint32_t SNES_SLOT_MASK = ~SNES_KIND_BIT_MASK
 bits 0..30 in SNES mode
 
bool is_build_ {false}
 true, if the database has been populated with fragments
 
std::array< double, 128 > fixed_mod_deltas_ {}
 Per-AA fixed modification delta mass (0.0 if no fixed mod applies)
 
std::array< const ResidueModification *, 128 > fixed_mod_ptrs_ {}
 Per-AA fixed modification pointer (nullptr if none)
 
double fixed_nterm_delta_ {0.0}
 Fixed N-terminal mod delta (0 if none)
 
double fixed_cterm_delta_ {0.0}
 Fixed C-terminal mod delta (0 if none)
 
const ResidueModificationfixed_nterm_mod_ptr_ {nullptr}
 
const ResidueModificationfixed_cterm_mod_ptr_ {nullptr}
 
std::array< std::vector< VarModEntry >, 128 > variable_mod_table_ {}
 Per-AA variable modification table: for each ASCII char, list of possible variable mods.
 
std::vector< VarModEntryvariable_nterm_mods_
 Pure N-terminal variable mods (not residue-specific)
 
std::vector< VarModEntryvariable_cterm_mods_
 Pure C-terminal variable mods (not residue-specific)
 
bool mod_tables_initialized_ {false}
 
bool is_snes_mode_ {false}
 
bool snes_enabled_ {false}
 
std::vector< double > snes_sigma_delta_set_
 
std::vector< double > snes_sigma_delta_set_with_prot_nterm_
 
std::vector< double > snes_sigma_delta_set_with_prot_cterm_
 
std::vector< Peptidefi_peptides_
 vector of all (digested) peptides
 
std::vector< Fragmentfi_fragments_
 vector of all theoretical fragments (b- and y- ions)
 
std::vector< uint32_t > protein_lengths_
 
float fragment_min_mz_
 smallest fragment mz
 
float fragment_max_mz_
 largest fragment mz
 
size_t min_ion_index_ {0}
 skip ions below this index (0=all, 2=skip b1/b2/y1/y2)
 
size_t bucketsize_
 number of fragments per outer node
 
std::vector< float > bucket_min_mz_
 vector of the smalles fragment mz of each bucket
 
double precursor_mass_tolerance_lower_ {20.0}
 positive magnitude, effective lower bound is -lower
 
double precursor_mass_tolerance_upper_ {20.0}
 positive magnitude, effective upper bound is +upper
 
bool precursor_mass_tolerance_unit_ppm_ {true}
 
float fragment_mz_tolerance_
 
bool fragment_mz_tolerance_unit_ppm_ {true}
 
static constexpr size_t MAX_MOD_SLOTS = 32
 max variable mod slots per peptide (uint32_t bitmask)
 
static std::array< double, 128 > residue_mass_table_
 
static std::once_flag mass_table_once_flag_
 
static IonOffsets ion_offsets_
 
bool add_b_ions_
 
bool add_y_ions_
 
bool add_a_ions_
 
bool add_c_ions_
 
bool add_x_ions_
 
bool add_z_ions_
 
std::string digestion_enzyme_
 
EnzymaticDigestion::Specificity enzyme_specificity_ {EnzymaticDigestion::SPEC_FULL}
 'full' (default), 'semi' (semi-tryptic), or 'none' (e.g. immunopeptidomics)
 
size_t missed_cleavages_
 number of missed cleavages
 
float peptide_min_mass_
 
float peptide_max_mass_
 
size_t peptide_min_length_
 
size_t peptide_max_length_
 
StringList modifications_fixed_
 Modification that are one all peptides.
 
StringList modifications_variable_
 Variable Modification -> all possible comibnations are created.
 
size_t max_variable_mods_per_peptide_
 
uint16_t min_matched_peaks_
 PSM with less hits are discarded.
 
int16_t min_isotope_error_
 Minimal possible isotope error.
 
int16_t max_isotope_error_
 Maximal possible isotope error (both only used for closed search)
 
uint16_t min_precursor_charge_
 minimal possible precursor charge (usually always 1)
 
uint16_t max_precursor_charge_
 maximal possible precursor charge
 
uint16_t max_fragment_charge_
 The maximal possible charge of the fragments.
 
uint32_t max_processed_hits_
 The amount of PSM that will be used. the rest is filtered out.
 
bool isSnesMode () const noexcept
 
std::vector< Hitquery (const Peak1D &peak, const std::pair< size_t, size_t > &peptide_idx_range, uint16_t peak_charge)
 Queries one peak.
 
void querySpectrum (const MSSpectrum &spectrum, SpectrumMatchesTopN &sms)
 : queries one complete experimental spectra against the Database. Loops over all precursor charges Starts at min_precursor_charge and iteratively goes to max_precursor_charge. We query all peaks multiple times with all the different precursor charges and corresponding precursor masses
 
void querySpectrum (const MSSpectrum &spectrum, const std::vector< FASTAFile::FASTAEntry > &fasta_entries, SpectrumMatchesTopN &sms)
 Query a spectrum against the fragment index with FASTA context.
 
AASequence reconstructModifiedSequence (const Peptide &peptide, const std::vector< FASTAFile::FASTAEntry > &fasta_entries) const
 Reconstruct a fully modified AASequence from a Peptide's bitmask.
 
int realizeSNESLength (const Peptide &mother, const std::vector< FASTAFile::FASTAEntry > &fasta_entries, double target_mh_plus, double tolerance_lower_magnitude, double tolerance_upper_magnitude, bool tolerance_ppm) const
 Find the realized sub-peptide length of a SNES mother that best matches the observed precursor mass.
 
AASequence reconstructRealizedSubSequence (const Peptide &mother, const std::vector< FASTAFile::FASTAEntry > &fasta_entries, size_t realized_length, uint32_t subset_bitmask=0) const
 
static bool isSingleCMother (uint32_t mod_bitmask) noexcept
 
static bool isSingleNMother (uint32_t mod_bitmask) noexcept
 
void updateMembers_ () override
 This method is used to update extra member variables at the end of the setParameters() method.
 
void generatePeptides (const std::vector< FASTAFile::FASTAEntry > &fasta_entries)
 Generates all peptides from given fasta entries. If Bottom-up is set to false skips digestion. If set to true the Digestion enzyme can be set in the parameters. Additionally introduces fixed and variable modifications for restrictive PSM search.
 
void generateSNESMothers_ (const std::vector< FASTAFile::FASTAEntry > &fasta_entries)
 SNES-mode peptide enumeration: emit Single-N + Single-C mother peptides.
 
void initModificationTables_ ()
 
size_t buildModSlots_ (const char *sequence, size_t seq_len, ModSlot *out_slots, bool is_protein_nterm=false, bool is_protein_cterm=false) const
 
std::vector< double > computeSnesSigmaDeltaSet_ (bool include_prot_nterm_mods, bool include_prot_cterm_mods) const
 
void generateFragmentsLightweight_ (std::vector< Fragment > &fragments, const char *sequence, size_t seq_len, UInt32 peptide_idx, double n_term_mod_mass, double c_term_mod_mass, const double *residue_mod_masses) const
 
void generateFragmentsForSeries_ (std::vector< Fragment > &fragments, const char *sequence, size_t seq_len, UInt32 peptide_idx, double n_term_mod_mass, double c_term_mod_mass, const double *residue_mod_masses, bool add_b, bool add_a, bool add_c, bool add_y, bool add_x, bool add_z) const
 
static void initResidueMassTable_ ()
 
void querySpectrumSNES_ (const MSSpectrum &spectrum, const std::vector< FASTAFile::FASTAEntry > &fasta_entries, SpectrumMatchesTopN &sms)
 SNES-mode spectrum query (MetaMorpheus-style: byte-count + b-ion filter).
 
void queryPeaks (SpectrumMatchesTopN &candidates, const MSSpectrum &spectrum, const std::pair< size_t, size_t > &candidates_range, const int16_t isotope_error, const uint16_t precursor_charge)
 queries peaks for a given experimental spectrum with a set range of potential peptides, isotope error and precursor charge. Hits are transferred into a PSM list. Technically an adapter between query(...) and openSearch(...)/searchDifferentPrecursorRanges(...)
 
void searchDifferentPrecursorRanges (const MSSpectrum &spectrum, float precursor_mass, SpectrumMatchesTopN &sms, uint16_t charge)
 If closed search loops over all isotope errors. For each iteration loop over all peaks with queryPeaks.
 
void trimHits (SpectrumMatchesTopN &init_hits) const
 places the k-largest elements in the front of the input array. Inside of the k-largest elements and outside the elements are not sorted
 
bool isOpenSearchMode_ () const noexcept
 Instance delegate — same rule, reads the member bounds.
 
std::pair< float, float > computeMassWindow_ (float precursor_mass) const
 

Additional Inherited Members

- Protected Member Functions inherited from DefaultParamHandler
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor.
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters.
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes!
 
std::vector< std::string > subsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes!
 
std::string error_name_
 Name that is displayed in error messages during the parameter checking.
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;.
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;.
 

Detailed Description

Generates from a set of Fasta files a 2D-datastructure which stores all theoretical masses of all b and y ions from all peptides generated from the Fasta file. The datastructure is build such that on one axis the fragments are sorted by their own mass and the axis by the mass of their precursor/protein. The FI has two options: Bottom-up and Top Down. In later digestion is skiped and the fragments have a direct reference to the mass of the proteins instead of digested peptides.


Class Documentation

◆ OpenMS::FragmentIndex::IonOffsets

struct OpenMS::FragmentIndex::IonOffsets

Precomputed ion-type mass offsets (from Residue::getInternalTo*Ion formulas)

Collaboration diagram for FragmentIndex::IonOffsets:
[legend]
Class Members
double a_offset {0.0}
double b_offset {0.0}
double c_offset {0.0}
double x_offset {0.0}
double y_offset {0.0}
double z_offset {0.0}

◆ OpenMS::FragmentIndex::SpectrumMatch

struct OpenMS::FragmentIndex::SpectrumMatch

Match between a query peak and an entry in the DB.

Collaboration diagram for FragmentIndex::SpectrumMatch:
[legend]
Class Members
int16_t isotope_error_ {} The isotope_error used for the performed search.
uint32_t num_matched_ {} Number of peaks-fragment hits.
size_t peptide_idx_ {} The idx this struct belongs to.
uint16_t precursor_charge_ {} The precursor_charge used for the performed search.
float sigma_delta_ {} SNES v1.1: Σ of variable-mod deltas for this match. 0 in non-SNES / unmodified SNES.
uint32_t subset_bitmask_ {} SNES v1.1: active slots in the slot list returned by buildModSlots_. 0 = unmodified. Ignored in non-SNES mode.

◆ OpenMS::FragmentIndex::VarModEntry

struct OpenMS::FragmentIndex::VarModEntry

Entry in the per-AA variable modification lookup table.

Collaboration diagram for FragmentIndex::VarModEntry:
[legend]
Class Members
double delta_mass mass delta from this modification
const ResidueModification * mod_ptr pointer to the modification (for AASequence reconstruction)
TermSpecificity term_spec where this mod can be applied

Member Enumeration Documentation

◆ SnesAnchor

enum class SnesAnchor
strong

SNES v1.1: constrain bin-walk hits to mothers with a specific protein anchor. Used to gate walks that enumerate PROTEIN_N_TERM / PROTEIN_C_TERM variable mods.

Enumerator
NONE 

no anchor restriction (baseline walks)

PROT_NTERM 

mother must have sequence_.first == 0

PROT_CTERM 

mother must have sequence_.first + sequence_.second == protein length

Constructor & Destructor Documentation

◆ FragmentIndex()

Default constructor.

Initializes an empty FragmentIndex. Call build() before using any query functions. After clear(), the index returns to this unbuilt state.

Thread-safety: constructing the object is thread-safe as long as the instance is not shared across threads before initialization completes.

◆ ~FragmentIndex()

~FragmentIndex ( )
overridedefault

Default destructor.

Releases owned memory. If the index was built, all internal buffers and fragment buckets are freed. No exceptions are thrown.

Member Function Documentation

◆ build()

void build ( const std::vector< FASTAFile::FASTAEntry > &  fasta_entries)

Given a set of Fasta files, builds the Fragment Index datastructure (FID). First all fragments are sorted by their own mass. Next they are placed in buckets. The min-fragment mass is stored for each bucket, whereupon the fragments are sorted within the buckets by their originating precursor mass.

Parameters
[in]fasta_entriesThe FASTA entries used to build the index.

◆ buildModSlots_()

size_t buildModSlots_ ( const char *  sequence,
size_t  seq_len,
ModSlot out_slots,
bool  is_protein_nterm = false,
bool  is_protein_cterm = false 
) const
protected

Scan a peptide sequence to find all variable modification slots. Returns the number of slots written to out_slots (at most MAX_MOD_SLOTS). Deterministic ordering: N-term pure-terminal mods, then left-to-right residue mods (ANYWHERE + position-specific terminal), then C-term pure-terminal mods.

Parameters
sequenceraw amino acid character array
seq_lenlength of the sequence
out_slotsoutput array for modification slots (must have space for MAX_MOD_SLOTS entries)
is_protein_ntermtrue if this peptide starts at protein position 0
is_protein_ctermtrue if this peptide ends at the last protein residue

◆ clear()

void clear ( )

Delete fragment index. Sets is_build=false.

◆ computeMassWindow_()

std::pair< float, float > computeMassWindow_ ( float  precursor_mass) const
private

Compute the signed mass window {lo, hi} around a precursor_mass, converting ppm → Da if the unit is ppm. lo is negative (or zero), hi is positive (or zero). This is the only place where positive member magnitudes become signed offsets.

◆ computeSnesSigmaDeltaSet_()

std::vector< double > computeSnesSigmaDeltaSet_ ( bool  include_prot_nterm_mods,
bool  include_prot_cterm_mods 
) const
protected

Enumerate distinct Σ values achievable by any subset of configured variable mods with popcount ≤ max_variable_mods_per_peptide_. Configuration-global; does not consider per-peptide residue inventory. Per-peptide applicability is enforced at query-time subset enumeration.

Parameters
include_prot_nterm_modsinclude mods with PROTEIN_N_TERM specificity
include_prot_cterm_modsinclude mods with PROTEIN_C_TERM specificity
Returns
sorted ascending distinct Σ values; always includes 0.0

◆ generateFragmentsForSeries_()

void generateFragmentsForSeries_ ( std::vector< Fragment > &  fragments,
const char *  sequence,
size_t  seq_len,
UInt32  peptide_idx,
double  n_term_mod_mass,
double  c_term_mod_mass,
const double *  residue_mod_masses,
bool  add_b,
bool  add_a,
bool  add_c,
bool  add_y,
bool  add_x,
bool  add_z 
) const
protected

Fragment generation with explicit per-call ion-series selection.

Called by the SNES mother path to restrict a Single-N mother to b-ions and a Single-C mother to y-ions regardless of the class-level add_*_ions_ flags. generateFragmentsLightweight_ forwards to this function after packing the class flags; both share a single implementation.

Parameters
[out]fragmentsOutput vector to append Fragment entries to
[in]sequenceRaw amino acid string (no modifications)
[in]seq_lenLength of sequence
[in]peptide_idxIndex of this peptide in fi_peptides_
[in]n_term_mod_massMass delta from N-terminal modification (0 if none)
[in]c_term_mod_massMass delta from C-terminal modification (0 if none)
[in]residue_mod_massesPer-residue modification mass deltas (nullptr if none; array of seq_len doubles)
[in]add_bEmit b-ions (prefix).
[in]add_aEmit a-ions (prefix).
[in]add_cEmit c-ions (prefix).
[in]add_yEmit y-ions (suffix).
[in]add_xEmit x-ions (suffix).
[in]add_zEmit z-ions (suffix).

◆ generateFragmentsLightweight_()

void generateFragmentsLightweight_ ( std::vector< Fragment > &  fragments,
const char *  sequence,
size_t  seq_len,
UInt32  peptide_idx,
double  n_term_mod_mass,
double  c_term_mod_mass,
const double *  residue_mod_masses 
) const
protected

Lightweight fragment generation: compute b/y ion m/z directly from amino acid chars. Bypasses AASequence::fromString and TheoreticalSpectrumGenerator. Uses the class-level add_b_ions_ / add_y_ions_ / ... flags for the ion series selection. See generateFragmentsForSeries_ for the explicit-flag variant used by the SNES mother path.

Parameters
[out]fragmentsOutput vector to append Fragment entries to
[in]sequenceRaw amino acid string (no modifications)
[in]seq_lenLength of sequence
[in]peptide_idxIndex of this peptide in fi_peptides_
[in]n_term_mod_massMass delta from N-terminal modification (0 if none)
[in]c_term_mod_massMass delta from C-terminal modification (0 if none)
[in]residue_mod_massesPer-residue modification mass deltas (nullptr if none; array of seq_len doubles)

◆ generatePeptides()

void generatePeptides ( const std::vector< FASTAFile::FASTAEntry > &  fasta_entries)
protected

Generates all peptides from given fasta entries. If Bottom-up is set to false skips digestion. If set to true the Digestion enzyme can be set in the parameters. Additionally introduces fixed and variable modifications for restrictive PSM search.

Parameters
[in]fasta_entries

◆ generateSNESMothers_()

void generateSNESMothers_ ( const std::vector< FASTAFile::FASTAEntry > &  fasta_entries)
protected

SNES-mode peptide enumeration: emit Single-N + Single-C mother peptides.

Called instead of the usual enzymatic-digestion path when is_snes_mode_ is true. For each protein, emits:

  • one Single-N mother at every anchor position i in [0, L - min_length], whose sequence spans residues [i, i + min(max_length, L - i)). Tagged by clearing bit SNES_KIND_BIT_MASK in the mod_bitmask; indexed with b-ion series only.
  • one Single-C mother at every anchor position j in [min_length - 1, L - 1], whose sequence spans residues [j - min(max_length, j + 1) + 1, j + 1). Tagged by setting bit SNES_KIND_BIT_MASK; indexed with y-ion series only.

All sub-peptides of the mother (down to the configured min_length) can be realized from a single mother record, which is what gives SNES its memory and speed win over naïve O(L^2) non-specific enumeration.

v1 restriction: variable modifications are disabled in SNES mode. A warning is emitted at build time if any variable modification is configured. Fixed modifications (both residue-specific and terminal) are fully supported.

Parameters
[in]fasta_entriesProtein database (same semantics as generatePeptides).

◆ getNumFragments()

Size getNumFragments ( ) const
inlinenoexcept

Number of theoretical fragments stored in the index (0 before build()).

◆ getPeptides()

const std::vector< Peptide > & getPeptides ( ) const

Returns a reference to the internal peptide container.

Provides read-only access to all peptides currently held by the index, typically populated during build().

Returns
const reference to the internal std::vector of Peptide.

Preconditions: The vector may be empty if build() has not been called yet. Thread-safety: read-only view; safe to access concurrently as long as no thread mutates the index (e.g., build()/clear()).

◆ getPeptidesInMassWindow()

std::pair< size_t, size_t > getPeptidesInMassWindow ( float  precursor_mass,
const std::pair< float, float > &  window 
) const

Return the [begin_idx, end_idx) peptide index range such that fi_peptides_[i].precursor_mz_ ∈ [precursor_mass + window.first, precursor_mass + window.second] for all i in the returned range.

Parameters
[in]precursor_massThe mono-charged precursor mass (M+H).
[in]windowSigned absolute offsets around the precursor mass. By convention window.first is <= 0 and window.second is >= 0 (produced by computeMassWindow_). A reversed window trivially returns an empty range; no diagnostic is emitted. No hidden tolerance is added.
Returns
[begin_idx, end_idx) half-open index range into fi_peptides_.

◆ initModificationTables_()

void initModificationTables_ ( )
protected

Build per-AA modification lookup tables from modifications_fixed_ and modifications_variable_. Called once at the start of generatePeptides().

◆ initResidueMassTable_()

static void initResidueMassTable_ ( )
staticprotected

◆ isBuild()

bool isBuild ( ) const

Indicates whether the fragment index has been built.

Returns
true if build() has completed successfully and the index is ready for queries; false otherwise (e.g., after construction or after clear()).

Thread-safety: read-only and can be called concurrently with other read-only methods. Must not race with build()/clear() on the same instance.

◆ isOpenSearchMode()

static bool isOpenSearchMode ( double  lower_magnitude,
double  upper_magnitude,
bool  unit_ppm 
)
inlinestaticnoexcept

Shared auto-detection: open-search iff max(lower, upper) > threshold (1000 ppm or 1 Da). Strict >: exactly 1000 ppm stays closed. This is the single source of truth for the open-search auto-detection rule and is reused by ProSEAlgorithm and the TOPP tool.

◆ isOpenSearchMode_()

bool isOpenSearchMode_ ( ) const
inlineprivatenoexcept

Instance delegate — same rule, reads the member bounds.

◆ isSingleCMother()

static bool isSingleCMother ( uint32_t  mod_bitmask)
inlinestaticnoexcept
Returns
true iff the mother described by mod_bitmask is a Single-C (C-anchored) mother. Only meaningful for peptides from an SNES-built index.

◆ isSingleNMother()

static bool isSingleNMother ( uint32_t  mod_bitmask)
inlinestaticnoexcept
Returns
true iff the mother described by mod_bitmask is a Single-N (N-anchored) mother.

◆ isSnesMode()

bool isSnesMode ( ) const
inlinenoexcept
Returns
true if the index was built in SNES mode. SNES activates when both snes_enabled is set to true and peptide:enzyme_specificity is none. snes_enabled defaults to false in v1 (opt-in), so specific/semi-specific searches and non-specific searches without the flag produce the same fragment index as the pre-SNES code path.

◆ query()

std::vector< Hit > query ( const Peak1D peak,
const std::pair< size_t, size_t > &  peptide_idx_range,
uint16_t  peak_charge 
)

Queries one peak.

Parameters
[in]peakThe queried peak
[in]peptide_idx_rangeThe range of precursors/peptides the peptide could potentially belongs to
[in]peak_chargeThe charge of the peak. Is used to calculate the mass from the mz
Returns
a vector of Hits(matching peptide_idx_range and matching fragment_mz_) containing the idx of the hitted peptide and the mass of the hit

◆ queryPeaks()

void queryPeaks ( SpectrumMatchesTopN candidates,
const MSSpectrum spectrum,
const std::pair< size_t, size_t > &  candidates_range,
const int16_t  isotope_error,
const uint16_t  precursor_charge 
)
private

queries peaks for a given experimental spectrum with a set range of potential peptides, isotope error and precursor charge. Hits are transferred into a PSM list. Technically an adapter between query(...) and openSearch(...)/searchDifferentPrecursorRanges(...)

Parameters
[out]candidatesThe n best Spectrum matches
[in]spectrumThe queried experimental spectrum
[in]candidates_rangeThe range of precursors/peptides the peptide could potentially belong to
[in]isotope_errorThe applied isotope error
[in]precursor_chargeThe applied precursor charge

◆ querySpectrum() [1/2]

void querySpectrum ( const MSSpectrum spectrum,
const std::vector< FASTAFile::FASTAEntry > &  fasta_entries,
SpectrumMatchesTopN sms 
)

Query a spectrum against the fragment index with FASTA context.

Required when FragmentIndex is in SNES mode with variable modifications; the FASTA is needed to realize sub-peptide sequences and apply variable mods. Non-SNES and SNES-without-var-mods paths ignore the fasta_entries argument.

Parameters
[in]spectrumExperimental spectrum with a single precursor.
[in]fasta_entriesThe FASTA database passed to build().
[out]smsAccumulated candidate matches.

◆ querySpectrum() [2/2]

void querySpectrum ( const MSSpectrum spectrum,
SpectrumMatchesTopN sms 
)

: queries one complete experimental spectra against the Database. Loops over all precursor charges Starts at min_precursor_charge and iteratively goes to max_precursor_charge. We query all peaks multiple times with all the different precursor charges and corresponding precursor masses

Parameters
[in]spectrumexperimental spectrum
[out]smsThe n best Spectrum matches

◆ querySpectrumSNES_()

void querySpectrumSNES_ ( const MSSpectrum spectrum,
const std::vector< FASTAFile::FASTAEntry > &  fasta_entries,
SpectrumMatchesTopN sms 
)
private

SNES-mode spectrum query (MetaMorpheus-style: byte-count + b-ion filter).

Implements the Rolfs/Smith 2020 inverted-index search strategy as executed in MetaMorpheus's NonSpecificEnzymeSearchEngine. Replaces the pre-SNES searchDifferentPrecursorRanges flow with a two-phase design:

  1. Byte-count pass: for every experimental peak, walk the fragment-mz buckets within fragment tolerance and increment a per-thread byte score table indexed by global mother peptide id. No precursor-mass filter at this stage — every mother that has a fragment matching any peak is counted. This is the critical departure from the v1 design, which pre-filtered mothers by mother_mass >= P - tol and admitted the top half of the index as candidates.
  2. Candidate collection: for each (precursor charge, isotope error), walk fragment buckets at the target m/z that a realized sub-peptide's terminal ion would occupy:
    • Single-N mother / b-ion index: target m/z = (M+H)+ − water (relation M_sub+H+ = b_k + water, so the mother's b_k ion falls at M_obs+H+ − water when the realized length matches).
    • Single-C mother / y-ion index: target m/z = (M+H)+ directly (M_sub+H+ = y_k exactly). Every mother with a fragment in the target bin that has the correct kind (Single-N vs Single-C) and whose byte-score meets fragment:min_matched_ions is emitted as a candidate.

This design produces a candidate set sized like a single fragment-bin lookup (dozens, not thousands), matching MetaMorpheus's algorithmic scalability. The byte table is reused across calls via thread_local.

Only called when isSnesMode is true; otherwise the pre-SNES searchDifferentPrecursorRanges path is used.

Parameters
[in]spectrumExperimental spectrum with a single precursor.
[in]fasta_entriesSource database passed to build(); required for realizing sub-peptides and applying variable mods inside the SNES v1.1 subset-enumeration post-pass.
[out]smsAccumulated candidate matches, ordered by insertion (caller runs full-score and top-N selection downstream).

◆ realizeSNESLength()

int realizeSNESLength ( const Peptide mother,
const std::vector< FASTAFile::FASTAEntry > &  fasta_entries,
double  target_mh_plus,
double  tolerance_lower_magnitude,
double  tolerance_upper_magnitude,
bool  tolerance_ppm 
) const

Find the realized sub-peptide length of a SNES mother that best matches the observed precursor mass.

Scans realizable lengths k in [peptide_min_length_, mother.sequence_.second] from the appropriate terminus (left-to-right for Single-N mothers, right-to-left for Single-C mothers), computing the cumulative residue mass plus fixed modifications. Returns the length whose realized (M+H)+ mass is closest to target_mh_plus within the given tolerance, or -1 if no length satisfies the tolerance.

Must only be called in SNES mode; returns -1 immediately otherwise.

Parameters
[in]motherA Peptide representing a Single-N or Single-C mother.
[in]fasta_entriesSource database (same one passed to build()).
[in]target_mh_plusObserved (M+H)+ mass after isotope-error correction.
[in]tolerance_lower_magnitudePositive tolerance magnitude on the low side (realized_mass - target >= -tolerance_lower_magnitude).
[in]tolerance_upper_magnitudePositive tolerance magnitude on the high side (realized_mass - target <= +tolerance_upper_magnitude).
[in]tolerance_ppmTrue if the magnitudes are in ppm.
Returns
Realized length, or -1 on failure.

◆ reconstructModifiedSequence()

AASequence reconstructModifiedSequence ( const Peptide peptide,
const std::vector< FASTAFile::FASTAEntry > &  fasta_entries 
) const

Reconstruct a fully modified AASequence from a Peptide's bitmask.

Used for result output - only called for final hits (not in the build hot path). Applies fixed modifications, then uses the bitmask to determine which variable modifications are active at which positions.

Parameters
[in]peptideThe Peptide descriptor with mod_bitmask_
[in]fasta_entriesThe FASTA database used during build()
Returns
The fully modified AASequence

◆ reconstructRealizedSubSequence()

AASequence reconstructRealizedSubSequence ( const Peptide mother,
const std::vector< FASTAFile::FASTAEntry > &  fasta_entries,
size_t  realized_length,
uint32_t  subset_bitmask = 0 
) const

Reconstruct a realized SNES sub-peptide as an AASequence.

Parameters
motherthe SNES mother Peptide entry
fasta_entriesthe FASTA entries used to build the index
realized_lengththe length of the realized sub-peptide (from realizeSNESLength)
subset_bitmaskSNES v1.1: active slots from buildModSlots_(seq_ptr, realized_length, ...) to apply as variable modifications. 0 = unmodified (backward compatible).
Returns
AASequence representing the realized sub-peptide with any variable mods from subset_bitmask applied

◆ searchDifferentPrecursorRanges()

void searchDifferentPrecursorRanges ( const MSSpectrum spectrum,
float  precursor_mass,
SpectrumMatchesTopN sms,
uint16_t  charge 
)
private

If closed search loops over all isotope errors. For each iteration loop over all peaks with queryPeaks.

If open search applies a precursor-mass window

Parameters
[in]spectrumexperimental query-spectrum
[in]precursor_massThe mass of the precursor (mz * charge)
[out]smsThe Top m SpectrumMatches
[in]chargeApplied charge

◆ trimHits()

void trimHits ( SpectrumMatchesTopN init_hits) const
private

places the k-largest elements in the front of the input array. Inside of the k-largest elements and outside the elements are not sorted

◆ updateMembers_()

void updateMembers_ ( )
overrideprotectedvirtual

This method is used to update extra member variables at the end of the setParameters() method.

Also call it at the end of the derived classes' copy constructor and assignment operator.

The default implementation is empty.

Reimplemented from DefaultParamHandler.

Member Data Documentation

◆ add_a_ions_

bool add_a_ions_
private

◆ add_b_ions_

bool add_b_ions_
private

◆ add_c_ions_

bool add_c_ions_
private

◆ add_x_ions_

bool add_x_ions_
private

◆ add_y_ions_

bool add_y_ions_
private

◆ add_z_ions_

bool add_z_ions_
private

◆ bucket_min_mz_

std::vector<float> bucket_min_mz_
protected

vector of the smalles fragment mz of each bucket

◆ bucketsize_

size_t bucketsize_
protected

number of fragments per outer node

◆ digestion_enzyme_

std::string digestion_enzyme_
private

◆ enzyme_specificity_

'full' (default), 'semi' (semi-tryptic), or 'none' (e.g. immunopeptidomics)

◆ fi_fragments_

std::vector<Fragment> fi_fragments_
protected

vector of all theoretical fragments (b- and y- ions)

◆ fi_peptides_

std::vector<Peptide> fi_peptides_
protected

vector of all (digested) peptides

◆ fixed_cterm_delta_

double fixed_cterm_delta_ {0.0}
protected

Fixed C-terminal mod delta (0 if none)

◆ fixed_cterm_mod_ptr_

const ResidueModification* fixed_cterm_mod_ptr_ {nullptr}
protected

◆ fixed_mod_deltas_

std::array<double, 128> fixed_mod_deltas_ {}
protected

Per-AA fixed modification delta mass (0.0 if no fixed mod applies)

◆ fixed_mod_ptrs_

std::array<const ResidueModification*, 128> fixed_mod_ptrs_ {}
protected

Per-AA fixed modification pointer (nullptr if none)

◆ fixed_nterm_delta_

double fixed_nterm_delta_ {0.0}
protected

Fixed N-terminal mod delta (0 if none)

◆ fixed_nterm_mod_ptr_

const ResidueModification* fixed_nterm_mod_ptr_ {nullptr}
protected

◆ fragment_max_mz_

float fragment_max_mz_
protected

largest fragment mz

◆ fragment_min_mz_

float fragment_min_mz_
protected

smallest fragment mz

◆ fragment_mz_tolerance_

float fragment_mz_tolerance_
protected

◆ fragment_mz_tolerance_unit_ppm_

bool fragment_mz_tolerance_unit_ppm_ {true}
protected

◆ ion_offsets_

IonOffsets ion_offsets_
staticprotected

◆ is_build_

bool is_build_ {false}
protected

true, if the database has been populated with fragments

◆ is_snes_mode_

bool is_snes_mode_ {false}
protected

SNES mode state. Set in updateMembers_ from the snes_enabled parameter. When true, generatePeptides dispatches to generateSNESMothers_ and the fragment-index query layer switches to the one-sided lookup. When false, no SNES code path is active and the index behaves identically to the original precursor-window-based implementation.

◆ mass_table_once_flag_

std::once_flag mass_table_once_flag_
staticprotected

◆ max_fragment_charge_

uint16_t max_fragment_charge_
private

The maximal possible charge of the fragments.

◆ max_isotope_error_

int16_t max_isotope_error_
private

Maximal possible isotope error (both only used for closed search)

◆ MAX_MOD_SLOTS

constexpr size_t MAX_MOD_SLOTS = 32
staticconstexprprotected

max variable mod slots per peptide (uint32_t bitmask)

◆ max_precursor_charge_

uint16_t max_precursor_charge_
private

maximal possible precursor charge

◆ max_processed_hits_

uint32_t max_processed_hits_
private

The amount of PSM that will be used. the rest is filtered out.

◆ max_variable_mods_per_peptide_

size_t max_variable_mods_per_peptide_
private

◆ min_ion_index_

size_t min_ion_index_ {0}
protected

skip ions below this index (0=all, 2=skip b1/b2/y1/y2)

◆ min_isotope_error_

int16_t min_isotope_error_
private

Minimal possible isotope error.

◆ min_matched_peaks_

uint16_t min_matched_peaks_
private

PSM with less hits are discarded.

◆ min_precursor_charge_

uint16_t min_precursor_charge_
private

minimal possible precursor charge (usually always 1)

◆ missed_cleavages_

size_t missed_cleavages_
private

number of missed cleavages

◆ mod_tables_initialized_

bool mod_tables_initialized_ {false}
protected

◆ modifications_fixed_

StringList modifications_fixed_
private

Modification that are one all peptides.

◆ modifications_variable_

StringList modifications_variable_
private

Variable Modification -> all possible comibnations are created.

◆ peptide_max_length_

size_t peptide_max_length_
private

◆ peptide_max_mass_

float peptide_max_mass_
private

◆ peptide_min_length_

size_t peptide_min_length_
private

◆ peptide_min_mass_

float peptide_min_mass_
private

◆ precursor_mass_tolerance_lower_

double precursor_mass_tolerance_lower_ {20.0}
protected

positive magnitude, effective lower bound is -lower

◆ precursor_mass_tolerance_unit_ppm_

bool precursor_mass_tolerance_unit_ppm_ {true}
protected

◆ precursor_mass_tolerance_upper_

double precursor_mass_tolerance_upper_ {20.0}
protected

positive magnitude, effective upper bound is +upper

◆ protein_lengths_

std::vector<uint32_t> protein_lengths_
protected

Protein lengths indexed by protein_idx, populated at build() time. Used by SNES v1.1 to gate PROTEIN_C_TERM variable-mod bin walks.

◆ residue_mass_table_

std::array<double, 128> residue_mass_table_
staticprotected

Precomputed residue mass lookup table: ASCII char -> internal monoisotopic mass (Da). Indexed by single-letter amino acid code (e.g., 'A'=65). Entries for non-AA chars are 0.

◆ snes_enabled_

bool snes_enabled_ {false}
protected

User-facing SNES opt-in switch (parameter "snes_enabled"). Only takes effect when the configured enzyme specificity is SPEC_NONE — specific / semi- specific searches ignore it. Exposed as a separate member so the parameter can be set/queried independently of the derived is_snes_mode_ state (which captures the combined decision specificity && snes_enabled).

◆ SNES_KIND_BIT_MASK

constexpr uint32_t SNES_KIND_BIT_MASK = 1u << 31
staticconstexpr

bit 31; set = Single-C mother

◆ snes_sigma_delta_set_

std::vector<double> snes_sigma_delta_set_
protected

SNES v1.1: precomputed distinct Σ_delta values for bin-walk targets. Baseline set excludes protein-term-only variable mods.

◆ snes_sigma_delta_set_with_prot_cterm_

std::vector<double> snes_sigma_delta_set_with_prot_cterm_
protected

SNES v1.1: Σ values including PROTEIN_C_TERM-only variable mods. Used only for Single-C mothers anchored at the protein C-terminus.

◆ snes_sigma_delta_set_with_prot_nterm_

std::vector<double> snes_sigma_delta_set_with_prot_nterm_
protected

SNES v1.1: Σ values including PROTEIN_N_TERM-only variable mods. Used only for Single-N mothers anchored at protein position 0.

◆ SNES_SLOT_MASK

constexpr uint32_t SNES_SLOT_MASK = ~SNES_KIND_BIT_MASK
staticconstexpr

bits 0..30 in SNES mode

◆ variable_cterm_mods_

std::vector<VarModEntry> variable_cterm_mods_
protected

Pure C-terminal variable mods (not residue-specific)

◆ variable_mod_table_

std::array<std::vector<VarModEntry>, 128> variable_mod_table_ {}
protected

Per-AA variable modification table: for each ASCII char, list of possible variable mods.

◆ variable_nterm_mods_

std::vector<VarModEntry> variable_nterm_mods_
protected

Pure N-terminal variable mods (not residue-specific)