OpenMS
Loading...
Searching...
No Matches
FLASHHelperClasses::PrecalculatedAveragine Class Reference

Pre-binned averagine isotope-distribution cache used by FLASHDeconv for fast cosine-similarity scoring. More...

#include <OpenMS/ANALYSIS/TOPDOWN/FLASHHelperClasses.h>

Collaboration diagram for FLASHHelperClasses::PrecalculatedAveragine:
[legend]

Public Member Functions

 PrecalculatedAveragine ()=default
 Default-construct an empty cache (no bins populated; the accessors are not safe until the parameterised ctor is used).
 
 PrecalculatedAveragine (double min_mass, double max_mass, double delta, CoarseIsotopePatternGenerator &generator, bool use_RNA_averagine, double decoy_iso_distance=-1)
 Precompute averagine isotope distributions across min_mass .. max_mass in steps of delta.
 
 PrecalculatedAveragine (const PrecalculatedAveragine &)=default
 Copy constructor.
 
 PrecalculatedAveragine (PrecalculatedAveragine &&other) noexcept=default
 Move constructor.
 
PrecalculatedAveragineoperator= (const PrecalculatedAveragine &pc)=default
 Copy assignment.
 
PrecalculatedAveragineoperator= (PrecalculatedAveragine &&pc) noexcept=default
 Move assignment.
 
 ~PrecalculatedAveragine ()=default
 Destructor.
 
IsotopeDistribution get (double mass) const
 Return the cached trimmed + L2-normalised isotope distribution for the bin containing mass.
 
size_t getMaxIsotopeIndex () const
 Return the externally-set isotope-index cap.
 
void setMaxIsotopeIndex (int index)
 Set the isotope-index cap consulted by external callers. Does not influence the cached distributions or other accessors.
 
Size getLeftCountFromApex (double mass) const
 Return the number of significant isotopes to the left of the apex for the bin containing mass.
 
Size getRightCountFromApex (double mass) const
 Return the number of significant isotopes to the right of the apex for the bin containing mass.
 
Size getApexIndex (double mass) const
 Return the index of the most-abundant isotope inside the trimmed distribution for the bin containing mass.
 
Size getLastIndex (double mass) const
 Return apex_index + right_count_from_apex for the bin containing mass — the index of the rightmost significant isotope.
 
double getAverageMassDelta (double mass) const
 Return (average_mass - monoisotopic_mass) for the bin's trimmed distribution.
 
double getMostAbundantMassDelta (double mass) const
 Return (most_abundant_mass - monoisotopic_mass) for the bin's trimmed distribution.
 
double getSNRMultiplicationFactor (double mass) const
 Return the SNR multiplication factor for the bin containing mass.
 

Private Member Functions

Size massToIndex_ (double mass) const
 Convert mass to a bin index. Clamps to [0, isotopes_.size()-1] for out-of-range inputs (no throw).
 

Private Attributes

std::vector< IsotopeDistributionisotopes_
 Per-bin isotope distribution. After trimming and L2 normalisation in the constructor.
 
std::vector< double > norms_
 Per-bin L2 norm — populated and consumed alongside isotopes_ for fast cosine scoring (vestigial: not consumed by the public API in this header).
 
std::vector< double > average_mono_mass_difference_
 Per-bin difference between the average mass and the monoisotopic mass of the trimmed distribution.
 
std::vector< double > abundant_mono_mass_difference_
 Per-bin difference between the most-abundant-isotope mass and the monoisotopic mass.
 
std::vector< double > snr_mul_factor_
 Per-bin SNR multiplication factor: (sum-of-normalised-intensities)^2 of the trimmed distribution.
 
std::vector< int > left_count_from_apex_
 Per-bin count of significant isotopes on the left side of the apex (running maximum across bins; see the PrecalculatedAveragine parameterised constructor for the unusual running-max semantics).
 
std::vector< int > right_count_from_apex_
 Per-bin count of significant isotopes on the right side of the apex (running maximum across bins; same caveat as left_count_from_apex_).
 
std::vector< Sizeapex_index_
 Per-bin index of the most-abundant isotope inside the trimmed distribution.
 
Size max_isotope_index_
 Cap on the isotope index used externally by FLASHDeconv. NOT initialised by the constructor — call setMaxIsotopeIndex before reading via getMaxIsotopeIndex.
 
double mass_interval_
 Bin step size (the delta passed to the constructor).
 
double min_mass_
 Lower bound of the cached mass range (the min_mass passed to the constructor).
 

Detailed Description

Pre-binned averagine isotope-distribution cache used by FLASHDeconv for fast cosine-similarity scoring.

Builds a table keyed by a mass bin index — one IsotopeDistribution plus its derived quantities (apex index, left/right counts of significant isotopes, average / most-abundant mass offsets, an SNR multiplication factor) per mass bin from min_mass to max_mass at step delta. The constructor performs all the per-mass work up front; the accessors are O(1) table lookups via massToIndex_(mass) (rounded to the nearest bin).

Bin lookup silently clamps: any mass below min_mass maps to bin 0 and any mass above max_mass maps to the last bin, so the accessors never throw on out-of-range input.

Constructor & Destructor Documentation

◆ PrecalculatedAveragine() [1/4]

Default-construct an empty cache (no bins populated; the accessors are not safe until the parameterised ctor is used).

◆ PrecalculatedAveragine() [2/4]

PrecalculatedAveragine ( double  min_mass,
double  max_mass,
double  delta,
CoarseIsotopePatternGenerator generator,
bool  use_RNA_averagine,
double  decoy_iso_distance = -1 
)

Precompute averagine isotope distributions across min_mass .. max_mass in steps of delta.

For each bin mass m = i*delta in the range, calls CoarseIsotopePatternGenerator::estimateFromPeptideMonoWeight(m) (or estimateFromRNAMonoWeight(m) when use_RNA_averagine is true) and applies the following processing:

  • Trimming: iteratively drops the least-intense end peak (left or right) until the trimmed-out power is below 0.0001 of the total, with a minimum kept length of 2 peaks. Trimmed peaks have their intensity zeroed out, not removed from the distribution.
  • L2 normalisation: divides every remaining peak's intensity by sqrt(total_pwr) of the kept set.
  • Decoy mode: when decoy_iso_distance is > 0, every peak's m/z is multiplied by decoy_iso_distance before trimming, then the distribution is re-sorted by mass and renormalised. This produces a "stretched" pattern whose inter-isotope spacing is decoy_iso_distance times the natural spacing.
  • Left/right counts: clamped to a floor of 2 each, and then accumulated as a running maximum across bins — so all bins end up with the same width (the maximum seen so far during construction).

Per bin, the constructor records the trimmed distribution, the apex index, the running-max left/right counts, the average-mono and most-abundant-mono mass deltas, and the SNR multiplication factor (sum-of-normalised-intensities)^2.

Parameters
[in]min_massLower bound of the cached mass range; bins are built starting at the first multiple of delta that is >= min_mass.
[in]max_massUpper bound of the cached mass range; bin generation stops at the first multiple of delta that exceeds max_mass.
[in]deltaBin step size.
[in,out]generatorIsotope-pattern generator used to compute each per-mass distribution.
[in]use_RNA_averagineIf true, use the RNA-mass averagine; otherwise use peptide averagine.
[in]decoy_iso_distance> 0 enables decoy mode (per-peak m/z multiplied by this value); <= 0 disables it. Default -1.

◆ PrecalculatedAveragine() [3/4]

Copy constructor.

◆ PrecalculatedAveragine() [4/4]

PrecalculatedAveragine ( PrecalculatedAveragine &&  other)
defaultnoexcept

Move constructor.

◆ ~PrecalculatedAveragine()

~PrecalculatedAveragine ( )
default

Destructor.

Member Function Documentation

◆ get()

IsotopeDistribution get ( double  mass) const

Return the cached trimmed + L2-normalised isotope distribution for the bin containing mass.

Parameters
[in]massInput mass to query. Out-of-range values are silently clamped to [min_mass, max_mass].
Returns
Cached isotope distribution for the resolved bin.

◆ getApexIndex()

Size getApexIndex ( double  mass) const

Return the index of the most-abundant isotope inside the trimmed distribution for the bin containing mass.

Parameters
[in]massInput mass to query. Out-of-range values are silently clamped to [min_mass, max_mass].
Returns
Position of the apex peak inside the trimmed distribution.

◆ getAverageMassDelta()

double getAverageMassDelta ( double  mass) const

Return (average_mass - monoisotopic_mass) for the bin's trimmed distribution.

Parameters
[in]massInput mass to query. Out-of-range values are silently clamped to [min_mass, max_mass].
Returns
Mass difference between the trimmed distribution's average mass and its monoisotopic mass.

◆ getLastIndex()

Size getLastIndex ( double  mass) const

Return apex_index + right_count_from_apex for the bin containing mass — the index of the rightmost significant isotope.

Parameters
[in]massInput mass to query. Out-of-range values are silently clamped to [min_mass, max_mass].
Returns
Index of the rightmost significant isotope.

◆ getLeftCountFromApex()

Size getLeftCountFromApex ( double  mass) const

Return the number of significant isotopes to the left of the apex for the bin containing mass.

Note
The stored value is a running maximum accumulated across bins during construction, so it grows monotonically with mass (see the parameterised constructor's left/right-count semantics).
Parameters
[in]massInput mass to query. Out-of-range values are silently clamped to [min_mass, max_mass].
Returns
Running-max left count of significant isotopes.

◆ getMaxIsotopeIndex()

size_t getMaxIsotopeIndex ( ) const

Return the externally-set isotope-index cap.

Warning
The constructor does not initialise max_isotope_index_; reading this before setMaxIsotopeIndex has been called returns an uninitialised value.
Returns
The most recent value passed to setMaxIsotopeIndex.

◆ getMostAbundantMassDelta()

double getMostAbundantMassDelta ( double  mass) const

Return (most_abundant_mass - monoisotopic_mass) for the bin's trimmed distribution.

Parameters
[in]massInput mass to query. Out-of-range values are silently clamped to [min_mass, max_mass].
Returns
Mass difference between the trimmed distribution's most-abundant-isotope mass and its monoisotopic mass.

◆ getRightCountFromApex()

Size getRightCountFromApex ( double  mass) const

Return the number of significant isotopes to the right of the apex for the bin containing mass.

Note
Same running-maximum caveat as getLeftCountFromApex.
Parameters
[in]massInput mass to query. Out-of-range values are silently clamped to [min_mass, max_mass].
Returns
Running-max right count of significant isotopes.

◆ getSNRMultiplicationFactor()

double getSNRMultiplicationFactor ( double  mass) const

Return the SNR multiplication factor for the bin containing mass.

(sum-of-normalised-intensities)^2 of the trimmed distribution. Used by FLASHDeconv for fast SNR computation.

Parameters
[in]massInput mass to query. Out-of-range values are silently clamped to [min_mass, max_mass].
Returns
Squared sum of normalised peak intensities for the bin.

◆ massToIndex_()

Size massToIndex_ ( double  mass) const
private

Convert mass to a bin index. Clamps to [0, isotopes_.size()-1] for out-of-range inputs (no throw).

◆ operator=() [1/2]

PrecalculatedAveragine & operator= ( const PrecalculatedAveragine pc)
default

Copy assignment.

◆ operator=() [2/2]

PrecalculatedAveragine & operator= ( PrecalculatedAveragine &&  pc)
defaultnoexcept

Move assignment.

◆ setMaxIsotopeIndex()

void setMaxIsotopeIndex ( int  index)

Set the isotope-index cap consulted by external callers. Does not influence the cached distributions or other accessors.

Parameters
[in]indexNew isotope-index cap value.

Member Data Documentation

◆ abundant_mono_mass_difference_

std::vector<double> abundant_mono_mass_difference_
private

Per-bin difference between the most-abundant-isotope mass and the monoisotopic mass.

◆ apex_index_

std::vector<Size> apex_index_
private

Per-bin index of the most-abundant isotope inside the trimmed distribution.

◆ average_mono_mass_difference_

std::vector<double> average_mono_mass_difference_
private

Per-bin difference between the average mass and the monoisotopic mass of the trimmed distribution.

◆ isotopes_

std::vector<IsotopeDistribution> isotopes_
private

Per-bin isotope distribution. After trimming and L2 normalisation in the constructor.

◆ left_count_from_apex_

std::vector<int> left_count_from_apex_
private

Per-bin count of significant isotopes on the left side of the apex (running maximum across bins; see the PrecalculatedAveragine parameterised constructor for the unusual running-max semantics).

◆ mass_interval_

double mass_interval_
private

Bin step size (the delta passed to the constructor).

◆ max_isotope_index_

Size max_isotope_index_
private

Cap on the isotope index used externally by FLASHDeconv. NOT initialised by the constructor — call setMaxIsotopeIndex before reading via getMaxIsotopeIndex.

◆ min_mass_

double min_mass_
private

Lower bound of the cached mass range (the min_mass passed to the constructor).

◆ norms_

std::vector<double> norms_
private

Per-bin L2 norm — populated and consumed alongside isotopes_ for fast cosine scoring (vestigial: not consumed by the public API in this header).

◆ right_count_from_apex_

std::vector<int> right_count_from_apex_
private

Per-bin count of significant isotopes on the right side of the apex (running maximum across bins; same caveat as left_count_from_apex_).

◆ snr_mul_factor_

std::vector<double> snr_mul_factor_
private

Per-bin SNR multiplication factor: (sum-of-normalised-intensities)^2 of the trimmed distribution.