OpenMS
Biosaur2Algorithm Class Reference

Implementation of the Biosaur2 feature detection workflow for LC-MS1 data. More...

#include <OpenMS/FEATUREFINDER/Biosaur2Algorithm.h>

Inheritance diagram for Biosaur2Algorithm:
[legend]
Collaboration diagram for Biosaur2Algorithm:
[legend]

Classes

struct  FastHillEntry
 Lightweight index entry for fast m/z-based hill lookup. More...
 
struct  Hill
 Representation of a single hill (continuous m/z trace across adjacent scans). More...
 
struct  IsotopeCandidate
 Candidate isotope peak that can be associated with a monoisotopic hill. More...
 
struct  PatternCandidate
 Internal representation of a candidate isotope pattern. More...
 
struct  PeptideFeature
 Aggregated properties of a detected peptide feature. More...
 

Public Member Functions

 Biosaur2Algorithm ()
 Default constructor. More...
 
void setMSData (const MSExperiment &ms_data)
 Set the MS data used for feature detection (copy version) More...
 
void setMSData (MSExperiment &&ms_data)
 Set the MS data used for feature detection (move version) More...
 
MSExperimentgetMSData ()
 Get non-const reference to MS data. More...
 
const MSExperimentgetMSData () const
 Get const reference to MS data. More...
 
void run (FeatureMap &feature_map)
 Execute the Biosaur2 workflow on the stored MS1 experiment (simplified interface) More...
 
void run (FeatureMap &feature_map, std::vector< Hill > &hills, std::vector< PeptideFeature > &peptide_features)
 Execute the Biosaur2 workflow on the stored MS1 experiment. More...
 
void writeTSV (const std::vector< PeptideFeature > &features, const String &filename) const
 Export detected peptide features to a Biosaur2-compatible TSV file. More...
 
void writeHills (const std::vector< Hill > &hills, const String &filename) const
 Export the detected hills as TSV for diagnostic purposes. More...
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 

Protected Member Functions

void updateMembers_ () override
 Update internal member variables from parameters (called automatically when parameters change) More...
 
- Protected Member Functions inherited from DefaultParamHandler
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 

Private Member Functions

Internal helper methods
double calculatePPM_ (double mz1, double mz2) const
 Calculate the mass accuracy (ppm) between two m/z values. More...
 
double calculateMedian_ (const std::vector< double > &values) const
 Calculate the median of a vector of values. More...
 
double cosineCorrelation1D_ (const std::vector< double > &v1, const std::vector< double > &v2) const
 Compute a cosine correlation between two 1D intensity vectors. More...
 
std::pair< double, SizecheckingCosCorrelationForCarbon_ (const std::vector< double > &theor_full, const std::vector< double > &exp_full, double thresh) const
 Check cosine correlation for averagine-based isotope intensities. More...
 
std::pair< std::vector< double >, SizecomputeAveragine_ (double neutral_mass, double apex_intensity) const
 Compute averagine-based theoretical isotope intensities. More...
 
std::vector< double > meanFilter_ (const std::vector< double > &data, Size window) const
 Apply a mean filter (moving average) to smooth data. More...
 
std::pair< double, double > calibrateMass_ (const std::vector< double > &mass_errors, double bin_width=0.05) const
 Estimate mass calibration parameters from a distribution of mass errors. More...
 
double computeHillMzStep_ (const MSExperiment &exp, double htol_ppm, double min_intensity, double min_mz, double max_mz) const
 Determine m/z binning step for hill detection. More...
 
void processTOF_ (MSExperiment &exp) const
 Apply TOF-specific intensity filtering. More...
 
void centroidProfileSpectra_ (MSExperiment &exp) const
 Centroid profile spectra using PeakPickerHiRes. More...
 
void centroidPASEFData_ (MSExperiment &exp, double mz_step, double pasef_tolerance) const
 Centroid PASEF/TIMS spectra in joint m/z-ion mobility space. More...
 
std::vector< HilldetectHills_ (const MSExperiment &exp, double htol_ppm, double min_intensity, double min_mz, double max_mz, bool use_im, std::vector< double > *hill_mass_diffs=nullptr) const
 Detect hills (continuous m/z traces) in the MS experiment. More...
 
void linkScanToHills_ (const MSSpectrum &spectrum, Size scan_idx, double htol_ppm, double min_intensity, double min_mz, double max_mz, double mz_step, bool use_im_global, std::vector< Hill > &hills, Size &hill_idx_counter, std::vector< Size > &prev_peak_to_hill, const MSSpectrum *&prev_spectrum_ptr, std::map< int, std::vector< int >> &prev_fast_dict, std::vector< int > &prev_im_bins, std::vector< double > *hill_mass_diffs) const
 Link peaks in a single scan to existing hills or start new hills. More...
 
std::vector< HillprocessHills_ (const std::vector< Hill > &hills, Size min_length) const
 Filter and process hills by applying length constraints and computing summary statistics. More...
 
std::vector< HillsplitHills_ (const std::vector< Hill > &hills, double hvf, Size min_length) const
 Split hills at valley positions to separate co-eluting species. More...
 
Size checkIsotopeValleySplit_ (const std::vector< IsotopeCandidate > &isotopes, const std::vector< Hill > &hills, double ivf) const
 Evaluate whether isotope pattern should be truncated at valley positions. More...
 
std::map< int, std::pair< double, double > > performInitialIsotopeCalibration_ (const std::vector< Hill > &hills, double itol_ppm, int min_charge, int max_charge, bool enable_isotope_calib) const
 Perform an initial mass calibration for isotope spacings based on raw hills. More...
 
double buildFastMzLookup_ (const std::vector< Hill > &hills, bool use_im, std::map< int, std::vector< FastHillEntry >> &hills_mz_fast, std::vector< int > &hill_im_bins) const
 Build fast m/z and optional ion-mobility lookup structures for hills. More...
 
std::vector< PatternCandidategenerateIsotopeCandidates_ (const std::vector< Hill > &hills, double itol_ppm, int min_charge, int max_charge, double ivf, double mz_step, const std::map< int, std::vector< FastHillEntry >> &hills_mz_fast, const std::map< Size, Size > &hill_idx_to_index, const std::vector< int > &hill_im_bins, bool use_im) const
 Generate initial isotope pattern candidates for all monoisotopic hills. More...
 
std::vector< PatternCandidateapplyRtFiltering_ (const std::vector< PatternCandidate > &candidates, const std::vector< Hill > &hills, const std::map< Size, Size > &hill_idx_to_index) const
 Apply RT-apex based filtering to isotope pattern candidates. More...
 
std::map< int, std::pair< double, double > > refineIsotopeCalibration_ (const std::vector< PatternCandidate > &candidates, double itol_ppm, bool enable_isotope_calib) const
 Refine isotope mass calibration based on initial pattern candidates. More...
 
std::vector< PatternCandidatefilterByCalibration_ (const std::vector< PatternCandidate > &candidates, const std::vector< Hill > &hills, const std::map< Size, Size > &hill_idx_to_index, const std::map< int, std::pair< double, double >> &isotope_calib_map_ready, bool enable_isotope_calib) const
 Filter isotope pattern candidates using refined calibration and cosine checks. More...
 
std::vector< PeptideFeatureselectNonOverlappingPatterns_ (const std::vector< PatternCandidate > &filtered_ready, const std::vector< Hill > &hills, bool negative_mode, int iuse, double itol_ppm) const
 Greedily select non-overlapping isotope patterns and assemble peptide features. More...
 
std::vector< PeptideFeaturedetectIsotopePatterns_ (std::vector< Hill > &hills, double itol_ppm, int min_charge, int max_charge, bool negative_mode, double ivf, int iuse, bool enable_isotope_calib, bool use_im) const
 Detect isotope patterns and assemble peptide features. More...
 
FeatureMap convertToFeatureMap_ (const std::vector< PeptideFeature > &features, const std::vector< Hill > &hills) const
 Convert peptide features to OpenMS FeatureMap format. More...
 
void debugCheckIsotopeConsistency_ (const char *stage_label, double mono_mz_center, double mono_rt_apex, Size mono_hill_idx, int charge, double itol_ppm, const Hill &iso_hill, Size isotope_number) const
 Debug helper to log obviously inconsistent isotope assignments. More...
 
double cosineCorrelation_ (const std::vector< double > &intensities1, const std::vector< Size > &scans1, const std::vector< double > &intensities2, const std::vector< Size > &scans2) const
 Compute cosine correlation between two intensity traces. More...
 
bool shouldThrowForMissingIM_ (const MSSpectrum &spectrum) const
 Check if missing ion mobility data should be treated as an error. More...
 
void processFAIMSGroup_ (double faims_cv, MSExperiment &group_exp, double original_paseftol, std::vector< Hill > &hills_out, std::vector< PeptideFeature > &features_out)
 Process a single FAIMS compensation voltage group. More...
 

Private Attributes

Member variables
MSExperiment ms_data_
 Input LC-MS data. More...
 
double mini_
 Minimum intensity threshold. More...
 
double minmz_
 Minimum m/z value. More...
 
double maxmz_
 Maximum m/z value. More...
 
double htol_
 Mass tolerance in ppm for hill detection. More...
 
double itol_
 Mass tolerance in ppm for isotope pattern detection. More...
 
double hvf_
 Hill valley factor for splitting hills at valleys. More...
 
double ivf_
 Isotope valley factor for splitting isotope patterns. More...
 
Size minlh_
 Minimum number of scans required for a hill. More...
 
int cmin_
 Minimum charge state to consider. More...
 
int cmax_
 Maximum charge state to consider. More...
 
double pasefmini_
 Minimum intensity for PASEF/TIMS clusters after centroiding. More...
 
Size pasefminlh_
 Minimum number of points per PASEF/TIMS cluster. More...
 
int iuse_
 Number of isotopes for intensity calculation (0=mono, -1=all, N=mono+N) More...
 
bool negative_mode_
 Whether to use negative ion mode. More...
 
bool tof_mode_
 Whether to enable TOF-specific intensity filtering. More...
 
bool profile_mode_
 Whether to centroid profile data using PeakPickerHiRes. More...
 
bool use_hill_calib_
 Whether to use automatic hill mass tolerance calibration. More...
 
bool ignore_iso_calib_
 Whether to disable automatic isotope mass calibration. More...
 
double paseftol_
 Ion mobility tolerance for PASEF/TIMS data (0=disable) More...
 
double hrttol_
 Maximum RT difference between monoisotopic and isotope apex (0=disable) More...
 
String convex_hull_mode_
 Representation of feature convex hulls ("mass_traces" vs. "bounding_box") More...
 
bool faims_merge_features_
 Whether to merge features at different FAIMS CV values representing the same analyte. More...
 

Additional Inherited Members

- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="")
 Writes all parameters to meta values. More...
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 

Detailed Description

Implementation of the Biosaur2 feature detection workflow for LC-MS1 data.

This algorithm is a C++ reimplementation of the Biosaur2 feature detection method, originally developed in Python. It processes centroided LC-MS1 data (with optional profile mode support) to detect peptide features by:

  1. Building retention time-contiguous signal traces called "hills" by linking peaks across consecutive scans
  2. Splitting hills at valley points to separate co-eluting species
  3. Detecting and annotating isotopic patterns based on expected mass differences and intensity correlations
  4. Calculating comprehensive feature properties including m/z, RT, intensity, and charge state

The algorithm supports several advanced features:

  • FAIMS compensation voltage grouping for FAIMS-enabled instruments
  • Ion mobility-aware processing for PASEF/TIMS data
  • TOF-specific intensity filtering
  • Automatic mass calibration for improved accuracy
  • Profile mode centroiding via PeakPickerHiRes

The implementation closely mirrors the reference Python implementation to ensure reproducible results. All core parameters are exposed through the parameter system and can be configured via INI files or programmatically. For detailed parameter descriptions and usage examples, see FeatureFinderLFQ.

Reference: Abdrakhimov, et al. Biosaur: An open-source Python software for liquid chromatography-mass spectrometry peptide feature detection with ion mobility support. Rapid Communications in Mass Spectrometry, 2022. https://doi.org/10.1002/rcm.9045


Class Documentation

◆ OpenMS::Biosaur2Algorithm::FastHillEntry

struct OpenMS::Biosaur2Algorithm::FastHillEntry

Lightweight index entry for fast m/z-based hill lookup.

Stores the hill index in the main Hill vector together with the first and last scan index for quick RT overlap checks when assembling isotope patterns.

Collaboration diagram for Biosaur2Algorithm::FastHillEntry:
[legend]
Class Members
Size first_scan
Size hill_index
Size last_scan

◆ OpenMS::Biosaur2Algorithm::Hill

struct OpenMS::Biosaur2Algorithm::Hill

Representation of a single hill (continuous m/z trace across adjacent scans).

A hill represents a contiguous series of peaks at similar m/z values across multiple consecutive scans. Hills are the fundamental building blocks for feature detection. Each hill stores the raw peak data (indices, m/z, intensity, RT) along with optional ion mobility information for PASEF/TIMS data. Summary statistics (intensity-weighted mean m/z, apex RT/intensity, etc.) are precomputed for efficient downstream processing.

Collaboration diagram for Biosaur2Algorithm::Hill:
[legend]
Class Members
double drift_time_median Median drift time (-1 if not available)
vector< double > drift_times Drift time values (TIMS data), empty if not available.
Size hill_idx Unique identifier for this hill.
vector< double > intensities Intensity values of peaks in this hill.
double intensity_apex Maximum intensity value in the hill.
double intensity_sum Sum of all intensities in the hill.
vector< double > ion_mobilities Ion mobility values, empty if not available.
double ion_mobility_median Intensity-weighted mean ion mobility (median fallback; -1 if not available)
Size length Number of points/peaks in this hill.
vector< double > mz_values m/z values of peaks in this hill
double mz_weighted_mean Intensity-weighted mean m/z (hill center)
vector< Size > peak_indices Indices of peaks within each spectrum.
double rt_apex Retention time at maximum intensity.
double rt_end Retention time of last peak.
double rt_start Retention time of first peak.
vector< double > rt_values Retention time values corresponding to each peak.
vector< Size > scan_indices Indices of spectra containing peaks of this hill.

◆ OpenMS::Biosaur2Algorithm::IsotopeCandidate

struct OpenMS::Biosaur2Algorithm::IsotopeCandidate

Candidate isotope peak that can be associated with a monoisotopic hill.

During isotope pattern detection, candidate hills are evaluated for their suitability as isotope peaks of a monoisotopic hill. This structure stores the association along with quality metrics that quantify how well the candidate matches the expected isotope pattern in terms of mass accuracy and intensity correlation.

Collaboration diagram for Biosaur2Algorithm::IsotopeCandidate:
[legend]
Class Members
double cos_corr Cosine correlation between monoisotopic and isotope intensity traces.
Size hill_idx Index of the hill that represents this isotope peak.
Size isotope_number Ordinal isotope number (0=monoisotopic, 1=first isotope, etc.)
double mass_diff_ppm Mass difference to expected isotope position in ppm.

◆ OpenMS::Biosaur2Algorithm::PatternCandidate

struct OpenMS::Biosaur2Algorithm::PatternCandidate

Internal representation of a candidate isotope pattern.

Encapsulates a monoisotopic hill together with a set of isotope candidates, the associated charge state and quality metrics used during pattern refinement.

Collaboration diagram for Biosaur2Algorithm::PatternCandidate:
[legend]
Class Members
int charge Charge state of the pattern.
double cos_cor_isotopes Cosine correlation in isotope-intensity space.
vector< IsotopeCandidate > isotopes Associated isotope candidates.
Size mono_index Index of the monoisotopic hill in the hills vector.
double mono_mz Monoisotopic m/z.
Size n_scans Number of scans contributing to the monoisotopic hill.

◆ OpenMS::Biosaur2Algorithm::PeptideFeature

struct OpenMS::Biosaur2Algorithm::PeptideFeature

Aggregated properties of a detected peptide feature.

A peptide feature represents a complete isotope pattern detected across multiple scans. It aggregates information from the monoisotopic hill and its associated isotope hills, providing comprehensive characterization including m/z, retention time, charge state, and intensity metrics. For ion mobility data, drift time and ion mobility values are also included.

Collaboration diagram for Biosaur2Algorithm::PeptideFeature:
[legend]
Class Members
int charge Detected charge state.
double drift_time Median drift time (-1 if not available)
double intensity_apex Maximum intensity across the feature.
double intensity_sum Sum of intensities (based on iuse parameter)
double ion_mobility Intensity-weighted mean ion mobility (median fallback; -1 if not available)
vector< IsotopeCandidate > isotopes List of associated isotope peaks.
double mass_calib Calibrated neutral mass of the feature in Da (neutral_mass = mz * charge ± charge * proton_mass; + for negative mode, − for positive mode)
Size mono_hill_idx Index of the monoisotopic hill.
double mz Monoisotopic m/z value.
Size n_isotopes Number of detected isotope peaks.
Size n_scans Number of scans contributing to the monoisotopic hill.
double rt_apex Retention time at maximum intensity.
double rt_end Retention time of last detection.
double rt_start Retention time of first detection.

Constructor & Destructor Documentation

◆ Biosaur2Algorithm()

Default constructor.

Initializes all algorithm parameters with their default values according to the Biosaur2 specification.

Member Function Documentation

◆ applyRtFiltering_()

std::vector<PatternCandidate> applyRtFiltering_ ( const std::vector< PatternCandidate > &  candidates,
const std::vector< Hill > &  hills,
const std::map< Size, Size > &  hill_idx_to_index 
) const
private

Apply RT-apex based filtering to isotope pattern candidates.

Discards isotope hills whose apex RT deviates by more than hrttol_ seconds from the monoisotopic hill apex.

Parameters
candidatesInput candidates (unchanged)
hillsAll hills
hill_idx_to_indexLookup from hill index to position in hills
Returns
RT-filtered candidates (identical to input if RT gating is disabled)

◆ buildFastMzLookup_()

double buildFastMzLookup_ ( const std::vector< Hill > &  hills,
bool  use_im,
std::map< int, std::vector< FastHillEntry >> &  hills_mz_fast,
std::vector< int > &  hill_im_bins 
) const
private

Build fast m/z and optional ion-mobility lookup structures for hills.

Populates a binned m/z lookup and per-hill ion-mobility bins that accelerate subsequent isotope candidate searches.

Parameters
hillsInput hills
use_imWhether ion mobility is used for gating
hills_mz_fastOutput map from m/z bin to hills overlapping that bin
hill_im_binsOutput ion-mobility bin per hill (0 if IM is not used or not available)
Returns
m/z step size used for binning (0 if no binning is possible)

◆ calculateMedian_()

double calculateMedian_ ( const std::vector< double > &  values) const
private

Calculate the median of a vector of values.

Parameters
valuesInput values
Returns
Median value, or 0.0 if input is empty

◆ calculatePPM_()

double calculatePPM_ ( double  mz1,
double  mz2 
) const
private

Calculate the mass accuracy (ppm) between two m/z values.

Parameters
mz1First m/z value
mz2Second m/z value (reference)
Returns
Mass difference in parts per million (ppm)

◆ calibrateMass_()

std::pair<double, double> calibrateMass_ ( const std::vector< double > &  mass_errors,
double  bin_width = 0.05 
) const
private

Estimate mass calibration parameters from a distribution of mass errors.

Creates a histogram of mass errors and identifies the peak to determine systematic shift and spread.

Parameters
mass_errorsCollection of mass errors (in ppm)
bin_widthWidth of histogram bins (default: 0.05 ppm)
Returns
Pair of (shift, sigma) where shift is the systematic error and sigma is the spread

◆ centroidPASEFData_()

void centroidPASEFData_ ( MSExperiment exp,
double  mz_step,
double  pasef_tolerance 
) const
private

Centroid PASEF/TIMS spectra in joint m/z-ion mobility space.

Performs 2D clustering of peaks in the m/z and ion mobility dimensions to reduce data complexity for ion mobility-enabled instruments.

Parameters
expMS experiment to be centroided (modified in place)
mz_stepm/z binning width for clustering
pasef_toleranceIon mobility accuracy (in IM units) used to cluster peaks in the IM dimension

◆ centroidProfileSpectra_()

void centroidProfileSpectra_ ( MSExperiment exp) const
private

Centroid profile spectra using PeakPickerHiRes.

Parameters
expMS experiment to be centroided (modified in place)

◆ checkingCosCorrelationForCarbon_()

std::pair<double, Size> checkingCosCorrelationForCarbon_ ( const std::vector< double > &  theor_full,
const std::vector< double > &  exp_full,
double  thresh 
) const
private

Check cosine correlation for averagine-based isotope intensities.

Returns the best correlation and the optimal truncation position in the experimental vector given a minimum explained averagine fraction and correlation threshold.

◆ checkIsotopeValleySplit_()

Size checkIsotopeValleySplit_ ( const std::vector< IsotopeCandidate > &  isotopes,
const std::vector< Hill > &  hills,
double  ivf 
) const
private

Evaluate whether isotope pattern should be truncated at valley positions.

Checks if there are significant valleys within the isotope traces that suggest the pattern should be cut short.

Parameters
isotopesIsotope candidates to evaluate
hillsAll available hills
ivfIsotope valley factor threshold
Returns
Recommended number of isotopes to retain (may be shorter than input)

◆ computeAveragine_()

std::pair<std::vector<double>, Size> computeAveragine_ ( double  neutral_mass,
double  apex_intensity 
) const
private

Compute averagine-based theoretical isotope intensities.

Uses a C-only binomial model on a 100 Da neutral-mass grid and rescales the resulting probabilities to the provided monoisotopic apex intensity. Returns the theoretical intensities and the index of the expected maximum-intensity isotope.

◆ computeHillMzStep_()

double computeHillMzStep_ ( const MSExperiment exp,
double  htol_ppm,
double  min_intensity,
double  min_mz,
double  max_mz 
) const
private

Determine m/z binning step for hill detection.

Performs a pre-scan over the experiment to find the maximum m/z value after basic filtering and derives the m/z bin width used for fast hill linking.

◆ convertToFeatureMap_()

FeatureMap convertToFeatureMap_ ( const std::vector< PeptideFeature > &  features,
const std::vector< Hill > &  hills 
) const
private

Convert peptide features to OpenMS FeatureMap format.

Transfers all feature properties (m/z, RT, intensity, charge, etc.) to OpenMS Feature objects and constructs convex hulls from the contributing hills. The representation of convex hulls (full mass-trace hulls vs. a single RT–m/z bounding box) is controlled via the convex_hulls parameter.

Parameters
featuresInput peptide features
hillsAll hills (needed to construct convex hulls)
Returns
FeatureMap containing the converted features

◆ cosineCorrelation1D_()

double cosineCorrelation1D_ ( const std::vector< double > &  v1,
const std::vector< double > &  v2 
) const
private

Compute a cosine correlation between two 1D intensity vectors.

Used internally for comparing theoretical and experimental isotope intensities.

◆ cosineCorrelation_()

double cosineCorrelation_ ( const std::vector< double > &  intensities1,
const std::vector< Size > &  scans1,
const std::vector< double > &  intensities2,
const std::vector< Size > &  scans2 
) const
private

Compute cosine correlation between two intensity traces.

Evaluates similarity of intensity profiles between two hills by computing the cosine of the angle between their intensity vectors (considering only overlapping scan indices).

Parameters
intensities1First intensity trace
scans1Scan indices for first trace
intensities2Second intensity trace
scans2Scan indices for second trace
Returns
Cosine correlation coefficient [0, 1], where 1 indicates perfect correlation

◆ debugCheckIsotopeConsistency_()

void debugCheckIsotopeConsistency_ ( const char *  stage_label,
double  mono_mz_center,
double  mono_rt_apex,
Size  mono_hill_idx,
int  charge,
double  itol_ppm,
const Hill iso_hill,
Size  isotope_number 
) const
private

Debug helper to log obviously inconsistent isotope assignments.

Emits warnings when an isotope hill is far away in RT or m/z from the corresponding monoisotopic hill. Intended for diagnosing pathological isotope linking behaviour in complex data (e.g. FAIMS).

Parameters
stage_labelText label indicating the call site (e.g. "detectIsotopePatterns_" or "convertToFeatureMap_")
mono_mz_centerIntensity-weighted mean m/z of the monoisotopic hill
mono_rt_apexRT apex of the monoisotopic hill
mono_hill_idxHill index of the monoisotopic hill
chargeCharge state of the feature
itol_ppmIsotope mass tolerance used at the call site (ppm)
iso_hillIsotope hill to check
isotope_numberOrdinal isotope index (1=first isotope, ...)

◆ detectHills_()

std::vector<Hill> detectHills_ ( const MSExperiment exp,
double  htol_ppm,
double  min_intensity,
double  min_mz,
double  max_mz,
bool  use_im,
std::vector< double > *  hill_mass_diffs = nullptr 
) const
private

Detect hills (continuous m/z traces) in the MS experiment.

Scans through the experiment and groups peaks with similar m/z values across consecutive scans into hills. Optionally collects mass differences for subsequent calibration.

Parameters
expInput MS experiment
htol_ppmMass tolerance in ppm for linking peaks into hills
min_intensityMinimum intensity threshold for peaks
min_mzMinimum m/z value to consider
max_mzMaximum m/z value to consider
use_imWhether to use ion-mobility information during hill linking (2D m/z–IM hills)
hill_mass_diffsOptional output container for mass differences (for calibration)
Returns
Vector of detected hills

◆ detectIsotopePatterns_()

std::vector<PeptideFeature> detectIsotopePatterns_ ( std::vector< Hill > &  hills,
double  itol_ppm,
int  min_charge,
int  max_charge,
bool  negative_mode,
double  ivf,
int  iuse,
bool  enable_isotope_calib,
bool  use_im 
) const
private

Detect isotope patterns and assemble peptide features.

For each candidate monoisotopic hill, searches for matching isotope peaks based on expected mass differences and charge states. Evaluates isotope candidates using cosine correlation and mass accuracy, then assembles complete peptide features.

Parameters
hillsInput hills (will be modified to mark used hills)
itol_ppmMass tolerance in ppm for isotope matching
min_chargeMinimum charge state to consider
max_chargeMaximum charge state to consider
negative_modeWhether to use negative ion mode (affects mass calculations)
ivfIsotope valley factor for splitting patterns
iuseNumber of isotopes to use for intensity calculation (0=mono only, -1=all, etc.)
enable_isotope_calibWhether to apply automatic mass calibration for isotopes
use_imWhether to use ion-mobility information when scoring and grouping isotope patterns
Returns
Vector of detected peptide features

◆ filterByCalibration_()

std::vector<PatternCandidate> filterByCalibration_ ( const std::vector< PatternCandidate > &  candidates,
const std::vector< Hill > &  hills,
const std::map< Size, Size > &  hill_idx_to_index,
const std::map< int, std::pair< double, double >> &  isotope_calib_map_ready,
bool  enable_isotope_calib 
) const
private

Filter isotope pattern candidates using refined calibration and cosine checks.

Applies mass error windows based on the refined calibration and recomputes the isotope-intensity cosine correlation, truncating patterns as necessary.

Parameters
candidatesInput pattern candidates
hillsAll hills
hill_idx_to_indexLookup from hill index to position in hills
isotope_calib_map_readyPer-isotope calibration parameters
enable_isotope_calibWhether isotope calibration is enabled
Returns
Filtered pattern candidates suitable for final greedy selection

◆ generateIsotopeCandidates_()

std::vector<PatternCandidate> generateIsotopeCandidates_ ( const std::vector< Hill > &  hills,
double  itol_ppm,
int  min_charge,
int  max_charge,
double  ivf,
double  mz_step,
const std::map< int, std::vector< FastHillEntry >> &  hills_mz_fast,
const std::map< Size, Size > &  hill_idx_to_index,
const std::vector< int > &  hill_im_bins,
bool  use_im 
) const
private

Generate initial isotope pattern candidates for all monoisotopic hills.

For each potential monoisotopic hill and charge state, searches the fast m/z lookup for matching isotope hills, evaluates RT profile correlation and averagine agreement, and returns all viable pattern candidates.

Parameters
hillsInput hills
itol_ppmIsotope mass tolerance in ppm
min_chargeMinimum charge state to consider
max_chargeMaximum charge state to consider
ivfIsotope valley factor controlling truncation at valleys
mz_stepm/z step size returned by buildFastMzLookup_
hills_mz_fastFast m/z lookup map
hill_idx_to_indexLookup from hill index to position in hills
hill_im_binsIon-mobility bins per hill
use_imWhether to use ion-mobility gating
Returns
Vector of initial isotope pattern candidates

◆ getMSData() [1/2]

MSExperiment& getMSData ( )

Get non-const reference to MS data.

Returns
Reference to the internal MS experiment data

◆ getMSData() [2/2]

const MSExperiment& getMSData ( ) const

Get const reference to MS data.

Returns
Const reference to the internal MS experiment data

◆ linkScanToHills_()

void linkScanToHills_ ( const MSSpectrum spectrum,
Size  scan_idx,
double  htol_ppm,
double  min_intensity,
double  min_mz,
double  max_mz,
double  mz_step,
bool  use_im_global,
std::vector< Hill > &  hills,
Size hill_idx_counter,
std::vector< Size > &  prev_peak_to_hill,
const MSSpectrum *&  prev_spectrum_ptr,
std::map< int, std::vector< int >> &  prev_fast_dict,
std::vector< int > &  prev_im_bins,
std::vector< double > *  hill_mass_diffs 
) const
private

Link peaks in a single scan to existing hills or start new hills.

This method implements the core hill-linking logic for one spectrum, updating the running hill list and the state that is carried across scans (previous fast m/z dictionary, ion-mobility bins, and peak-to-hill assignments).

◆ meanFilter_()

std::vector<double> meanFilter_ ( const std::vector< double > &  data,
Size  window 
) const
private

Apply a mean filter (moving average) to smooth data.

Uses zero-padding at boundaries to match NumPy's 'same' convolution mode.

Parameters
dataInput data to be smoothed
windowHalf-width of the filter kernel (total kernel size = 2*window + 1)
Returns
Filtered data of the same length as input

◆ performInitialIsotopeCalibration_()

std::map<int, std::pair<double, double> > performInitialIsotopeCalibration_ ( const std::vector< Hill > &  hills,
double  itol_ppm,
int  min_charge,
int  max_charge,
bool  enable_isotope_calib 
) const
private

Perform an initial mass calibration for isotope spacings based on raw hills.

Scans all hills for regularly spaced C13 isotope peaks across a range of charge states and derives per-isotope shift/sigma estimates (in ppm). The results are primarily diagnostic and mirror the behaviour of the reference Biosaur2 code.

Parameters
hillsInput hills sorted by m/z
itol_ppmIsotope mass tolerance in ppm
min_chargeMinimum charge state to consider
max_chargeMaximum charge state to consider
enable_isotope_calibWhether isotope calibration is enabled
Returns
Map from isotope index (1..9) to (shift, sigma) in ppm

◆ processFAIMSGroup_()

void processFAIMSGroup_ ( double  faims_cv,
MSExperiment group_exp,
double  original_paseftol,
std::vector< Hill > &  hills_out,
std::vector< PeptideFeature > &  features_out 
)
private

Process a single FAIMS compensation voltage group.

Performs the complete feature detection pipeline (optional PASEF centroiding, hill detection, isotope pattern detection) for a single FAIMS CV group or non-FAIMS data.

Parameters
faims_cvFAIMS compensation voltage for this group (NaN for non-FAIMS)
group_expMS experiment for this group (modified in place)
original_paseftolOriginal PASEF tolerance setting
hills_outOutput container for hills detected in this group
features_outOutput container for peptide features detected in this group

◆ processHills_()

std::vector<Hill> processHills_ ( const std::vector< Hill > &  hills,
Size  min_length 
) const
private

Filter and process hills by applying length constraints and computing summary statistics.

Parameters
hillsInput hills
min_lengthMinimum number of scans required for a hill to be retained
Returns
Filtered hills with computed statistics (median m/z, apex RT/intensity, etc.)

◆ processTOF_()

void processTOF_ ( MSExperiment exp) const
private

Apply TOF-specific intensity filtering.

For TOF instruments, applies a specialized filtering step to reduce noise by considering local intensity distributions.

Parameters
expMS experiment to be filtered (modified in place)

◆ refineIsotopeCalibration_()

std::map<int, std::pair<double, double> > refineIsotopeCalibration_ ( const std::vector< PatternCandidate > &  candidates,
double  itol_ppm,
bool  enable_isotope_calib 
) const
private

Refine isotope mass calibration based on initial pattern candidates.

Aggregates mass errors from all candidates and derives per-isotope shift/sigma estimates to be used for subsequent filtering.

Parameters
candidatesInitial (optionally RT-filtered) pattern candidates
itol_ppmIsotope mass tolerance in ppm
enable_isotope_calibWhether isotope calibration is enabled
Returns
Map from isotope index (1..9) to (shift, sigma) in ppm

◆ run() [1/2]

void run ( FeatureMap feature_map)

Execute the Biosaur2 workflow on the stored MS1 experiment (simplified interface)

This is a convenience overload that discards the intermediate hills and peptide feature vectors. Use this when you only need the final FeatureMap output.

Parameters
feature_mapOutput FeatureMap that will receive the detected features and meta information
Note
The input MS data must be set via setMSData() before calling this method

◆ run() [2/2]

void run ( FeatureMap feature_map,
std::vector< Hill > &  hills,
std::vector< PeptideFeature > &  peptide_features 
)

Execute the Biosaur2 workflow on the stored MS1 experiment.

This is the main processing method that performs the complete feature detection pipeline:

  1. Filters input data to MS1 spectra only
  2. Optionally centroids profile data
  3. Applies TOF-specific intensity filtering if enabled
  4. Groups spectra by FAIMS compensation voltage if applicable
  5. Detects hills (continuous m/z traces across scans)
  6. Splits hills at valley points
  7. Detects isotope patterns and assembles peptide features
  8. Converts features to OpenMS FeatureMap format
Parameters
feature_mapOutput FeatureMap that receives the detected features and meta information
hillsOutput container for all detected hills that survived filtering steps. Useful for diagnostics and quality control.
peptide_featuresOutput container storing intermediate peptide feature representations before conversion to FeatureMap entries
Note
The input MS data must be set via setMSData() before calling this method
All spectra with MS level != 1 will be removed from the internal MS data
If profile_mode is enabled, spectra will be centroided using PeakPickerHiRes

◆ selectNonOverlappingPatterns_()

std::vector<PeptideFeature> selectNonOverlappingPatterns_ ( const std::vector< PatternCandidate > &  filtered_ready,
const std::vector< Hill > &  hills,
bool  negative_mode,
int  iuse,
double  itol_ppm 
) const
private

Greedily select non-overlapping isotope patterns and assemble peptide features.

Sorts candidates by pattern length and quality, resolves conflicts between overlapping hills, performs final averagine/cosine checks and converts surviving patterns into PeptideFeature entries.

Parameters
filtered_readyPattern candidates after calibration-based filtering
hillsAll hills
negative_modeWhether negative ion mode is enabled
iuseNumber of isotopes to use for intensity calculation
itol_ppmIsotope mass tolerance in ppm (for debug sanity checks)
Returns
Final list of peptide features with non-overlapping isotope patterns

◆ setMSData() [1/2]

void setMSData ( const MSExperiment ms_data)

Set the MS data used for feature detection (copy version)

Parameters
ms_dataInput MS experiment containing centroided or profile MS1 spectra

◆ setMSData() [2/2]

void setMSData ( MSExperiment &&  ms_data)

Set the MS data used for feature detection (move version)

Parameters
ms_dataInput MS experiment containing centroided or profile MS1 spectra (will be moved)

◆ shouldThrowForMissingIM_()

bool shouldThrowForMissingIM_ ( const MSSpectrum spectrum) const
private

Check if missing ion mobility data should be treated as an error.

The Python reference implementation gracefully degrades when ion mobility arrays are missing. This method implements the same behavior by returning false to allow processing without IM data.

Parameters
spectrumSpectrum to check
Returns
Currently always returns false (missing IM is not an error)

◆ splitHills_()

std::vector<Hill> splitHills_ ( const std::vector< Hill > &  hills,
double  hvf,
Size  min_length 
) const
private

Split hills at valley positions to separate co-eluting species.

Identifies local minima in the intensity profile and splits hills where the valley factor criterion is satisfied, effectively separating features that were initially grouped together.

Parameters
hillsInput hills to be split
hvfHill valley factor threshold (ratio of valley to neighboring peaks)
min_lengthMinimum length required for split segments to be retained
Returns
Hills after splitting, with valley-separated segments as independent hills

◆ updateMembers_()

void updateMembers_ ( )
overrideprotectedvirtual

Update internal member variables from parameters (called automatically when parameters change)

Reimplemented from DefaultParamHandler.

◆ writeHills()

void writeHills ( const std::vector< Hill > &  hills,
const String filename 
) const

Export the detected hills as TSV for diagnostic purposes.

This method writes detailed information about each detected hill to a TSV file, which is useful for debugging, quality control, and understanding the feature detection process.

Parameters
hillsHills to export (typically obtained from run())
filenameDestination file path for the TSV output

◆ writeTSV()

void writeTSV ( const std::vector< PeptideFeature > &  features,
const String filename 
) const

Export detected peptide features to a Biosaur2-compatible TSV file.

The TSV file format matches the output of the reference Python implementation, allowing for easy comparison and downstream processing with Biosaur2-compatible tools.

Parameters
featuresPeptide features to export (typically obtained from run())
filenameDestination file path for the TSV output

Member Data Documentation

◆ cmax_

int cmax_
private

Maximum charge state to consider.

◆ cmin_

int cmin_
private

Minimum charge state to consider.

◆ convex_hull_mode_

String convex_hull_mode_
private

Representation of feature convex hulls ("mass_traces" vs. "bounding_box")

◆ faims_merge_features_

bool faims_merge_features_
private

Whether to merge features at different FAIMS CV values representing the same analyte.

◆ hrttol_

double hrttol_
private

Maximum RT difference between monoisotopic and isotope apex (0=disable)

◆ htol_

double htol_
private

Mass tolerance in ppm for hill detection.

◆ hvf_

double hvf_
private

Hill valley factor for splitting hills at valleys.

◆ ignore_iso_calib_

bool ignore_iso_calib_
private

Whether to disable automatic isotope mass calibration.

◆ itol_

double itol_
private

Mass tolerance in ppm for isotope pattern detection.

◆ iuse_

int iuse_
private

Number of isotopes for intensity calculation (0=mono, -1=all, N=mono+N)

◆ ivf_

double ivf_
private

Isotope valley factor for splitting isotope patterns.

◆ maxmz_

double maxmz_
private

Maximum m/z value.

◆ mini_

double mini_
private

Minimum intensity threshold.

◆ minlh_

Size minlh_
private

Minimum number of scans required for a hill.

◆ minmz_

double minmz_
private

Minimum m/z value.

◆ ms_data_

MSExperiment ms_data_
private

Input LC-MS data.

◆ negative_mode_

bool negative_mode_
private

Whether to use negative ion mode.

◆ pasefmini_

double pasefmini_
private

Minimum intensity for PASEF/TIMS clusters after centroiding.

◆ pasefminlh_

Size pasefminlh_
private

Minimum number of points per PASEF/TIMS cluster.

◆ paseftol_

double paseftol_
private

Ion mobility tolerance for PASEF/TIMS data (0=disable)

◆ profile_mode_

bool profile_mode_
private

Whether to centroid profile data using PeakPickerHiRes.

◆ tof_mode_

bool tof_mode_
private

Whether to enable TOF-specific intensity filtering.

◆ use_hill_calib_

bool use_hill_calib_
private

Whether to use automatic hill mass tolerance calibration.