OpenMS
Loading...
Searching...
No Matches
ProSEAlgorithm.h File Reference
Include dependency graph for ProSEAlgorithm.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  ProSEAlgorithm
 Fragment-index-based peptide database search algorithm (experimental). More...
 
struct  ProSEAlgorithm::RunStatistics
 Per-run identification statistics for the end-of-search report. More...
 
struct  ProSEAlgorithm::SharedSearchStats
 Configuration, database and fragment-index facts shared across all input files of one ProSE invocation. More...
 
struct  ProSEAlgorithm::SearchResult
 Comprehensive search result including modification analysis. More...
 
struct  ProSEAlgorithm::MultiFileSearchResult
 Multi-file search result bundle. More...
 
struct  ProSEAlgorithm::SearchContext
 Prepared per-database state shared across multiple spectrum files. More...
 
struct  ProSEAlgorithm::AnnotatedHit_
 Slimmer structure as storing all scored candidates in PeptideHit objects takes too much space. More...
 
struct  ProSEAlgorithm::DecoyStrategy_
 Resolved decoy handling for one concrete input database. More...
 
struct  ProSEAlgorithm::CalibrationResult_
 Result of a calibration pass. More...
 

Namespaces

namespace  OpenMS
 Main OpenMS namespace.
 

Class Documentation

◆ OpenMS::ProSEAlgorithm::RunStatistics

struct OpenMS::ProSEAlgorithm::RunStatistics

Per-run identification statistics for the end-of-search report.

Populated by collectRunStatistics_() once a single spectrum file has been searched (post-FDR), plus a few fields captured at well-defined points during search() (target/decoy counts pre-FDR, achieved q-value, timing). All counts refer to one input file. Cross-file/shared facts (database, fragment index, configuration) live in SharedSearchStats instead.

Collaboration diagram for ProSEAlgorithm::RunStatistics:
[legend]
Class Members
double achieved_psm_fdr = -1.0 max retained q-value after FDR (<0 = n/a)
map< Int, Size > charge_histogram precursor charge -> PSM count
Size decoy_psms = 0 decoy PSMs in the final IDs (after FDR, if applied)
bool fdr_applied = false true if PSM-level FDR filtering ran
double frag_err_mad = 0.0
double frag_err_median = 0.0
double frag_err_recommended = 0.0
bool frag_tol_valid = false true if fragment-error estimate present
double hyperscore_max = 0.0
double hyperscore_median = 0.0
double hyperscore_min = 0.0
string input_file spectrum file this run searched (basename or path)
Size matched_spectra = 0 spectra with >=1 retained PSM in the final IDs (after FDR, if applied)
map< Size, Size > missed_cleavage_histogram missed cleavages -> PSM count
Size ms2_spectra = 0 number of MS2 spectra in the input
double prec_err_mad = 0.0
double prec_err_median = 0.0
double prec_err_recommended = 0.0
bool prec_tol_valid = false true if precursor-error estimate present
bool score_stats_valid = false true if hyperscore_* below are meaningful
double seconds_calibration = 0.0 calibration pass wall time (0 if disabled)
double seconds_fdr = 0.0 FDR filtering wall time (0 if not applied)
double seconds_search = 0.0 scoring + post-processing wall time
Size target_psms = 0 target PSMs in the final IDs (after FDR, if applied)
Size unique_peptides = 0 distinct peptide sequences among top hits
Size unique_proteins = 0 distinct protein accessions among top hits

◆ OpenMS::ProSEAlgorithm::SharedSearchStats

struct OpenMS::ProSEAlgorithm::SharedSearchStats

Configuration, database and fragment-index facts shared across all input files of one ProSE invocation.

Computed once (the fragment index is built once and reused), so these costs/counts must NOT be summed per file. Populated by the multi-file searchWithModificationAnalysis() overloads.

Collaboration diagram for ProSEAlgorithm::SharedSearchStats:
[legend]
Class Members
bool calibration_enabled = false
bool chunked = false
string database_file FASTA path (empty for in-memory db)
Size db_decoy_proteins = 0 decoy entries in the searched (augmented) db
Size db_target_proteins = 0 target entries in the searched (augmented) db
string decoy_mode "generated" | "external" | "none (target-only)"
string enzyme
vector< string > fixed_mods
double fragment_tol = 0.0
string fragment_tol_unit
Size indexed_fragments = 0 theoretical fragments in the index (summed over chunks)
Size indexed_peptides = 0 peptides in the fragment index (summed over chunks)
vector< string > ion_series
Int max_charge = 0
Int min_charge = 0
Size missed_cleavages = 0
bool open_search = false
double precursor_tol_lower = 0.0
string precursor_tol_unit
double precursor_tol_upper = 0.0
double protein_fdr_threshold = 0.0
double psm_fdr_threshold = 0.0
double seconds_index_build = 0.0 decoy generation + fragment index build wall time
double seconds_total = 0.0 whole-search wall time (set by the caller)
bool snes_mode = false
vector< string > variable_mods

◆ OpenMS::ProSEAlgorithm::SearchResult

struct OpenMS::ProSEAlgorithm::SearchResult

Comprehensive search result including modification analysis.

This structure contains all outputs from an open search including:

  • Standard protein and peptide identifications
  • Delta mass statistics table (histogram of mass shifts)
  • PTM statistics table (mapped modifications with residue analysis)
  • Per-run identification statistics for the end-of-search report
Collaboration diagram for ProSEAlgorithm::SearchResult:
[legend]
Class Members
ExitCodes exit_code = ExitCodes::EXECUTION_OK
bool is_open_search = false
OpenSearchAnalysisResult modification_analysis
PeptideIdentificationList peptide_ids
vector< ProteinIdentification > protein_ids
RunStatistics stats

◆ OpenMS::ProSEAlgorithm::MultiFileSearchResult

struct OpenMS::ProSEAlgorithm::MultiFileSearchResult

Multi-file search result bundle.

Returned by the file-list searchWithModificationAnalysis() overloads. Holds one SearchResult per input file (in per_file, in input order) and a single aggregate result whose peptide_ids are the concatenation of all per-file PSMs and whose modification_analysis is computed once on the pooled set of PSMs.

Special cases for aggregate:

  • When the input list contains exactly one file, aggregate is left almost-empty (only is_open_search and exit_code are set) — the single-file pooled aggregate would just duplicate per_file[0] and re-run modification analysis on the same PSMs. Callers should use per_file[0] for the result in this case.
  • When every per-file run failed, aggregate.exit_code is set to the first non-OK per-file exit code (so callers can inspect it without walking the per_file vector).

The aggregate's protein_ids template is taken from the first successful per-file result (search parameters are identical across files by construction), with the primary MS run path overwritten to list every input file.

Collaboration diagram for ProSEAlgorithm::MultiFileSearchResult:
[legend]
Class Members
SearchResult aggregate
bool decoy_is_prefix = true Position of decoy_string (true = prefix, false = suffix).
string decoy_string

Effective decoy marker resolved from the shared database, for a caller-side merged-PSM protein-FDR step (e.g. the ProSE TOPP tool's -out_merged path). Empty when the search was target-only (decoys=ignore).

bool have_decoys = false True when the searched databases contained decoys (FDR possible).
vector< SearchResult > per_file
SharedSearchStats shared

Configuration / database / fragment-index facts shared across all input files (the index is built once and reused), for the end-of-search report.

◆ OpenMS::ProSEAlgorithm::SearchContext

struct OpenMS::ProSEAlgorithm::SearchContext

Prepared per-database state shared across multiple spectrum files.

Holds the (decoy-augmented) protein database and the built FragmentIndex so that searching N spectrum files against the same FASTA pays the index build cost only once. Construct via prepareContext() and pass to the context-taking search() overload.

Collaboration diagram for ProSEAlgorithm::SearchContext:
[legend]
Class Members
vector< FASTAEntry > db
bool decoy_is_prefix = true Position of decoy_string (true = prefix, false = suffix).
string decoy_string

Effective decoy marker carried by db (auto-detected for external decoys, or the configured prefix for internally generated ones). Empty when the search is target-only. Feeds PeptideIndexing and the protein-level FDR so they recognise the same decoys that were searched.

FragmentIndex fragment_index
bool have_decoys = false

True when db contains decoy entries (generated or external), i.e. target-decoy FDR is possible.

bool release_fragment_index_after_scoring = false

When true, the context-taking search() overload will release fragment_index (via clear()) after scoreSpectraAgainstIndex_ returns, reclaiming its heap footprint before the PeptideIndexing Aho-Corasick pass — which is otherwise the RSS high-water mark on large databases. Set by single-use callers (e.g. the internal search(spectra, fasta_db, ...) wrapper that creates a throw-away ctx). Default false preserves the reuse contract for callers that built ctx via prepareContext() and want to run multiple search() calls against it.

◆ OpenMS::ProSEAlgorithm::DecoyStrategy_

struct OpenMS::ProSEAlgorithm::DecoyStrategy_

Resolved decoy handling for one concrete input database.

Produced by resolveDecoyStrategy_() and consumed by buildDecoyAugmentedDB_() and the downstream PeptideIndexing / FDR steps, so the same decoys that are searched are also the ones scored.

Collaboration diagram for ProSEAlgorithm::DecoyStrategy_:
[legend]
Class Members
string decoy_string effective marker for PeptideIndexing + protein FDR
bool generate {false} reverse target proteins to synthesise decoys
bool have_decoys {false} searched DB will contain decoys (FDR possible)
bool is_prefix {true} position of decoy_string
bool strip_existing {false} drop pre-existing decoy entries before searching
bool strip_is_prefix {true} position of strip_string
string strip_string marker of pre-existing decoys to strip

◆ OpenMS::ProSEAlgorithm::CalibrationResult_

struct OpenMS::ProSEAlgorithm::CalibrationResult_

Result of a calibration pass.

Holds the estimated precursor and fragment tolerances computed from confident PSMs during the calibration pass. When success is false, the tolerance values are undefined and should not be used.

Collaboration diagram for ProSEAlgorithm::CalibrationResult_:
[legend]
Class Members
double cal_lower {0} calibrated lower magnitude (valid iff !extreme_bias && success)
double cal_upper {0} calibrated upper magnitude (valid iff !extreme_bias && success)
bool extreme_bias {false} |shift| >= spread — writeback skipped (test observability)
double fragment_shift {0} reserved for future fragment m/z shift correction
double fragment_tolerance {0} estimated fragment tolerance (same unit as configured)
double precursor_shift {0} signed median of precursor errors (calibration bias)
double precursor_spread {0} median(|e - shift|) + 3 * MAD(|e - shift|)
bool success {false} true if enough PSMs were found for reliable estimation