OpenMS
Loading...
Searching...
No Matches
NuXLModificationsGenerator Class Reference

Enumerator of precursor-adduct masses for NuXL cross-link searches. More...

#include <OpenMS/ANALYSIS/NUXL/NuXLModificationsGenerator.h>

Static Public Member Functions

static NuXLModificationMassesResult initModificationMassesNA (const StringList &target_nucleotides, const StringList &nt_groups, const std::set< char > &can_xl, const StringList &mappings, const StringList &modifications, std::string sequence_restriction="", bool cysteine_adduct=false, Int max_length=4)
 Build the full set of precursor adducts (formula + mass + nucleotide composition) for the given configuration.
 

Static Private Member Functions

static bool notInSeq (const std::string &res_seq, const std::string &query)
 Test whether query is not present as a sorted-window permutation of res_seq.
 
static void generateTargetSequences (const std::string &res_seq, Size param_pos, const std::map< char, std::vector< char > > &map_source2target, StringList &target_sequences)
 Recursively expand res_seq into every target-substituted variant according to map_source2target.
 

Detailed Description

Enumerator of precursor-adduct masses for NuXL cross-link searches.

Builds the table of empirical formulas + monoisotopic masses + disambiguating nucleotide compositions that NuXL searches against. Driven by:

  • the list of target nucleotides with their monophosphate empirical formulas (e.g. "U=C9H13N2O9P" entries in target_nucleotides),
  • the list of source→target mappings used to express modifications (e.g. "C->T" entries in mappings),
  • the list of "nucleotide:+formula-formula" modifications applied to each target nucleotide (e.g. "U:+H2O-H2O" entries in modifications),
  • and an optional sequence_restriction limiting which oligomer compositions survive (only adducts whose nucleotide string is a substring of the restriction sequence, sort-insensitive at each window, are kept).

Optional cysteine_adduct toggles a hardcoded DTT-derived C4H8S2O2 entry (commonly referred to as the 152 modification) into the result.

Member Function Documentation

◆ generateTargetSequences()

static void generateTargetSequences ( const std::string &  res_seq,
Size  param_pos,
const std::map< char, std::vector< char > > &  map_source2target,
StringList target_sequences 
)
staticprivate

Recursively expand res_seq into every target-substituted variant according to map_source2target.

Starting at param_pos, walks res_seq character by character. Whenever the current character is a key in map_source2target with more than one candidate target, recurses once per candidate that differs from the current character; the unchanged path falls through. After the recursive walk, the resulting sequence is appended to target_sequences only when every position is either a non-mapped character or a character that is simultaneously a source and a valid target nucleotide (i.e. no pure source nucleotide is left). target_sequences is appended to; existing entries are preserved.

◆ initModificationMassesNA()

static NuXLModificationMassesResult initModificationMassesNA ( const StringList target_nucleotides,
const StringList nt_groups,
const std::set< char > &  can_xl,
const StringList mappings,
const StringList modifications,
std::string  sequence_restriction = "",
bool  cysteine_adduct = false,
Int  max_length = 4 
)
static

Build the full set of precursor adducts (formula + mass + nucleotide composition) for the given configuration.

Iterates source-nucleotide combinations up to max_length, applies the per-nucleotide modifications, and (optionally) keeps only those compositions that are substrings of sequence_restriction. When sequence_restriction is empty, every length-k combination of source nucleotides for k in [1, max_length] is considered.

Parameters
[in]target_nucleotidesEntries in the form "N=Empirical_formula" giving the monophosphate formula for each target nucleotide (e.g. "U=C9H13N2O9P"). The entries are split on '='.
[in]nt_groupsGroup identifiers consulted during the combinatorial expansion (forwarded verbatim into the per-modification application loop).
[in]can_xlSet of nucleotides eligible to carry a cross-link; compositions without at least one cross-linkable nucleotide are dropped.
[in]mappingsSource→target nucleotide mappings in the form "X->Y" (split on "->"). Used to expand source-nucleotide sequences into target sequences and to derive the source alphabet for the combinatorial enumeration.
[in]modificationsPer-nucleotide modification descriptors in the form "N:+formula-formula" (e.g. "U:+H2O-H2O"). The second character must be ':' — otherwise the call throws.
[in]sequence_restrictionOptional NA reference sequence; when set, only oligo compositions that appear as a length-matched windowed permutation of the reference survive. Default empty (no restriction → all combinations up to max_length).
[in]cysteine_adductIf true, insert the hardcoded DTT-derived C4H8S2O2 adduct (the 152 modification) into the result.
[in]max_lengthMaximum oligomer length to enumerate when sequence_restriction is empty. Default 4.
Returns
Populated NuXLModificationMassesResult.
Exceptions
OpenMS::Exception::MissingInformationwhen a modifications entry does not follow the "N:+formula-formula" format (specifically when the second character of the entry is not ':').

◆ notInSeq()

static bool notInSeq ( const std::string &  res_seq,
const std::string &  query 
)
staticprivate

Test whether query is not present as a sorted-window permutation of res_seq.

Returns true iff no contiguous window of length query.size() in res_seq has the same multi-set of characters as query. The window comparison sorts both sides first, so the test is permutation-insensitive. Empty query short-circuits to false (treated as always present).