Enumerator of precursor-adduct masses for NuXL cross-link searches. More...

#include <OpenMS/ANALYSIS/NUXL/NuXLModificationsGenerator.h>

Static Public Member Functions
static NuXLModificationMassesResult	initModificationMassesNA (const StringList &target_nucleotides, const StringList &nt_groups, const std::set< char > &can_xl, const StringList &mappings, const StringList &modifications, std::string sequence_restriction="", bool cysteine_adduct=false, Int max_length=4)
	Build the full set of precursor adducts (formula + mass + nucleotide composition) for the given configuration.

Static Private Member Functions
static bool	notInSeq (const std::string &res_seq, const std::string &query)
	Test whether `query` is not present as a sorted-window permutation of `res_seq`.

static void	generateTargetSequences (const std::string &res_seq, Size param_pos, const std::map< char, std::vector< char > > &map_source2target, StringList &target_sequences)
	Recursively expand `res_seq` into every target-substituted variant according to `map_source2target`.

Detailed Description

Enumerator of precursor-adduct masses for NuXL cross-link searches.

Builds the table of empirical formulas + monoisotopic masses + disambiguating nucleotide compositions that NuXL searches against. Driven by:

the list of target nucleotides with their monophosphate empirical formulas (e.g. "U=C9H13N2O9P" entries in target_nucleotides),
the list of source→target mappings used to express modifications (e.g. "C->T" entries in mappings),
the list of "nucleotide:+formula-formula" modifications applied to each target nucleotide (e.g. "U:+H2O-H2O" entries in modifications),
and an optional sequence_restriction limiting which oligomer compositions survive (only adducts whose nucleotide string is a substring of the restriction sequence, sort-insensitive at each window, are kept).

Optional cysteine_adduct toggles a hardcoded DTT-derived C4H8S2O2 entry (commonly referred to as the 152 modification) into the result.

Member Function Documentation

◆ generateTargetSequences()

static void generateTargetSequences	(	const std::string &	res_seq,
		Size	param_pos,
		const std::map< char, std::vector< char > > &	map_source2target,
		StringList &	target_sequences
	)

staticprivate

Recursively expand res_seq into every target-substituted variant according to map_source2target.

Starting at param_pos, walks res_seq character by character. Whenever the current character is a key in map_source2target with more than one candidate target, recurses once per candidate that differs from the current character; the unchanged path falls through. After the recursive walk, the resulting sequence is appended to target_sequences only when every position is either a non-mapped character or a character that is simultaneously a source and a valid target nucleotide (i.e. no pure source nucleotide is left). target_sequences is appended to; existing entries are preserved.

◆ initModificationMassesNA()

static NuXLModificationMassesResult initModificationMassesNA	(	const StringList &	target_nucleotides,
		const StringList &	nt_groups,
		const std::set< char > &	can_xl,
		const StringList &	mappings,
		const StringList &	modifications,
		std::string	sequence_restriction = `""`,
		bool	cysteine_adduct = `false`,
		Int	max_length = `4`
	)

static

Build the full set of precursor adducts (formula + mass + nucleotide composition) for the given configuration.

Iterates source-nucleotide combinations up to max_length, applies the per-nucleotide modifications, and (optionally) keeps only those compositions that are substrings of sequence_restriction. When sequence_restriction is empty, every length-k combination of source nucleotides for k in [1, max_length] is considered.

Parameters

[in]	target_nucleotides	Entries in the form `"N=Empirical_formula"` giving the monophosphate formula for each target nucleotide (e.g. `"U=C9H13N2O9P"`). The entries are split on `'='`.
[in]	nt_groups	Group identifiers consulted during the combinatorial expansion (forwarded verbatim into the per-modification application loop).
[in]	can_xl	Set of nucleotides eligible to carry a cross-link; compositions without at least one cross-linkable nucleotide are dropped.
[in]	mappings	Source→target nucleotide mappings in the form `"X->Y"` (split on `"->"`). Used to expand source-nucleotide sequences into target sequences and to derive the source alphabet for the combinatorial enumeration.
[in]	modifications	Per-nucleotide modification descriptors in the form `"N:+formula-formula"` (e.g. `"U:+H2O-H2O"`). The second character must be `'`:' — otherwise the call throws.
[in]	sequence_restriction	Optional NA reference sequence; when set, only oligo compositions that appear as a length-matched windowed permutation of the reference survive. Default empty (no restriction → all combinations up to `max_length`).
[in]	cysteine_adduct	If `true`, insert the hardcoded DTT-derived `C4H8S2O2` adduct (the 152 modification) into the result.
[in]	max_length	Maximum oligomer length to enumerate when `sequence_restriction` is empty. Default `4`.

Returns: Populated NuXLModificationMassesResult.

Exceptions

OpenMS::Exception::MissingInformation when a modifications entry does not follow the "N:+formula-formula" format (specifically when the second character of the entry is not ':').

◆ notInSeq()

static bool notInSeq	(	const std::string &	res_seq,
		const std::string &	query
	)

staticprivate

Test whether query is not present as a sorted-window permutation of res_seq.

Returns true iff no contiguous window of length query.size() in res_seq has the same multi-set of characters as query. The window comparison sorts both sides first, so the test is permutation-insensitive. Empty query short-circuits to false (treated as always present).

Static Public Member Functions

Static Private Member Functions

Detailed Description

Member Function Documentation

◆ generateTargetSequences()

◆ initModificationMassesNA()

◆ notInSeq()