![]() |
OpenMS
|
Enumerator of precursor-adduct masses for NuXL cross-link searches. More...
#include <OpenMS/ANALYSIS/NUXL/NuXLModificationsGenerator.h>
Static Public Member Functions | |
| static NuXLModificationMassesResult | initModificationMassesNA (const StringList &target_nucleotides, const StringList &nt_groups, const std::set< char > &can_xl, const StringList &mappings, const StringList &modifications, std::string sequence_restriction="", bool cysteine_adduct=false, Int max_length=4) |
| Build the full set of precursor adducts (formula + mass + nucleotide composition) for the given configuration. | |
Static Private Member Functions | |
| static bool | notInSeq (const std::string &res_seq, const std::string &query) |
Test whether query is not present as a sorted-window permutation of res_seq. | |
| static void | generateTargetSequences (const std::string &res_seq, Size param_pos, const std::map< char, std::vector< char > > &map_source2target, StringList &target_sequences) |
Recursively expand res_seq into every target-substituted variant according to map_source2target. | |
Enumerator of precursor-adduct masses for NuXL cross-link searches.
Builds the table of empirical formulas + monoisotopic masses + disambiguating nucleotide compositions that NuXL searches against. Driven by:
"U=C9H13N2O9P" entries in target_nucleotides),"C->T" entries in mappings),"nucleotide:+formula-formula" modifications applied to each target nucleotide (e.g. "U:+H2O-H2O" entries in modifications),sequence_restriction limiting which oligomer compositions survive (only adducts whose nucleotide string is a substring of the restriction sequence, sort-insensitive at each window, are kept).Optional cysteine_adduct toggles a hardcoded DTT-derived C4H8S2O2 entry (commonly referred to as the 152 modification) into the result.
|
staticprivate |
Recursively expand res_seq into every target-substituted variant according to map_source2target.
Starting at param_pos, walks res_seq character by character. Whenever the current character is a key in map_source2target with more than one candidate target, recurses once per candidate that differs from the current character; the unchanged path falls through. After the recursive walk, the resulting sequence is appended to target_sequences only when every position is either a non-mapped character or a character that is simultaneously a source and a valid target nucleotide (i.e. no pure source nucleotide is left). target_sequences is appended to; existing entries are preserved.
|
static |
Build the full set of precursor adducts (formula + mass + nucleotide composition) for the given configuration.
Iterates source-nucleotide combinations up to max_length, applies the per-nucleotide modifications, and (optionally) keeps only those compositions that are substrings of sequence_restriction. When sequence_restriction is empty, every length-k combination of source nucleotides for k in [1, max_length] is considered.
| [in] | target_nucleotides | Entries in the form "N=Empirical_formula" giving the monophosphate formula for each target nucleotide (e.g. "U=C9H13N2O9P"). The entries are split on '='. |
| [in] | nt_groups | Group identifiers consulted during the combinatorial expansion (forwarded verbatim into the per-modification application loop). |
| [in] | can_xl | Set of nucleotides eligible to carry a cross-link; compositions without at least one cross-linkable nucleotide are dropped. |
| [in] | mappings | Source→target nucleotide mappings in the form "X->Y" (split on "->"). Used to expand source-nucleotide sequences into target sequences and to derive the source alphabet for the combinatorial enumeration. |
| [in] | modifications | Per-nucleotide modification descriptors in the form "N:+formula-formula" (e.g. "U:+H2O-H2O"). The second character must be ':' — otherwise the call throws. |
| [in] | sequence_restriction | Optional NA reference sequence; when set, only oligo compositions that appear as a length-matched windowed permutation of the reference survive. Default empty (no restriction → all combinations up to max_length). |
| [in] | cysteine_adduct | If true, insert the hardcoded DTT-derived C4H8S2O2 adduct (the 152 modification) into the result. |
| [in] | max_length | Maximum oligomer length to enumerate when sequence_restriction is empty. Default 4. |
| OpenMS::Exception::MissingInformation | when a modifications entry does not follow the "N:+formula-formula" format (specifically when the second character of the entry is not ':'). |
|
staticprivate |
Test whether query is not present as a sorted-window permutation of res_seq.
Returns true iff no contiguous window of length query.size() in res_seq has the same multi-set of characters as query. The window comparison sorts both sides first, so the test is permutation-insensitive. Empty query short-circuits to false (treated as always present).