OpenMS
Loading...
Searching...
No Matches
ProForma Class Reference

ProForma v2 peptidoform notation parser and data structures. More...

#include <OpenMS/CHEMISTRY/ProForma.h>

Classes

struct  AdductIon
 Adduct ion specification for charge state. More...
 
struct  AmbiguousRegion
 Ambiguous amino acid region. More...
 
struct  ConversionIssue
 Description of a conversion issue from Peptidoform to AASequence. More...
 
struct  CrossLinkGroup
 Cross-link group connecting sites across chains. More...
 
struct  CvAccession
 Controlled vocabulary accession for a modification. More...
 
struct  FormulaTag
 Chemical formula with optional charge. More...
 
struct  GlobalModification
 Global modification applied to specific locations. More...
 
struct  GlycanComposition
 Glycan composition specification. More...
 
struct  InfoTag
 Info tag for arbitrary text annotations. More...
 
struct  IsotopeReplacement
 Isotope replacement for stable isotope labeling. More...
 
struct  Label
 Label for cross-links, branches, or ambiguous grouping. More...
 
struct  LabileModification
 Labile modification that may be lost during fragmentation. More...
 
struct  MassDelta
 Mass delta modification with optional source hint. More...
 
struct  Modification
 A modification with one or more alternative tags. More...
 
struct  ModifiedRange
 Modified sequence range with shared modifications. More...
 
struct  NamedMod
 Named modification with optional CV prefix hint. More...
 
class  ParseError
 Structured parse error with context for ProForma parsing. More...
 
struct  Peptidoform
 A single peptidoform (one peptide chain) More...
 
struct  PeptidoformIon
 A peptidoform ion (one or more chains with optional charge) More...
 
struct  PositionConstraint
 Position constraint specifying allowed residues for a modification. More...
 
struct  SequenceElement
 A single amino acid with its modifications. More...
 
struct  UnlocalisedMod
 Unlocalised modification with optional occurrence count. More...
 

Public Types

enum class  ConversionPolicy { FAIL_ON_LOSS , DROP_UNLOCALISED , BEST_EFFORT }
 Conversion policy for transforming Peptidoform to AASequence. More...
 
enum class  ConversionIssueType {
  UNRESOLVED_MOD , UNLOCALISED_MOD , LABILE_MOD , GLOBAL_MOD ,
  AMBIGUOUS_MOD , AMBIGUOUS_REGION , MODIFIED_RANGE , CROSS_LINK ,
  MULTIPLE_CHAINS , ALTERNATIVE_MODS , UNSUPPORTED_FEATURE
}
 Issue type for AASequence conversion problems. More...
 
enum class  WriteMode { LOSSLESS , CANONICAL }
 Write mode for ProForma string serialization. More...
 
enum class  CvDatabase {
  UNIMOD , MOD , RESID , XLMOD ,
  GNO
}
 Controlled vocabulary database prefix for modification accessions. More...
 
enum class  ErrorCode {
  UNEXPECTED_CHARACTER , UNCLOSED_BRACKET , UNMATCHED_BRACKET , INVALID_CV_PREFIX ,
  INVALID_CV_ACCESSION , INVALID_AMINO_ACID , INVALID_MASS_VALUE , INVALID_FORMULA ,
  UNKNOWN_MONOSACCHARIDE , DANGLING_CROSSLINK_LABEL , EMPTY_SEQUENCE , INVALID_CHARGE ,
  INVALID_OCCURRENCE_SPECIFIER , UNEXPECTED_END_OF_INPUT , INTERNAL_ERROR
}
 Error codes for programmatic handling of ProForma parse errors. More...
 
using ModificationTag = std::variant< CvAccession, NamedMod, MassDelta, FormulaTag, GlycanComposition, InfoTag, PositionConstraint >
 Variant type representing any modification tag content.
 
using SequenceSection = std::variant< SequenceElement, AmbiguousRegion, ModifiedRange >
 Variant type representing a section of the sequence.
 
using GlobalModEntry = std::variant< IsotopeReplacement, GlobalModification >
 Variant type for global modification entries.
 
using ChargeState = std::variant< int, std::vector< AdductIon > >
 Charge state specification.
 

Static Public Member Functions

static Peptidoform parse (const String &input)
 Parse a ProForma string into a Peptidoform AST.
 
static PeptidoformIon parseIon (const String &input)
 Parse a ProForma string into a PeptidoformIon AST.
 
static String toString (const Peptidoform &pf, WriteMode mode=WriteMode::LOSSLESS)
 Convert a Peptidoform AST back to ProForma string notation.
 
static String toString (const PeptidoformIon &pfi, WriteMode mode=WriteMode::LOSSLESS)
 Convert a PeptidoformIon AST back to ProForma string notation.
 
static String toJSON (const Peptidoform &pf)
 Convert a Peptidoform to JSON string representation.
 
static Peptidoform peptidoformFromJSON (const String &json_str)
 Construct a Peptidoform from JSON string.
 
static String toJSON (const PeptidoformIon &pfi)
 Convert a PeptidoformIon to JSON string representation.
 
static PeptidoformIon peptidoformIonFromJSON (const String &json_str)
 Construct a PeptidoformIon from JSON string.
 
static void resolveModifications (Peptidoform &pf)
 Resolve all modifications in a Peptidoform using ModificationsDB.
 
static AASequence toAASequence (const Peptidoform &pf, ConversionPolicy policy=ConversionPolicy::FAIL_ON_LOSS)
 Convert a Peptidoform to an OpenMS AASequence.
 
static Peptidoform fromAASequence (const AASequence &seq)
 Create a Peptidoform from an OpenMS AASequence.
 
static bool isRepresentableAsAASequence (const Peptidoform &pf)
 Check if a Peptidoform can be fully represented as an AASequence.
 
static std::vector< ConversionIssuegetAASequenceConversionIssues (const Peptidoform &pf)
 Get a list of all issues that would arise during AASequence conversion.
 
static bool canCalculateMass (const Peptidoform &pf)
 Check if mass can be calculated for a Peptidoform.
 
static bool canCalculateMass (const PeptidoformIon &pfi)
 Check if mass can be calculated for a PeptidoformIon.
 
static std::vector< ConversionIssuegetMassCalculationIssues (const Peptidoform &pf)
 Get issues preventing mass calculation for a Peptidoform.
 
static std::vector< ConversionIssuegetMassCalculationIssues (const PeptidoformIon &pfi)
 Get issues preventing mass calculation for a PeptidoformIon.
 
static double getMonoWeight (const Peptidoform &pf)
 Calculate monoisotopic mass of a Peptidoform.
 
static double getMonoWeight (const PeptidoformIon &pfi)
 Calculate monoisotopic mass of a PeptidoformIon.
 
static double getMZ (const PeptidoformIon &pfi)
 Calculate m/z for a PeptidoformIon at its specified charge state.
 
static double getMZ (const Peptidoform &pf, int charge)
 Calculate m/z for a Peptidoform at a given charge state.
 
static std::optional< double > tryGetMonoWeight (const Peptidoform &pf)
 Try to calculate monoisotopic mass of a Peptidoform (non-throwing)
 
static std::optional< double > tryGetMonoWeight (const Peptidoform &pf, std::vector< ConversionIssue > &issues_out)
 Try to calculate monoisotopic mass with diagnostic information.
 
static std::optional< double > tryGetMonoWeight (const PeptidoformIon &pfi)
 Try to calculate monoisotopic mass of a PeptidoformIon (non-throwing)
 
static std::optional< double > tryGetMonoWeight (const PeptidoformIon &pfi, std::vector< ConversionIssue > &issues_out)
 Try to calculate monoisotopic mass of PeptidoformIon with diagnostics.
 
static std::optional< double > tryGetMZ (const Peptidoform &pf, int charge)
 Try to calculate m/z for a Peptidoform (non-throwing)
 
static std::optional< double > tryGetMZ (const Peptidoform &pf, int charge, std::vector< ConversionIssue > &issues_out)
 Try to calculate m/z for a Peptidoform with diagnostics.
 
static std::optional< double > tryGetMZ (const PeptidoformIon &pfi)
 Try to calculate m/z for a PeptidoformIon (non-throwing)
 
static std::optional< double > tryGetMZ (const PeptidoformIon &pfi, std::vector< ConversionIssue > &issues_out)
 Try to calculate m/z for a PeptidoformIon with diagnostics.
 
static bool canGenerateSpectrum (const Peptidoform &pf)
 Check if a theoretical spectrum can be generated for a Peptidoform.
 
static bool canGenerateSpectrum (const PeptidoformIon &pfi)
 Check if a theoretical spectrum can be generated for a PeptidoformIon.
 
static std::vector< ConversionIssuegetSpectrumGenerationIssues (const Peptidoform &pf)
 Get issues preventing spectrum generation for a Peptidoform.
 
static std::vector< ConversionIssuegetSpectrumGenerationIssues (const PeptidoformIon &pfi)
 Get issues preventing spectrum generation for a PeptidoformIon.
 
static MSSpectrum generateSpectrum (const Peptidoform &pf, int min_charge=1, int max_charge=1, const std::string &ion_types="by", bool add_losses=false, bool add_metainfo=true)
 Generate a theoretical MS/MS spectrum for a Peptidoform.
 
static MSSpectrum generateSpectrum (const PeptidoformIon &pfi, int min_charge=1, int max_charge=1, const std::string &ion_types="by", bool add_losses=false, bool add_metainfo=true)
 Generate a theoretical MS/MS spectrum for a PeptidoformIon.
 
static const char * errorCodeToString (ErrorCode code)
 Convert error code to human-readable string.
 

Detailed Description

ProForma v2 peptidoform notation parser and data structures.

This class provides parsing, serialization, and conversion functionality for the ProForma v2 peptidoform notation standard. It contains nested types that form the Abstract Syntax Tree (AST) representation of parsed ProForma strings.

Usage example:

String input = "EM[UNIMOD:35]K";
// pf now contains the parsed AST
// Convert back to string
// Convert to AASequence
Representation of a peptide/protein sequence.
Definition AASequence.h:88
static Peptidoform parse(const String &input)
Parse a ProForma string into a Peptidoform AST.
static String toString(const Peptidoform &pf, WriteMode mode=WriteMode::LOSSLESS)
Convert a Peptidoform AST back to ProForma string notation.
static AASequence toAASequence(const Peptidoform &pf, ConversionPolicy policy=ConversionPolicy::FAIL_ON_LOSS)
Convert a Peptidoform to an OpenMS AASequence.
A single peptidoform (one peptide chain)
Definition ProForma.h:516
A more convenient string class.
Definition String.h:34

Class Documentation

◆ OpenMS::ProForma::AdductIon

struct OpenMS::ProForma::AdductIon

Adduct ion specification for charge state.

Represents an adduct ion contributing to the charge state of a peptidoform ion. Multiple adducts can combine to give the total charge.

ProForma notation: Na:z+1 in /[Na:z+1,H:z+1]

Collaboration diagram for ProForma::AdductIon:
[legend]
Class Members
int charge The charge contribution of this adduct.
String formula The adduct formula (e.g., "Na", "H", "K")
optional< int > occurrence Optional occurrence count from ^N suffix.

◆ OpenMS::ProForma::AmbiguousRegion

struct OpenMS::ProForma::AmbiguousRegion

Ambiguous amino acid region.

Represents a region where the amino acid sequence is uncertain. ProForma notation: (?DQ) means either D or Q at this position.

Collaboration diagram for ProForma::AmbiguousRegion:
[legend]
Class Members
vector< SequenceElement > elements The ambiguous amino acid possibilities.

◆ OpenMS::ProForma::ConversionIssue

struct OpenMS::ProForma::ConversionIssue

Description of a conversion issue from Peptidoform to AASequence.

Records problems encountered when attempting to convert a ProForma Peptidoform to an OpenMS AASequence representation.

Collaboration diagram for ProForma::ConversionIssue:
[legend]
Class Members
String description Human-readable description.
size_t position Position in sequence (SIZE_MAX if not position-specific)
ConversionIssueType type The type of issue.

◆ OpenMS::ProForma::CrossLinkGroup

struct OpenMS::ProForma::CrossLinkGroup

Cross-link group connecting sites across chains.

Groups together all sites that share a cross-link label. Each site is identified by its chain index and position within that chain.

Derived during parsing from matching #XL labels.

Collaboration diagram for ProForma::CrossLinkGroup:
[legend]
Class Members
String label The cross-link label (e.g., XL1)
vector< pair< size_t, size_t > > sites (chain_index, site_index) pairs

◆ OpenMS::ProForma::CvAccession

struct OpenMS::ProForma::CvAccession

Controlled vocabulary accession for a modification.

Represents a modification specified by a CV accession number, e.g., UNIMOD:35 for Oxidation. The accession string contains only the identifier portion (e.g., "35" for UNIMOD:35).

Collaboration diagram for ProForma::CvAccession:
[legend]
Class Members
String accession The accession identifier (e.g., "35" for UNIMOD:35, full string for GNO)
CvDatabase database The source database (UNIMOD, MOD, RESID, XLMOD, or GNO)

◆ OpenMS::ProForma::FormulaTag

struct OpenMS::ProForma::FormulaTag

Chemical formula with optional charge.

Represents a modification specified by chemical formula. The optional charge is specified via the :z+N suffix in ProForma (e.g., Formula:C12H20O2:z+2).

Collaboration diagram for ProForma::FormulaTag:
[legend]
Class Members
optional< int > charge Optional charge from :z+N suffix.
String formula_string The chemical formula string (e.g., "C12H20O2")

◆ OpenMS::ProForma::GlobalModification

struct OpenMS::ProForma::GlobalModification

Global modification applied to specific locations.

A global modification applies the same modification to all occurrences of specified residues or termini throughout the peptide.

ProForma notation: <[TMT6plex]@K,N-term>

Collaboration diagram for ProForma::GlobalModification:
[legend]
Class Members
vector< String > locations Target locations ("K", "N-term", "C-term:K", etc.)
Modification modification The modification to apply.

◆ OpenMS::ProForma::GlycanComposition

struct OpenMS::ProForma::GlycanComposition

Glycan composition specification.

Represents a glycan modification as a composition of monosaccharides. Each component can be either a named monosaccharide (e.g., "Hex", "HexNAc") or a custom formula specification.

Example: Glycan:HexNAc1Hex2 -> [(HexNAc, 1), (Hex, 2)]

Collaboration diagram for ProForma::GlycanComposition:
[legend]
Class Members
typedef variant< String, FormulaTag > Monosaccharide A monosaccharide component: either a name (String) or a custom formula (FormulaTag)
Class Members
vector< pair< Monosaccharide, int > > components List of (monosaccharide, count) pairs.

◆ OpenMS::ProForma::InfoTag

struct OpenMS::ProForma::InfoTag

Info tag for arbitrary text annotations.

Represents an INFO: tag in ProForma that carries arbitrary text metadata about a modification or site. Example: INFO:provenance_data

Collaboration diagram for ProForma::InfoTag:
[legend]
Class Members
String text The info text content.

◆ OpenMS::ProForma::IsotopeReplacement

struct OpenMS::ProForma::IsotopeReplacement

Isotope replacement for stable isotope labeling.

Represents global replacement of an element with a specific isotope, used for stable isotope labeling experiments.

ProForma notation: <13C> or <15N> or <D>

Collaboration diagram for ProForma::IsotopeReplacement:
[legend]
Class Members
String isotope The isotope specification (e.g., "13C", "15N", "D")

◆ OpenMS::ProForma::LabileModification

struct OpenMS::ProForma::LabileModification

Labile modification that may be lost during fragmentation.

Labile modifications are typically lost during ionization or fragmentation and thus may not be observed in MS/MS spectra.

ProForma notation: {Glycan:Hex}PEPTIDE

Collaboration diagram for ProForma::LabileModification:
[legend]
Class Members
Modification modification The labile modification.

◆ OpenMS::ProForma::Modification

struct OpenMS::ProForma::Modification

A modification with one or more alternative tags.

In ProForma, a modification can have multiple alternatives separated by |, representing uncertainty about the exact modification. Each alternative consists of a tag and an optional label.

Example: K[Phospho|+79.97] has two alternatives

The resolved_mod field is populated by resolveModifications() and points to the ResidueModification in ModificationsDB (for the first/primary alternative).

Collaboration diagram for ProForma::Modification:
[legend]
Class Members
vector< pair< ModificationTag, optional< Label > > > alternatives Each alternative is a (tag, optional_label) pair.
const ResidueModification * resolved_mod = nullptr

Resolved modification pointer (populated by resolveModifications) Points to the ResidueModification for the first alternative, if found

◆ OpenMS::ProForma::ModifiedRange

struct OpenMS::ProForma::ModifiedRange

Modified sequence range with shared modifications.

Represents a subsequence where one or more modifications apply to the entire range, but the exact position is uncertain.

ProForma notation: (EOSFORMS)[+19.0523] means +19.0523 applies somewhere in EOSFORMS

Collaboration diagram for ProForma::ModifiedRange:
[legend]
Class Members
vector< SequenceElement > elements The amino acids in the range.
vector< Modification > modifications Modifications applying to the entire range.

◆ OpenMS::ProForma::NamedMod

struct OpenMS::ProForma::NamedMod

Named modification with optional CV prefix hint.

Represents a modification specified by name, optionally with a CV prefix hint to disambiguate which database to search (e.g., "U:Oxidation" for UniMod, "M:Oxidation" for PSI-MOD).

Collaboration diagram for ProForma::NamedMod:
[legend]
Class Members
optional< CvDatabase > cv_hint Optional CV prefix hint (U, M, R, X, G)
String name The modification name (e.g., "Oxidation", "Phospho")

◆ OpenMS::ProForma::Peptidoform

struct OpenMS::ProForma::Peptidoform

A single peptidoform (one peptide chain)

Represents a complete peptide chain including:

  • Optional name identifier (from v2.1 extension)
  • Global modifications (<13C>, <[TMT6plex]@K>)
  • Unlocalised modifications ([Phospho]?)
  • Labile modifications ({Glycan:Hex})
  • N-terminal modifications ([Acetyl]-)
  • The amino acid sequence with modifications
  • C-terminal modifications (-[Amidated])
Collaboration diagram for ProForma::Peptidoform:
[legend]
Class Members
vector< Modification > c_term_mods C-terminal modifications: -[Amidated].
optional< ChargeState > charge Optional per-chain charge (for chimeric spectra)
vector< GlobalModEntry > global_mods Global modifications: <13C>, <[TMT6plex]@K>
vector< LabileModification > labile_mods Labile modifications: {Glycan:Hex}.
vector< Modification > n_term_mods N-terminal modifications: [Acetyl]-.
optional< String > name Optional name from (>name) v2.1 extension.
vector< SequenceSection > sequence The sequence with modifications.
vector< UnlocalisedMod > unlocalised_mods Unlocalised modifications: [Phospho]?

◆ OpenMS::ProForma::PeptidoformIon

struct OpenMS::ProForma::PeptidoformIon

A peptidoform ion (one or more chains with optional charge)

Represents one or more peptide chains that form a single ion species. Multiple chains can be present in cross-linked or multi-chain entities.

ProForma notation: chains are separated by //

Collaboration diagram for ProForma::PeptidoformIon:
[legend]
Class Members
vector< Peptidoform > chains One or more peptide chains (separated by // or + in ProForma)
optional< ChargeState > charge Optional charge state specification.
bool is_chimeric = false True if chains are chimeric (+ separator), false if cross-linked (//)
optional< String > name Optional name from (>>name) v2.1 extension.

◆ OpenMS::ProForma::PositionConstraint

struct OpenMS::ProForma::PositionConstraint

Position constraint specifying allowed residues for a modification.

Represents a Position: tag in ProForma that constrains where a modification can be localized. This is typically used as an alternative to a modification to indicate its possible sites.

Example: [Oxidation|Position:M] means Oxidation can only occur at M residues

Collaboration diagram for ProForma::PositionConstraint:
[legend]
Class Members
bool c_term = false True if modification can be at C-terminus.
bool n_term = false True if modification can be at N-terminus.
vector< char > residues List of allowed amino acid residues.

◆ OpenMS::ProForma::SequenceElement

struct OpenMS::ProForma::SequenceElement

A single amino acid with its modifications.

Represents one position in the peptide sequence: the amino acid residue and zero or more modifications attached to it.

Collaboration diagram for ProForma::SequenceElement:
[legend]
Class Members
char amino_acid Single-letter amino acid code (A-Z)
vector< Modification > modifications Modifications at this position.

◆ OpenMS::ProForma::UnlocalisedMod

struct OpenMS::ProForma::UnlocalisedMod

Unlocalised modification with optional occurrence count.

Represents a modification that is known to exist on the peptide but whose exact position is unknown. The occurrence specifies how many instances of this modification are present.

ProForma notation: [Phospho]?PEPTIDE or [Phospho]^2?PEPTIDE

Collaboration diagram for ProForma::UnlocalisedMod:
[legend]
Class Members
vector< Modification > modifications The unlocalised modification(s)
optional< int > occurrence Optional occurrence count from ^N suffix.

Member Typedef Documentation

◆ ChargeState

using ChargeState = std::variant< int, std::vector<AdductIon> >

Charge state specification.

The charge state can be specified as either:

  • A simple integer charge (/2, /+2, /-1)
  • A list of adduct ions (/[Na:z+1,H:z+1])

◆ GlobalModEntry

Variant type for global modification entries.

A GlobalModEntry can be either:

◆ ModificationTag

Variant type representing any modification tag content.

A ModificationTag can be one of:

  • CvAccession: CV database accession (e.g., UNIMOD:35)
  • NamedMod: Named modification with optional CV hint (e.g., Oxidation, U:Oxidation)
  • MassDelta: Mass difference with optional source (e.g., +15.9949, Obs:+79.978)
  • FormulaTag: Chemical formula (e.g., Formula:C12H20O2)
  • GlycanComposition: Glycan composition (e.g., Glycan:HexNAc1Hex2)
  • InfoTag: Info annotation (e.g., INFO:comment)
  • PositionConstraint: Allowed residue positions (e.g., Position:MKC)

◆ SequenceSection

Variant type representing a section of the sequence.

A SequenceSection can be:

Member Enumeration Documentation

◆ ConversionIssueType

enum class ConversionIssueType
strong

Issue type for AASequence conversion problems.

Enumerator
UNRESOLVED_MOD 

Modification could not be found in ModificationsDB.

UNLOCALISED_MOD 

Modification has no specific position.

LABILE_MOD 

Labile modification (lost during fragmentation)

GLOBAL_MOD 

Global modification (applies to multiple sites)

AMBIGUOUS_MOD 

Ambiguously localized modification.

AMBIGUOUS_REGION 

Ambiguous amino acid region.

MODIFIED_RANGE 

Modified range (position uncertain)

CROSS_LINK 

Cross-link between chains.

MULTIPLE_CHAINS 

Multiple peptide chains.

ALTERNATIVE_MODS 

Multiple alternative modifications (|)

UNSUPPORTED_FEATURE 

Other unsupported ProForma feature.

◆ ConversionPolicy

enum class ConversionPolicy
strong

Conversion policy for transforming Peptidoform to AASequence.

Controls how the conversion handles modifications that cannot be directly represented in AASequence (e.g., unlocalised, labile, or ambiguous modifications).

Enumerator
FAIL_ON_LOSS 

Fail if any modification cannot be fully represented.

DROP_UNLOCALISED 

Drop unlocalised, labile, and global modifications.

BEST_EFFORT 

Try to convert as much as possible, skip unsupported.

◆ CvDatabase

enum class CvDatabase
strong

Controlled vocabulary database prefix for modification accessions.

Identifies the source database for a modification accession in ProForma notation. Examples: UNIMOD:35, MOD:00046, XLMOD:02001, GNO:G59626AS

Enumerator
UNIMOD 

UniMod database (https://www.unimod.org/)

MOD 

PSI-MOD ontology (https://www.ebi.ac.uk/ols/ontologies/mod)

RESID 

RESID database.

XLMOD 

Cross-linking modifications ontology.

GNO 

Glycan naming ontology.

◆ ErrorCode

enum class ErrorCode
strong

Error codes for programmatic handling of ProForma parse errors.

These error codes provide machine-readable categorization of parsing failures, enabling downstream code to handle specific error types appropriately.

Enumerator
UNEXPECTED_CHARACTER 

Unexpected character encountered during parsing.

UNCLOSED_BRACKET 

Opening bracket without matching close bracket.

UNMATCHED_BRACKET 

Closing bracket without matching open bracket.

INVALID_CV_PREFIX 

Invalid controlled vocabulary prefix (e.g., not UNIMOD, MOD, etc.)

INVALID_CV_ACCESSION 

Invalid CV accession number format.

INVALID_AMINO_ACID 

Invalid amino acid one-letter code.

INVALID_MASS_VALUE 

Invalid mass value format or value.

INVALID_FORMULA 

Invalid chemical formula.

UNKNOWN_MONOSACCHARIDE 

Unknown monosaccharide abbreviation.

DANGLING_CROSSLINK_LABEL 

Crosslink label without a matching partner.

EMPTY_SEQUENCE 

Empty sequence provided.

INVALID_CHARGE 

Invalid charge state specification.

INVALID_OCCURRENCE_SPECIFIER 

Invalid occurrence specifier (e.g., ^2)

UNEXPECTED_END_OF_INPUT 

Unexpected end of input string.

INTERNAL_ERROR 

Internal parser error (should not occur)

◆ WriteMode

enum class WriteMode
strong

Write mode for ProForma string serialization.

Controls whether the output preserves original formatting (LOSSLESS) or produces a normalized, deterministic output (CANONICAL).

Enumerator
LOSSLESS 

Preserve original spelling/formatting where possible (e.g., mass delta text)

CANONICAL 

Normalized output: uppercase CV prefixes, sorted mods, 4 decimal places for masses.

Member Function Documentation

◆ canCalculateMass() [1/2]

static bool canCalculateMass ( const Peptidoform pf)
static

Check if mass can be calculated for a Peptidoform.

Returns true if all components have known masses:

  • All amino acids are standard residues
  • All modifications are either resolved or have explicit mass deltas
  • No ambiguous regions with different possible amino acids
Parameters
[in]pfThe Peptidoform to check (modifications will be resolved if needed)
Returns
True if mass calculation is possible

◆ canCalculateMass() [2/2]

static bool canCalculateMass ( const PeptidoformIon pfi)
static

Check if mass can be calculated for a PeptidoformIon.

Returns true if mass can be calculated for all chains. Cross-links are handled correctly (cross-linker mass counted once).

Parameters
[in]pfiThe PeptidoformIon to check
Returns
True if mass calculation is possible

◆ canGenerateSpectrum() [1/2]

static bool canGenerateSpectrum ( const Peptidoform pf)
static

Check if a theoretical spectrum can be generated for a Peptidoform.

Returns true if the Peptidoform can be converted to AASequence and fragmented. Use getSpectrumGenerationIssues() to get detailed diagnostics.

Parameters
[in]pfThe Peptidoform to check
Returns
True if spectrum generation is possible

◆ canGenerateSpectrum() [2/2]

static bool canGenerateSpectrum ( const PeptidoformIon pfi)
static

Check if a theoretical spectrum can be generated for a PeptidoformIon.

Returns true if the PeptidoformIon can be fragmented. For cross-linked peptides, both chains must be convertible. Chimeric spectra are not supported.

Parameters
[in]pfiThe PeptidoformIon to check
Returns
True if spectrum generation is possible

◆ errorCodeToString()

static const char * errorCodeToString ( ErrorCode  code)
static

Convert error code to human-readable string.

Parameters
[in]codeThe error code to convert.
Returns
A human-readable string describing the error code.

◆ fromAASequence()

static Peptidoform fromAASequence ( const AASequence seq)
static

Create a Peptidoform from an OpenMS AASequence.

Converts an AASequence with modifications to ProForma notation. Uses CV accessions (UNIMOD) where available, otherwise named modifications.

Parameters
[in]seqThe AASequence to convert
Returns
The equivalent Peptidoform AST

◆ generateSpectrum() [1/2]

static MSSpectrum generateSpectrum ( const Peptidoform pf,
int  min_charge = 1,
int  max_charge = 1,
const std::string &  ion_types = "by",
bool  add_losses = false,
bool  add_metainfo = true 
)
static

Generate a theoretical MS/MS spectrum for a Peptidoform.

Converts the Peptidoform to AASequence and uses TheoreticalSpectrumGenerator to generate fragment ions based on the specified ion types.

Parameters
[in]pfThe Peptidoform to fragment
[in]min_chargeMinimum fragment ion charge state
[in]max_chargeMaximum fragment ion charge state
[in]ion_typesString specifying which ion types to generate: 'a','b','c','x','y','z' for ion series, 'M' for precursor peaks, 'I' for immonium ions. Example: "by" for b/y ions, "byM" for b/y + precursor
[in]add_lossesIf true, include neutral loss peaks (H2O, NH3)
[in]add_metainfoIf true, include ion annotations in spectrum
Returns
MSSpectrum with theoretical peaks
Exceptions
Exception::ConversionErrorif spectrum generation fails

◆ generateSpectrum() [2/2]

static MSSpectrum generateSpectrum ( const PeptidoformIon pfi,
int  min_charge = 1,
int  max_charge = 1,
const std::string &  ion_types = "by",
bool  add_losses = false,
bool  add_metainfo = true 
)
static

Generate a theoretical MS/MS spectrum for a PeptidoformIon.

For single-chain peptides, uses TheoreticalSpectrumGenerator. For cross-linked peptides (// separator), uses TheoreticalSpectrumGeneratorXLMS. Chimeric spectra are not supported.

Parameters
[in]pfiThe PeptidoformIon to fragment
[in]min_chargeMinimum fragment ion charge state
[in]max_chargeMaximum fragment ion charge state
[in]ion_typesString specifying which ion types to generate: 'a','b','c','x','y','z' for ion series, 'M' for precursor peaks, 'I' for immonium ions. Example: "by" for b/y ions, "abyM" for a/b/y + precursor
[in]add_lossesIf true, include neutral loss peaks
[in]add_metainfoIf true, include ion annotations
Returns
MSSpectrum with theoretical peaks
Exceptions
Exception::ConversionErrorif spectrum generation fails

◆ getAASequenceConversionIssues()

static std::vector< ConversionIssue > getAASequenceConversionIssues ( const Peptidoform pf)
static

Get a list of all issues that would arise during AASequence conversion.

Returns detailed information about every aspect of the Peptidoform that cannot be represented in an AASequence.

Parameters
[in]pfThe Peptidoform to analyze
Returns
Vector of conversion issues (empty if fully convertible)

◆ getMassCalculationIssues() [1/2]

static std::vector< ConversionIssue > getMassCalculationIssues ( const Peptidoform pf)
static

Get issues preventing mass calculation for a Peptidoform.

Returns detailed information about components that prevent mass calculation.

Parameters
[in]pfThe Peptidoform to analyze
Returns
Vector of issues (empty if mass can be calculated)

◆ getMassCalculationIssues() [2/2]

static std::vector< ConversionIssue > getMassCalculationIssues ( const PeptidoformIon pfi)
static

Get issues preventing mass calculation for a PeptidoformIon.

Returns detailed information about components that prevent mass calculation across all chains.

Parameters
[in]pfiThe PeptidoformIon to analyze
Returns
Vector of issues (empty if mass can be calculated)

◆ getMonoWeight() [1/2]

static double getMonoWeight ( const Peptidoform pf)
static

Calculate monoisotopic mass of a Peptidoform.

Calculates the neutral monoisotopic mass including:

  • Amino acid residue masses
  • Terminal H2O mass
  • All modification mass deltas
  • Unlocalised and labile modifications (included in total)
  • Global modifications applied to matching residues
Parameters
[in]pfThe Peptidoform to calculate mass for
Returns
Monoisotopic mass in Daltons
Exceptions
Exception::InvalidValueif mass cannot be calculated (use canCalculateMass() first)
Note
Modifications are resolved automatically if not already resolved

◆ getMonoWeight() [2/2]

static double getMonoWeight ( const PeptidoformIon pfi)
static

Calculate monoisotopic mass of a PeptidoformIon.

For cross-linked peptides, calculates the combined mass of all chains. Cross-linker masses are counted only once per cross-link group.

Parameters
[in]pfiThe PeptidoformIon to calculate mass for
Returns
Monoisotopic mass in Daltons
Exceptions
Exception::InvalidValueif mass cannot be calculated
Exception::InvalidValueif pfi is chimeric (use getMonoWeight on individual chains)

◆ getMZ() [1/2]

static double getMZ ( const Peptidoform pf,
int  charge 
)
static

Calculate m/z for a Peptidoform at a given charge state.

Parameters
[in]pfThe Peptidoform to calculate m/z for
[in]chargeThe charge state (must be non-zero)
Returns
m/z value
Exceptions
Exception::InvalidValueif mass cannot be calculated or charge is zero

◆ getMZ() [2/2]

static double getMZ ( const PeptidoformIon pfi)
static

Calculate m/z for a PeptidoformIon at its specified charge state.

Uses the charge state from the PeptidoformIon if present.

Parameters
[in]pfiThe PeptidoformIon with charge state
Returns
m/z value
Exceptions
Exception::InvalidValueif mass cannot be calculated or no charge state

◆ getSpectrumGenerationIssues() [1/2]

static std::vector< ConversionIssue > getSpectrumGenerationIssues ( const Peptidoform pf)
static

Get issues preventing spectrum generation for a Peptidoform.

Parameters
[in]pfThe Peptidoform to analyze
Returns
Vector of issues (empty if spectrum can be generated)

◆ getSpectrumGenerationIssues() [2/2]

static std::vector< ConversionIssue > getSpectrumGenerationIssues ( const PeptidoformIon pfi)
static

Get issues preventing spectrum generation for a PeptidoformIon.

Parameters
[in]pfiThe PeptidoformIon to analyze
Returns
Vector of issues (empty if spectrum can be generated)

◆ isRepresentableAsAASequence()

static bool isRepresentableAsAASequence ( const Peptidoform pf)
static

Check if a Peptidoform can be fully represented as an AASequence.

Returns true if all modifications can be resolved and there are no unsupported features (ambiguous regions, cross-links, etc.)

Parameters
[in]pfThe Peptidoform to check
Returns
True if conversion is possible without issues

◆ parse()

static Peptidoform parse ( const String input)
static

Parse a ProForma string into a Peptidoform AST.

This is the main entry point for parsing simple peptidoforms without charge state information.

Parameters
[in]inputThe ProForma string to parse
Returns
The parsed Peptidoform AST
Exceptions
ParseErrorif the input is invalid
Note
For peptidoforms with charge state (e.g., "PEPTIDE/2"), use parseIon()

◆ parseIon()

static PeptidoformIon parseIon ( const String input)
static

Parse a ProForma string into a PeptidoformIon AST.

This entry point handles the full ProForma notation including:

  • Multiple chains (separated by //)
  • Charge state specification (/2, /+2, /-1)
  • Adduct ion specification (/[Na:z+1,H:z+1])
Parameters
[in]inputThe ProForma string to parse
Returns
The parsed PeptidoformIon AST
Exceptions
ParseErrorif the input is invalid

◆ peptidoformFromJSON()

static Peptidoform peptidoformFromJSON ( const String json_str)
static

Construct a Peptidoform from JSON string.

Deserializes a JSON string back into a Peptidoform AST.

Parameters
[in]json_strJSON string representation of a Peptidoform
Returns
The deserialized Peptidoform
Exceptions
Exception::ParseErrorif the JSON is malformed or missing required fields

◆ peptidoformIonFromJSON()

static PeptidoformIon peptidoformIonFromJSON ( const String json_str)
static

Construct a PeptidoformIon from JSON string.

Deserializes a JSON string back into a PeptidoformIon AST.

Parameters
[in]json_strJSON string representation of a PeptidoformIon
Returns
The deserialized PeptidoformIon
Exceptions
Exception::ParseErrorif the JSON is malformed or missing required fields

◆ resolveModifications()

static void resolveModifications ( Peptidoform pf)
static

Resolve all modifications in a Peptidoform using ModificationsDB.

Looks up each modification tag (CV accession, named mod, mass delta) in ModificationsDB and stores the resolved ResidueModification pointer.

Parameters
[in,out]pfThe Peptidoform to resolve (modified in place)
Note
Modifications that cannot be resolved will have resolved_mod = nullptr

◆ toAASequence()

static AASequence toAASequence ( const Peptidoform pf,
ConversionPolicy  policy = ConversionPolicy::FAIL_ON_LOSS 
)
static

Convert a Peptidoform to an OpenMS AASequence.

Parameters
[in]pfThe Peptidoform to convert
[in]policyHow to handle unconvertible modifications
Returns
The equivalent AASequence
Exceptions
Exception::ConversionErrorif STRICT policy and conversion not possible
Note
Call resolveModifications() first, or this will resolve automatically

◆ toJSON() [1/2]

static String toJSON ( const Peptidoform pf)
static

Convert a Peptidoform to JSON string representation.

Serializes the complete Peptidoform AST including all modifications, terminal modifications, global modifications, and labels.

Parameters
[in]pfThe Peptidoform to serialize
Returns
JSON string representation of the Peptidoform

◆ toJSON() [2/2]

static String toJSON ( const PeptidoformIon pfi)
static

Convert a PeptidoformIon to JSON string representation.

Serializes the complete PeptidoformIon AST including all chains, charge state, adduct ions, and cross-link groups.

Parameters
[in]pfiThe PeptidoformIon to serialize
Returns
JSON string representation of the PeptidoformIon

◆ toString() [1/2]

static String toString ( const Peptidoform pf,
WriteMode  mode = WriteMode::LOSSLESS 
)
static

Convert a Peptidoform AST back to ProForma string notation.

Parameters
[in]pfThe Peptidoform to convert
[in]modeWrite mode: LOSSLESS preserves original formatting, CANONICAL produces normalized output
Returns
The ProForma string representation

◆ toString() [2/2]

static String toString ( const PeptidoformIon pfi,
WriteMode  mode = WriteMode::LOSSLESS 
)
static

Convert a PeptidoformIon AST back to ProForma string notation.

Parameters
[in]pfiThe PeptidoformIon to convert
[in]modeWrite mode: LOSSLESS preserves original formatting, CANONICAL produces normalized output
Returns
The ProForma string representation

◆ tryGetMonoWeight() [1/4]

static std::optional< double > tryGetMonoWeight ( const Peptidoform pf)
static

Try to calculate monoisotopic mass of a Peptidoform (non-throwing)

Single-pass calculation that resolves modifications and calculates mass. More efficient than calling canCalculateMass() followed by getMonoWeight().

Parameters
[in]pfThe Peptidoform to calculate mass for
Returns
Monoisotopic mass in Daltons, or std::nullopt if calculation not possible

◆ tryGetMonoWeight() [2/4]

static std::optional< double > tryGetMonoWeight ( const Peptidoform pf,
std::vector< ConversionIssue > &  issues_out 
)
static

Try to calculate monoisotopic mass with diagnostic information.

Single-pass calculation that also collects any issues preventing calculation.

Parameters
[in]pfThe Peptidoform to calculate mass for
[out]issues_outVector to receive any issues (cleared first)
Returns
Monoisotopic mass in Daltons, or std::nullopt if calculation not possible

◆ tryGetMonoWeight() [3/4]

static std::optional< double > tryGetMonoWeight ( const PeptidoformIon pfi)
static

Try to calculate monoisotopic mass of a PeptidoformIon (non-throwing)

Parameters
[in]pfiThe PeptidoformIon to calculate mass for
Returns
Monoisotopic mass in Daltons, or std::nullopt if calculation not possible
Note
Returns std::nullopt for chimeric spectra (calculate per-chain instead)

◆ tryGetMonoWeight() [4/4]

static std::optional< double > tryGetMonoWeight ( const PeptidoformIon pfi,
std::vector< ConversionIssue > &  issues_out 
)
static

Try to calculate monoisotopic mass of PeptidoformIon with diagnostics.

Parameters
[in]pfiThe PeptidoformIon to calculate mass for
[out]issues_outVector to receive any issues (cleared first)
Returns
Monoisotopic mass in Daltons, or std::nullopt if calculation not possible

◆ tryGetMZ() [1/4]

static std::optional< double > tryGetMZ ( const Peptidoform pf,
int  charge 
)
static

Try to calculate m/z for a Peptidoform (non-throwing)

Parameters
[in]pfThe Peptidoform to calculate m/z for
[in]chargeThe charge state (must be non-zero)
Returns
m/z value, or std::nullopt if calculation not possible or charge is zero

◆ tryGetMZ() [2/4]

static std::optional< double > tryGetMZ ( const Peptidoform pf,
int  charge,
std::vector< ConversionIssue > &  issues_out 
)
static

Try to calculate m/z for a Peptidoform with diagnostics.

Parameters
[in]pfThe Peptidoform to calculate m/z for
[in]chargeThe charge state (must be non-zero)
[out]issues_outVector to receive any issues (cleared first)
Returns
m/z value, or std::nullopt if calculation not possible or charge is zero

◆ tryGetMZ() [3/4]

static std::optional< double > tryGetMZ ( const PeptidoformIon pfi)
static

Try to calculate m/z for a PeptidoformIon (non-throwing)

Parameters
[in]pfiThe PeptidoformIon with charge state
Returns
m/z value, or std::nullopt if calculation not possible or no charge

◆ tryGetMZ() [4/4]

static std::optional< double > tryGetMZ ( const PeptidoformIon pfi,
std::vector< ConversionIssue > &  issues_out 
)
static

Try to calculate m/z for a PeptidoformIon with diagnostics.

Parameters
[in]pfiThe PeptidoformIon with charge state
[out]issues_outVector to receive any issues (cleared first)
Returns
m/z value, or std::nullopt if calculation not possible or no charge