OpenMS
Loading...
Searching...
No Matches
PEFFEntry Struct Reference

Represents a single entry in a PEFF file with all annotations. More...

#include <OpenMS/FORMAT/PEFFFile.h>

Collaboration diagram for PEFFEntry:
[legend]

Public Member Functions

 PEFFEntry ()=default
 
 PEFFEntry (const PEFFEntry &rhs)=default
 
 PEFFEntry (PEFFEntry &&rhs) noexcept=default
 
PEFFEntryoperator= (const PEFFEntry &rhs)=default
 
PEFFEntryoperator= (PEFFEntry &&rhs) noexcept=default
 
bool operator== (const PEFFEntry &rhs) const
 
FASTAFile::FASTAEntry toFASTAEntry () const
 Convert to a FASTAFile::FASTAEntry (loses PEFF-specific annotations)
 
AASequence getSequence () const
 Get the base AASequence for this entry (unmodified sequence).
 
AASequence getModifiedSequence () const
 Get an AASequence with all annotated modifications applied.
 
void getVariantSequences (std::vector< std::string > &descriptions, std::vector< AASequence > &sequences, bool include_complex=false) const
 Get all variant sequences (each variant applied individually).
 
AASequence getProcessedSequence (const String &region_accession="PEFF:0001021") const
 Get processed sequence (e.g., mature protein without signal peptide).
 
void digestWithVariants (const ProteaseDigestion &digestor, std::vector< std::string > &descriptions, std::vector< AASequence > &sequences, Size min_length=6, Size max_length=40, bool include_reference=true, bool include_variants=true, bool include_modifications=false) const
 Generate all variant and/or modification peptides by digesting with a given protease.
 
void generatePeptides (const ProteaseDigestion &digestor, std::vector< std::string > &descriptions, std::vector< AASequence > &sequences, const std::vector< std::string > &fixed_mods={}, const std::vector< std::string > &variable_mods={}, Size max_variable_mods_per_peptide=2, Size min_length=6, Size max_length=40, bool include_reference=true, bool include_peff_variants=true, bool include_peff_modifications=true) const
 Generate peptides with PEFF annotations and optional sample handling modifications.
 

Static Public Member Functions

static PEFFEntry fromFASTAEntry (const FASTAFile::FASTAEntry &fasta)
 Create a PEFFEntry from a FASTAEntry (basic fields only)
 

Public Attributes

String prefix
 Database prefix from description line (e.g., "sp" from ">sp:P12345")
 
String identifier
 
String sequence
 
std::vector< Stringprotein_names
 \PName - may have multiple names
 
String gene_name
 \GName
 
Int ncbi_tax_id {0}
 \NcbiTaxId or \OX
 
String taxonomy_name
 \TaxName
 
Size sequence_length {0}
 \Length
 
String sequence_version
 \SV
 
String entry_version
 \EV
 
Int protein_existence {0}
 \PE (1-5)
 
String db_unique_id
 \DbUniqueId
 
String entry_id
 \ID (e.g., NPM_HUMAN)
 
std::vector< Stringalt_accessions
 \AltAC - alternative accessions
 
std::vector< PEFFModificationmodifications
 
std::vector< PEFFVariantSimplesimple_variants
 
std::vector< PEFFVariantComplexcomplex_variants
 
std::vector< PEFFProcessedRegionprocessed_regions
 
std::vector< PEFFDisulfideBonddisulfide_bonds
 \DisulfideBond
 
std::vector< Stringproteoforms
 ProForma notation.
 
std::map< String, Stringcustom_annotations
 

Static Private Member Functions

static std::vector< std::pair< String, AASequence > > enumeratePEFFModifications_ (const AASequence &peptide, const std::vector< std::pair< Size, const PEFFModification * > > &peff_mods, const String &base_description)
 Apply PEFF modifications at specific positions to a peptide.
 

Detailed Description

Represents a single entry in a PEFF file with all annotations.

Each entry corresponds to one description line and sequence in the PEFF file. The description line format per the PEFF spec is:

>Prefix:DbUniqueId \key=value \key=value ...

Where Prefix is the database prefix defined in the header block and DbUniqueId is the unique identifier within that database. The identifier field stores the full "Prefix:DbUniqueId" string, and the prefix field stores just the prefix portion.

Constructor & Destructor Documentation

◆ PEFFEntry() [1/3]

PEFFEntry ( )
default

◆ PEFFEntry() [2/3]

PEFFEntry ( const PEFFEntry rhs)
default

◆ PEFFEntry() [3/3]

PEFFEntry ( PEFFEntry &&  rhs)
defaultnoexcept

Member Function Documentation

◆ digestWithVariants()

void digestWithVariants ( const ProteaseDigestion digestor,
std::vector< std::string > &  descriptions,
std::vector< AASequence > &  sequences,
Size  min_length = 6,
Size  max_length = 40,
bool  include_reference = true,
bool  include_variants = true,
bool  include_modifications = false 
) const

Generate all variant and/or modification peptides by digesting with a given protease.

This method performs enzymatic digestion on the reference sequence and then generates all combinations of simple variants and/or modifications within each peptide.

Parameters
digestorThe protease digestion object (must be configured with enzyme, missed cleavages, etc.)
descriptionsOutput vector for peptide descriptions (empty for reference peptides)
sequencesOutput vector for peptide sequences
min_lengthMinimum peptide length to include (default: 6)
max_lengthMaximum peptide length to include (default: 40, 0 = no limit)
include_referenceIf true, include reference peptides (default: true)
include_variantsIf true, generate variant combinations (default: true)
include_modificationsIf true, generate modification combinations (default: false)

◆ enumeratePEFFModifications_()

static std::vector< std::pair< String, AASequence > > enumeratePEFFModifications_ ( const AASequence peptide,
const std::vector< std::pair< Size, const PEFFModification * > > &  peff_mods,
const String base_description 
)
staticprivate

Apply PEFF modifications at specific positions to a peptide.

Helper method that generates all 2^n combinations of PEFF modifications for a given peptide, where n is the number of PEFF modifications within the peptide's range.

Parameters
peptideThe base peptide sequence
peff_modsPEFF modifications with positions relative to the peptide (0-based)
base_descriptionBase description to prepend to modification descriptions
Returns
Vector of pairs: (description, modified AASequence)

◆ fromFASTAEntry()

static PEFFEntry fromFASTAEntry ( const FASTAFile::FASTAEntry fasta)
static

Create a PEFFEntry from a FASTAEntry (basic fields only)

◆ generatePeptides()

void generatePeptides ( const ProteaseDigestion digestor,
std::vector< std::string > &  descriptions,
std::vector< AASequence > &  sequences,
const std::vector< std::string > &  fixed_mods = {},
const std::vector< std::string > &  variable_mods = {},
Size  max_variable_mods_per_peptide = 2,
Size  min_length = 6,
Size  max_length = 40,
bool  include_reference = true,
bool  include_peff_variants = true,
bool  include_peff_modifications = true 
) const

Generate peptides with PEFF annotations and optional sample handling modifications.

Combines enzymatic digestion, PEFF variants/modifications, and sample handling mods.

Parameters
digestorThe protease digestion object (configured with enzyme, missed cleavages)
descriptionsOutput vector for peptide descriptions
sequencesOutput vector for peptide sequences
fixed_modsFixed modifications (e.g., {"Carbamidomethyl (C)"})
variable_modsVariable modifications (e.g., {"Oxidation (M)"})
max_variable_mods_per_peptideMaximum variable mods per peptide (default: 2)
min_lengthMinimum peptide length (default: 6)
max_lengthMaximum peptide length (default: 40, 0 = no limit)
include_referenceInclude reference peptides (default: true)
include_peff_variantsEnumerate PEFF variants (default: true)
include_peff_modificationsEnumerate PEFF modifications (default: true)

◆ getModifiedSequence()

AASequence getModifiedSequence ( ) const

Get an AASequence with all annotated modifications applied.

Uses the modifications vector to apply modifications to the sequence. Modifications with unknown positions (position == 0) are skipped. Modifications that cannot be resolved are logged as warnings and skipped.

Returns
AASequence with modifications applied

◆ getProcessedSequence()

AASequence getProcessedSequence ( const String region_accession = "PEFF:0001021") const

Get processed sequence (e.g., mature protein without signal peptide).

Applies the first processed region of the given type to extract the processed sequence segment.

Parameters
region_accessionPEFF CV accession for the region type (e.g., "PEFF:0001021" for signal peptide)
Returns
Processed AASequence, or empty if region not found

◆ getSequence()

AASequence getSequence ( ) const

Get the base AASequence for this entry (unmodified sequence).

Returns
AASequence representing the protein sequence

◆ getVariantSequences()

void getVariantSequences ( std::vector< std::string > &  descriptions,
std::vector< AASequence > &  sequences,
bool  include_complex = false 
) const

Get all variant sequences (each variant applied individually).

Parameters
descriptionsOutput vector for variant descriptions
sequencesOutput vector for variant sequences
include_complexIf true, also include complex variants (default: false)

◆ operator=() [1/2]

PEFFEntry & operator= ( const PEFFEntry rhs)
default

◆ operator=() [2/2]

PEFFEntry & operator= ( PEFFEntry &&  rhs)
defaultnoexcept

◆ operator==()

◆ toFASTAEntry()

FASTAFile::FASTAEntry toFASTAEntry ( ) const

Convert to a FASTAFile::FASTAEntry (loses PEFF-specific annotations)

Member Data Documentation

◆ alt_accessions

std::vector<String> alt_accessions

\AltAC - alternative accessions

Referenced by PEFFEntry::operator==().

◆ complex_variants

std::vector<PEFFVariantComplex> complex_variants

Referenced by PEFFEntry::operator==().

◆ custom_annotations

std::map<String, String> custom_annotations

Referenced by PEFFEntry::operator==().

◆ db_unique_id

String db_unique_id

\DbUniqueId

Referenced by PEFFEntry::operator==().

◆ disulfide_bonds

std::vector<PEFFDisulfideBond> disulfide_bonds

\DisulfideBond

Referenced by PEFFEntry::operator==().

◆ entry_id

String entry_id

\ID (e.g., NPM_HUMAN)

Referenced by PEFFEntry::operator==().

◆ entry_version

String entry_version

\EV

Referenced by PEFFEntry::operator==().

◆ gene_name

String gene_name

\GName

Referenced by PEFFEntry::operator==().

◆ identifier

String identifier

Referenced by PEFFEntry::operator==().

◆ modifications

std::vector<PEFFModification> modifications

Referenced by PEFFEntry::operator==().

◆ ncbi_tax_id

Int ncbi_tax_id {0}

\NcbiTaxId or \OX

Referenced by PEFFEntry::operator==().

◆ prefix

String prefix

Database prefix from description line (e.g., "sp" from ">sp:P12345")

Referenced by PEFFEntry::operator==().

◆ processed_regions

std::vector<PEFFProcessedRegion> processed_regions

Referenced by PEFFEntry::operator==().

◆ protein_existence

Int protein_existence {0}

\PE (1-5)

Referenced by PEFFEntry::operator==().

◆ protein_names

std::vector<String> protein_names

\PName - may have multiple names

Referenced by PEFFEntry::operator==().

◆ proteoforms

std::vector<String> proteoforms

ProForma notation.

Referenced by PEFFEntry::operator==().

◆ sequence

String sequence

Referenced by PEFFEntry::operator==().

◆ sequence_length

Size sequence_length {0}

\Length

Referenced by PEFFEntry::operator==().

◆ sequence_version

String sequence_version

\SV

Referenced by PEFFEntry::operator==().

◆ simple_variants

std::vector<PEFFVariantSimple> simple_variants

Referenced by PEFFEntry::operator==().

◆ taxonomy_name

String taxonomy_name

\TaxName

Referenced by PEFFEntry::operator==().