![]() |
OpenMS
|
Represents a single entry in a PEFF file with all annotations. More...
#include <OpenMS/FORMAT/PEFFFile.h>
Public Member Functions | |
| PEFFEntry ()=default | |
| PEFFEntry (const PEFFEntry &rhs)=default | |
| PEFFEntry (PEFFEntry &&rhs) noexcept=default | |
| PEFFEntry & | operator= (const PEFFEntry &rhs)=default |
| PEFFEntry & | operator= (PEFFEntry &&rhs) noexcept=default |
| bool | operator== (const PEFFEntry &rhs) const |
| FASTAFile::FASTAEntry | toFASTAEntry () const |
| Convert to a FASTAFile::FASTAEntry (loses PEFF-specific annotations) | |
| AASequence | getSequence () const |
| Get the base AASequence for this entry (unmodified sequence). | |
| AASequence | getModifiedSequence () const |
| Get an AASequence with all annotated modifications applied. | |
| void | getVariantSequences (std::vector< std::string > &descriptions, std::vector< AASequence > &sequences, bool include_complex=false) const |
| Get all variant sequences (each variant applied individually). | |
| AASequence | getProcessedSequence (const String ®ion_accession="PEFF:0001021") const |
| Get processed sequence (e.g., mature protein without signal peptide). | |
| void | digestWithVariants (const ProteaseDigestion &digestor, std::vector< std::string > &descriptions, std::vector< AASequence > &sequences, Size min_length=6, Size max_length=40, bool include_reference=true, bool include_variants=true, bool include_modifications=false) const |
| Generate all variant and/or modification peptides by digesting with a given protease. | |
| void | generatePeptides (const ProteaseDigestion &digestor, std::vector< std::string > &descriptions, std::vector< AASequence > &sequences, const std::vector< std::string > &fixed_mods={}, const std::vector< std::string > &variable_mods={}, Size max_variable_mods_per_peptide=2, Size min_length=6, Size max_length=40, bool include_reference=true, bool include_peff_variants=true, bool include_peff_modifications=true) const |
| Generate peptides with PEFF annotations and optional sample handling modifications. | |
Static Public Member Functions | |
| static PEFFEntry | fromFASTAEntry (const FASTAFile::FASTAEntry &fasta) |
| Create a PEFFEntry from a FASTAEntry (basic fields only) | |
Public Attributes | |
| String | prefix |
| Database prefix from description line (e.g., "sp" from ">sp:P12345") | |
| String | identifier |
| String | sequence |
| std::vector< String > | protein_names |
| \PName - may have multiple names | |
| String | gene_name |
| \GName | |
| Int | ncbi_tax_id {0} |
| \NcbiTaxId or \OX | |
| String | taxonomy_name |
| \TaxName | |
| Size | sequence_length {0} |
| \Length | |
| String | sequence_version |
| \SV | |
| String | entry_version |
| \EV | |
| Int | protein_existence {0} |
| \PE (1-5) | |
| String | db_unique_id |
| \DbUniqueId | |
| String | entry_id |
| \ID (e.g., NPM_HUMAN) | |
| std::vector< String > | alt_accessions |
| \AltAC - alternative accessions | |
| std::vector< PEFFModification > | modifications |
| std::vector< PEFFVariantSimple > | simple_variants |
| std::vector< PEFFVariantComplex > | complex_variants |
| std::vector< PEFFProcessedRegion > | processed_regions |
| std::vector< PEFFDisulfideBond > | disulfide_bonds |
| \DisulfideBond | |
| std::vector< String > | proteoforms |
| ProForma notation. | |
| std::map< String, String > | custom_annotations |
Static Private Member Functions | |
| static std::vector< std::pair< String, AASequence > > | enumeratePEFFModifications_ (const AASequence &peptide, const std::vector< std::pair< Size, const PEFFModification * > > &peff_mods, const String &base_description) |
| Apply PEFF modifications at specific positions to a peptide. | |
Represents a single entry in a PEFF file with all annotations.
Each entry corresponds to one description line and sequence in the PEFF file. The description line format per the PEFF spec is:
>Prefix:DbUniqueId \key=value \key=value ...
Where Prefix is the database prefix defined in the header block and DbUniqueId is the unique identifier within that database. The identifier field stores the full "Prefix:DbUniqueId" string, and the prefix field stores just the prefix portion.
|
default |
| void digestWithVariants | ( | const ProteaseDigestion & | digestor, |
| std::vector< std::string > & | descriptions, | ||
| std::vector< AASequence > & | sequences, | ||
| Size | min_length = 6, |
||
| Size | max_length = 40, |
||
| bool | include_reference = true, |
||
| bool | include_variants = true, |
||
| bool | include_modifications = false |
||
| ) | const |
Generate all variant and/or modification peptides by digesting with a given protease.
This method performs enzymatic digestion on the reference sequence and then generates all combinations of simple variants and/or modifications within each peptide.
| digestor | The protease digestion object (must be configured with enzyme, missed cleavages, etc.) |
| descriptions | Output vector for peptide descriptions (empty for reference peptides) |
| sequences | Output vector for peptide sequences |
| min_length | Minimum peptide length to include (default: 6) |
| max_length | Maximum peptide length to include (default: 40, 0 = no limit) |
| include_reference | If true, include reference peptides (default: true) |
| include_variants | If true, generate variant combinations (default: true) |
| include_modifications | If true, generate modification combinations (default: false) |
|
staticprivate |
Apply PEFF modifications at specific positions to a peptide.
Helper method that generates all 2^n combinations of PEFF modifications for a given peptide, where n is the number of PEFF modifications within the peptide's range.
| peptide | The base peptide sequence |
| peff_mods | PEFF modifications with positions relative to the peptide (0-based) |
| base_description | Base description to prepend to modification descriptions |
|
static |
Create a PEFFEntry from a FASTAEntry (basic fields only)
| void generatePeptides | ( | const ProteaseDigestion & | digestor, |
| std::vector< std::string > & | descriptions, | ||
| std::vector< AASequence > & | sequences, | ||
| const std::vector< std::string > & | fixed_mods = {}, |
||
| const std::vector< std::string > & | variable_mods = {}, |
||
| Size | max_variable_mods_per_peptide = 2, |
||
| Size | min_length = 6, |
||
| Size | max_length = 40, |
||
| bool | include_reference = true, |
||
| bool | include_peff_variants = true, |
||
| bool | include_peff_modifications = true |
||
| ) | const |
Generate peptides with PEFF annotations and optional sample handling modifications.
Combines enzymatic digestion, PEFF variants/modifications, and sample handling mods.
| digestor | The protease digestion object (configured with enzyme, missed cleavages) |
| descriptions | Output vector for peptide descriptions |
| sequences | Output vector for peptide sequences |
| fixed_mods | Fixed modifications (e.g., {"Carbamidomethyl (C)"}) |
| variable_mods | Variable modifications (e.g., {"Oxidation (M)"}) |
| max_variable_mods_per_peptide | Maximum variable mods per peptide (default: 2) |
| min_length | Minimum peptide length (default: 6) |
| max_length | Maximum peptide length (default: 40, 0 = no limit) |
| include_reference | Include reference peptides (default: true) |
| include_peff_variants | Enumerate PEFF variants (default: true) |
| include_peff_modifications | Enumerate PEFF modifications (default: true) |
| AASequence getModifiedSequence | ( | ) | const |
Get an AASequence with all annotated modifications applied.
Uses the modifications vector to apply modifications to the sequence. Modifications with unknown positions (position == 0) are skipped. Modifications that cannot be resolved are logged as warnings and skipped.
| AASequence getProcessedSequence | ( | const String & | region_accession = "PEFF:0001021" | ) | const |
Get processed sequence (e.g., mature protein without signal peptide).
Applies the first processed region of the given type to extract the processed sequence segment.
| region_accession | PEFF CV accession for the region type (e.g., "PEFF:0001021" for signal peptide) |
| AASequence getSequence | ( | ) | const |
Get the base AASequence for this entry (unmodified sequence).
| void getVariantSequences | ( | std::vector< std::string > & | descriptions, |
| std::vector< AASequence > & | sequences, | ||
| bool | include_complex = false |
||
| ) | const |
Get all variant sequences (each variant applied individually).
| descriptions | Output vector for variant descriptions |
| sequences | Output vector for variant sequences |
| include_complex | If true, also include complex variants (default: false) |
|
inline |
References PEFFEntry::alt_accessions, PEFFEntry::complex_variants, PEFFEntry::custom_annotations, PEFFEntry::db_unique_id, PEFFEntry::disulfide_bonds, PEFFEntry::entry_id, PEFFEntry::entry_version, PEFFEntry::gene_name, PEFFEntry::identifier, PEFFEntry::modifications, PEFFEntry::ncbi_tax_id, PEFFEntry::prefix, PEFFEntry::processed_regions, PEFFEntry::protein_existence, PEFFEntry::protein_names, PEFFEntry::proteoforms, PEFFEntry::sequence, PEFFEntry::sequence_length, PEFFEntry::sequence_version, PEFFEntry::simple_variants, and PEFFEntry::taxonomy_name.
| FASTAFile::FASTAEntry toFASTAEntry | ( | ) | const |
Convert to a FASTAFile::FASTAEntry (loses PEFF-specific annotations)
| std::vector<String> alt_accessions |
\AltAC - alternative accessions
Referenced by PEFFEntry::operator==().
| std::vector<PEFFVariantComplex> complex_variants |
Referenced by PEFFEntry::operator==().
Referenced by PEFFEntry::operator==().
| String db_unique_id |
\DbUniqueId
Referenced by PEFFEntry::operator==().
| std::vector<PEFFDisulfideBond> disulfide_bonds |
\DisulfideBond
Referenced by PEFFEntry::operator==().
| String entry_id |
\ID (e.g., NPM_HUMAN)
Referenced by PEFFEntry::operator==().
| String entry_version |
\EV
Referenced by PEFFEntry::operator==().
| String gene_name |
\GName
Referenced by PEFFEntry::operator==().
| String identifier |
Referenced by PEFFEntry::operator==().
| std::vector<PEFFModification> modifications |
Referenced by PEFFEntry::operator==().
| Int ncbi_tax_id {0} |
\NcbiTaxId or \OX
Referenced by PEFFEntry::operator==().
| String prefix |
Database prefix from description line (e.g., "sp" from ">sp:P12345")
Referenced by PEFFEntry::operator==().
| std::vector<PEFFProcessedRegion> processed_regions |
Referenced by PEFFEntry::operator==().
| Int protein_existence {0} |
\PE (1-5)
Referenced by PEFFEntry::operator==().
| std::vector<String> protein_names |
\PName - may have multiple names
Referenced by PEFFEntry::operator==().
| std::vector<String> proteoforms |
ProForma notation.
Referenced by PEFFEntry::operator==().
| String sequence |
Referenced by PEFFEntry::operator==().
| Size sequence_length {0} |
\Length
Referenced by PEFFEntry::operator==().
| String sequence_version |
\SV
Referenced by PEFFEntry::operator==().
| std::vector<PEFFVariantSimple> simple_variants |
Referenced by PEFFEntry::operator==().
| String taxonomy_name |
\TaxName
Referenced by PEFFEntry::operator==().