![]() |
OpenMS
|
Utility class for handling Universal Spectrum Identifiers (USI). More...
#include <OpenMS/METADATA/USI.h>
Public Types | |
| enum class | IndexType { SCAN , INDEX , NATIVEID } |
| Supported index types in USI. More... | |
Public Member Functions | |
Constructors and Destructor | |
| USI () | |
| Default constructor (creates an empty/invalid USI) | |
| USI (const String &collection, const String &ms_run, IndexType index_type, const String &index, const String &interpretation="") | |
| Construct a USI from its components. | |
| USI (const String &usi_string) | |
| Construct a USI from its string representation. | |
| USI (const USI &)=default | |
| Copy constructor. | |
| USI (USI &&) noexcept=default | |
| Move constructor. | |
| ~USI ()=default | |
| Destructor. | |
Assignment operators | |
| USI & | operator= (const USI &)=default |
| Copy assignment operator. | |
| USI & | operator= (USI &&) noexcept=default |
| Move assignment operator. | |
Comparison operators | |
| bool | operator== (const USI &rhs) const |
| Equality operator. | |
| bool | operator!= (const USI &rhs) const |
| Inequality operator. | |
| bool | operator< (const USI &rhs) const |
| Less-than operator (for sorting/containers) | |
Accessors | |
| const String & | getCollection () const |
| Returns the collection (dataset identifier or library name) | |
| void | setCollection (const String &collection) |
| Sets the collection (dataset identifier or library name) | |
| const String & | getMSRun () const |
| Returns the MS run file name. | |
| void | setMSRun (const String &ms_run) |
| Sets the MS run file name. | |
| IndexType | getIndexType () const |
| Returns the index type. | |
| void | setIndexType (IndexType index_type) |
| Sets the index type. | |
| const String & | getIndex () const |
| Returns the spectrum index as a string. | |
| void | setIndex (const String &index) |
| Sets the spectrum index. | |
| const String & | getInterpretation () const |
| Returns the interpretation (peptide sequence/charge) | |
| void | setInterpretation (const String &interpretation) |
| Sets the interpretation (peptide sequence/charge) | |
| bool | hasInterpretation () const |
| Returns true if an interpretation is present. | |
String conversion | |
| String | toString () const |
| Converts the USI to its string representation. | |
| bool | fromString (const String &usi_string) |
| Parse a USI string and populate this object. | |
Static Public Member Functions | |
Static utility methods | |
| static USI | createFromScanNumber (const String &dataset_id, const String &filename, int scan_number, const String &interpretation="") |
| Create a USI from spectrum metadata. | |
| static USI | createFromNativeID (const String &dataset_id, const String &filename, const String &native_id) |
| Create a USI from a native spectrum identifier. | |
| static std::optional< int > | extractScanNumberFromNativeID (const String &native_id) |
| Extract scan number from a native spectrum ID. | |
| static String | indexTypeToString (IndexType index_type) |
| Get the index type string representation. | |
| static IndexType | indexTypeFromString (const String &type_string) |
| Parse index type from string. | |
| static String | extractBasename (const String &filepath) |
| Extract basename from a file path (removes directory path). | |
CV term information | |
| String | collection_ |
| Dataset identifier or library name. | |
| String | ms_run_ |
| MS run file name. | |
| IndexType | index_type_ |
| Type of spectrum index. | |
| String | index_ |
| Spectrum index value. | |
| String | interpretation_ |
| Optional peptide interpretation. | |
| static const String | USI_PREFIX |
| Prefix for all USIs. | |
| static const String | CV_ACCESSION |
| CV accession for USI. | |
| static const String | CV_NAME |
| CV name for USI. | |
| static const String & | getCVAccession () |
| Returns the PSI-MS CV accession for USI (MS:1003063) | |
| static const String & | getCVName () |
| Returns the PSI-MS CV name for USI. | |
Validity checking | |
| bool | isValid () const |
| Check if this USI is valid (has all required components). | |
| static bool | isValidUSI (const String &usi_string) |
| Check if a string is a valid USI format. | |
| static std::optional< USI > | tryParse (const String &usi_string) |
| Try to parse a USI string (non-throwing). | |
Utility class for handling Universal Spectrum Identifiers (USI).
The Universal Spectrum Identifier (USI) is a standardized identifier format defined by the HUPO-PSI (Human Proteome Organization - Proteomics Standards Initiative) for uniquely referencing mass spectrometry spectra across public repositories. The USI format enables unambiguous identification of spectra in ProteomeXchange partner repositories (PRIDE, PeptideAtlas, MassIVE, jPOST, iProX) and spectral libraries.
Where:
mzspec:PXD000561:Adult_Frontalcortex.mzML:scan:12345 mzspec:PXD000561:Adult_Frontalcortex.mzML:scan:12345:PEPTIDEK/2mzspec:PXD000561:sample.mzML:scan:1234:PEPT[Phospho]IDEK/2
|
strong |
Supported index types in USI.
| Enumerator | |
|---|---|
| SCAN | scan number (most common) |
| INDEX | zero-based spectrum index |
| NATIVEID | native spectrum identifier from the instrument |
| USI | ( | const String & | collection, |
| const String & | ms_run, | ||
| IndexType | index_type, | ||
| const String & | index, | ||
| const String & | interpretation = "" |
||
| ) |
Construct a USI from its components.
| collection | ProteomeXchange dataset identifier or spectral library name |
| ms_run | Name of the MS run file |
| index_type | Type of spectrum index (scan, index, or nativeId) |
| index | Numeric identifier of the spectrum |
| interpretation | Optional peptide interpretation (sequence/charge) |
Construct a USI from its string representation.
| usi_string | Complete USI string to parse |
| Exception::ParseError | if the USI string is malformed |
|
default |
Destructor.
|
static |
Create a USI from spectrum metadata.
This is a convenience method for creating a USI from commonly available spectrum information.
| dataset_id | ProteomeXchange dataset identifier (e.g., "PXD000561") |
| filename | MS file name (e.g., "sample.mzML") |
| scan_number | Scan number |
| interpretation | Optional ProForma interpretation (e.g., "PEPTIDEK/2") |
Extract basename from a file path (removes directory path).
This is useful when converting full file paths to MS run names for USI. Example: "/path/to/sample.mzML" -> "sample.mzML" Example: "file:///C:/data/sample.mzML" -> "sample.mzML"
| filepath | Full file path or URI |
|
static |
Extract scan number from a native spectrum ID.
Attempts to extract a numeric scan number from various native ID formats. This is useful when converting native IDs to USI format.
| native_id | Native spectrum identifier |
| bool fromString | ( | const String & | usi_string | ) |
| const String & getCollection | ( | ) | const |
Returns the collection (dataset identifier or library name)
|
static |
Returns the PSI-MS CV accession for USI (MS:1003063)
| const String & getIndex | ( | ) | const |
Returns the spectrum index as a string.
| IndexType getIndexType | ( | ) | const |
Returns the index type.
| const String & getInterpretation | ( | ) | const |
Returns the interpretation (peptide sequence/charge)
| const String & getMSRun | ( | ) | const |
Returns the MS run file name.
| bool hasInterpretation | ( | ) | const |
Returns true if an interpretation is present.
Parse index type from string.
| type_string | String representation of index type |
| Exception::InvalidValue | if the string is not a valid index type |
| bool isValid | ( | ) | const |
|
static |
| bool operator!= | ( | const USI & | rhs | ) | const |
Inequality operator.
| bool operator< | ( | const USI & | rhs | ) | const |
Less-than operator (for sorting/containers)
| bool operator== | ( | const USI & | rhs | ) | const |
Equality operator.
| void setCollection | ( | const String & | collection | ) |
Sets the collection (dataset identifier or library name)
| void setIndex | ( | const String & | index | ) |
Sets the spectrum index.
| void setIndexType | ( | IndexType | index_type | ) |
Sets the index type.
| void setInterpretation | ( | const String & | interpretation | ) |
Sets the interpretation (peptide sequence/charge)
| void setMSRun | ( | const String & | ms_run | ) |
Sets the MS run file name.
Try to parse a USI string (non-throwing).
This is the preferred method when you want to validate and use a USI in a single operation, avoiding double-parsing.
| usi_string | String to parse |
|
protected |
Dataset identifier or library name.
|
protected |
Spectrum index value.
|
protected |
Type of spectrum index.
|
protected |
Optional peptide interpretation.
|
protected |
MS run file name.
|
staticprotected |
Prefix for all USIs.