OpenMS
Loading...
Searching...
No Matches
USI Class Reference

Utility class for handling Universal Spectrum Identifiers (USI). More...

#include <OpenMS/METADATA/USI.h>

Collaboration diagram for USI:
[legend]

Public Types

enum class  IndexType { SCAN , INDEX , NATIVEID }
 Supported index types in USI. More...
 

Public Member Functions

Constructors and Destructor
 USI ()
 Default constructor (creates an empty/invalid USI)
 
 USI (const String &collection, const String &ms_run, IndexType index_type, const String &index, const String &interpretation="")
 Construct a USI from its components.
 
 USI (const String &usi_string)
 Construct a USI from its string representation.
 
 USI (const USI &)=default
 Copy constructor.
 
 USI (USI &&) noexcept=default
 Move constructor.
 
 ~USI ()=default
 Destructor.
 
Assignment operators
USIoperator= (const USI &)=default
 Copy assignment operator.
 
USIoperator= (USI &&) noexcept=default
 Move assignment operator.
 
Comparison operators
bool operator== (const USI &rhs) const
 Equality operator.
 
bool operator!= (const USI &rhs) const
 Inequality operator.
 
bool operator< (const USI &rhs) const
 Less-than operator (for sorting/containers)
 
Accessors
const StringgetCollection () const
 Returns the collection (dataset identifier or library name)
 
void setCollection (const String &collection)
 Sets the collection (dataset identifier or library name)
 
const StringgetMSRun () const
 Returns the MS run file name.
 
void setMSRun (const String &ms_run)
 Sets the MS run file name.
 
IndexType getIndexType () const
 Returns the index type.
 
void setIndexType (IndexType index_type)
 Sets the index type.
 
const StringgetIndex () const
 Returns the spectrum index as a string.
 
void setIndex (const String &index)
 Sets the spectrum index.
 
const StringgetInterpretation () const
 Returns the interpretation (peptide sequence/charge)
 
void setInterpretation (const String &interpretation)
 Sets the interpretation (peptide sequence/charge)
 
bool hasInterpretation () const
 Returns true if an interpretation is present.
 
String conversion
String toString () const
 Converts the USI to its string representation.
 
bool fromString (const String &usi_string)
 Parse a USI string and populate this object.
 

Static Public Member Functions

Static utility methods
static USI createFromScanNumber (const String &dataset_id, const String &filename, int scan_number, const String &interpretation="")
 Create a USI from spectrum metadata.
 
static USI createFromNativeID (const String &dataset_id, const String &filename, const String &native_id)
 Create a USI from a native spectrum identifier.
 
static std::optional< int > extractScanNumberFromNativeID (const String &native_id)
 Extract scan number from a native spectrum ID.
 
static String indexTypeToString (IndexType index_type)
 Get the index type string representation.
 
static IndexType indexTypeFromString (const String &type_string)
 Parse index type from string.
 
static String extractBasename (const String &filepath)
 Extract basename from a file path (removes directory path).
 

CV term information

String collection_
 Dataset identifier or library name.
 
String ms_run_
 MS run file name.
 
IndexType index_type_
 Type of spectrum index.
 
String index_
 Spectrum index value.
 
String interpretation_
 Optional peptide interpretation.
 
static const String USI_PREFIX
 Prefix for all USIs.
 
static const String CV_ACCESSION
 CV accession for USI.
 
static const String CV_NAME
 CV name for USI.
 
static const StringgetCVAccession ()
 Returns the PSI-MS CV accession for USI (MS:1003063)
 
static const StringgetCVName ()
 Returns the PSI-MS CV name for USI.
 

Validity checking

bool isValid () const
 Check if this USI is valid (has all required components).
 
static bool isValidUSI (const String &usi_string)
 Check if a string is a valid USI format.
 
static std::optional< USItryParse (const String &usi_string)
 Try to parse a USI string (non-throwing).
 

Detailed Description

Utility class for handling Universal Spectrum Identifiers (USI).

The Universal Spectrum Identifier (USI) is a standardized identifier format defined by the HUPO-PSI (Human Proteome Organization - Proteomics Standards Initiative) for uniquely referencing mass spectrometry spectra across public repositories. The USI format enables unambiguous identification of spectra in ProteomeXchange partner repositories (PRIDE, PeptideAtlas, MassIVE, jPOST, iProX) and spectral libraries.

USI Format
The general format of a USI is:
mzspec:<collection>:<ms_run>:<index_type>:<index>[:interpretation]

Where:

  • collection: ProteomeXchange dataset identifier (e.g., PXD000561) or spectral library name
  • ms_run: Name of the MS run file (e.g., sample1.mzML)
  • index_type: Type of spectrum index, typically "scan" or "index"
  • index: Numeric identifier of the spectrum
  • interpretation: Optional peptide interpretation in ProForma format (sequence/charge)
Examples
  • Basic spectrum reference: mzspec:PXD000561:Adult_Frontalcortex.mzML:scan:12345
  • With peptide interpretation: mzspec:PXD000561:Adult_Frontalcortex.mzML:scan:12345:PEPTIDEK/2
  • With modification: mzspec:PXD000561:sample.mzML:scan:1234:PEPT[Phospho]IDEK/2
PSI-MS CV Term
USI is defined in the PSI-MS ontology as MS:1003063 (universal spectrum identifier).
See also
SpectrumLookup, SpectrumSettings, PeptideIdentification
https://www.psidev.info/usi (USI specification)

Member Enumeration Documentation

◆ IndexType

enum class IndexType
strong

Supported index types in USI.

Enumerator
SCAN 

scan number (most common)

INDEX 

zero-based spectrum index

NATIVEID 

native spectrum identifier from the instrument

Constructor & Destructor Documentation

◆ USI() [1/5]

USI ( )

Default constructor (creates an empty/invalid USI)

◆ USI() [2/5]

USI ( const String collection,
const String ms_run,
IndexType  index_type,
const String index,
const String interpretation = "" 
)

Construct a USI from its components.

Parameters
collectionProteomeXchange dataset identifier or spectral library name
ms_runName of the MS run file
index_typeType of spectrum index (scan, index, or nativeId)
indexNumeric identifier of the spectrum
interpretationOptional peptide interpretation (sequence/charge)

◆ USI() [3/5]

USI ( const String usi_string)
explicit

Construct a USI from its string representation.

Parameters
usi_stringComplete USI string to parse
Exceptions
Exception::ParseErrorif the USI string is malformed

◆ USI() [4/5]

USI ( const USI )
default

Copy constructor.

◆ USI() [5/5]

USI ( USI &&  )
defaultnoexcept

Move constructor.

◆ ~USI()

~USI ( )
default

Destructor.

Member Function Documentation

◆ createFromNativeID()

static USI createFromNativeID ( const String dataset_id,
const String filename,
const String native_id 
)
static

Create a USI from a native spectrum identifier.

Parameters
dataset_idProteomeXchange dataset identifier
filenameMS file name
native_idNative spectrum identifier from the instrument
Returns
Constructed USI object

◆ createFromScanNumber()

static USI createFromScanNumber ( const String dataset_id,
const String filename,
int  scan_number,
const String interpretation = "" 
)
static

Create a USI from spectrum metadata.

This is a convenience method for creating a USI from commonly available spectrum information.

Parameters
dataset_idProteomeXchange dataset identifier (e.g., "PXD000561")
filenameMS file name (e.g., "sample.mzML")
scan_numberScan number
interpretationOptional ProForma interpretation (e.g., "PEPTIDEK/2")
Returns
Constructed USI object

◆ extractBasename()

static String extractBasename ( const String filepath)
static

Extract basename from a file path (removes directory path).

This is useful when converting full file paths to MS run names for USI. Example: "/path/to/sample.mzML" -> "sample.mzML" Example: "file:///C:/data/sample.mzML" -> "sample.mzML"

Parameters
filepathFull file path or URI
Returns
Basename without path component

◆ extractScanNumberFromNativeID()

static std::optional< int > extractScanNumberFromNativeID ( const String native_id)
static

Extract scan number from a native spectrum ID.

Attempts to extract a numeric scan number from various native ID formats. This is useful when converting native IDs to USI format.

Parameters
native_idNative spectrum identifier
Returns
Scan number if extraction succeeded, std::nullopt otherwise

◆ fromString()

bool fromString ( const String usi_string)

Parse a USI string and populate this object.

Parameters
usi_stringUSI string to parse
Returns
True if parsing succeeded, false otherwise

◆ getCollection()

const String & getCollection ( ) const

Returns the collection (dataset identifier or library name)

◆ getCVAccession()

static const String & getCVAccession ( )
static

Returns the PSI-MS CV accession for USI (MS:1003063)

◆ getCVName()

static const String & getCVName ( )
static

Returns the PSI-MS CV name for USI.

◆ getIndex()

const String & getIndex ( ) const

Returns the spectrum index as a string.

◆ getIndexType()

IndexType getIndexType ( ) const

Returns the index type.

◆ getInterpretation()

const String & getInterpretation ( ) const

Returns the interpretation (peptide sequence/charge)

◆ getMSRun()

const String & getMSRun ( ) const

Returns the MS run file name.

◆ hasInterpretation()

bool hasInterpretation ( ) const

Returns true if an interpretation is present.

◆ indexTypeFromString()

static IndexType indexTypeFromString ( const String type_string)
static

Parse index type from string.

Parameters
type_stringString representation of index type
Returns
Index type enum value
Exceptions
Exception::InvalidValueif the string is not a valid index type

◆ indexTypeToString()

static String indexTypeToString ( IndexType  index_type)
static

Get the index type string representation.

Parameters
index_typeIndex type enum value
Returns
String representation ("scan", "index", or "nativeId")

◆ isValid()

bool isValid ( ) const

Check if this USI is valid (has all required components).

A valid USI must have non-empty collection, ms_run, and index fields.

Returns
True if the USI is valid, false otherwise

◆ isValidUSI()

static bool isValidUSI ( const String usi_string)
static

Check if a string is a valid USI format.

Validates that the string follows the USI specification format.

Parameters
usi_stringString to validate
Returns
True if the string is a valid USI, false otherwise

◆ operator!=()

bool operator!= ( const USI rhs) const

Inequality operator.

◆ operator<()

bool operator< ( const USI rhs) const

Less-than operator (for sorting/containers)

◆ operator=() [1/2]

USI & operator= ( const USI )
default

Copy assignment operator.

◆ operator=() [2/2]

USI & operator= ( USI &&  )
defaultnoexcept

Move assignment operator.

◆ operator==()

bool operator== ( const USI rhs) const

Equality operator.

◆ setCollection()

void setCollection ( const String collection)

Sets the collection (dataset identifier or library name)

◆ setIndex()

void setIndex ( const String index)

Sets the spectrum index.

◆ setIndexType()

void setIndexType ( IndexType  index_type)

Sets the index type.

◆ setInterpretation()

void setInterpretation ( const String interpretation)

Sets the interpretation (peptide sequence/charge)

◆ setMSRun()

void setMSRun ( const String ms_run)

Sets the MS run file name.

◆ toString()

String toString ( ) const

Converts the USI to its string representation.

Returns
Complete USI string

◆ tryParse()

static std::optional< USI > tryParse ( const String usi_string)
static

Try to parse a USI string (non-throwing).

This is the preferred method when you want to validate and use a USI in a single operation, avoiding double-parsing.

Parameters
usi_stringString to parse
Returns
Parsed USI if valid, std::nullopt otherwise
// Prefer this pattern:
if (auto usi = USI::tryParse(str)) {
use(*usi);
}
// Over this (which parses twice):
if (USI::isValidUSI(str)) {
USI usi(str);
use(usi);
}
Utility class for handling Universal Spectrum Identifiers (USI).
Definition USI.h:54
static bool isValidUSI(const String &usi_string)
Check if a string is a valid USI format.
static std::optional< USI > tryParse(const String &usi_string)
Try to parse a USI string (non-throwing).

Member Data Documentation

◆ collection_

String collection_
protected

Dataset identifier or library name.

◆ CV_ACCESSION

const String CV_ACCESSION
staticprotected

CV accession for USI.

◆ CV_NAME

const String CV_NAME
staticprotected

CV name for USI.

◆ index_

String index_
protected

Spectrum index value.

◆ index_type_

IndexType index_type_
protected

Type of spectrum index.

◆ interpretation_

String interpretation_
protected

Optional peptide interpretation.

◆ ms_run_

String ms_run_
protected

MS run file name.

◆ USI_PREFIX

const String USI_PREFIX
staticprotected

Prefix for all USIs.