Skip to content

MSP files management

Sadjad F Baygi edited this page Nov 8, 2022 · 16 revisions

IDSL.FSA was designed to manage MSP format mass spectrometry files with different structures. IDSL.FSA provide various tools to manage .msp files which a number of them are summarized below:

msp2FSdb

The msp2FSdb can generate organized Fragmentation Spectra DataBase (FSDB) libraries for data parsing.

msp2FSdb(path = getwd(), MSPfile_vector = "", massIntegrationWindow = 0, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)

path: address of .msp file

MSPfile_vector: a vector of .msp file names

massIntegrationWindow: Mass accuracy in Da

allowedNominalMass: c(TRUE, FALSE). Select TRUE only for nominal mass analysis.

allowedWeightedSpectralEntropy: c(TRUE, FALSE). Weighted entropy to measure entropy similarity score.

noiseRemovalRatio: Noise level removal relative to the basepeak to measure entropy similarity score (0-1)

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.

mgf2msp

The mgf2msp can convert Mascot generic format files (.mgf) into NIST mass spectra format (.msp). The mgf2msp module is fast which requires <2 sec for .mgf files with ~5,000 fragmentation blocks on a single thread.

mgf2msp(path = getwd(), MGFfileName = "")

path: Location of the original .msp file

MGFfileName: Name of the mgf file with its extension

The converted files are stored in the same directory with .msp extensions.

mspSplitterPosNeg

In many instances, .msp public libraries include both positive and negative fragmentation data in one .msp file. Therefore, IDSL.FSA utilized a module, mspSplitterPosNeg, to separate positive and negative MSP blocks for a rapid and efficient annotation. This module is so easy to use:

mspSplitterPosNeg(path = getwd(), MSPfile = "", number_processing_threads = 1)

path: Location of the original .msp file

MSPfile: Name of the .msp file with its extension

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments

The separated MSP blocks are stored in the same directory with "_Pos" and "_Neg" suffixes.

FSdb2precursorType

The FSdb2precursorType can detect potential ionization pathways for molecular formulas using a vector of InChIKey values from an FSDB. This function only searches for the first 14 InChIKey letters; and therefore, may result with multiple potential precursor types.

FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency", number_processing_threads = 1)

InChIKeyVector: A vector of InChIKey values. This value may contain whole InChIKey strings or first 14 InChIKey letters.

libFSdb: A converted MSP library reference file using the msp2FSdb module which is an FSDB produced by the IDSL.FSA package.

tableIndicator: c("Frequency", "PrecursorMZ"). To show frequency or a median of PrecursorMZ values in the output dataframe for each precursor type.

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments

A matrix of frequency for each InChIKey in the FSDB. The matrix column headers represent precursor types.

FSA_msp2Cytoscape

This FSA_msp2Cytoscape module performs pairwise MSP block analysis to create Cytoscape networks files. This function is especially beneficial to find related peaks in an analysis.

FSA_msp2Cytoscape(path = getwd(), MSPfile = "", mspVariableVector = NULL, mspNodeID = NULL,
massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)

path: address of .msp file

MSPfile: name of .msp file

mspVariableVector: a vector of MSP variables

mspNodeID: MSP Node ID which is the ID that is required for the `specsim' ID generation

massError: Mass accuracy in Da

RTtolerance: Retention time tolerance (min) to match MSP blocks. Select NA to ignore retention time match. This option is so helpful to find co-occurring compounds.

minEntropySimilarity: Minimum entropy similarity score

allowedNominalMass: c(TRUE, FALSE). Select TRUE only for nominal mass analysis.

allowedWeightedSpectralEntropy: c(TRUE, FALSE). Weighted entropy to measure entropy similarity score.

noiseRemovalRatio: Noise level removal relative to the basepeak to measure entropy similarity score (0-1)

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments

FSA_uniqueMSPblockTagger

This FSA_uniqueMSPblockTagger module performs pairwise MSP blocks analysis to remove similar MSP blocks in an .msp file.

FSA_uniqueMSPblockTagger(path = getwd(), MSPfile = "", aggregateBy = "Name", massError = 0.01,
RTtolerance = NA, minEntropySimilarity = 0.75, noiseRemovalRatio = 0.01, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, number_processing_threads = 1)

path: address of .msp file

MSPfile: name of .msp file

aggregateBy: a variable to aggregate the MSP blocks based on

massError: Mass accuracy in Da

RTtolerance: Retention time tolerance (min) to match MSP blocks. Select NA to ignore retention time match. This option is so helpful to find co-occurring compounds.

minEntropySimilarity: Minimum entropy similarity score

noiseRemovalRatio: Noise level removal relative to the basepeak to measure entropy similarity score (0-1)

allowedNominalMass: c(TRUE, FALSE). Select TRUE only for nominal mass analysis.

allowedWeightedSpectralEntropy: c(TRUE, FALSE). Weighted entropy to measure entropy similarity score.

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.

Clone this wiki locally