-
Notifications
You must be signed in to change notification settings - Fork 0
MSP files management
IDSL.FSA was designed to manage MSP format mass spectrometry files with different structures. IDSL.FSA provide various tools to manage .msp files which a number of them are summarized below:
The msp2FSdb
can generate organized Fragmentation Spectra DataBase (FSDB) libraries for data parsing.
msp2FSdb(path = getwd(), MSPfile_vector = "", massIntegrationWindow = 0, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)
path: address of .msp file
MSPfile_vector: a vector of .msp file names
massIntegrationWindow: Mass accuracy in Da
allowedNominalMass: c(TRUE
, FALSE
). Select TRUE
only for nominal mass analysis.
allowedWeightedSpectralEntropy: c(TRUE
, FALSE
). Weighted entropy to measure entropy similarity score.
noiseRemovalRatio: Noise level removal relative to the basepeak to measure entropy similarity score (0-1)
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.
The mgf2msp
can convert Mascot generic format files (.mgf) into NIST mass spectra format (.msp). The mgf2msp
module is fast which requires <2 sec for .mgf files with ~5,000 fragmentation blocks on a single thread.
mgf2msp(path = getwd(), MGFfileName = "")
path: Location of the original .msp file
MGFfileName: Name of the mgf file with its extension
The converted files are stored in the same directory with .msp extensions.
In many instances, .msp public libraries include both positive and negative fragmentation data in one .msp file. Therefore, IDSL.FSA utilized a module, mspSplitterPosNeg
, to separate positive and negative MSP blocks for a rapid and efficient annotation. This module is so easy to use:
mspSplitterPosNeg(path = getwd(), MSPfile = "", number_processing_threads = 1)
path: Location of the original .msp file
MSPfile: Name of the .msp file with its extension
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments
The separated MSP blocks are stored in the same directory with "_Pos" and "_Neg" suffixes.
The FSdb2precursorType
can detect potential ionization pathways for molecular formulas using a vector of InChIKey values from an FSDB. This function only searches for the first 14 InChIKey letters; and therefore, may result with multiple potential precursor types.
FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency", number_processing_threads = 1)
InChIKeyVector: A vector of InChIKey values. This value may contain whole InChIKey strings or first 14 InChIKey letters.
libFSdb: A converted MSP library reference file using the msp2FSdb
module which is an FSDB produced by the IDSL.FSA package.
tableIndicator: c("Frequency", "PrecursorMZ"). To show frequency or a median of PrecursorMZ values in the output dataframe for each precursor type.
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments
A matrix of frequency for each InChIKey in the FSDB. The matrix column headers represent precursor types.
This FSA_msp2Cytoscape
module performs pairwise MSP block analysis to create Cytoscape networks files. This function is especially beneficial to find related peaks in an analysis.
FSA_msp2Cytoscape(path = getwd(), MSPfile = "", mspVariableVector = NULL, mspNodeID = NULL,
massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)
path: address of .msp file
MSPfile: name of .msp file
mspVariableVector: a vector of MSP variables
mspNodeID: MSP Node ID which is the ID that is required for the `specsim' ID generation
massError: Mass accuracy in Da
RTtolerance: Retention time tolerance (min) to match MSP blocks. Select NA to ignore retention time match. This option is so helpful to find co-occurring compounds.
minEntropySimilarity: Minimum entropy similarity score
allowedNominalMass: c(TRUE
, FALSE
). Select TRUE
only for nominal mass analysis.
allowedWeightedSpectralEntropy: c(TRUE
, FALSE
). Weighted entropy to measure entropy similarity score.
noiseRemovalRatio: Noise level removal relative to the basepeak to measure entropy similarity score (0-1)
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments
This FSA_uniqueMSPblockTagger
module performs pairwise MSP blocks analysis to remove similar MSP blocks in an .msp file.
FSA_uniqueMSPblockTagger(path = getwd(), MSPfile = "", aggregateBy = "Name", massError = 0.01,
RTtolerance = NA, minEntropySimilarity = 0.75, noiseRemovalRatio = 0.01, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, number_processing_threads = 1)
path: address of .msp file
MSPfile: name of .msp file
aggregateBy: a variable to aggregate the MSP blocks based on
massError: Mass accuracy in Da
RTtolerance: Retention time tolerance (min) to match MSP blocks. Select NA to ignore retention time match. This option is so helpful to find co-occurring compounds.
minEntropySimilarity: Minimum entropy similarity score
noiseRemovalRatio: Noise level removal relative to the basepeak to measure entropy similarity score (0-1)
allowedNominalMass: c(TRUE
, FALSE
). Select TRUE
only for nominal mass analysis.
allowedWeightedSpectralEntropy: c(TRUE
, FALSE
). Weighted entropy to measure entropy similarity score.
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.