-
Notifications
You must be signed in to change notification settings - Fork 0
MSP files management
IDSL.FSA was designed to manage MSP format mass spectrometry files with different structures. IDSL.FSA provide various tools to manage msp files which a number of them are summarized below:
The mgf2msp
convert Mascot generic format files (.mgf) into NIST mass spectra format (.msp). The mgf2msp
module is fast which requires <2 sec for mgf files with ~5,000 fragmentation blocks on a single thread.
mgf2msp(path = getwd(), MGFfileName = "")
path: Location of the original msp file
MGFfileName: Name of the mgf file with its extension
The converted files are stored in the same directory with an .msp extension.
In many instances, msp public libraries include both positive and negative fragmentation data in one msp file. Therefore, IDSL.FSA utilized a module, mspSpiltterPosNeg
, to separate positive and negative msp blocks for a rapid and efficient annotation. This module is so easy to use:
mspSpiltterPosNeg(path = getwd(), mspFileName = "", number_processing_threads = 1)
path: Location of the original msp file
mspFileName: Name of the msp file with its extension
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments
The isolated MSP blocks are stored in the same directory with "POS_" and "NEG_" prefixes.
The FSdb2precursorType
can detect potential ionization pathways for molecular formulas using a vector of InChIKey values from an FSDB. This function only searches for the first 14 InChIKey letters; and therefore, may result with multiple potential precursor types.
FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency", number_processing_threads = 1)
InChIKeyVector: A vector of InChIKey values. This value may contain whole InChIKey strings or first 14 InChIKey letters.
libFSdb: A converted MSP library reference file using the msp2FSdb
module which is an FSDB produced by the IDSL.FSA package.
tableIndicator: c("Frequency", "PrecursorMZ"). To show frequency or a median of PrecursorMZ values in the output dataframe for each precursor type.
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments
A matrix of frequency for each InChIKey in the FSDB. The matrix column headers represent precursor types.
This FSA_msp2Cytoscape
module performs pairwise MSP blocks analysis to create Cytoscape networks files. This function is especially beneficial to find related peaks in an analysis.
FSA_msp2Cytoscape(path, MSPfile, mspVariableVector = NULL, mspNodeID = NULL,
massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75, noiseRemovalRatio = 0.01,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, number_processing_threads = 1)
path: address of msp file
MSPfile: name of msp file
mspVariableVector: a vector of msp variables
mspNodeID: msp Node ID which is the ID that is required for the `specsim' ID generation
massError: Mass accuracy in Da
RTtolerance: Retention time tolerance (min) to match msp blocks. Select NA to ignore retention time match. This option is so helpful to find co-occurring compounds.
minEntropySimilarity: Minimum entropy similarity score
noiseRemovalRatio: Noise level removal relative to the basepeak to measure entropy similarity score (in percent)
allowedNominalMass: c(TRUE
, FALSE
). Select TRUE
only for nominal mass analysis.
allowedWeightedSpectralEntropy: c(TRUE
, FALSE
). Weighted entropy to measure entropy similarity score.
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments
This FSA_uniqueMSPblockTagger
module performs pairwise MSP blocks analysis to create Cytoscape networks files. This function is especially beneficial to find related peaks in an analysis.
FSA_uniqueMSPblockTagger(path, MSPfile, aggregateBy = "Name", massError = 0.01, RTtolerance = NA,
minEntropySimilarity = 0.75, noiseRemovalRatio = 0.01, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, number_processing_threads = 1)
path: address of msp file
MSPfile: name of msp file
aggregateBy: a variable to aggregate the MSP blocks based on
massError: Mass accuracy in Da
RTtolerance: Retention time tolerance (min) to match msp blocks. Select NA to ignore retention time match. This option is so helpful to find co-occurring compounds.
minEntropySimilarity: Minimum entropy similarity score
noiseRemovalRatio: Noise level removal relative to the basepeak to measure entropy similarity score (in percent)
allowedNominalMass: c(TRUE
, FALSE
). Select TRUE
only for nominal mass analysis.
allowedWeightedSpectralEntropy: c(TRUE
, FALSE
). Weighted entropy to measure entropy similarity score.
number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments