| Title: | Fragmentation Spectra Analysis (FSA) |
|---|---|
| Description: | The 'IDSL.FSA' package was designed to annotate standard .msp (mass spectra format) and .mgf (Mascot generic format) files using mass spectral entropy similarity, dot product (cosine) similarity, and normalized Euclidean mass error (NEME) followed by intelligent pre-filtering steps for rapid spectra searches. 'IDSL.FSA' also provides a number of modules to convert and manipulate .msp and .mgf files. The 'IDSL.FSA' workflow was integrated in the 'IDSL.CSA' and 'IDSL.NPA' packages introduced in <doi:10.1021/acs.analchem.3c00376>. |
| Authors: | Sadjad Fakouri-Baygi [aut]
|
| Maintainer: | Dinesh Barupal <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.2 |
| Built: | 2026-05-12 06:12:33 UTC |
| Source: | https://github.com/idslme/idsl.fsa |
This module annotates fragmentation spectra from .MSP files.
fragmentation_spectra_annotator(path, MSPfile = "", libFSdb, libFSdbIDlist, targetedPrecursorType = NA, ratio2basePeak4nSpectraMarkers = 0, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, roundingDigitPrefiltering = 1, minMatchedNumPeaks = 1, massError = 0, maxNEME = 0, minIonRangeDifference = 0, minCosineSimilarity, minEntropySimilarity, minRatioMatchedNspectraMarkers, spectralEntropyDeviationPrefiltering, massErrorPrecursor = NA, RTtolerance = NA, exportSpectraParameters = NULL, number_processing_threads = 1)fragmentation_spectra_annotator(path, MSPfile = "", libFSdb, libFSdbIDlist, targetedPrecursorType = NA, ratio2basePeak4nSpectraMarkers = 0, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, roundingDigitPrefiltering = 1, minMatchedNumPeaks = 1, massError = 0, maxNEME = 0, minIonRangeDifference = 0, minCosineSimilarity, minEntropySimilarity, minRatioMatchedNspectraMarkers, spectralEntropyDeviationPrefiltering, massErrorPrecursor = NA, RTtolerance = NA, exportSpectraParameters = NULL, number_processing_threads = 1)
path |
Address of .msp file(s) |
MSPfile |
name of the .msp file |
libFSdb |
A converted .msp library reference file using the 'msp2FSdb' module which is an FSDB produced by the IDSL.FSA package. |
libFSdbIDlist |
Ion markers object from the FSDB reference |
targetedPrecursorType |
A vector of targeted precursor types |
ratio2basePeak4nSpectraMarkers |
Ratio of peaks in fragmentation spectra to the basepeak to calculate minimum qualified number of matched abundant peaks |
allowedNominalMass |
c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis. |
allowedWeightedSpectralEntropy |
c(TRUE, FALSE). Weighted entropy to transform low abundant signals prior to calculating entropy similarity score. Please see the reference for details on wight transformation. |
noiseRemovalRatio |
noise removal ratio ([0 - 1])relative to the basepeak to measure entropy similarity score. |
roundingDigitPrefiltering |
Level of pre-filtering |
minMatchedNumPeaks |
Minimum matched number of peaks |
massError |
Mass accuracy in Da |
maxNEME |
Maximum value for Normalized Euclidean Mass Error (NEME) in mDa |
minIonRangeDifference |
Minimum distance (Da) between lowest and highest matched m/z to prevent matching only isotopic envelopes |
minCosineSimilarity |
Minimum cosine similarity score |
minEntropySimilarity |
Minimum entropy similarity score |
minRatioMatchedNspectraMarkers |
Minimum percentage of detection of abundant library peaks in percentage |
spectralEntropyDeviationPrefiltering |
Spectral entropy deviation for pre-filtering |
massErrorPrecursor |
Mass accuracy (Da) to find precursor m/z in .msp files |
RTtolerance |
Retention time tolerance (min) |
exportSpectraParameters |
Parameters for export MS/MS match figures |
number_processing_threads |
Number of processing threads for multi-threaded processing |
A dataframe of matched spectra
This module is to optimize the 'indexVec' variable by removing elements that have redundant 'idVec' numbers.
FSA_aggregate(idVec, variableVec, indexVec, targetVar)FSA_aggregate(idVec, variableVec, indexVec, targetVar)
idVec |
a vector of id numbers. Repeated id numbers are allowed |
variableVec |
a vector of variable of the interest such as RT, m/z, etc. |
indexVec |
a vector of indices |
targetVar |
the targeted value in 'variableVec' |
a clean indexVec after removing redundant 'idVec'.
This function is to set annotations on the spectra plots with a reasonable distance to avoid overlying annotations.
FSA_annotation_text_repel(FSAspectra, nGridX, nGridY)FSA_annotation_text_repel(FSAspectra, nGridX, nGridY)
FSAspectra |
FSAspectra |
nGridX |
number of grids on the x-axis |
nGridY |
number of grids on the y-axis |
labels
A module to create directories after removing the existing directory with the same name to prevent data interferences.
FSA_dir.create(folder, allowedUnlink = FALSE)FSA_dir.create(folder, allowedUnlink = FALSE)
folder |
folder |
allowedUnlink |
allowedUnlink |
when the original folder was deleted and recreated successfully, 'TRUE' is returned by this function.
This function processes the spreadsheet of the 'FSDB' tab to ensure the parameter inputs are consistent with the requirements of the IDSL.FSA pipeline.
FSA_FSdb_xlsxAnalyzer(spreadsheet)FSA_FSdb_xlsxAnalyzer(spreadsheet)
spreadsheet |
FSA spreadsheet |
This function returns the FSDB parameters to feed the 'FSdb_file_generator' function.
This function loads .Rdata files into a variable.
FSA_loadRdata(fileName)FSA_loadRdata(fileName)
fileName |
is an '.Rdata' file. |
The called variable into the new assigned variable name.
Locate indices of the pattern in the string
FSA_locate_regex(string, pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)FSA_locate_regex(string, pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
string |
a string as character |
pattern |
a pattern to screen |
ignore.case |
ignore.case |
perl |
perl |
fixed |
fixed |
useBytes |
useBytes |
This function returns 'NULL' when no matches are detected for the pattern.
A 2-column matrix of location indices. The first and second columns represent start and end positions, respectively.
pattern <- "Cl" string <- "NaCl.5HCl" Location_Cl <- FSA_locate_regex(string, pattern)pattern <- "Cl" string <- "NaCl.5HCl" Location_Cl <- FSA_locate_regex(string, pattern)
FSA_logRecorder
FSA_logRecorder(messageQuote, allowedPrinting = TRUE)FSA_logRecorder(messageQuote, allowedPrinting = TRUE)
messageQuote |
messageQuote |
allowedPrinting |
allowedPrinting |
a line of communication messages is exported to the console and the log .txt file.
FSA_message
FSA_message(messageQuote, failedMessage= TRUE)FSA_message(messageQuote, failedMessage= TRUE)
messageQuote |
messageQuote |
failedMessage |
failedMessage |
a line of communication messages is exported to the console.
This function arranges the parameters for the annotation process
FSA_msp_annotator(PARAM_SPEC, libFSdb, address_input_msp, output_path, allowedVerbose = TRUE)FSA_msp_annotator(PARAM_SPEC, libFSdb, address_input_msp, output_path, allowedVerbose = TRUE)
PARAM_SPEC |
a parameter driven from the 'FSA_SpectraSimilarity_xlsxAnalyzer' module. |
libFSdb |
a converted .msp library reference files (FSDB) using the 'msp2FSdb' module |
address_input_msp |
address of the .msp files |
output_path |
output path |
allowedVerbose |
c(TRUE, FALSE). A 'TRUE' allowedVerbose provides messages about the flow of the function. |
A dataframe of matched annotated spectra stored in the output directory.
This function generates necessary files from pairwise MSP blocks analysis to create Cytoscape networks.
FSA_msp2Cytoscape(path, MSPfile = "", mspVariableVector = NULL, mspNodeID = NULL, massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)FSA_msp2Cytoscape(path, MSPfile = "", mspVariableVector = NULL, mspNodeID = NULL, massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)
path |
address of .msp file or an FSDB |
MSPfile |
name of .msp file |
mspVariableVector |
a vector of msp variables |
mspNodeID |
msp Node ID which is the ID that is required for the ‘specsim’ ID generation |
massError |
Mass accuracy in Da |
RTtolerance |
Retention time tolerance (min) to match msp blocks. Select NA to ignore retention time match. This option is so helpful to find co-occurring compounds. |
minEntropySimilarity |
Minimum entropy similarity score |
allowedNominalMass |
c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis. |
allowedWeightedSpectralEntropy |
c(TRUE, FALSE). Weighted entropy to measure entropy similarity score. |
noiseRemovalRatio |
noise removal ratio relative to the basepeak to measure entropy similarity score (in percent) |
number_processing_threads |
Number of processing threads for multi-threaded processing |
node_attributes_dataFrame |
node_attributes dataframe. A string to store using 'writeTable' function of R after a tab separation. |
edge_dataFrame |
edge dataframe. A string to store using the 'writeTable' function of R after a tab separation. |
correlation_network |
correlation_network dataframe. A string to store using the 'writeTable' function of R after a tab separation. |
FSDB |
Fragmentation spectra database (FSDB) object |
exclusionMSPnoideid |
A vector of MSP node ids which can be excluded to create a library of unique MSP blocks. |
filteredNetworkSIF |
A filtered network in the cytoscape SIF format that does not have redundant MSP blocks within a RT window. |
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B. and Ideker, T., (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research, 13(11), 2498-2504, doi:10.1101/gr.1239303
path_extdata <- system.file("extdata", package = "IDSL.FSA") mspFileName <- "Kynurenine_Kynurenic_acid.msp" ## listCytoscape <- FSA_msp2Cytoscape(path = path_extdata, MSPfile = mspFileName, mspVariableVector = c("Name", "Collision_energy"), mspNodeID = NULL, massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0, noiseRemovalRatio = 0, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, number_processing_threads = 1) ## FSDB <- listCytoscape[["FSDB"]] ## temp_wd <- tempdir() # just a temporary folder to save results ## write.table(listCytoscape[["node_attributes_dataFrame"]], paste0(temp_wd, "/node_attributes_dataFrame.txt"), quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE) ## write.table(listCytoscape[["correlation_network"]], paste0(temp_wd, "/correlation_network.sif"), quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE) ## write.table(listCytoscape[["edge_dataFrame"]], paste0(temp_wd, "/edge_dataFrame.txt"), quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE) ##path_extdata <- system.file("extdata", package = "IDSL.FSA") mspFileName <- "Kynurenine_Kynurenic_acid.msp" ## listCytoscape <- FSA_msp2Cytoscape(path = path_extdata, MSPfile = mspFileName, mspVariableVector = c("Name", "Collision_energy"), mspNodeID = NULL, massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0, noiseRemovalRatio = 0, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, number_processing_threads = 1) ## FSDB <- listCytoscape[["FSDB"]] ## temp_wd <- tempdir() # just a temporary folder to save results ## write.table(listCytoscape[["node_attributes_dataFrame"]], paste0(temp_wd, "/node_attributes_dataFrame.txt"), quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE) ## write.table(listCytoscape[["correlation_network"]], paste0(temp_wd, "/correlation_network.sif"), quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE) ## write.table(listCytoscape[["edge_dataFrame"]], paste0(temp_wd, "/edge_dataFrame.txt"), quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE) ##
plot FSdb to Spectra
FSA_plotFSdb2Spectra(path, allowedUnlink = TRUE, annexName = "", FSdb, selectedFSdbIDs = NULL, number_processing_threads = 1, allowedVerbose = TRUE)FSA_plotFSdb2Spectra(path, allowedUnlink = TRUE, annexName = "", FSdb, selectedFSdbIDs = NULL, number_processing_threads = 1, allowedVerbose = TRUE)
path |
Address of .msp file(s) |
allowedUnlink |
allowedUnlink |
annexName |
annexName |
FSdb |
FSdb |
selectedFSdbIDs |
selected FSdb IDs. When 'NULL', the entire FSDB blocks are plotted. |
number_processing_threads |
Number of processing threads for multi-threaded processing |
allowedVerbose |
c(TRUE, FALSE). A 'TRUE' allowedVerbose provides messages about the flow of the function. |
spectra_figure object
This module ensures that the 'aggregate' function of R returns a list type of data.
FSA_R.aggregate(FSAvec)FSA_R.aggregate(FSAvec)
FSAvec |
a vector of data |
listIDFSAvec
This function generates spectra markers
FSA_spectra_marker_generator(FSdb, ratio2basePeak4nSpectraMarkers = 0, aggregationLevel = NA)FSA_spectra_marker_generator(FSdb, ratio2basePeak4nSpectraMarkers = 0, aggregationLevel = NA)
FSdb |
FSdb object from the 'msp2FSdb' module |
ratio2basePeak4nSpectraMarkers |
Ratio of peaks in fragmentation spectra to the basepeak to calculate minimum qualified number of matched abundant peaks |
aggregationLevel |
c(NA, 0, 1, 2, 3). When 'NA', this function returns a matrix for the spectra markers. When integer numbers are used, the ion marker masses are grouped by a rounding digit equal to this number. |
spectraMarkerMass |
a grouped or a matrix of ion marker masses corresponding to FSdb ids |
nSpectraMarkers |
number of spectra markers for each FSdb id |
This function processes the spreadsheet of the 'SpectraSimilarity' tab to ensure the parameter inputs are consistent with the requirements of the IDSL.FSA pipeline.
FSA_SpectraSimilarity_xlsxAnalyzer(spreadsheet)FSA_SpectraSimilarity_xlsxAnalyzer(spreadsheet)
spreadsheet |
FSA spreadsheet |
This function returns the FSA SpectraSimilarity parameters to feed the 'FSA_msp_annotator' module.
This function removes similar MSP blocks. This function aggregates MSP blocks based on the 'Name' values.
FSA_uniqueMSPblockTagger(path, MSPfile = "", aggregateBy = "Name", massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75, noiseRemovalRatio = 0.01, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, plotSpectra = FALSE, number_processing_threads = 1)FSA_uniqueMSPblockTagger(path, MSPfile = "", aggregateBy = "Name", massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75, noiseRemovalRatio = 0.01, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, plotSpectra = FALSE, number_processing_threads = 1)
path |
Address of .msp file or an FSDB |
MSPfile |
name of .msp file |
aggregateBy |
a variable to aggregate the MSP blocks based on |
massError |
Mass accuracy in Da |
RTtolerance |
Retention time tolerance (min) to match msp blocks. Select NA to ignore retention time match. |
minEntropySimilarity |
Minimum entropy similarity score |
noiseRemovalRatio |
noise removal ratio relative to the basepeak to measure entropy similarity score (in percent) |
allowedNominalMass |
c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis. |
allowedWeightedSpectralEntropy |
c(TRUE, FALSE). Weighted entropy to measure entropy similarity score. |
plotSpectra |
c(TRUE, FALSE) |
number_processing_threads |
Number of processing threads for multi-threaded processing |
a list of similar MSP blocks is returned at the end and a subsetted .msp and FSDB files are saved in the 'path' directory.
FSA_uniqueMSPblockTaggerUntargeted
FSA_uniqueMSPblockTaggerUntargeted(path, MSPfile_vector, minCSAdetectionFrequency = 20, minEntropySimilarity = 0.75, massError = 0.01, massErrorPrecursor = 0.01, RTtolerance = 0.1, noiseRemovalRatio = 0.01, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, plotSpectra = FALSE, number_processing_threads = 1)FSA_uniqueMSPblockTaggerUntargeted(path, MSPfile_vector, minCSAdetectionFrequency = 20, minEntropySimilarity = 0.75, massError = 0.01, massErrorPrecursor = 0.01, RTtolerance = 0.1, noiseRemovalRatio = 0.01, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, plotSpectra = FALSE, number_processing_threads = 1)
path |
Address of .msp file(s) |
MSPfile_vector |
A vector of names of .msp files or one .msp file name. |
minCSAdetectionFrequency |
minimum CSA detection frequency |
minEntropySimilarity |
minimum EntropySimilarity |
massError |
Mass accuracy in Da |
massErrorPrecursor |
Mass accuracy (Da) to find precursor m/z in .msp files |
RTtolerance |
Retention time tolerance (min) |
noiseRemovalRatio |
noise removal ratio ([0 - 1])relative to the basepeak to measure entropy similarity score. |
allowedNominalMass |
c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis. |
allowedWeightedSpectralEntropy |
c(TRUE, FALSE). Weighted entropy to transform low abundant signals prior to calculating entropy similarity score. Please see the reference for details on wight transformation. |
plotSpectra |
c(TRUE, FALSE) |
number_processing_threads |
Number of processing threads for multi-threaded processing |
uniqueMSPvariants
This function executes the FSA workflow.
FSA_workflow(spreadsheet)FSA_workflow(spreadsheet)
spreadsheet |
FSA spreadsheet |
This function organizes the FSA file processing for better performance using the template spreadsheet.
This function processes the spreadsheet of the FSA parameters to ensure the parameter inputs are consistent with the requirements of the IDSL.FSA pipeline.
FSA_xlsxAnalyzer(spreadsheet)FSA_xlsxAnalyzer(spreadsheet)
spreadsheet |
FSA spreadsheet |
This function returns the FSA parameters to feed the FSA_workflow function.
This function generates FSDB objects
FSdb_file_generator(PARAM_FSdb, output_path = NULL)FSdb_file_generator(PARAM_FSdb, output_path = NULL)
PARAM_FSdb |
'PARAM_FSdb' parameters obtained by the 'FSA_FSdb_xlsxAnalyzer' function. |
output_path |
output_path |
An FSDB object
FSdb subsetter
FSdb_subsetter(FSdb, inclusionIDs = NULL, exclusionIDs = NULL)FSdb_subsetter(FSdb, inclusionIDs = NULL, exclusionIDs = NULL)
FSdb |
FSdb |
inclusionIDs |
inclusionIDs |
exclusionIDs |
exclusionIDs |
subsetted FSdb
This function converts FSDB R objects into .msp standard files.
FSdb2msp(path, FSdbFileName = "", UnweightMSP = FALSE, number_processing_threads = 1)FSdb2msp(path, FSdbFileName = "", UnweightMSP = FALSE, number_processing_threads = 1)
path |
address of .msp file(s) |
FSdbFileName |
name of the FSDB library name including '.Rdata' extension |
UnweightMSP |
to unweight fragmentation patterns |
number_processing_threads |
Number of processing threads for multi-threaded processing |
The .msp file is stored in the same folder
FSdb2PeakXcolSubsetter
FSdb2PeakXcolSubsetter(FSdb_address, peak_alignment_folder, metavariable = "idsl.ipa_collective_peakids", number_processing_threads = 1)FSdb2PeakXcolSubsetter(FSdb_address, peak_alignment_folder, metavariable = "idsl.ipa_collective_peakids", number_processing_threads = 1)
FSdb_address |
FSdb_address |
peak_alignment_folder |
peak_alignment_folder |
metavariable |
metavariable |
number_processing_threads |
Number of processing threads for multi-threaded processing |
peakXcol |
peakXcol |
peak_height |
peak_height |
peak_area |
peak_area |
peak_R13C |
peak_R13C |
This function finds potential ionization pathways for molecular formulas using a vector of InChIKey values from an FSDB. This function only searches for the first 14 InChIKey letters; and therefore, may result with multiple potential precursor types.
FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency", number_processing_threads = 1)FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency", number_processing_threads = 1)
InChIKeyVector |
A vector of InChIKey values. This value may contain whole InChIKey strings or first 14 InChIKey letters. |
libFSdb |
A converted MSP library reference file using the 'msp2FSdb' module which is an FSDB produced by the IDSL.FSA package. |
tableIndicator |
c("Frequency", "PrecursorMZ"). To show frequency or a median of 'PrecursorMZ' values in the output dataframe for each precursor type. |
number_processing_threads |
Number of processing threads for multi-threaded processing |
A matrix of frequency for each InChIKey in the FSDB. The matrix column headers represent precursor types.
address_input_msp <- system.file("extdata", package = "IDSL.FSA") MSPfile_vector <- c("Kynurenine_Kynurenic_acid.msp") libFSdb <- msp2FSdb(path = address_input_msp, MSPfile_vector) ## InChIKeyVector <- c("HCZHHEIFKROPDY-UHFFFAOYSA-N", "YGPSJZOEDVAXAB-QMMMGPOBSA-N") precursor_type_table <- FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency", number_processing_threads = 1)address_input_msp <- system.file("extdata", package = "IDSL.FSA") MSPfile_vector <- c("Kynurenine_Kynurenic_acid.msp") libFSdb <- msp2FSdb(path = address_input_msp, MSPfile_vector) ## InChIKeyVector <- c("HCZHHEIFKROPDY-UHFFFAOYSA-N", "YGPSJZOEDVAXAB-QMMMGPOBSA-N") precursor_type_table <- FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency", number_processing_threads = 1)
This function converts .mgf (Mascot generic format) files into the .msp (mass spectra) format.
mgf2msp(path, MGFfile = "")mgf2msp(path, MGFfile = "")
path |
address of the .mgf file. |
MGFfile |
name of the file with the .mgf extension. |
The .msp files are saved in the same location.
temp_wd <- tempdir() # just a temporary folder path_extdata <- system.file("extdata", package = "IDSL.FSA") MGFfile <- "Training_000.mgf" file.copy(from = paste0(path_extdata, "/", MGFfile), to = temp_wd) mgf2msp(path = temp_wd, MGFfile)temp_wd <- tempdir() # just a temporary folder path_extdata <- system.file("extdata", package = "IDSL.FSA") MGFfile <- "Training_000.mgf" file.copy(from = paste0(path_extdata, "/", MGFfile), to = temp_wd) mgf2msp(path = temp_wd, MGFfile)
This function converts .msp (mass spectra format) files into a readable R object.
msp2FSdb(path, MSPfile_vector = "", massIntegrationWindow = 0, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)msp2FSdb(path, MSPfile_vector = "", massIntegrationWindow = 0, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)
path |
Address of .msp file(s) |
MSPfile_vector |
A vector of names of .msp files or one .msp file name. |
massIntegrationWindow |
Mass window in Da to integrate adjacent peaks in the fragmentation spectra |
allowedNominalMass |
c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis. |
allowedWeightedSpectralEntropy |
c(TRUE, FALSE). Weighted entropy to transform low abundant signals prior to calculating entropy similarity score. Please see the reference for details on wight transformation. |
noiseRemovalRatio |
noise removal ratio ([0 - 1])relative to the basepeak to measure entropy similarity score. |
number_processing_threads |
Number of processing threads for multi-threaded processing |
logFSdb |
Parameters used to create the FSDB object |
PrecursorMZ |
A vector of precursor m/z values |
Precursor Type |
A vector of precursor adduct types |
Retention Time |
A vector of retention time values |
Num Peaks |
A vector of num peaks values indicating number of ions for each fragment spectra |
Spectral Entropy |
A vector of spectral entropy values |
FragmentList |
A list of fragment ions |
MSPLibraryParameters |
A dataframe of tabulated headers and their values for each msp block |
This function was designed not only to achieve the fastest computational speed; but also can standardize .msp files that were generated by inconsistent settings.
Li, Y., Kind, T., Folz, J., Vaniya, A., Mehta, S.S. and Fiehn, O. (2021). Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nature methods, 18(12), 1524-1531, doi:10.1038/s41592-021-01331-z
path_extdata <- system.file("extdata", package = "IDSL.FSA") MSPfile <- c("Kynurenine_Kynurenic_acid.msp") sampleFSdb <- msp2FSdb(path = path_extdata, MSPfile)path_extdata <- system.file("extdata", package = "IDSL.FSA") MSPfile <- c("Kynurenine_Kynurenic_acid.msp") sampleFSdb <- msp2FSdb(path = path_extdata, MSPfile)
This function creates an aligned table from the spectra in the .msp file
msp2TrainingMatrix(path, MSPfile = "", minDetectionFreq = 1, selectedFSdbIDs = NULL, dimension = "wide", massAccuracy = 0.01, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)msp2TrainingMatrix(path, MSPfile = "", minDetectionFreq = 1, selectedFSdbIDs = NULL, dimension = "wide", massAccuracy = 0.01, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)
path |
Address of .msp file or an FSDB |
MSPfile |
A .msp file name or FSDB in .Rdata format |
minDetectionFreq |
A minimum detection frequency for an ion across the entire spectra |
selectedFSdbIDs |
selected MSP block/FSDB IDs to limit the screening to specific ion blocks |
dimension |
c("wide", "long"). *wide* or *long* alignment matrix output |
massAccuracy |
A mass accuracy (Da) |
allowedNominalMass |
c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis. |
allowedWeightedSpectralEntropy |
c(TRUE, FALSE). Weighted entropy to transform low abundant signals prior to calculating entropy similarity score. Please see the reference for details on wight transformation. |
noiseRemovalRatio |
noise removal ratio ([0 - 1])relative to the basepeak to measure entropy similarity score. |
number_processing_threads |
Number of processing threads for multi-threaded processing |
A FSDB file (.Rdata) and aligned spectra table (.csv) are stored in the same directory.
temp_wd <- tempdir() # just a temporary folder path_extdata <- system.file("extdata", package = "IDSL.FSA") MSPfile <- "Kynurenine_Kynurenic_acid.msp" file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd) msp2TrainingMatrix(path = temp_wd, MSPfile, minDetectionFreq = 1)temp_wd <- tempdir() # just a temporary folder path_extdata <- system.file("extdata", package = "IDSL.FSA") MSPfile <- "Kynurenine_Kynurenic_acid.msp" file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd) msp2TrainingMatrix(path = temp_wd, MSPfile, minDetectionFreq = 1)
This function separates the positive and negative MSP blocks.
mspPosNegSplitter(path, MSPfile = "", number_processing_threads = 1)mspPosNegSplitter(path, MSPfile = "", number_processing_threads = 1)
path |
address of the .msp file. |
MSPfile |
name of the file with the .msp extension. |
number_processing_threads |
Number of processing threads for multi-threaded processing |
The .msp files are saved in the same location with '_Neg.msp' and '_Pos.msp' extensions.
temp_wd <- tempdir() # just a temporary folder path_extdata <- system.file("extdata", package = "IDSL.FSA") MSPfile <- "Kynurenine_Kynurenic_acid.msp" file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd) mspPosNegSplitter(temp_wd, MSPfile)temp_wd <- tempdir() # just a temporary folder path_extdata <- system.file("extdata", package = "IDSL.FSA") MSPfile <- "Kynurenine_Kynurenic_acid.msp" file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd) mspPosNegSplitter(temp_wd, MSPfile)
This function plots spectra figures from FSdb objects generated using the 'msp2FSdb' function.
plotFSdb2SpectraCore(FSdb, index)plotFSdb2SpectraCore(FSdb, index)
FSdb |
FSdb |
index |
index |
spectra_figure object
## To create the FSdb object temp_wd <- tempdir() # just a temporary folder path_extdata <- system.file("extdata", package = "IDSL.FSA") MSPfile <- c("Kynurenine_Kynurenic_acid.msp") file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd) FSdb <- msp2FSdb(path = temp_wd, MSPfile) ## To plot spectra index <- 1 plotFSdb2SpectraCore(FSdb, index)## To create the FSdb object temp_wd <- tempdir() # just a temporary folder path_extdata <- system.file("extdata", package = "IDSL.FSA") MSPfile <- c("Kynurenine_Kynurenic_acid.msp") file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd) FSdb <- msp2FSdb(path = temp_wd, MSPfile) ## To plot spectra index <- 1 plotFSdb2SpectraCore(FSdb, index)
This function creates 1:1 mixed AB spectra for spectral entropy calculation
spectra_1A1B_mixer(PEAK_A, PEAK_B, massError = 0, allowedNominalMass = FALSE)spectra_1A1B_mixer(PEAK_A, PEAK_B, massError = 0, allowedNominalMass = FALSE)
PEAK_A |
A matrix (m/z, int) of fragmentation spectra |
PEAK_B |
A matrix (m/z, int) of fragmentation spectra |
massError |
Mass accuracy in Da |
allowedNominalMass |
c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis. |
A matrix of 1:1 mixing spectra. First and second columns represent intensity-weighted average m/z and cumulated intensity, respectively.
This function integrates individual m/z peaks from multiple chromatogram scans (spectra) into summed m/z peaks using a mass accuracy or nominal masses.
spectra_integrator(stackedSpectra, massError = 0, allowedNominalMass = FALSE)spectra_integrator(stackedSpectra, massError = 0, allowedNominalMass = FALSE)
stackedSpectra |
A matrix of two columns of the stacked spectra. First and second columns should represent m/z and intensity, respectively. |
massError |
Mass accuracy in Da |
allowedNominalMass |
c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis. |
A matrix of integrated spectra. First and second columns represent intensity-weighted average m/z and cumulated intensity, respectively.
data(stackedSpectra) massError <- 0.005 # Da Integrated_spectra <- spectra_integrator(stackedSpectra[, 1:2], massError)data(stackedSpectra) massError <- 0.005 # Da Integrated_spectra <- spectra_integrator(stackedSpectra[, 1:2], massError)
This function can detect m/z peaks that are related to each other across selected spectra lists.
spectra_ion_filter(spectraList, indexSpectraList = length(spectraList), massError, minPercentageDetectedScans = 10, rsdCutoff = 0, pearsonRHOthreshold = NA)spectra_ion_filter(spectraList, indexSpectraList = length(spectraList), massError, minPercentageDetectedScans = 10, rsdCutoff = 0, pearsonRHOthreshold = NA)
spectraList |
a list of matrices of m/z and intensity values for each chromatogram scan |
indexSpectraList |
a vector of spectra indices for the analysis. This vector should have at least 3 elements to run this function. |
massError |
required mass error for m/z values |
rsdCutoff |
Relative standard deviations (in percent) to remove constant peaks (usually noisy peaks) |
minPercentageDetectedScans |
Minimum percentage of detected scans for an m/z peak |
pearsonRHOthreshold |
A threshold for pairwise Pearson's correlation coefficient across the selected spectra lists. This feature is recommended to find co-occurring peaks within a chromatographic peak. This feature may be used to eliminate instrument noises from MS2 data channels within an MS1 chromatographic peak for DDA analysis. |
A matrix of m/z and cumulated intensities across the 'indexSpectraList' spectra
This module calculates spectral entropy for a fragmentation pattern using a method described by the reference paper.
spectral_entropy_calculator(FragmentList, allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01)spectral_entropy_calculator(FragmentList, allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01)
FragmentList |
A matrix (m/z, int) of fragmentation pattern after intensity adjustment |
allowedWeightedSpectralEntropy |
c(TRUE, FALSE). Weighted entropy to transform low abundant signals prior to calculating entropy similarity score. Please see the reference for details on weight transformation. |
noiseRemovalRatio |
noise removal ratio ([0 - 1])relative to the basepeak to measure entropy similarity score. |
spectralEntropy |
spectral entropy |
NumPeaks |
NumPeaks |
FragmentList |
A matrix of two-columns after intensity normalization relative to summation of intensities AND entropy weight transformation when is selected. |
noise removal on intensities should be performed prior to feeding to this function
Li, Y., Kind, T., Folz, J., Vaniya, A., Mehta, S.S. and Fiehn, O. (2021). Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nature methods, 18(12), 1524-1531, doi:10.1038/s41592-021-01331-z
FragmentList <- cbind(seq(50, 600, length.out = 10), seq(10, 90, length.out = 10)) SE <- spectral_entropy_calculator(FragmentList) print(SE[[1]])FragmentList <- cbind(seq(50, 600, length.out = 10), seq(10, 90, length.out = 10)) SE <- spectral_entropy_calculator(FragmentList) print(SE[[1]])
This module measures similarity of spectral entropies between 'PEAK_A' and 'PEAK_B' fragment spectra using a method described by the reference paper.
spectral_entropy_similarity_score(PEAK_A, S_PEAK_A, PEAK_B, S_PEAK_B, massError, allowedNominalMass = FALSE)spectral_entropy_similarity_score(PEAK_A, S_PEAK_A, PEAK_B, S_PEAK_B, massError, allowedNominalMass = FALSE)
PEAK_A |
A matrix (m/z, int) of fragmentation spectra |
S_PEAK_A |
Spectral entropy of PEAK_A |
PEAK_B |
A matrix (m/z, int) of fragmentation spectra |
S_PEAK_B |
Spectral entropy of PEAK_B |
massError |
Mass accuracy in Da |
allowedNominalMass |
c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis. |
spectral entropy similarity between 0 - 1
Li, Y., Kind, T., Folz, J., Vaniya, A., Mehta, S.S. and Fiehn, O. (2021). Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nature methods, 18(12), 1524-1531, doi:10.1038/s41592-021-01331-z
allowedWeightedSpectralEntropy <- TRUE ## A <- cbind(seq(50, 160, length.out = 10), seq(10, 90, length.out = 10)) sA <- spectral_entropy_calculator(A, allowedWeightedSpectralEntropy) S_PEAK_A <- sA[[1]] PEAK_A <- sA[[3]] ## B <- cbind(seq(50, 160, length.out = 10), seq(50, 60, length.out = 10)) sB <- spectral_entropy_calculator(A, allowedWeightedSpectralEntropy) S_PEAK_B <- sB[[1]] PEAK_B <- sB[[3]] ## allowedNominalMass = TRUE entropyScore <- spectral_entropy_similarity_score(PEAK_A, S_PEAK_A, PEAK_B, S_PEAK_B, allowedNominalMass)allowedWeightedSpectralEntropy <- TRUE ## A <- cbind(seq(50, 160, length.out = 10), seq(10, 90, length.out = 10)) sA <- spectral_entropy_calculator(A, allowedWeightedSpectralEntropy) S_PEAK_A <- sA[[1]] PEAK_A <- sA[[3]] ## B <- cbind(seq(50, 160, length.out = 10), seq(50, 60, length.out = 10)) sB <- spectral_entropy_calculator(A, allowedWeightedSpectralEntropy) S_PEAK_B <- sB[[1]] PEAK_B <- sB[[3]] ## allowedNominalMass = TRUE entropyScore <- spectral_entropy_similarity_score(PEAK_A, S_PEAK_A, PEAK_B, S_PEAK_B, allowedNominalMass)
A data to test the 'spectra_integrator' function.
data("stackedSpectra")data("stackedSpectra")
mza numeric vector of m/z values
inta numeric vector of intensities
scan_numbera numeric vector of chromatogram scan numbers
The 'scan_number' column is not necessary to test the 'spectra_integrator' function.
data(stackedSpectra)data(stackedSpectra)
This function sorts 84 elements in the periodic table for molecular formula deconvolution.
UFSA_element_sorter()UFSA_element_sorter()
A string vector of elements
Elements <- UFSA_element_sorter()Elements <- UFSA_element_sorter()
This function convert a molecular formulas into a numerical vector
UFSA_formula_vector_generator(molecular_formula, Elements, LElements = length(Elements), allowedRedundantElements = FALSE)UFSA_formula_vector_generator(molecular_formula, Elements, LElements = length(Elements), allowedRedundantElements = FALSE)
molecular_formula |
molecular formula |
Elements |
a string vector of elements. This value must be driven from the 'element_sorter' function. |
LElements |
number of elements. To speed up loop calculations, consider calculating the number of elements outside of the loop. |
allowedRedundantElements |
'TRUE' should be used to deconvolute molecular formulas with redundant elements (e.g. CO2CH3O), and 'FALSE' should be used to skip such complex molecular formulas.(default value) |
a numerical vector for the molecular formula. This function returns a vector of -Inf values when the molecular formula has elements not listed in the 'Elements' string vector.
molecular_formula <- "C12H2Br5Cl3O" Elements <- UFSA_element_sorter() mol_vec <- UFSA_formula_vector_generator(molecular_formula, Elements) ## regenerated_molecular_formula <- UFSA_hill_molecular_formula_printer(Elements, mol_vec)molecular_formula <- "C12H2Br5Cl3O" Elements <- UFSA_element_sorter() mol_vec <- UFSA_formula_vector_generator(molecular_formula, Elements) ## regenerated_molecular_formula <- UFSA_hill_molecular_formula_printer(Elements, mol_vec)
This function produces molecular formulas from a list numerical vectors in the Hill notation system
UFSA_hill_molecular_formula_printer(MolVecMat, Elements, LElements = length(Elements))UFSA_hill_molecular_formula_printer(MolVecMat, Elements, LElements = length(Elements))
MolVecMat |
A matrix of numerical vectors of molecular formulas in each row. |
Elements |
A vector string of the used elements. |
LElements |
LElements |
A vector of molecular formulas
Elements <- c("C", "H", "O", "N", "Br", "Cl") MoleFormVec1 <- c(2, 6, 1, 0, 0, 0) # C2H6O MoleFormVec2 <- c(8, 10, 2, 4, 0 ,0) # C8H10N4O2 MoleFormVec3 <- c(12, 2, 1, 0, 5, 3) # C12H2Br5Cl3O MolVecMat <- rbind(MoleFormVec1, MoleFormVec2, MoleFormVec3) H_MolF <- UFSA_hill_molecular_formula_printer(MolVecMat, Elements)Elements <- c("C", "H", "O", "N", "Br", "Cl") MoleFormVec1 <- c(2, 6, 1, 0, 0, 0) # C2H6O MoleFormVec2 <- c(8, 10, 2, 4, 0 ,0) # C8H10N4O2 MoleFormVec3 <- c(12, 2, 1, 0, 5, 3) # C12H2Br5Cl3O MolVecMat <- rbind(MoleFormVec1, MoleFormVec2, MoleFormVec3) H_MolF <- UFSA_hill_molecular_formula_printer(MolVecMat, Elements)
This function deconvolutes ionization pathways into a coefficient and a numerical vector to simplify prediction ionization pathways.
UFSA_ionization_pathway_deconvoluter(IonPathways, Elements, LElements = length(Elements))UFSA_ionization_pathway_deconvoluter(IonPathways, Elements, LElements = length(Elements))
IonPathways |
A vector of ionization pathways. Pathways should be like [Coeff*M+ADD1-DED1+...] where "Coeff" should be an integer between 1-9 and ADD1 and DED1 may be ionization pathways. ex: 'IonPathways <- c("[M]+", "[M+H]+", "[2M-Cl]-", "[3M+CO2-H2O+Na-KO2+HCl-NH4]-")' |
Elements |
A vector string of the used elements |
LElements |
Counts of elements |
A list of adduct calculation values for each ionization pathway.
Elements <- UFSA_element_sorter() IonPathways <- c("[M]+", "[M+H]+", "[2M-Cl]-", "[3M+CO2-H2O+Na-KO2+HCl-NH4]-") Ion_DC <- UFSA_ionization_pathway_deconvoluter(IonPathways, Elements)Elements <- UFSA_element_sorter() IonPathways <- c("[M]+", "[M+H]+", "[2M-Cl]-", "[3M+CO2-H2O+Na-KO2+HCl-NH4]-") Ion_DC <- UFSA_ionization_pathway_deconvoluter(IonPathways, Elements)
Precursor type corrector from MSP files. This function initially attempts to standardize the precursor types to be consistent with the 'ionization_pathway_deconvoluter' module of the IDSL.SUFA package.
UFSA_precursorType_corrector(precursorType, ionMode = NULL)UFSA_precursorType_corrector(precursorType, ionMode = NULL)
precursorType |
precursorType |
ionMode |
ionMode |
correctedPrecursorType
uncorrectedPrecursorType <- c("[M]+", "[M+H]+", "[2M-Cl]-", "[3M+COO-H2O+Na-KO2+HCl-NH4]-") precursorType <- UFSA_precursorType_corrector(uncorrectedPrecursorType, ionMode = NULL)uncorrectedPrecursorType <- c("[M]+", "[M+H]+", "[2M-Cl]-", "[3M+COO-H2O+Na-KO2+HCl-NH4]-") precursorType <- UFSA_precursorType_corrector(uncorrectedPrecursorType, ionMode = NULL)
This function creates .msp files from an organized spreadsheet of fragmentation data.
xlsx2msp(path, xlsxFileName = "", number_processing_threads = 1)xlsx2msp(path, xlsxFileName = "", number_processing_threads = 1)
path |
address of the spreadsheet |
xlsxFileName |
name of the file with the .xlsx extension. |
number_processing_threads |
Number of processing threads for multi-threaded processing |
The .msp files are saved in the same location.
The spreadsheet should have only one column for the following headers (case-sensitive): c('ID', 'mz_fragment', 'int_fragment', 'Name')
temp_wd <- tempdir() # just a temporary folder path_extdata <- system.file("extdata", package = "IDSL.FSA") xlsxFileName <- "PFAS_MSe.xlsx" file.copy(from = paste0(path_extdata, "/", xlsxFileName), to = temp_wd) xlsx2msp(temp_wd, xlsxFileName)temp_wd <- tempdir() # just a temporary folder path_extdata <- system.file("extdata", package = "IDSL.FSA") xlsxFileName <- "PFAS_MSe.xlsx" file.copy(from = paste0(path_extdata, "/", xlsxFileName), to = temp_wd) xlsx2msp(temp_wd, xlsxFileName)