Package 'IDSL.FSA' reference manual

Title:	Fragmentation Spectra Analysis (FSA)
Description:	The 'IDSL.FSA' package was designed to annotate standard .msp (mass spectra format) and .mgf (Mascot generic format) files using mass spectral entropy similarity, dot product (cosine) similarity, and normalized Euclidean mass error (NEME) followed by intelligent pre-filtering steps for rapid spectra searches. 'IDSL.FSA' also provides a number of modules to convert and manipulate .msp and .mgf files. The 'IDSL.FSA' workflow was integrated in the 'IDSL.CSA' and 'IDSL.NPA' packages introduced in <doi:10.1021/acs.analchem.3c00376>.
Authors:	Sadjad Fakouri-Baygi [aut] , Dinesh Barupal [cre, aut]
Maintainer:	Dinesh Barupal <[email protected]>
License:	MIT + file LICENSE
Version:	1.2
Built:	2026-05-12 06:12:33 UTC
Source:	https://github.com/idslme/idsl.fsa

Fragmentation Spectra Annotator

Description

This module annotates fragmentation spectra from .MSP files.

Usage

fragmentation_spectra_annotator(path, MSPfile = "", libFSdb,
libFSdbIDlist, targetedPrecursorType = NA, ratio2basePeak4nSpectraMarkers = 0,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01, roundingDigitPrefiltering = 1, minMatchedNumPeaks = 1,
massError = 0, maxNEME = 0, minIonRangeDifference = 0, minCosineSimilarity,
minEntropySimilarity, minRatioMatchedNspectraMarkers,
spectralEntropyDeviationPrefiltering, massErrorPrecursor = NA, RTtolerance = NA,
exportSpectraParameters = NULL, number_processing_threads = 1)
fragmentation_spectra_annotator(path, MSPfile = "", libFSdb,
libFSdbIDlist, targetedPrecursorType = NA, ratio2basePeak4nSpectraMarkers = 0,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01, roundingDigitPrefiltering = 1, minMatchedNumPeaks = 1,
massError = 0, maxNEME = 0, minIonRangeDifference = 0, minCosineSimilarity,
minEntropySimilarity, minRatioMatchedNspectraMarkers,
spectralEntropyDeviationPrefiltering, massErrorPrecursor = NA, RTtolerance = NA,
exportSpectraParameters = NULL, number_processing_threads = 1)

Arguments

path

Address of .msp file(s)

MSPfile

name of the .msp file

libFSdb

A converted .msp library reference file using the 'msp2FSdb' module which is an FSDB produced by the IDSL.FSA package.

libFSdbIDlist

Ion markers object from the FSDB reference

targetedPrecursorType

A vector of targeted precursor types

ratio2basePeak4nSpectraMarkers

Ratio of peaks in fragmentation spectra to the basepeak to calculate minimum qualified number of matched abundant peaks

allowedNominalMass

c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis.

allowedWeightedSpectralEntropy

c(TRUE, FALSE). Weighted entropy to transform low abundant signals prior to calculating entropy similarity score. Please see the reference for details on wight transformation.

noiseRemovalRatio

noise removal ratio ([0 - 1])relative to the basepeak to measure entropy similarity score.

roundingDigitPrefiltering

Level of pre-filtering

minMatchedNumPeaks

Minimum matched number of peaks

massError

Mass accuracy in Da

maxNEME

Maximum value for Normalized Euclidean Mass Error (NEME) in mDa

minIonRangeDifference

Minimum distance (Da) between lowest and highest matched m/z to prevent matching only isotopic envelopes

minCosineSimilarity

Minimum cosine similarity score

minEntropySimilarity

Minimum entropy similarity score

minRatioMatchedNspectraMarkers

Minimum percentage of detection of abundant library peaks in percentage

spectralEntropyDeviationPrefiltering

Spectral entropy deviation for pre-filtering

massErrorPrecursor

Mass accuracy (Da) to find precursor m/z in .msp files

RTtolerance

Retention time tolerance (min)

exportSpectraParameters

Parameters for export MS/MS match figures

number_processing_threads

Number of processing threads for multi-threaded processing

Value

A dataframe of matched spectra

aggregation method for FSA

Description

This module is to optimize the 'indexVec' variable by removing elements that have redundant 'idVec' numbers.

Usage

FSA_aggregate(idVec, variableVec, indexVec, targetVar)
FSA_aggregate(idVec, variableVec, indexVec, targetVar)

Arguments

idVec

a vector of id numbers. Repeated id numbers are allowed

variableVec

a vector of variable of the interest such as RT, m/z, etc.

indexVec

a vector of indices

targetVar

the targeted value in 'variableVec'

Value

a clean indexVec after removing redundant 'idVec'.

FSA annotation text repel

Description

This function is to set annotations on the spectra plots with a reasonable distance to avoid overlying annotations.

Usage

FSA_annotation_text_repel(FSAspectra, nGridX, nGridY)
FSA_annotation_text_repel(FSAspectra, nGridX, nGridY)

Arguments

FSAspectra

FSAspectra

nGridX

number of grids on the x-axis

nGridY

number of grids on the y-axis

Value

labels

FSA_dir.create

Description

A module to create directories after removing the existing directory with the same name to prevent data interferences.

Usage

FSA_dir.create(folder, allowedUnlink = FALSE)
FSA_dir.create(folder, allowedUnlink = FALSE)

Arguments

folder

folder

allowedUnlink

allowedUnlink

Value

when the original folder was deleted and recreated successfully, 'TRUE' is returned by this function.

FSA FSdb xlsx Analyzer

Description

This function processes the spreadsheet of the 'FSDB' tab to ensure the parameter inputs are consistent with the requirements of the IDSL.FSA pipeline.

Usage

FSA_FSdb_xlsxAnalyzer(spreadsheet)
FSA_FSdb_xlsxAnalyzer(spreadsheet)

Arguments

spreadsheet

FSA spreadsheet

Value

This function returns the FSDB parameters to feed the 'FSdb_file_generator' function.

FSA loadRdata

Description

This function loads .Rdata files into a variable.

Usage

FSA_loadRdata(fileName)
FSA_loadRdata(fileName)

Arguments

fileName

is an '.Rdata' file.

Value

The called variable into the new assigned variable name.

FSA Locate regex

Description

Locate indices of the pattern in the string

Usage

FSA_locate_regex(string, pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE,
useBytes = FALSE)
FSA_locate_regex(string, pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE,
useBytes = FALSE)

Arguments

string

a string as character

pattern

a pattern to screen

ignore.case

ignore.case

perl

perl

fixed

fixed

useBytes

useBytes

Details

This function returns 'NULL' when no matches are detected for the pattern.

Value

A 2-column matrix of location indices. The first and second columns represent start and end positions, respectively.

Examples

pattern <- "Cl"
string <- "NaCl.5HCl"
Location_Cl <- FSA_locate_regex(string, pattern)
pattern <- "Cl"
string <- "NaCl.5HCl"
Location_Cl <- FSA_locate_regex(string, pattern)

FSA logRecorder

Description

FSA_logRecorder

Usage

FSA_logRecorder(messageQuote, allowedPrinting = TRUE)
FSA_logRecorder(messageQuote, allowedPrinting = TRUE)

Arguments

messageQuote

messageQuote

allowedPrinting

allowedPrinting

Value

a line of communication messages is exported to the console and the log .txt file.

FSA message

Description

FSA_message

Usage

FSA_message(messageQuote, failedMessage= TRUE)
FSA_message(messageQuote, failedMessage= TRUE)

Arguments

messageQuote

messageQuote

failedMessage

failedMessage

Value

a line of communication messages is exported to the console.

FSA msp annotator

Description

This function arranges the parameters for the annotation process

Usage

FSA_msp_annotator(PARAM_SPEC, libFSdb, address_input_msp, output_path,
allowedVerbose = TRUE)
FSA_msp_annotator(PARAM_SPEC, libFSdb, address_input_msp, output_path,
allowedVerbose = TRUE)

Arguments

PARAM_SPEC

a parameter driven from the 'FSA_SpectraSimilarity_xlsxAnalyzer' module.

libFSdb

a converted .msp library reference files (FSDB) using the 'msp2FSdb' module

address_input_msp

address of the .msp files

output_path

output path

allowedVerbose

c(TRUE, FALSE). A 'TRUE' allowedVerbose provides messages about the flow of the function.

Value

A dataframe of matched annotated spectra stored in the output directory.

FSA Cytoscape Files Generator

Description

This function generates necessary files from pairwise MSP blocks analysis to create Cytoscape networks.

Usage

FSA_msp2Cytoscape(path, MSPfile = "", mspVariableVector = NULL,
mspNodeID = NULL, massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01, number_processing_threads = 1)
FSA_msp2Cytoscape(path, MSPfile = "", mspVariableVector = NULL,
mspNodeID = NULL, massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01, number_processing_threads = 1)

Arguments

path

address of .msp file or an FSDB

MSPfile

name of .msp file

mspVariableVector

a vector of msp variables

mspNodeID

msp Node ID which is the ID that is required for the ‘specsim’ ID generation

massError

Mass accuracy in Da

RTtolerance

Retention time tolerance (min) to match msp blocks. Select NA to ignore retention time match. This option is so helpful to find co-occurring compounds.

minEntropySimilarity

Minimum entropy similarity score

allowedNominalMass

c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis.

allowedWeightedSpectralEntropy

c(TRUE, FALSE). Weighted entropy to measure entropy similarity score.

noiseRemovalRatio

noise removal ratio relative to the basepeak to measure entropy similarity score (in percent)

number_processing_threads

Number of processing threads for multi-threaded processing

Value

node_attributes_dataFrame

node_attributes dataframe. A string to store using 'writeTable' function of R after a tab separation.

edge_dataFrame

edge dataframe. A string to store using the 'writeTable' function of R after a tab separation.

correlation_network

correlation_network dataframe. A string to store using the 'writeTable' function of R after a tab separation.

FSDB

Fragmentation spectra database (FSDB) object

exclusionMSPnoideid

A vector of MSP node ids which can be excluded to create a library of unique MSP blocks.

filteredNetworkSIF

A filtered network in the cytoscape SIF format that does not have redundant MSP blocks within a RT window.

References

Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B. and Ideker, T., (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research, 13(11), 2498-2504, doi:10.1101/gr.1239303

Examples


path_extdata <- system.file("extdata", package = "IDSL.FSA")
mspFileName <- "Kynurenine_Kynurenic_acid.msp"
##
listCytoscape <- FSA_msp2Cytoscape(path = path_extdata,
MSPfile = mspFileName, mspVariableVector = c("Name", "Collision_energy"),
mspNodeID = NULL, massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0,
noiseRemovalRatio = 0, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, number_processing_threads = 1)
##
FSDB <- listCytoscape[["FSDB"]]
##
temp_wd <- tempdir() # just a temporary folder to save results
##
write.table(listCytoscape[["node_attributes_dataFrame"]], paste0(temp_wd,
"/node_attributes_dataFrame.txt"), quote = FALSE, sep = "\t", row.names = FALSE,
col.names = FALSE)
##
write.table(listCytoscape[["correlation_network"]], paste0(temp_wd,
"/correlation_network.sif"), quote = FALSE, sep = "\t", row.names = FALSE,
col.names = FALSE)
##
write.table(listCytoscape[["edge_dataFrame"]], paste0(temp_wd,
"/edge_dataFrame.txt"), quote = FALSE, sep = "\t", row.names = FALSE,
col.names = FALSE)
##

path_extdata <- system.file("extdata", package = "IDSL.FSA")
mspFileName <- "Kynurenine_Kynurenic_acid.msp"
##
listCytoscape <- FSA_msp2Cytoscape(path = path_extdata,
MSPfile = mspFileName, mspVariableVector = c("Name", "Collision_energy"),
mspNodeID = NULL, massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0,
noiseRemovalRatio = 0, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, number_processing_threads = 1)
##
FSDB <- listCytoscape[["FSDB"]]
##
temp_wd <- tempdir() # just a temporary folder to save results
##
write.table(listCytoscape[["node_attributes_dataFrame"]], paste0(temp_wd,
"/node_attributes_dataFrame.txt"), quote = FALSE, sep = "\t", row.names = FALSE,
col.names = FALSE)
##
write.table(listCytoscape[["correlation_network"]], paste0(temp_wd,
"/correlation_network.sif"), quote = FALSE, sep = "\t", row.names = FALSE,
col.names = FALSE)
##
write.table(listCytoscape[["edge_dataFrame"]], paste0(temp_wd,
"/edge_dataFrame.txt"), quote = FALSE, sep = "\t", row.names = FALSE,
col.names = FALSE)
##

plot FSdb to Spectra

Description

plot FSdb to Spectra

Usage

FSA_plotFSdb2Spectra(path, allowedUnlink = TRUE, annexName = "", FSdb,
selectedFSdbIDs = NULL, number_processing_threads = 1, allowedVerbose = TRUE)
FSA_plotFSdb2Spectra(path, allowedUnlink = TRUE, annexName = "", FSdb,
selectedFSdbIDs = NULL, number_processing_threads = 1, allowedVerbose = TRUE)

Arguments

path

Address of .msp file(s)

allowedUnlink

allowedUnlink

annexName

annexName

FSdb

FSdb

selectedFSdbIDs

selected FSdb IDs. When 'NULL', the entire FSDB blocks are plotted.

number_processing_threads

Number of processing threads for multi-threaded processing

allowedVerbose

c(TRUE, FALSE). A 'TRUE' allowedVerbose provides messages about the flow of the function.

Value

spectra_figure object

aggregate function for IDSL.FSA

Description

This module ensures that the 'aggregate' function of R returns a list type of data.

Usage

FSA_R.aggregate(FSAvec)
FSA_R.aggregate(FSAvec)

Arguments

FSAvec

a vector of data

Value

listIDFSAvec

FSA Spectra Marker Generator

Description

This function generates spectra markers

Usage

FSA_spectra_marker_generator(FSdb, ratio2basePeak4nSpectraMarkers = 0,
aggregationLevel = NA)
FSA_spectra_marker_generator(FSdb, ratio2basePeak4nSpectraMarkers = 0,
aggregationLevel = NA)

Arguments

FSdb

FSdb object from the 'msp2FSdb' module

ratio2basePeak4nSpectraMarkers

Ratio of peaks in fragmentation spectra to the basepeak to calculate minimum qualified number of matched abundant peaks

aggregationLevel

c(NA, 0, 1, 2, 3). When 'NA', this function returns a matrix for the spectra markers. When integer numbers are used, the ion marker masses are grouped by a rounding digit equal to this number.

Value

spectraMarkerMass

a grouped or a matrix of ion marker masses corresponding to FSdb ids

nSpectraMarkers

number of spectra markers for each FSdb id

FSA SpectraSimilarity xlsx Analyzer

Description

This function processes the spreadsheet of the 'SpectraSimilarity' tab to ensure the parameter inputs are consistent with the requirements of the IDSL.FSA pipeline.

Usage

FSA_SpectraSimilarity_xlsxAnalyzer(spreadsheet)
FSA_SpectraSimilarity_xlsxAnalyzer(spreadsheet)

Arguments

spreadsheet

FSA spreadsheet

Value

This function returns the FSA SpectraSimilarity parameters to feed the 'FSA_msp_annotator' module.

FSA Unique MSP Block Tagger

Description

This function removes similar MSP blocks. This function aggregates MSP blocks based on the 'Name' values.

Usage

FSA_uniqueMSPblockTagger(path, MSPfile = "", aggregateBy = "Name",
massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75,
noiseRemovalRatio = 0.01, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, plotSpectra = FALSE,
number_processing_threads = 1)
FSA_uniqueMSPblockTagger(path, MSPfile = "", aggregateBy = "Name",
massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75,
noiseRemovalRatio = 0.01, allowedNominalMass = FALSE,
allowedWeightedSpectralEntropy = TRUE, plotSpectra = FALSE,
number_processing_threads = 1)

Arguments

path

Address of .msp file or an FSDB

MSPfile

name of .msp file

aggregateBy

a variable to aggregate the MSP blocks based on

massError

Mass accuracy in Da

RTtolerance

Retention time tolerance (min) to match msp blocks. Select NA to ignore retention time match.

minEntropySimilarity

Minimum entropy similarity score

noiseRemovalRatio

noise removal ratio relative to the basepeak to measure entropy similarity score (in percent)

allowedNominalMass

c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis.

allowedWeightedSpectralEntropy

c(TRUE, FALSE). Weighted entropy to measure entropy similarity score.

plotSpectra

c(TRUE, FALSE)

number_processing_threads

Number of processing threads for multi-threaded processing

Value

a list of similar MSP blocks is returned at the end and a subsetted .msp and FSDB files are saved in the 'path' directory.

FSA_uniqueMSPblockTaggerUntargeted

Description

FSA_uniqueMSPblockTaggerUntargeted

Usage

FSA_uniqueMSPblockTaggerUntargeted(path, MSPfile_vector,
minCSAdetectionFrequency = 20, minEntropySimilarity = 0.75, massError = 0.01,
massErrorPrecursor = 0.01, RTtolerance = 0.1, noiseRemovalRatio = 0.01,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
plotSpectra = FALSE, number_processing_threads = 1)
FSA_uniqueMSPblockTaggerUntargeted(path, MSPfile_vector,
minCSAdetectionFrequency = 20, minEntropySimilarity = 0.75, massError = 0.01,
massErrorPrecursor = 0.01, RTtolerance = 0.1, noiseRemovalRatio = 0.01,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
plotSpectra = FALSE, number_processing_threads = 1)

Arguments

path

Address of .msp file(s)

MSPfile_vector

A vector of names of .msp files or one .msp file name.

minCSAdetectionFrequency

minimum CSA detection frequency

minEntropySimilarity

minimum EntropySimilarity

massError

Mass accuracy in Da

massErrorPrecursor

Mass accuracy (Da) to find precursor m/z in .msp files

RTtolerance

Retention time tolerance (min)

noiseRemovalRatio

noise removal ratio ([0 - 1])relative to the basepeak to measure entropy similarity score.

allowedNominalMass

c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis.

allowedWeightedSpectralEntropy

c(TRUE, FALSE). Weighted entropy to transform low abundant signals prior to calculating entropy similarity score. Please see the reference for details on wight transformation.

plotSpectra

c(TRUE, FALSE)

number_processing_threads

Number of processing threads for multi-threaded processing

Value

uniqueMSPvariants

FSA workflow

Description

This function executes the FSA workflow.

Usage

FSA_workflow(spreadsheet)
FSA_workflow(spreadsheet)

Arguments

spreadsheet

FSA spreadsheet

Value

This function organizes the FSA file processing for better performance using the template spreadsheet.

FSA xlsx Analyzer

Description

This function processes the spreadsheet of the FSA parameters to ensure the parameter inputs are consistent with the requirements of the IDSL.FSA pipeline.

Usage

FSA_xlsxAnalyzer(spreadsheet)
FSA_xlsxAnalyzer(spreadsheet)

Arguments

spreadsheet

FSA spreadsheet

Value

This function returns the FSA parameters to feed the FSA_workflow function.

FSdb file generator

Description

This function generates FSDB objects

Usage

FSdb_file_generator(PARAM_FSdb, output_path = NULL)
FSdb_file_generator(PARAM_FSdb, output_path = NULL)

Arguments

PARAM_FSdb

'PARAM_FSdb' parameters obtained by the 'FSA_FSdb_xlsxAnalyzer' function.

output_path

output_path

Value

An FSDB object

FSdb subsetter

Description

FSdb subsetter

Usage

FSdb_subsetter(FSdb, inclusionIDs = NULL, exclusionIDs = NULL)
FSdb_subsetter(FSdb, inclusionIDs = NULL, exclusionIDs = NULL)

Arguments

FSdb

FSdb

inclusionIDs

inclusionIDs

exclusionIDs

exclusionIDs

Value

subsetted FSdb

Fragmentation Spectra DataBase (FSDB) to MSP

Description

This function converts FSDB R objects into .msp standard files.

Usage

FSdb2msp(path, FSdbFileName = "", UnweightMSP = FALSE,
number_processing_threads = 1)
FSdb2msp(path, FSdbFileName = "", UnweightMSP = FALSE,
number_processing_threads = 1)

Arguments

path

address of .msp file(s)

FSdbFileName

name of the FSDB library name including '.Rdata' extension

UnweightMSP

to unweight fragmentation patterns

number_processing_threads

Number of processing threads for multi-threaded processing

Value

The .msp file is stored in the same folder

FSdb2PeakXcolSubsetter

Description

FSdb2PeakXcolSubsetter

Usage

FSdb2PeakXcolSubsetter(FSdb_address, peak_alignment_folder,
metavariable = "idsl.ipa_collective_peakids", number_processing_threads = 1)
FSdb2PeakXcolSubsetter(FSdb_address, peak_alignment_folder,
metavariable = "idsl.ipa_collective_peakids", number_processing_threads = 1)

Arguments

FSdb_address

FSdb_address

peak_alignment_folder

peak_alignment_folder

metavariable

metavariable

number_processing_threads

Number of processing threads for multi-threaded processing

Value

peakXcol

peakXcol

peak_height

peak_height

peak_area

peak_area

peak_R13C

peak_R13C

Precursor Types from Fragmentation Spectra DataBase (FSDB)

Description

This function finds potential ionization pathways for molecular formulas using a vector of InChIKey values from an FSDB. This function only searches for the first 14 InChIKey letters; and therefore, may result with multiple potential precursor types.

Usage

FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency",
number_processing_threads = 1)
FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency",
number_processing_threads = 1)

Arguments

InChIKeyVector

A vector of InChIKey values. This value may contain whole InChIKey strings or first 14 InChIKey letters.

libFSdb

A converted MSP library reference file using the 'msp2FSdb' module which is an FSDB produced by the IDSL.FSA package.

tableIndicator

c("Frequency", "PrecursorMZ"). To show frequency or a median of 'PrecursorMZ' values in the output dataframe for each precursor type.

number_processing_threads

Number of processing threads for multi-threaded processing

Value

A matrix of frequency for each InChIKey in the FSDB. The matrix column headers represent precursor types.

Examples

address_input_msp <- system.file("extdata", package = "IDSL.FSA")
MSPfile_vector <- c("Kynurenine_Kynurenic_acid.msp")
libFSdb <- msp2FSdb(path = address_input_msp, MSPfile_vector)
##
InChIKeyVector <- c("HCZHHEIFKROPDY-UHFFFAOYSA-N", "YGPSJZOEDVAXAB-QMMMGPOBSA-N")
precursor_type_table <- FSdb2precursorType(InChIKeyVector, libFSdb,
tableIndicator = "Frequency", number_processing_threads = 1)
address_input_msp <- system.file("extdata", package = "IDSL.FSA")
MSPfile_vector <- c("Kynurenine_Kynurenic_acid.msp")
libFSdb <- msp2FSdb(path = address_input_msp, MSPfile_vector)
##
InChIKeyVector <- c("HCZHHEIFKROPDY-UHFFFAOYSA-N", "YGPSJZOEDVAXAB-QMMMGPOBSA-N")
precursor_type_table <- FSdb2precursorType(InChIKeyVector, libFSdb,
tableIndicator = "Frequency", number_processing_threads = 1)

MGF to MSP

Description

This function converts .mgf (Mascot generic format) files into the .msp (mass spectra) format.

Usage

mgf2msp(path, MGFfile = "")
mgf2msp(path, MGFfile = "")

Arguments

path

address of the .mgf file.

MGFfile

name of the file with the .mgf extension.

Value

The .msp files are saved in the same location.

Examples


temp_wd <- tempdir() # just a temporary folder
path_extdata <- system.file("extdata", package = "IDSL.FSA")
MGFfile <- "Training_000.mgf"
file.copy(from = paste0(path_extdata, "/", MGFfile), to = temp_wd)
mgf2msp(path = temp_wd, MGFfile)

temp_wd <- tempdir() # just a temporary folder
path_extdata <- system.file("extdata", package = "IDSL.FSA")
MGFfile <- "Training_000.mgf"
file.copy(from = paste0(path_extdata, "/", MGFfile), to = temp_wd)
mgf2msp(path = temp_wd, MGFfile)

msp to Fragmentation Spectra DataBase (FSDB)

Description

This function converts .msp (mass spectra format) files into a readable R object.

Usage

msp2FSdb(path, MSPfile_vector = "", massIntegrationWindow = 0,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01, number_processing_threads = 1)
msp2FSdb(path, MSPfile_vector = "", massIntegrationWindow = 0,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01, number_processing_threads = 1)

Arguments

path

Address of .msp file(s)

MSPfile_vector

A vector of names of .msp files or one .msp file name.

massIntegrationWindow

Mass window in Da to integrate adjacent peaks in the fragmentation spectra

allowedNominalMass

c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis.

allowedWeightedSpectralEntropy

c(TRUE, FALSE). Weighted entropy to transform low abundant signals prior to calculating entropy similarity score. Please see the reference for details on wight transformation.

noiseRemovalRatio

noise removal ratio ([0 - 1])relative to the basepeak to measure entropy similarity score.

number_processing_threads

Number of processing threads for multi-threaded processing

Value

logFSdb

Parameters used to create the FSDB object

PrecursorMZ

A vector of precursor m/z values

Precursor Type

A vector of precursor adduct types

Retention Time

A vector of retention time values

Num Peaks

A vector of num peaks values indicating number of ions for each fragment spectra

Spectral Entropy

A vector of spectral entropy values

FragmentList

A list of fragment ions

MSPLibraryParameters

A dataframe of tabulated headers and their values for each msp block

Note

This function was designed not only to achieve the fastest computational speed; but also can standardize .msp files that were generated by inconsistent settings.

References

Li, Y., Kind, T., Folz, J., Vaniya, A., Mehta, S.S. and Fiehn, O. (2021). Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nature methods, 18(12), 1524-1531, doi:10.1038/s41592-021-01331-z

Examples

path_extdata <- system.file("extdata", package = "IDSL.FSA")
MSPfile <- c("Kynurenine_Kynurenic_acid.msp")
sampleFSdb <- msp2FSdb(path = path_extdata, MSPfile)
path_extdata <- system.file("extdata", package = "IDSL.FSA")
MSPfile <- c("Kynurenine_Kynurenic_acid.msp")
sampleFSdb <- msp2FSdb(path = path_extdata, MSPfile)

msp to Fragmentation Spectra DataBase (FSDB)

Description

This function creates an aligned table from the spectra in the .msp file

Usage

msp2TrainingMatrix(path, MSPfile = "", minDetectionFreq = 1,
selectedFSdbIDs = NULL, dimension = "wide", massAccuracy = 0.01,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01, number_processing_threads = 1)
msp2TrainingMatrix(path, MSPfile = "", minDetectionFreq = 1,
selectedFSdbIDs = NULL, dimension = "wide", massAccuracy = 0.01,
allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01, number_processing_threads = 1)

Arguments

path

Address of .msp file or an FSDB

MSPfile

A .msp file name or FSDB in .Rdata format

minDetectionFreq

A minimum detection frequency for an ion across the entire spectra

selectedFSdbIDs

selected MSP block/FSDB IDs to limit the screening to specific ion blocks

dimension

c("wide", "long"). *wide* or *long* alignment matrix output

massAccuracy

A mass accuracy (Da)

allowedNominalMass

c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis.

allowedWeightedSpectralEntropy

c(TRUE, FALSE). Weighted entropy to transform low abundant signals prior to calculating entropy similarity score. Please see the reference for details on wight transformation.

noiseRemovalRatio

noise removal ratio ([0 - 1])relative to the basepeak to measure entropy similarity score.

number_processing_threads

Number of processing threads for multi-threaded processing

Value

A FSDB file (.Rdata) and aligned spectra table (.csv) are stored in the same directory.

Examples


temp_wd <- tempdir() # just a temporary folder
path_extdata <- system.file("extdata", package = "IDSL.FSA")
MSPfile <- "Kynurenine_Kynurenic_acid.msp"
file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd)
msp2TrainingMatrix(path = temp_wd, MSPfile, minDetectionFreq = 1)

temp_wd <- tempdir() # just a temporary folder
path_extdata <- system.file("extdata", package = "IDSL.FSA")
MSPfile <- "Kynurenine_Kynurenic_acid.msp"
file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd)
msp2TrainingMatrix(path = temp_wd, MSPfile, minDetectionFreq = 1)

MSP Pos/Neg Splitter

Description

This function separates the positive and negative MSP blocks.

Usage

mspPosNegSplitter(path, MSPfile = "", number_processing_threads = 1)
mspPosNegSplitter(path, MSPfile = "", number_processing_threads = 1)

Arguments

path

address of the .msp file.

MSPfile

name of the file with the .msp extension.

number_processing_threads

Number of processing threads for multi-threaded processing

Value

The .msp files are saved in the same location with '_Neg.msp' and '_Pos.msp' extensions.

Examples


temp_wd <- tempdir() # just a temporary folder
path_extdata <- system.file("extdata", package = "IDSL.FSA")
MSPfile <- "Kynurenine_Kynurenic_acid.msp"
file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd)
mspPosNegSplitter(temp_wd, MSPfile)

temp_wd <- tempdir() # just a temporary folder
path_extdata <- system.file("extdata", package = "IDSL.FSA")
MSPfile <- "Kynurenine_Kynurenic_acid.msp"
file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd)
mspPosNegSplitter(temp_wd, MSPfile)

plot spectra from FSdb core

Description

This function plots spectra figures from FSdb objects generated using the 'msp2FSdb' function.

Usage

plotFSdb2SpectraCore(FSdb, index)
plotFSdb2SpectraCore(FSdb, index)

Arguments

FSdb

FSdb

index

index

Value

spectra_figure object

Examples


## To create the FSdb object
temp_wd <- tempdir() # just a temporary folder
path_extdata <- system.file("extdata", package = "IDSL.FSA")
MSPfile <- c("Kynurenine_Kynurenic_acid.msp")
file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd)
FSdb <- msp2FSdb(path = temp_wd, MSPfile)
## To plot spectra
index <- 1
plotFSdb2SpectraCore(FSdb, index)

## To create the FSdb object
temp_wd <- tempdir() # just a temporary folder
path_extdata <- system.file("extdata", package = "IDSL.FSA")
MSPfile <- c("Kynurenine_Kynurenic_acid.msp")
file.copy(from = paste0(path_extdata, "/", MSPfile), to = temp_wd)
FSdb <- msp2FSdb(path = temp_wd, MSPfile)
## To plot spectra
index <- 1
plotFSdb2SpectraCore(FSdb, index)

Mixer 1:1 spectra A and B

Description

This function creates 1:1 mixed AB spectra for spectral entropy calculation

Usage

spectra_1A1B_mixer(PEAK_A, PEAK_B, massError = 0, allowedNominalMass = FALSE)
spectra_1A1B_mixer(PEAK_A, PEAK_B, massError = 0, allowedNominalMass = FALSE)

Arguments

PEAK_A

A matrix (m/z, int) of fragmentation spectra

PEAK_B

A matrix (m/z, int) of fragmentation spectra

massError

Mass accuracy in Da

allowedNominalMass

c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis.

Value

A matrix of 1:1 mixing spectra. First and second columns represent intensity-weighted average m/z and cumulated intensity, respectively.

Spectra Integrator

Description

This function integrates individual m/z peaks from multiple chromatogram scans (spectra) into summed m/z peaks using a mass accuracy or nominal masses.

Usage

spectra_integrator(stackedSpectra, massError = 0, allowedNominalMass = FALSE)
spectra_integrator(stackedSpectra, massError = 0, allowedNominalMass = FALSE)

Arguments

stackedSpectra

A matrix of two columns of the stacked spectra. First and second columns should represent m/z and intensity, respectively.

massError

Mass accuracy in Da

allowedNominalMass

c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis.

Value

A matrix of integrated spectra. First and second columns represent intensity-weighted average m/z and cumulated intensity, respectively.

Examples

data(stackedSpectra)
massError <- 0.005 # Da
Integrated_spectra <- spectra_integrator(stackedSpectra[, 1:2], massError)
data(stackedSpectra)
massError <- 0.005 # Da
Integrated_spectra <- spectra_integrator(stackedSpectra[, 1:2], massError)

Spectra Ion Filter

Description

This function can detect m/z peaks that are related to each other across selected spectra lists.

Usage

spectra_ion_filter(spectraList, indexSpectraList = length(spectraList), massError,
minPercentageDetectedScans = 10, rsdCutoff = 0, pearsonRHOthreshold = NA)
spectra_ion_filter(spectraList, indexSpectraList = length(spectraList), massError,
minPercentageDetectedScans = 10, rsdCutoff = 0, pearsonRHOthreshold = NA)

Arguments

spectraList

a list of matrices of m/z and intensity values for each chromatogram scan

indexSpectraList

a vector of spectra indices for the analysis. This vector should have at least 3 elements to run this function.

massError

required mass error for m/z values

rsdCutoff

Relative standard deviations (in percent) to remove constant peaks (usually noisy peaks)

minPercentageDetectedScans

Minimum percentage of detected scans for an m/z peak

pearsonRHOthreshold

A threshold for pairwise Pearson's correlation coefficient across the selected spectra lists. This feature is recommended to find co-occurring peaks within a chromatographic peak. This feature may be used to eliminate instrument noises from MS2 data channels within an MS1 chromatographic peak for DDA analysis.

Value

A matrix of m/z and cumulated intensities across the 'indexSpectraList' spectra

Spectral Entropy Calculator

Description

This module calculates spectral entropy for a fragmentation pattern using a method described by the reference paper.

Usage

spectral_entropy_calculator(FragmentList, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01)
spectral_entropy_calculator(FragmentList, allowedWeightedSpectralEntropy = TRUE,
noiseRemovalRatio = 0.01)

Arguments

FragmentList

A matrix (m/z, int) of fragmentation pattern after intensity adjustment

allowedWeightedSpectralEntropy

c(TRUE, FALSE). Weighted entropy to transform low abundant signals prior to calculating entropy similarity score. Please see the reference for details on weight transformation.

noiseRemovalRatio

noise removal ratio ([0 - 1])relative to the basepeak to measure entropy similarity score.

Value

spectralEntropy

spectral entropy

NumPeaks

NumPeaks

FragmentList

A matrix of two-columns after intensity normalization relative to summation of intensities AND entropy weight transformation when is selected.

Note

noise removal on intensities should be performed prior to feeding to this function

References

Examples

FragmentList <- cbind(seq(50, 600, length.out = 10), seq(10, 90, length.out = 10))
SE <- spectral_entropy_calculator(FragmentList)
print(SE[[1]])
FragmentList <- cbind(seq(50, 600, length.out = 10), seq(10, 90, length.out = 10))
SE <- spectral_entropy_calculator(FragmentList)
print(SE[[1]])

Spectral Entropy Calculator

Description

This module measures similarity of spectral entropies between 'PEAK_A' and 'PEAK_B' fragment spectra using a method described by the reference paper.

Usage

spectral_entropy_similarity_score(PEAK_A, S_PEAK_A, PEAK_B, S_PEAK_B, massError,
allowedNominalMass = FALSE)
spectral_entropy_similarity_score(PEAK_A, S_PEAK_A, PEAK_B, S_PEAK_B, massError,
allowedNominalMass = FALSE)

Arguments

PEAK_A

A matrix (m/z, int) of fragmentation spectra

S_PEAK_A

Spectral entropy of PEAK_A

PEAK_B

A matrix (m/z, int) of fragmentation spectra

S_PEAK_B

Spectral entropy of PEAK_B

massError

Mass accuracy in Da

allowedNominalMass

c(TRUE, FALSE). Select 'TRUE' only for nominal mass analysis.

Value

spectral entropy similarity between 0 - 1

References

Examples

allowedWeightedSpectralEntropy <- TRUE
##
A <- cbind(seq(50, 160, length.out = 10), seq(10, 90, length.out = 10))
sA <- spectral_entropy_calculator(A, allowedWeightedSpectralEntropy)
S_PEAK_A <- sA[[1]]
PEAK_A <- sA[[3]]
##
B <- cbind(seq(50, 160, length.out = 10), seq(50, 60, length.out = 10))
sB <- spectral_entropy_calculator(A, allowedWeightedSpectralEntropy)
S_PEAK_B <- sB[[1]]
PEAK_B <- sB[[3]]
##
allowedNominalMass = TRUE
entropyScore <- spectral_entropy_similarity_score(PEAK_A, S_PEAK_A, PEAK_B,
S_PEAK_B, allowedNominalMass)
allowedWeightedSpectralEntropy <- TRUE
##
A <- cbind(seq(50, 160, length.out = 10), seq(10, 90, length.out = 10))
sA <- spectral_entropy_calculator(A, allowedWeightedSpectralEntropy)
S_PEAK_A <- sA[[1]]
PEAK_A <- sA[[3]]
##
B <- cbind(seq(50, 160, length.out = 10), seq(50, 60, length.out = 10))
sB <- spectral_entropy_calculator(A, allowedWeightedSpectralEntropy)
S_PEAK_B <- sB[[1]]
PEAK_B <- sB[[3]]
##
allowedNominalMass = TRUE
entropyScore <- spectral_entropy_similarity_score(PEAK_A, S_PEAK_A, PEAK_B,
S_PEAK_B, allowedNominalMass)

Example for a stacked spectra

Description

A data to test the 'spectra_integrator' function.

Usage

data("stackedSpectra")data("stackedSpectra")

Format

mz: a numeric vector of m/z values
int: a numeric vector of intensities
scan_number: a numeric vector of chromatogram scan numbers

Details

The 'scan_number' column is not necessary to test the 'spectra_integrator' function.

Examples

data(stackedSpectra)
data(stackedSpectra)

Element Sorter

Description

This function sorts 84 elements in the periodic table for molecular formula deconvolution.

Usage

UFSA_element_sorter()
UFSA_element_sorter()

Value

A string vector of elements

Examples

Elements <- UFSA_element_sorter()
Elements <- UFSA_element_sorter()

Molecular Formula Vector Generator

Description

This function convert a molecular formulas into a numerical vector

Usage

UFSA_formula_vector_generator(molecular_formula, Elements, LElements = length(Elements),
allowedRedundantElements = FALSE)
UFSA_formula_vector_generator(molecular_formula, Elements, LElements = length(Elements),
allowedRedundantElements = FALSE)

Arguments

molecular_formula

molecular formula

Elements

a string vector of elements. This value must be driven from the 'element_sorter' function.

LElements

number of elements. To speed up loop calculations, consider calculating the number of elements outside of the loop.

allowedRedundantElements

'TRUE' should be used to deconvolute molecular formulas with redundant elements (e.g. CO2CH3O), and 'FALSE' should be used to skip such complex molecular formulas.(default value)

Value

a numerical vector for the molecular formula. This function returns a vector of -Inf values when the molecular formula has elements not listed in the 'Elements' string vector.

Examples

molecular_formula <- "C12H2Br5Cl3O"
Elements <- UFSA_element_sorter()
mol_vec <- UFSA_formula_vector_generator(molecular_formula, Elements)
##
regenerated_molecular_formula <- UFSA_hill_molecular_formula_printer(Elements, mol_vec)
molecular_formula <- "C12H2Br5Cl3O"
Elements <- UFSA_element_sorter()
mol_vec <- UFSA_formula_vector_generator(molecular_formula, Elements)
##
regenerated_molecular_formula <- UFSA_hill_molecular_formula_printer(Elements, mol_vec)

Print Hill Molecular Formula

Description

This function produces molecular formulas from a list numerical vectors in the Hill notation system

Usage

UFSA_hill_molecular_formula_printer(MolVecMat, Elements, LElements = length(Elements))
UFSA_hill_molecular_formula_printer(MolVecMat, Elements, LElements = length(Elements))

Arguments

MolVecMat

A matrix of numerical vectors of molecular formulas in each row.

Elements

A vector string of the used elements.

LElements

LElements

Value

A vector of molecular formulas

Examples

Elements <- c("C", "H", "O", "N", "Br", "Cl")
MoleFormVec1 <- c(2, 6, 1, 0, 0, 0) # C2H6O
MoleFormVec2 <- c(8, 10, 2, 4, 0 ,0) # C8H10N4O2
MoleFormVec3 <- c(12, 2, 1, 0, 5, 3) # C12H2Br5Cl3O
MolVecMat <- rbind(MoleFormVec1, MoleFormVec2, MoleFormVec3)
H_MolF <- UFSA_hill_molecular_formula_printer(MolVecMat, Elements)
Elements <- c("C", "H", "O", "N", "Br", "Cl")
MoleFormVec1 <- c(2, 6, 1, 0, 0, 0) # C2H6O
MoleFormVec2 <- c(8, 10, 2, 4, 0 ,0) # C8H10N4O2
MoleFormVec3 <- c(12, 2, 1, 0, 5, 3) # C12H2Br5Cl3O
MolVecMat <- rbind(MoleFormVec1, MoleFormVec2, MoleFormVec3)
H_MolF <- UFSA_hill_molecular_formula_printer(MolVecMat, Elements)

Ionization Pathway Deconvoluter

Description

This function deconvolutes ionization pathways into a coefficient and a numerical vector to simplify prediction ionization pathways.

Usage

UFSA_ionization_pathway_deconvoluter(IonPathways, Elements, LElements = length(Elements))
UFSA_ionization_pathway_deconvoluter(IonPathways, Elements, LElements = length(Elements))

Arguments

IonPathways

A vector of ionization pathways. Pathways should be like [Coeff*M+ADD1-DED1+...] where "Coeff" should be an integer between 1-9 and ADD1 and DED1 may be ionization pathways. ex: 'IonPathways <- c("[M]+", "[M+H]+", "[2M-Cl]-", "[3M+CO2-H2O+Na-KO2+HCl-NH4]-")'

Elements

A vector string of the used elements

LElements

Counts of elements

Value

A list of adduct calculation values for each ionization pathway.

Examples

Elements <- UFSA_element_sorter()
IonPathways <- c("[M]+", "[M+H]+", "[2M-Cl]-", "[3M+CO2-H2O+Na-KO2+HCl-NH4]-")
Ion_DC <- UFSA_ionization_pathway_deconvoluter(IonPathways, Elements)
Elements <- UFSA_element_sorter()
IonPathways <- c("[M]+", "[M+H]+", "[2M-Cl]-", "[3M+CO2-H2O+Na-KO2+HCl-NH4]-")
Ion_DC <- UFSA_ionization_pathway_deconvoluter(IonPathways, Elements)

UFA Precursor Type Corrector

Description

Precursor type corrector from MSP files. This function initially attempts to standardize the precursor types to be consistent with the 'ionization_pathway_deconvoluter' module of the IDSL.SUFA package.

Usage

UFSA_precursorType_corrector(precursorType, ionMode = NULL)
UFSA_precursorType_corrector(precursorType, ionMode = NULL)

Arguments

precursorType

precursorType

ionMode

ionMode

Value

correctedPrecursorType

Examples

uncorrectedPrecursorType <- c("[M]+", "[M+H]+", "[2M-Cl]-", "[3M+COO-H2O+Na-KO2+HCl-NH4]-")
precursorType <- UFSA_precursorType_corrector(uncorrectedPrecursorType, ionMode = NULL)
uncorrectedPrecursorType <- c("[M]+", "[M+H]+", "[2M-Cl]-", "[3M+COO-H2O+Na-KO2+HCl-NH4]-")
precursorType <- UFSA_precursorType_corrector(uncorrectedPrecursorType, ionMode = NULL)

xlsx to MSP

Description

This function creates .msp files from an organized spreadsheet of fragmentation data.

Usage

xlsx2msp(path, xlsxFileName = "", number_processing_threads = 1)
xlsx2msp(path, xlsxFileName = "", number_processing_threads = 1)

Arguments

path

address of the spreadsheet

xlsxFileName

name of the file with the .xlsx extension.

number_processing_threads

Number of processing threads for multi-threaded processing

Value

The .msp files are saved in the same location.

Note

The spreadsheet should have only one column for the following headers (case-sensitive): c('ID', 'mz_fragment', 'int_fragment', 'Name')

Examples


temp_wd <- tempdir() # just a temporary folder
path_extdata <- system.file("extdata", package = "IDSL.FSA")
xlsxFileName <- "PFAS_MSe.xlsx"
file.copy(from = paste0(path_extdata, "/", xlsxFileName), to = temp_wd)
xlsx2msp(temp_wd, xlsxFileName)

temp_wd <- tempdir() # just a temporary folder
path_extdata <- system.file("extdata", package = "IDSL.FSA")
xlsxFileName <- "PFAS_MSe.xlsx"
file.copy(from = paste0(path_extdata, "/", xlsxFileName), to = temp_wd)
xlsx2msp(temp_wd, xlsxFileName)

Package 'IDSL.FSA'

Help Index

Fragmentation Spectra Annotator

Description

Usage

Arguments

Value

aggregation method for FSA

Description

Usage

Arguments

Value

FSA annotation text repel

Description

Usage

Arguments

Value

FSA_dir.create

Description

Usage

Arguments

Value

FSA FSdb xlsx Analyzer

Description

Usage

Arguments

Value

FSA loadRdata

Description

Usage

Arguments

Value

FSA Locate regex

Description

Usage

Arguments

Details

Value

Examples

FSA logRecorder

Description

Usage

Arguments

Value

FSA message

Description

Usage

Arguments

Value

FSA msp annotator

Description

Usage

Arguments

Value

FSA Cytoscape Files Generator

Description

Usage

Arguments

Value

References

Examples

plot FSdb to Spectra

Description

Usage

Arguments

Value

aggregate function for IDSL.FSA

Description

Usage

Arguments

Value

FSA Spectra Marker Generator

Description

Usage

Arguments

Value

FSA SpectraSimilarity xlsx Analyzer

Description

Usage

Arguments