Skip to content

Parameters

philippa1812 edited this page Jul 9, 2024 · 18 revisions

Parameters

Each workflow contains parameters which can be adjusted in a configuration file or in the graphical user interface of msiFlow. Every workflow directory contains a data subfolder with an example config.yaml.

Important note: One common parameter of all workflows is data which defines the path to the input files. This parameter is only intended for the local version of msiFlow. In the Docker version the data path is specified in the command which launches the graphical user interface or the command which executes a workflow using the command-line interface. Therefore don't change this parameter in the Docker version!

IF Segmentation Workflow

Parameter Type Default Value Description
threshold_algorithm categorical otsu select a thresholding algorithm for segmentation from the following: otsu yen isodata mean minimum triangle
gauss_sigma numeric 1 sigma of Gaussian filter (increase for more smoothing)
min_size numeric 10 min. object size in pixels in the segmentation
img_channels_to_segment string 1 comma-separated list of image channels to segment in the TIF stacks

Molecular Heterogeneity Workflow

Parameter Type Default Value Description
dot_size numeric 5 dot size of scatter plots
metric categorical cosine select a UMAP distance metric from the following: euclidean chebyshev cosine correlation
n_neighbors numerical 3 size of local neighborhood UMAP will look at
min_dist numerical 0.0 min. distance apart that point are allowed to be in low dim. UMAP representation
use_model boolean False set to True to use a pre-trained UMAP model
min_samples numeric 30 min. number of neighbors to a core point in HDBSCAN clustering
min_cluster_size numeric 500 min. size of an HDBSCAN cluster

Molecular Signatures Workflow

Parameter Type Default Value Description
model categorical XGBoost select a classification model from the following: XGBoost, LGBoost, AdaBoost, CatBoost, GBoost, RandomForest
img_channels string '' comma-separated list of image channels to classify in the TIF stacks (leave this string empty if image channels are provided as individual TIF files)
multiclass boolean True set to True to perform multi-classification if you have multiple image channels/classes
class_balancing_method categorical weights if your classes are unevenly distributed select a method to tackle class imbalance from the following: smote, undersample, oversample, weights else select standard to not use any class imbalance methods
num_top_feat numeric 10 number of top features to plot from classification model
save_ion_imgs boolean False set to True to save the ion images of top features
save_umap_imgs boolean True set to True to save the UMAP images of top features (requires umap_data.csv in input folder)
n_folds numeric 0 to perform stratified k-fold cross-validation set the number of folds or set to 0 to not perform cross-validation (time-consuming)
annotate boolean True set to True to annotate molecules (requires annotation.csv in input folder)

MSI IF Registration Workflow

Parameter Type Default Value Description
radius numeric 100 radius of rolling-ball background subtraction
sigma numeric 1 sigma of Gaussian filter (increase for more smoothing)
lower_perc percentage  0.0 lower percentage of intensity values which should be suppressed for contrast enhancement
upper_perc percentage 99.9 upper percentage of intensity values which should be suppressed for contrast enhancement
af_chan numeric 1 image number/channel containing the autofluorescence image after image stack creation (generated in alphabetical order)
mask_val_chan numeric 2 image slice number of TIF stack containing the mask which is used for validating the registration result (requires a mask provided in input folder for MSI)

MSI Preprocessing Workflow

General

Parameter Type Default Value Description
 matrix_removal boolean  True  set to True to apply matrix removal
 peak_filtering boolean  True  set to True to apply peak filtering
 norm boolean  True  set to True to apply normalisation
 outlier_removal boolean  False  set to True to apply outlier removal
 deisotoping boolean  True  set to True to apply deisotoping

Smoothing and Peak Picking

Parameter Type Default Value Description
 snr  numeric 3   signal-to-noise threshold for peaks
 smooth  binary  1  set to 1 to enable Savitzky-Golay smoothing and set to 0 to disable Savitzky-Golay smoothing
 window  numeric  11  length of the Savitzky-Golay filter window
 order  numeric  3 order of the polynomial of Savitzky-Golay filter 

Peak Alignment

Parameter Type Default Value Description
 num_pixel_percentage  percentage 100   percentage of pixels to consider for building common m/z vector (decrease this value if you have low RAM)
 mz_resolution  numeric  0.005  bin size in Da of histogram which is used to build common m/z vector
 pixel_percentage  percentage  3  min. percentage of m/z to form common m/z vector
 max_shift  numeric  0.01 max. shift in Da to shift peaks 

Matrix Removal

Parameter Type Default Value Description
 clustering  boolean True   set to `True to use clustering for matrix/off-tissue identification
 dim_reduction  categorical  umap  select a method to reduce spectra from the following: umap, t-sne or `pca
 n_components  numeric  2  number of components of dim. reduction
 metric  categorical  cosine select a UMAP distance metric from the following: euclidean chebyshev cosine correlation 
n_neighbors  numeric  100 size of local neighborhood UMAP will look at 
min_dist numeric 0.0 min. distance apart that points are allowed to be in low dim. representation (UMAP)
cluster_algorithm categorical  hdbscan  select a cluster algorithm to cluster the low-dim. representation from the following: hierarchical k-means gaussian_mixture hdbscan 
min_cluster_size numeric  100  min. size of an HDBSCAN cluster
 min_samples numeric   500 min. number of neighbors to a core point 
matrix_corr_thr percentage  0.7 clusters with this Spearman correlation threshold to the initial off-tissue cluster are combined to an extended matrix cluster
pixel_perc_thr percentage  30  pixel percentage threshold of clusters to extend off-tissue cluster 
matrix_postproc boolean  True  set to True for post-processing of the matrix/off-tissue image
pixel_removal boolean  True  set to True to remove matrix/off-tissue pixels from each dataset 
 matrix_subtraction boolean  False  set to True to subtract the mean matrix spectrum from each pixel spectrum 
 num_matrix_peaks numeric  0  set number of top matrix peaks to remove
 matrix_peak_removal boolean  False set to True to remove matrix peaks
 matrix_mzs list  ''   list of known matrix m/z values

Intra-Normalisation

Parameter Type Default Value Description
 method  categorial mfc  select a method from the following: mfc sum mean

Inter-Normalisation

Parameter Type Default Value Description
 method  categorial mfc  select a method from the following: mfc sum mean

Peak Filtering

Parameter Type Default Value Description
 sum  categorical max  select a method to summarize the spatial coherence over all samples from the following: min mean max
 thr numeric  500  filter peaks based on this defined spatial coherence threshold

Sample Outlier Detection

Parameter Type Default Value Description
 n_neighbors numeric  10  size of local neighborhood UMAP will look at
 min_cluster_size numeric  5000  min. size of an HDBSCAN cluster 
 min_samples numeric 1000 min. number of neighbors to a core point
cluster_thr percentage  70  cluster pixel percentage which must be covered by one sample to be considered a sample-specific cluster (SSC) 
 sample_thr percentage  70  sample pixel percentage which must be covered by SSC pixels to be considered a sample outlier 
 remove_scc boolean  True  set to True to remove SSC pixels

De-isotoping

Parameter Type Default Value Description
 tolerance numeric  0.01  the tolerance used to match isotopic peaks 
 min_isotopes numeric  2  min. number of isotopic peaks
 max_isotopes numeric  6  max. number of isotopic peaks
 openMS boolean  True  set to True to use openMS routine 

MSI Segmentation Workflow

Parameter Type Default Value Description
multi_sample boolean False set to True to perform multi-sample segmentation, set to False to perform segmentation on each sample individually
dot_size numeric  1 dot size of scatter plots
dim_reduction method categorical umap select a method for dimensionality reduction from the following: pca t-sne umap or set to '' to not perform dimensionality reduction
n_components numeric 2  number of components to reduce the data
metric categorical cosine select a UMAP distance metric from the following: euclidean chebyshev cosine correlation
n_neighbors numerical 15 size of local neighborhood UMAP will look at
min_dist numerical 0.0 min. distance apart that point are allowed to be in low dim. UMAP representation
clustering method categorical  k-means   select a clustering algorithm from the following: hierarchical k-means gaussian_mixture hdbscan SA where SA performs spatial k-means (only applicable in single-sample segmentation)
n_clusters numeric   3  number of clusters (applicable for k-means gaussian_mixture hierarchical)
min_cluster_size numeric 100 min. size of an HDBSCAN cluster
min_samples numeric 5 min. number of neighbors to a core point in HDBSCAN clustering

Region Group Analysis Workflow

Parameter Type Default Value Description
annotate boolean  True   set to True to annotate molecules (requires annotation.csv in input folder)
method categorical  mean  select a method from the following to summarize the spectra in defined regions: mean median
fold_change_thr numeric 1.0  absolute log2 fold change threshold to define regulated molecules (value between 0.0 and 1.0)
 infected_grp string  UPEC   name of the infected group (must be provided in the file names)
control_grp string  control  name of the control group (must be provided in the file names) 
save_ion_imgs boolean  True   set to True to save ion images of regulated molecules
row_norm boolean  True set to True to perform row normalization for the heatmap