Running Banksy on large Xenium Dataset #39

Alwash-317 · 2024-09-18T15:28:01Z

Hi,

I’m working with an integrated Xenium dataset consisting of 12 samples, totaling approximately 5.4 million cells. After pre-processing the individual Xenium samples, I merged them into a single Seurat object for downstream analysis. However, I’m encountering issues when trying to run BANKSY due to the large size of the dataset. The R script is as follows:

`file_paths <- c(
"path_1", "path_2", ..., "path_12")

sample_names <- c(
"sample_1", "sample_2", ..., "sample_12")

seu_list <- list()

for (i in seq_along(file_paths)) {
seu <- readRDS(file_paths[i])
coords <- seu[[paste0("fov_", sample_names[i])]]$centroids@coords
seu$sdimx <- coords[, 1]
seu$sdimy <- coords[, 2]
seu_list[[i]] <- seu
}

merged_seu <- Reduce(merge, seu_list)

merged_seu <- JoinLayers(merged_seu)

DefaultAssay(merged_seu) <- "Xenium"

merged_seu <- RunBanksy(
merged_seu,
lambda = 0.8,
assay = 'Xenium',
slot = 'data',
features = 'all',
group = 'Sample_ID',
dimx = 'sdimx',
dimy = 'sdimy',
split.scale = TRUE,
k_geom = 15)`

And it crashes at the RunBanksy step with the following log error:

Error in [.data.table(knn_df, , abs(gcm[, to, drop = FALSE] %*% (weight *  : 
  negative length vectors are not allowed
Calls: RunBanksy ... mapply -> <Anonymous> -> <Anonymous> -> [ -> [.data.table
In addition: Warning message:
In asMethod(object) :
  sparse->dense coercion: allocating vector of size 19.3 GiB
Execution halted.

I attempted to allocate more memory for the script (up to 800 GB), and monitored memory usage, which didn’t exceed this limit at the time of the crash. I also used the future package with the setting options(future.globals.maxSize = 256 * 1024^3), but the issue persists.

Given the size of the dataset, are there any computationally less intensive approaches or optimizations you would recommend for running BANKSY on such large datasets? Any suggestions to handle memory usage more efficiently or alternative strategies would be greatly appreciated.

Thank you for your help!

The text was updated successfully, but these errors were encountered:

vipulsinghal02 · 2024-10-02T01:01:58Z

Hi Alwash, have you tried using hightly variable genes (2000 HVGs in Seurat)? Another optimization to try is to first do HVG to 2000 genes, then further reduce to 100 PCs. this 100PC by 5.4 million cell matrix is now your new feature-cell matrix ("gene"-cell matrix), and you start the usual of the pipeline on this.

This should greatly reduce dataset size and allow for the processing. Another idea is to use the BPcells package (which seurat supports, see their pages/vignettes).

Best,
Vipul

vipulsinghal02 · 2024-10-02T01:13:10Z

Also:

construct the BANKSY matrix separately for each sample, merge them, and run PCA on the merged matrix.
See this: Potential inconsistency between R and Python Versions Banksy_py#12 (comment)

Let me know how it goes!
Best,
Vipul

Alwash-317 closed this as completed Oct 1, 2024

vipulsinghal02 reopened this Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Banksy on large Xenium Dataset #39

Running Banksy on large Xenium Dataset #39

Alwash-317 commented Sep 18, 2024

vipulsinghal02 commented Oct 2, 2024

vipulsinghal02 commented Oct 2, 2024

Running Banksy on large Xenium Dataset #39

Running Banksy on large Xenium Dataset #39

Comments

Alwash-317 commented Sep 18, 2024

vipulsinghal02 commented Oct 2, 2024

vipulsinghal02 commented Oct 2, 2024