Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates #19

Closed
wants to merge 34 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
ebfd4a0
update to R 4.3.3
alexvpickering Mar 15, 2024
b5446a4
update MASS
alexvpickering Mar 15, 2024
e9b6d51
update init lock
alexvpickering Mar 15, 2024
65aec07
add build dep
alexvpickering Mar 16, 2024
bf86bf4
roxygenize
alexvpickering Mar 16, 2024
ba11799
update to R 4.4.0
alexvpickering May 14, 2024
714f938
fix python venv stuff
alexvpickering May 14, 2024
f6a559d
Merge branch 'master' into update-r-ver
alexvpickering May 14, 2024
ffbbaea
update CRAN packages
alexvpickering May 15, 2024
18d61ce
update init
alexvpickering May 15, 2024
286f50b
update safe packages
alexvpickering May 15, 2024
0cd937a
use v3 for dev
alexvpickering May 15, 2024
a3da836
reorder
alexvpickering May 15, 2024
987cb31
update tests
alexvpickering May 15, 2024
f148922
fix tests
alexvpickering May 15, 2024
e35d3ed
exclude SeuratWrappers from restore_fast
alexvpickering May 15, 2024
1d426f5
fix ghost package need
alexvpickering May 16, 2024
46be054
try workaround for licensing
alexvpickering May 16, 2024
94b9e02
ghost functions for RhpcBLASctl
alexvpickering May 16, 2024
6566b0d
update snaps
alexvpickering May 17, 2024
33fbac1
implement Seurat v5
alexvpickering May 17, 2024
3856fc8
update comment
alexvpickering May 17, 2024
8ab6b8a
update init lockfile
alexvpickering May 17, 2024
3d63603
fixing tests
alexvpickering May 21, 2024
0615493
ensure layers joined before/after integration
alexvpickering May 21, 2024
80c7955
fix regression
alexvpickering May 22, 2024
76ea8b1
add test for regression
alexvpickering May 22, 2024
9c4e864
move helpers to seperate file
alexvpickering May 22, 2024
cf8ded6
more explicit
alexvpickering May 22, 2024
9a1c83d
print tests if fail
alexvpickering May 22, 2024
7ceedc4
fix test
alexvpickering May 22, 2024
4cec13d
Merge pull request #363 from hms-dbmi-cellenics/update-r-ver
alexvpickering May 23, 2024
8dfb965
fix test_local fallback
alexvpickering May 23, 2024
0d23e05
Merge pull request #368 from hms-dbmi-cellenics/fix-covr
alexvpickering May 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ jobs:
docker run \
--entrypoint /bin/bash \
-v $(pwd)/covr:/covr $IMAGE_NAME \
-c "R -e 'cov <- covr::package_coverage(); covr::to_cobertura(cov, \"/covr/coverage.xml\")'"
-c "R -e 'tryCatch({cov <- covr::package_coverage(); covr::to_cobertura(cov, \"/covr/coverage.xml\")}, error = function(e) {testthat::test_local(); stop()})'"

env:
IMAGE_NAME: ${{ format('{0}/{1}:{2}-{3}', steps.login-ecr.outputs.registry, steps.ref.outputs.repo-name, steps.ref.outputs.image-tag, matrix.project) }}
Expand Down
1 change: 1 addition & 0 deletions pipeline-runner/.Rprofile
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
source("renv/activate.R")
options(Seurat.object.assay.version = "v5")
pkgload::load_all(attach_testthat = TRUE)
12 changes: 6 additions & 6 deletions pipeline-runner/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ License: GPL-3.0
Depends:
R (>= 4.2.0)
Imports:
batchelor (== 1.12.1),
batchelor,
covr,
magrittr,
data.table
data.table,
hdf5r,
glmGamPoi
Suggests:
diffviewer,
devtools,
Expand All @@ -24,11 +26,9 @@ Suggests:
roxygen2,
styler,
testthat (>= 3.0.0),
usethis,
glmGamPoi,
hdf5r
usethis
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
RoxygenNote: 7.3.1
10 changes: 5 additions & 5 deletions pipeline-runner/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Create builder step
# pull official base image and use it as builder
FROM rocker/r-ver:4.2.2 AS builder
FROM rocker/r-ver:4.4.0 AS builder
WORKDIR /src/pipeline-runner

# install required debian packages to install R packages
Expand All @@ -20,7 +20,7 @@ RUN echo ".libPaths(c('$RENV_LIB', .libPaths()))" >> $(R RHOME)/etc/Rprofile.sit

# install renv to install required R packages
RUN R -q -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))" && \
R -q -e "remotes::install_github('rstudio/[email protected].5')" && \
R -q -e "remotes::install_github('rstudio/[email protected].7')" && \
R -q -e "renv::init(bare = TRUE, settings = list(use.cache = FALSE))"

# fast pre-restore with pkgbuild
Expand Down Expand Up @@ -56,7 +56,7 @@ RUN Rscript check_package_licenses.R
# ---------------------------------------------------
# COMMON MINIMAL BUILD
# ---------------------------------------------------
FROM rocker/r-ver:4.2.2 AS common
FROM rocker/r-ver:4.4.0 AS common
WORKDIR /src/pipeline-runner

# get source code and R packages
Expand All @@ -71,8 +71,8 @@ RUN echo ".libPaths(c('$RENV_LIB', .libPaths()))" >> $(R RHOME)/etc/Rprofile.sit

# install python packages in virtualenv
ENV WORKON_HOME=/src/.virtualenvs
RUN R -q -e "reticulate::virtualenv_create('r-reticulate')" && \
R -q -e "reticulate::virtualenv_install('r-reticulate', c('geosketch==1.2', 'scanorama==1.7.3'), pip_options='--no-cache-dir')"
RUN R -q -e "reticulate::virtualenv_create('r-reticulate', python='$(which python3)')" && \
R -q -e "reticulate::virtualenv_install('r-reticulate', c('geosketch==1.2', 'scanorama==1.7.3'), pip_options='--no-cache-dir')"

# ---------------------------------------------------
# PRODUCTION BUILD
Expand Down
4 changes: 2 additions & 2 deletions pipeline-runner/R/gem2s-6-construct_qc_config.R
Original file line number Diff line number Diff line change
Expand Up @@ -151,10 +151,10 @@ get_embedding_config <- function(scdata_list, config) {
config$embeddingSettings$methodSettings$tsne <- list(
perplexity = min(
default_perplexity,
min(vapply(scdata_list, ncol, integer(1))) / 100),
min(vapply(scdata_list, ncol, numeric(1))) / 100),
learningRate = max(
default_learning_rate,
min(vapply(scdata_list, ncol, integer(1))) / 12)
min(vapply(scdata_list, ncol, numeric(1))) / 12)
)

return(config)
Expand Down
4 changes: 2 additions & 2 deletions pipeline-runner/R/handle_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ load_processed_scdata <- function(s3, pipeline_config, experiment_id) {
# get_nnzero will return how many non-zero counts the count matrix has
# it is used to order samples according to their size
get_nnzero <- function (x) {
return(length(x@assays[["RNA"]]@counts@i))
return(length(x@assays[["RNA"]]$counts@i))
}

order_by_size <- function(scdata_list) {
Expand Down Expand Up @@ -619,7 +619,7 @@ get_s3_rds <- function(bucket, key, aws_config) {

conn <- gzcon(rawConnection(body))
object <- readRDS(conn)

close(conn)

return(object)
Expand Down
2 changes: 1 addition & 1 deletion pipeline-runner/R/qc-5-filter_doublets.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ filter_doublets <- function(scdata_list, config, sample_id, cells_id, task_name

if ("recomputeDoubletScore" %in% names(config)) {
if (config$recomputeDoubletScore) {
scores <- get_doublet_scores(sample_data@assays$RNA@counts, technology = config$sampleTechnology)
scores <- get_doublet_scores(sample_data@assays$RNA$counts, technology = config$sampleTechnology)
sample_data <- add_dblscore(sample_data, scores)
# update doublet scores in original scdata
scdata_list[[sample_id]] <- add_dblscore(scdata_list[[sample_id]], scores)
Expand Down
2 changes: 1 addition & 1 deletion pipeline-runner/R/qc-6-integrate_scdata-fastmnn.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ run_fastmnn <- function(scdata_list, config, cells_id) {
}

# calculate as many PCs for the PCA as possible, ideally 50, unless few cells
npcs_for_pca <- min(vapply(scdata_list, ncol, integer(1)) - 1, 50)
npcs_for_pca <- min(vapply(scdata_list, ncol, numeric(1)) - 1, 50)
npcs <- config$dimensionalityReduction$numPCs

# use the min of what the user wants and what can be calculated
Expand Down
6 changes: 3 additions & 3 deletions pipeline-runner/R/qc-6-integrate_scdata-harmony.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ run_harmony <- function(scdata_list, config, cells_id) {
}

# calculate as many PCs for the PCA as possible, ideally 50, unless few cells
npcs_for_pca <- min(vapply(scdata_list, ncol, integer(1)) - 1, 50)
npcs_for_pca <- min(vapply(scdata_list, ncol, numeric(1)) - 1, 50)
npcs <- config$dimensionalityReduction$numPCs

# use the min of what the user wants and what can be calculated
Expand Down Expand Up @@ -68,7 +68,7 @@ run_harmony <- function(scdata_list, config, cells_id) {
harmony::RunHarmony(
scdata,
group.by.vars = "samples",
reduction = "pca_for_harmony",
reduction.use = "pca_for_harmony",
dims.use = 1:npcs
)
}
Expand Down Expand Up @@ -115,7 +115,7 @@ RunGeosketchHarmony <- function(scdata,
harmony::RunHarmony(
geosketch_list$sketch,
group.by.vars = "samples",
reduction = reduction,
reduction.use = reduction,
dims.use = 1:npcs,
)
scdata_sketch_integrated@misc[["active.reduction"]] <- "harmony"
Expand Down
2 changes: 1 addition & 1 deletion pipeline-runner/R/qc-6-integrate_scdata-seuratv4.R
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ run_seuratv4 <- function(scdata_list, config, cells_id) {
use_geosketch <- "downsampling" %in% names(config) && config$downsampling$method == "geosketch"

# calculate as many PCs for the PCA as possible, ideally 50, unless few cells
npcs_for_pca <- min(vapply(scdata_list, ncol, integer(1)) - 1, 50)
npcs_for_pca <- min(vapply(scdata_list, ncol, numeric(1)) - 1, 50)
# use the min of what the user wants and what can be calculated
npcs <- min(config$dimensionalityReduction$numPCs, npcs_for_pca)

Expand Down
2 changes: 1 addition & 1 deletion pipeline-runner/R/qc-6-integrate_scdata-unisample.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ run_unisample <- function(scdata_list, config, cells_id) {
if (grepl("lognorm", normalization, ignore.case = TRUE)) normalization <- "LogNormalize"

# calculate as many PCs for the PCA as possible, ideally 50, unless few cells
npcs_for_pca <- min(vapply(scdata_list, ncol, integer(1)) - 1, 50)
npcs_for_pca <- min(vapply(scdata_list, ncol, numeric(1)) - 1, 50)
npcs <- config$dimensionalityReduction$numPCs

# use the min nPCs of what the user wants and what can be calculated
Expand Down
6 changes: 6 additions & 0 deletions pipeline-runner/R/qc-6-integrate_scdata.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ integrate_scdata <- function(scdata_list, config, sample_id, cells_id, task_name
integration_function <- get(paste0("run_", method))
scdata_integrated <- integration_function(scdata_list, config, cells_id)

if (methods::is(scdata_integrated[['RNA']], 'Assay5'))
scdata_integrated[['RNA']] <- SeuratObject::JoinLayers(scdata_integrated[['RNA']])

message("Finished data integration")

# Update config numPCs with estimated or user provided nPCs
Expand Down Expand Up @@ -79,6 +82,9 @@ create_scdata <- function(scdata_list, cells_id, merge_data = FALSE) {
merged_scdatas <- merge_scdata_list(scdata_list, merge_data)
merged_scdatas <- add_metadata(merged_scdatas, scdata_list)

if (methods::is(merged_scdatas[['RNA']], 'Assay5'))
merged_scdatas[['RNA']] <- SeuratObject::JoinLayers(merged_scdatas[['RNA']])

return(merged_scdatas)
}

Expand Down
27 changes: 16 additions & 11 deletions pipeline-runner/R/qc-7-embed_and_cluster.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,12 @@ run_clustering <- function(scdata, config, ignore_ssl_cert) {
#' @export
#'
embed_and_cluster <- function(
scdata,
config,
sample_id,
cells_id,
task_name = "configureEmbedding",
ignore_ssl_cert = FALSE
scdata,
config,
sample_id,
cells_id,
task_name = "configureEmbedding",
ignore_ssl_cert = FALSE
) {

if (config$clustering_should_run == TRUE) {
Expand Down Expand Up @@ -80,10 +80,15 @@ format_cluster_cellsets <- function(cell_sets,
name = paste0(clustering_method, " clusters")) {
message("Formatting cluster cellsets.")

# careful with capital l on type for the key.
# needed for leiden clustering to work
cell_sets_key <- ifelse(
clustering_method %in% c('louvain', 'leiden'),
'louvain',
clustering_method)

cell_sets_object <-
list(
key = "louvain",
key = cell_sets_key,
name = name,
rootNode = TRUE,
type = "cellSets",
Expand Down Expand Up @@ -147,9 +152,9 @@ replace_cell_class_through_api <- function(cell_class_object, api_url, experimen
httr_query <- paste0("$[?(@.key == \"", cell_class_key, "\")]")

body <- list(list(
"$match" = list(query = httr_query, value = list("$remove" = TRUE))
),
list("$prepend" = cell_class_object))
"$match" = list(query = httr_query, value = list("$remove" = TRUE))
),
list("$prepend" = cell_class_object))

patch_cell_sets(api_url, experiment_id, body, auth_JWT, ignore_ssl_cert)
}
Expand Down
4 changes: 2 additions & 2 deletions pipeline-runner/R/qc-helpers.R
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ subset_safe <- function(scdata, cells) {
if (length(cells) > 0) {
return(subset(scdata, cells = cells))
} else {
return(subset(scdata, cells = colnames(scdata)[1]))
return(subset(scdata, cells = colnames(scdata)[1:2]))
}
}

Expand Down Expand Up @@ -136,7 +136,7 @@ calc_filter_stats <- function(scdata) {
}

# number of counts per gene
ncount <- Matrix::rowSums(scdata[["RNA"]]@counts)
ncount <- Matrix::rowSums(scdata[["RNA"]]$counts)

list(
num_cells = ncol(scdata),
Expand Down
37 changes: 24 additions & 13 deletions pipeline-runner/R/seurat-2-load_seurat.R
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,13 @@ reconstruct_seurat <- function(dataset_fpath) {
# get counts
tryCatch({
SeuratObject::DefaultAssay(user_scdata) <- 'RNA'
counts <- user_scdata[['RNA']]@counts

# if V5 object, ensure layers are rejoined
if (methods::is(user_scdata[['RNA']], 'Assay5'))
user_scdata[['RNA']] <- SeuratObject::JoinLayers(user_scdata[['RNA']])


counts <- user_scdata[['RNA']]$counts
test_user_sparse_mat(counts)
rns <- row.names(counts)
check_type_is_safe(rns)
Expand Down Expand Up @@ -84,14 +90,20 @@ reconstruct_seurat <- function(dataset_fpath) {

# add logcounts
tryCatch({
logcounts <- user_scdata[['RNA']]@data
test_user_sparse_mat(logcounts)

layers <- SeuratObject::Layers(user_scdata, assay = 'RNA')
if ('data' %in% layers) {
logcounts <- user_scdata[['RNA']]$data
test_user_sparse_mat(logcounts)
} else {
logcounts <- Seurat::NormalizeData(user_scdata[['RNA']]$counts)
}

# shouldn't be raw counts
suspect.counts <- max(logcounts) > 100
if (suspect.counts) logcounts <- Seurat::NormalizeData(logcounts)

scdata[['RNA']]@data <- logcounts
scdata[['RNA']]$data <- logcounts
},
error = function(e) {
message(e$message)
Expand Down Expand Up @@ -134,10 +146,9 @@ reconstruct_seurat <- function(dataset_fpath) {
check_type_is_safe(red_name)
red_match <- grep("umap|tsne", red_name, value = TRUE)

if (length(red_match) > 0 && !(red_match %in% c("umap", "tsne"))) {
if (length(red_match) && !(red_match %in% c("umap", "tsne"))) {
is_umap <- grepl("umap", red_match)
is_tsne <- grepl("tsne", red_match)
new_red_name <- ifelse(is_umap, "umap", ifelse(is_tsne, "tsne", NA))
new_red_name <- ifelse(is_umap, "umap", "tsne")

message("Found reduction name ", red_match," containing ", new_red_name)
user_scdata <- update_reduction_name(user_scdata, red_name, new_red_name)
Expand All @@ -147,12 +158,12 @@ reconstruct_seurat <- function(dataset_fpath) {

stopifnot(red_name %in% c('umap', 'tsne'))

embedding <- user_scdata@reductions[[red_name]]@cell.embeddings
test_user_df(embedding)
red <- SeuratObject::CreateDimReducObject(
embeddings = embedding,
assay = 'RNA'
)
red <- user_scdata@reductions[[red_name]]
test_user_df([email protected])
test_user_df([email protected])
test_user_df([email protected])
red@assay.used <- 'RNA'

scdata@reductions[[red_name]] <- red
},
error = function(e) {
Expand Down
8 changes: 5 additions & 3 deletions pipeline-runner/R/seurat-3-upload_seurat_to_aws.R
Original file line number Diff line number Diff line change
Expand Up @@ -161,8 +161,10 @@ find_cluster_columns <- function(scdata) {
}

make_vals_numeric <- function(vals) {
vals <- as.character(vals)
as.numeric(factor(vals, levels = unique(vals)))
suppressWarnings({
vals <- as.character(vals)
as.numeric(factor(vals, levels = unique(vals)))
})
}

test_groups_equal <- function(vals1, vals2) {
Expand All @@ -183,7 +185,7 @@ add_samples_to_input <- function(scdata, input) {
change_sample_names_to_ids <- function(scdata, input) {
sample_ids <- input$sampleIds
names(sample_ids) <- input$sampleNames
scdata$samples <- sample_ids[scdata$samples]
scdata$samples <- unname(sample_ids[scdata$samples])
return(scdata)
}

Expand Down
2 changes: 1 addition & 1 deletion pipeline-runner/R/subset-1-subset_seurat.R
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ load_parent_experiment_data <- function(input, pipeline_config) {
#'
diet_scdata <- function(scdata) {
lean_scdata <- Seurat::CreateSeuratObject(
counts = scdata@assays$RNA@counts,
counts = scdata@assays$RNA$counts,
meta.data = [email protected],
min.cells = 0,
min.features = 0
Expand Down
5 changes: 3 additions & 2 deletions pipeline-runner/data-raw/update_R_version_deps.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ renv::upgrade(reload = TRUE)

# start with clean renv package cache
# only keep renv
renv_lib <- grep('renv', .libPaths(), value = TRUE)
renv_lib <- grep('renv/sandbox', .libPaths(), value = TRUE)[1]

del_pkgs <- list.files(renv_lib)
del_pkgs <- del_pkgs[del_pkgs != 'renv']
Expand Down Expand Up @@ -60,9 +60,10 @@ while (!done) {
done <- TRUE

}, error = function(e) {
# browser()
message(e$message)
message('Updating failed package')
failed.pkg <- gsub("^install of package '(.+?)' failed .+?$", "\\1", e$message)
failed.pkg <- gsub("^Error installing package '(.+?)':.+?$", "\\1", e$message)
updated <<- c(updated, failed.pkg)

renv::install(failed.pkg, prompt = FALSE)
Expand Down
3 changes: 3 additions & 0 deletions pipeline-runner/init.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ library(tryCatchLog)
library(magrittr)
library(uuid)

# v5 is the default but making explicit
options(Seurat.object.assay.version = "v5")

# increase maxSize from the default of 500MB to 32GB
options(future.globals.maxSize = 32 * 1024 * 1024^2)

Expand Down
Loading
Loading