Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exclude scDblFinder columns from Seurat uploads #353

Merged
merged 3 commits into from
Jan 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions pipeline-runner/R/seurat-3-upload_seurat_to_aws.R
Original file line number Diff line number Diff line change
Expand Up @@ -78,17 +78,18 @@ upload_seurat_to_aws <- function(input, pipeline_config, prev_out) {
}

find_cluster_columns <- function(scdata) {
meta <- [email protected]

# exclude all group columns, including duplicates
group_cols <- find_group_columns([email protected], remove.dups = FALSE)
exclude_cols <- c(group_cols, 'samples')
group_cols <- find_group_columns(meta, remove.dups = FALSE)
group_cols <- c(group_cols, 'samples')
scdblfinder_cols <- grep('^scDblFinder', colnames(meta), value = TRUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add it to exclude_cols? don't really feel strongly about it though. whatever you think looks best is fine by me.

Copy link
Contributor Author

@alexvpickering alexvpickering Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I did initially as well but it interferes with how exclude_cols is used further down (checking to see if a column is same as samples or other group column). I'll rename exclude_cols to make this more obvious


# order meta to indicate preference for louvain clusters
meta <- [email protected]
louvain_cols <- c('louvain', 'active.ident', 'seurat_clusters')
meta <- meta |> dplyr::relocate(dplyr::any_of(louvain_cols))

check_cols <- setdiff(colnames(meta), exclude_cols)
check_cols <- setdiff(colnames(meta), c(scdblfinder_cols, group_cols))

cluster_cols <- c()
for (check_col in check_cols) {
Expand All @@ -111,9 +112,9 @@ find_cluster_columns <- function(scdata) {

# skip if col is same as samples or group column
is_sample_col <- FALSE
for (exclude_col in exclude_cols) {
exclude_vals <- meta[[exclude_col]]
if (test_groups_equal(check_vals, exclude_vals)) {
for (group_col in group_cols) {
group_vals <- meta[[group_col]]
if (test_groups_equal(check_vals, group_vals)) {
is_sample_col <- TRUE
break
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -285,3 +285,19 @@ test_that("find_cluster_columns puts 'louvain' column first if exists", {
cluster_cols <- find_cluster_columns(scdata)
expect_equal(cluster_cols[1], 'louvain')
})


test_that("find_cluster_columns omits columns that start with scDblFinder", {

expected_cols <- c('RNA_snn_res.0.8', 'letter.idents', 'groups', 'RNA_snn_res.1')

scdata <- mock_scdata()
sample_names <- c('A', 'B', 'C', 'D')
samples <- rep(sample_names, each = ncol(scdata)/4)
scdata$samples <- samples

scdata$scDblFinder.cluster <- scdata$RNA_snn_res.0.8

cluster_cols <- find_cluster_columns(scdata)
expect_setequal(cluster_cols, expected_cols)
})
Loading