Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix access to layer data for DietSeurat in V5. #8197

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

ddiez
Copy link
Contributor

@ddiez ddiez commented Dec 16, 2023

This fixes #8054 which is caused by a change in satijalab/seurat-object@d6aa9af that changed the behavior of brakets.

@ddiez
Copy link
Contributor Author

ddiez commented May 11, 2024

I want to nudge this PR given that it has been a long time since submitted, several patches to Seurat V5 have been released and recently version 5.1.0 came out without a fix, and people are still having problems as indicated by all the imaginative ways used to overcome this in #8054. Tagging @mojaveazure since the issue was caused by a change in SeuratObject. This shows the problem with the latest version of Seurat. Using DietSeurat on pbmc_small dataset drops the dimensionality reduction objects but all the data objects remain.

> pbmc_small
An object of class Seurat
230 features across 80 samples within 1 assay
Active assay: RNA (230 features, 20 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, tsne

> DietSeurat(pbmc_small)
An object of class Seurat
230 features across 80 samples within 1 assay
Active assay: RNA (230 features, 20 variable features)
 3 layers present: counts, data, scale.data

Specifying which layers to remove also does not work:

> DietSeurat(pbmc_small, layers="counts")
An object of class Seurat
230 features across 80 samples within 1 assay
Active assay: RNA (230 features, 20 variable features)
 3 layers present: counts, data, scale.data

Internally DietSeurat tries to remove the layers using object[[assay]][[lyr]] <- NULL but this does not work and there is no error, the request is silently ignored:

> pbmc_small[["RNA"]][["scale.data"]] <- NULL
> pbmc_small
An object of class Seurat
230 features across 80 samples within 1 assay
Active assay: RNA (230 features, 20 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, tsne

The right way to accomplish this given the changes in SeuratObject is to use single brakets:

> pbmc_small[["RNA"]]["scale.data"] <- NULL
> pbmc_small
An object of class Seurat
230 features across 80 samples within 1 assay
Active assay: RNA (230 features, 20 variable features)
 2 layers present: counts, data
 2 dimensional reductions calculated: pca, tsne

I hope this helps implementing a fix into the main tree. Perhaps there is some reason I cannot see that prevents this fix from working. In that case I apologize. Just wanted to get this from my mind if possible!

@AlbertoFabbri93
Copy link

Is there anything preventing this fix from being merged?

@dcollins15
Copy link
Contributor

dcollins15 commented Dec 13, 2024

@ddiez thank you for the bug fix and apologies for the long delay in addressing this PR! With CI checks now properly configured, we aim to be much more responsive to pull requests going forward.

As soon as #9544 is approved and merged, checks should pass, and this PR will be next in line for merging. 👍

Before that (but after #9544 is merged), there are a couple of housekeeping items to take care of:

  1. Rebase ddiez:fix_DietSeuratV5 onto satijalab:develop
  2. Bump the package version (5.1.0.9008 -> 5.1.0.9009)

Additionally, a simple unit test for DietSeurat would be a great addition. If you have time, please consider adding one (e.g., in tests/testthat/test_diet_seurat.R). If not, no worries—I can add it myself soon. 🤓

One other thought—I have an inkling that we may need to consider the value of options(Seurat.object.assay.version) options("Seurat.object.assay.brackets") for operations like this, though I hope I'm wrong.

The PR's one-year anniversary is Monday—let's aim to celebrate by merging it in! 👶 🎂 🥳 Thanks again for your contribution and patience.

@ddiez
Copy link
Contributor Author

ddiez commented Dec 15, 2024

Thanks @dcollins15 for the update and happy to know it is in the process of being merged. Next week I am busy until Thursday and won't be able to do anything about this. I can do it from Thursday or Friday if by then #9544 has been merged (not yet at the moment it seems). I can take care of the changes, I think. One question, for the rebase, is it enough if I use the "sync fork" button on top of my fork of the repository?

I may try to add a unit test too. I have to take a look first at a few of existing examples first to try to follow the same style as much as possible.

Regarding version, I may need to check again the code in the PR, as I cannot remember much about it, at the moment 😥But I remember seeing some cases before in which doing things depending on the version of the object was necessary. I see there is also the option to use Version(x) to obtain a more fine-grained version number. Would not that be preferred to reading the value in options?

@ddiez
Copy link
Contributor Author

ddiez commented Dec 16, 2024

Oh, for the rebase maybe I can use Update with rebase button at the bottom of this PR?

@dcollins15
Copy link
Contributor

Thanks for the prompt reply @ddiez!

Oh, for the rebase maybe I can use Update with rebase button at the bottom of this PR?

That's the one 🚀

My original message had a typo in it, I meant options("Seurat.object.assay.brackets") not options(Seurat.object.assay.version). The former value controls the behavior of single brackets for Assay and Assay5 instances such that:

library(Seurat)


data(pbmc_small)
test_case <- pbmc_small

options(Seurat.object.assay.brackets = "v5")
# Returns the assay's `counts` layer.
test_case[["RNA"]]["counts"]

options(Seurat.object.assay.brackets = "v3")
# Returns the specified gene values from the assay's default layer.
test_case[["RNA"]]["MS4A1"]

Weirdly, the option does not affect the behavior of single brackets when used as a setter:

options(Seurat.object.assay.brackets = "v5")
# Drops the assay's `counts` layer.
test_case[["RNA"]]["counts"] <- NULL

test_case <- pbmc_small
options(Seurat.object.assay.brackets = "v3")
# Still drops the assay's `counts` layer.
test_case[["RNA"]]["counts"] <- NULL

Which is all to say, your fix works as expected 🚀

I'm most of the way through writing a comprehensive set of unit tests for DietSeurat. Unfortunately, the process has unearthed an additional bug—it is not currently possible to drop the data layer from a v3 Assay instance. The issue is reproduced in the test script below:

library(testthat)
library(Seurat)


# A test fixture to generate a random counts matrix. Column names are prefixed
# with "Cell" and then feature a zero-padded index, e.g. "Cell05". Row names
# are formatted similarly except with a "Gene" prefix, e.g. "Gene08".
get_random_counts <- function(
    n_cells = 10,
    n_genes = 10,
    min_count = 0,
    max_count = 100
) {
  # Generate a random matrix with values between `min_count` and `max_count`.
  raw_counts <- matrix(
    sample(min_count:max_count, n_genes * n_cells, replace = TRUE),
    nrow = n_genes,
    ncol = n_cells
  )
  # Set the appropriate column and row names.
  colnames(raw_counts) <- sprintf("Cell%0*d", nchar(n_cells), seq_len(n_cells))
  rownames(raw_counts) <- sprintf("Cell%0*d", nchar(n_genes), seq_len(n_genes))
  # Convert the counts to a `dgCMatrix`.
  counts <- as.sparse(as.matrix(raw_counts))
  
  return(counts)
}

test_that("`DietSeurat` can drop the `data` layer from an `Assay` instance", {
  counts <- get_random_counts()
  
  assay <- CreateAssayObject(counts)
  assay@data <- NormalizeData(counts)
  assay@scale.data <- ScaleData(assay@data)
  test_case <- CreateSeuratObject(assay)
  
  # Try to retain just the `counts` layer.
  # ToDo: Determine why this is failing.
  layer_names <- "counts"
  result <- DietSeurat(test_case, layers = layer_names)
  expect_equal(Layers(result), layer_names)
  
  # Try to drop the `data` layer.
  # ToDo: Determine why this is failing.
  layer_names <- c("counts", "scale.data")
  result <- DietSeurat(test_case, layers = )
  expect_equal(Layers(result), layer_names)

})

Running the script yields:

── Failure (test_diet_seurat.R:37:3): `DietSeurat` can drop the `data` layer from an `Assay` instance ── 
Layers(result) not equal to `layer_names`.
Lengths differ: 2 is not 1

── Failure (test_diet_seurat.R:43:3): `DietSeurat` can drop the `data` layer from an `Assay` instance ── 
Layers(result) not equal to `layer_names`.
Lengths differ: 3 is not 2
[ FAIL 2 | WARN 0 | SKIP 0 | PASS 0 ]

That is all to say, I'll need to follow up with another fix anyway, so I can introduce unit tests then 👌 As soon as I'm unblocked on #9544 we can shepherd this one on home 🐑 (I think I have commit access to your branch, in which case I'm happy to take care of the actual button pressing)

Copy link
Contributor

@dcollins15 dcollins15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope you don't mind that I went ahead and took care of those final two ToDos—I was pretty keen on finishing this PR up on its anniversary 🥳

Thanks again for the fix! I'll tag you in any follow-up PRs I open, in case you're interested 🙂

@dcollins15 dcollins15 merged commit af3cbc2 into satijalab:develop Dec 17, 2024
2 checks passed
@ddiez
Copy link
Contributor Author

ddiez commented Jan 7, 2025

@dcollins15 thanks a lot for taking care of this and sorry for the late reply. I would be happy to get tagged here in case of additional changes needed.

Regarding v3 objects, I reached the same conclusion when briefly testing this previously. I may be wrong in my understanding, but I think that this is by design (?). In v3 objects, even when you create an object with CreateSeuratObject and pass the matrix of counts, the data slot is populated with the raw counts. The data slot is then modified upon applying NormalizeData. When v5 objects came to be they added more flexibility by allowing, in principle, arbitrary layers. And when you create a v5 object (or more precisely a v5 assay object) the data layer is not created by default. It is only created when running NormalizeData. So perhaps it was never intended for data slots to be removed in v3 objects.

I am not sure I understand the need for Seurat.object.assay.brackets option. I thought the code to handle the behavior depends on the implementation in SeuratObject package. And it was the change in that package that caused the problem here. So, any changes that depended on whether v3 or v5 objects are used, would have to be handled there. But perhaps this only indicates my shallow understanding of the code internals.

@AlbertoFabbri93
Copy link

@dcollins15 I think you forgot to mention this fix in the release notes for Seurat 5.2.0

dcollins15 added a commit that referenced this pull request Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants