Concern regarding method use for Xenium data #203

LPotter21 · 2024-12-11T00:25:51Z

Hello all,

We have recently begun working with 10x Xenium data, and have been comparing normalization methods for our pipeline. We have noticed oddities in how SCTransform behaves for the data in comparison to traditional scRNA-seq data. These data make us doubt the appropriateness of SCTransform for Xenium data, so we wanted to reach out to see your opinion.

The SCTransform adds a few columns to the Seurat object metadata including nCount_SCT. According to our understanding, nCount_SCT represents the total "normalized counts" for each cell, and contrasts nicely with the raw counts (nCount_RNA for scRNA-seq, and nCount_Xenium for 10x Xenium).

Plotting the raw counts (nCount_RNA) vs the nCount_SCT allows for a high-level comparison of how the model transformed the counts across cells.

Using scRNA-seq data from your vignette (replicated as well using our own scRNA-seq experiments) yields a pattern similar to this:

However, using Xenium data, also from your vignette, we see a stratified set of distributions

This issue is even stronger within our own data, with some samples showing more distinct separation within nCount_SCT.

When you look into spatial plotting, you can see even more strongly the concern.

There is a grid-like pattern within the physical image data post-SCTransformation, seemingly associated with the different "strata" in the SCT counts seen above. We see similar and stronger patterns within our own data following the same methodology.

This clearly cannot represent biological variation, given the patterning, and so we hope that you can provide some insight into whether this data is expected, and if so, why?

Lastly, when looking into the counts for specific genes, we saw that 0-count genes were given non-0 values following SCTransform as well. While this makes sense conceptually for scRNA-seq, we are unsure whether such count abundance estimates are appropriate for Xenium, as an image and in-situ hybridization-based technology.

Please let us know your thoughts on this as well. Thank you

The text was updated successfully, but these errors were encountered:

saketkc · 2024-12-22T18:03:13Z

Hi @LPotter21, thanks for the question. nCount_SCT is the sum total of corrected counts after SCT normalization.
When we calculate the corrected counts, we ‘reverse’ the regression model. In short, with the a) person residuals and b) the regularized model estimates, we reverse the regression model to estimate the per cell per gene counts. When you do this, you also need to tell the model what is the sequencing depth of the cell and since the goal is to obtain these values where the sequencing depth has been accounted for, we use the median UMI as a reasonable estimate of depth (i.e. the corrected counts are calculated per gene per cell assuming all cells have been sequenced to median depth with no constraints on the final corrected sequencing depth). This is a reasonable estimate as we observed higher TPR (controlling for the same FDR) in downstream DE analysis (which we also show in the v2 paper).

The goal post normalization is to compare one gene across cells and not the total counts post normalization. What is your rationale of using nCount_SCT for comparison?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concern regarding method use for Xenium data #203

Concern regarding method use for Xenium data #203

LPotter21 commented Dec 11, 2024

saketkc commented Dec 22, 2024

Concern regarding method use for Xenium data #203

Concern regarding method use for Xenium data #203

Comments

LPotter21 commented Dec 11, 2024

saketkc commented Dec 22, 2024