the count matrices don't contain counts #133

jkobject · 2024-01-31T16:02:38Z

Report

Hello,

I am seeing that the benchmark datasets used mention that you are taking the "counts" layer from these datasets. However, when looking at this layer I see values being floats instead of ints. Meaning that they are not counts.

The tool I want to benchmark only takes count matrices.

How should I get the count data?

Version information

No response

jkobject · 2024-02-05T09:59:44Z

to verify that they don't contain counts: do

adata = sc.read(
    "data/lung_atlas.h5ad",
    backup_url="https://figshare.com/ndownloader/files/24539942",
)
adata.layers['counts'].sum()

same thing for the pancreas dataset.

adamgayoso · 2024-02-05T14:20:06Z

Just having a float dtype does not imply that they are not count data. Most datasets are stored in a float32 format.

For that particular dataset, I would encourage you to read the original scib paper methods section.

If you're using a tool like scVI, it would technically work on data with decimals, like (1.03). The question is whether the non-count data are meant to represent count data. For example, pseudoaligners can provide probabilistic count values.

jkobject · 2024-02-05T15:05:55Z

Hello Adam,

Thanks for the reply. I understand that even raw counts are often stored as float32, but here I see that some of the datasets used in this combined dataset have values that are not raw counts (meaning data with decimals).

I have not worked with probabilistic raw counts before. Are you saying that this is the reason why most of the 10x samples have decimal values?

Reading the methods section. It is saying that some datasets were unavailable as raw counts and they used the rpkm or tpms: so the counts also contain normalized data?

I am not sure how to continue with it if the data is depth normalized. I am working with my own model that is assuming that the counts are true counts..

jkobject added the bug Something isn't working label Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the count matrices don't contain counts #133

the count matrices don't contain counts #133

jkobject commented Jan 31, 2024

jkobject commented Feb 5, 2024

adamgayoso commented Feb 5, 2024

jkobject commented Feb 5, 2024 •

edited

Loading

the count matrices don't contain counts #133

the count matrices don't contain counts #133

Comments

jkobject commented Jan 31, 2024

Report

Version information

jkobject commented Feb 5, 2024

adamgayoso commented Feb 5, 2024

jkobject commented Feb 5, 2024 • edited Loading

jkobject commented Feb 5, 2024 •

edited

Loading