Why we need to ensure the data format is adjusted with 10,000 for normalization correctness? #138

HelloWorldLTY · 2024-10-10T21:32:28Z

Hi, thanks for your great work! I notice that celltypist requires our input data to be normalized with 10,000 per cell counts, but it seems that when computing the loss, celltypist does not have any assumption of input data, is it only a result from empricial experiments? Can I directly input pca for prediction?

Moreover, starting from line 309 in celltypist/classifier.py, it seems that we only consider adata[0:1000]. How do you determine 1000? Thanks a lot.

ChuanXu1 · 2024-10-12T21:54:33Z

@HelloWorldLTY, CellTypist records the scaling parameters (mean and sd) for each gene in the training dataset, and then re-applies them to the query data. 10,000 is an arbitrary choice, but CellTypist has to make sure these parameters which are derived from 10,000-based data will be applied to the same 10,000-based data.

For the second question, it's just a quick way to check whether the data contains negative values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why we need to ensure the data format is adjusted with 10,000 for normalization correctness? #138

Why we need to ensure the data format is adjusted with 10,000 for normalization correctness? #138

HelloWorldLTY commented Oct 10, 2024

ChuanXu1 commented Oct 12, 2024

Why we need to ensure the data format is adjusted with 10,000 for normalization correctness? #138

Why we need to ensure the data format is adjusted with 10,000 for normalization correctness? #138

Comments

HelloWorldLTY commented Oct 10, 2024

ChuanXu1 commented Oct 12, 2024