Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why we need to ensure the data format is adjusted with 10,000 for normalization correctness? #138

Open
HelloWorldLTY opened this issue Oct 10, 2024 · 1 comment

Comments

@HelloWorldLTY
Copy link

Hi, thanks for your great work! I notice that celltypist requires our input data to be normalized with 10,000 per cell counts, but it seems that when computing the loss, celltypist does not have any assumption of input data, is it only a result from empricial experiments? Can I directly input pca for prediction?

Moreover, starting from line 309 in celltypist/classifier.py, it seems that we only consider adata[0:1000]. How do you determine 1000? Thanks a lot.

@ChuanXu1
Copy link
Collaborator

@HelloWorldLTY, CellTypist records the scaling parameters (mean and sd) for each gene in the training dataset, and then re-applies them to the query data. 10,000 is an arbitrary choice, but CellTypist has to make sure these parameters which are derived from 10,000-based data will be applied to the same 10,000-based data.

For the second question, it's just a quick way to check whether the data contains negative values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants