Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

celltypist before/after batch correction #119

Open
malonzm1 opened this issue May 2, 2024 · 10 comments
Open

celltypist before/after batch correction #119

malonzm1 opened this issue May 2, 2024 · 10 comments

Comments

@malonzm1
Copy link

malonzm1 commented May 2, 2024

Hi,

I perform batch correction using scVI. But I perform celltypist prediction before batch correction. Is it better to perform celltypist after batch correction or it doesn't matter?

Good day.

@ChuanXu1
Copy link
Collaborator

ChuanXu1 commented May 3, 2024

@malonzm1, predicted_labels is only dependent on gene expression matrix, but majority_voting will be influenced by the neighborhood graph if it is constructed from scVI latent space.

@malonzm1
Copy link
Author

malonzm1 commented May 3, 2024

Thanks!

@malonzm1 malonzm1 closed this as completed May 3, 2024
@malonzm1
Copy link
Author

malonzm1 commented May 8, 2024

Is majority_voting more reliable if celltypist is run after batch correction?

@malonzm1 malonzm1 reopened this May 8, 2024
@ChuanXu1
Copy link
Collaborator

ChuanXu1 commented May 8, 2024

@malonzm1, depends, but majority_voting is usually more readable.

@smallsmalltown
Copy link

@ChuanXu1 Based on what you've described, it seems that batch effects will not impact the predicted_labels, but they can influence the majority_voting results??? After applying harmony to remove batch effects, my data also encountered the issue of "Invalid expression matrix in .X, expect log1p normalized expression to 10000 counts per cell; will use .raw.X instead."

@ChuanXu1
Copy link
Collaborator

@smallsmalltown, as I remember, Harmony will not change the expression values but produce only the corrected latent space. To predict your data using CellTypist, you need to provide a normalized gene expression in either .X or .raw.X.

@Flu09
Copy link

Flu09 commented Aug 8, 2024

@ChuanXu1 Can you explain more about the latent space idea and harmony?. If I integrated using harmony in R then converted my object to h5ad then provided celltypist with the normalized .X of it, what would be better predicted_labels or majority voting? will celltypist use the latent space of the samples at all?

@ChuanXu1
Copy link
Collaborator

ChuanXu1 commented Aug 8, 2024

@Flu09, celltypist does not use the latent space to predict cell types, namely, the predicted_labels is independent from the latent space. The majority_voting however may be impacted by the latent space as the majority voting result relies on the clustering, which is influenced by the latent space.

@Flu09
Copy link

Flu09 commented Aug 17, 2024

I see thank you but if i will combine two studies and i noticed that the overall counts in one study are fewer than the other. should the annotation by celltypist be done on each study alone.

@ChuanXu1
Copy link
Collaborator

@Flu09, it's safer to do this for each dataset separately to ensure sufficient gene overlap between your data and the model used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants