-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Label projection] scANVI task sees test cells during training #771
Comments
scANVI actually doesn't see the test set labels. The way this is implemented it's a semi-supervised method see:
|
I agree, it doesn't see the labels, but it sees the cells. Does my concern make sense? |
There are 2 modes of operation for label projection task, I guess:
I assumed this task was covering only the 1st type, but it doesn't have to. However, I'd like to make clear distinction between the two types, and I think that current scANVI method implementation falls into the second category |
I agree we should separate these things. For example, de novo integration with scanorama on all data and then training a classifier on training embeddings would fall into (2) in this case. |
As far as I'm concerned the models have to be able to use the expression data for the test cells to predict their labels. Whether you use these for semi-supervised or purely supervised is up to you -- I can't imagine a setting where you wouldn't be able to do it semi-supervised. |
Agree with @scottgigante-immunai, I actually think this is fine... the benchmark is an evaluation of how these tools would be used from the user perspective. When using scANVI alone for new data annotation, you need to train on both train+test to annotate a new dataset... the only alternative i can think of is using a forward pass on the most proximal batch, which is a non-realistic modus operandi for not seeing the test batch in training. |
In scArches+scANVI, dataset is split into train/test, and only the train part is used to train scVI/scANVI model: https://github.com/openproblems-bio/openproblems/blob/main/openproblems/tasks/label_projection/methods/scvi_tools.py#L102
In contrast, scANVI method trains on all cells: https://github.com/openproblems-bio/openproblems/blob/main/openproblems/tasks/label_projection/methods/scvi_tools.py#L68
If nobody has objections, I will make scANVI method train only on train part of the data, and then get the latent dimensions/predict for test.
cc @LuckyMD @adamgayoso
The text was updated successfully, but these errors were encountered: