Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple models #112

Open
anke-king opened this issue Mar 22, 2024 · 7 comments
Open

multiple models #112

anke-king opened this issue Mar 22, 2024 · 7 comments

Comments

@anke-king
Copy link

I would like to train cell typist on different data sets. Should I merge the 2 data sets and train the model once or train 2 models and do the annotation twice?

@ChuanXu1
Copy link
Collaborator

@anke-king, if you train them separately, you will get two independent models. If you want to combine them for training, you have to unify their annotations to make cell type names consistent. Both approaches are feasible (I personally prefer the former as it's quicker and it's intuitive to check the consistency of predictions from two datasets).

@anke-king
Copy link
Author

Thank your for your reply!
Just for clarification: I have one data set with cell types for training and a second data set with cell typest which are not in the first data set. In my target data set (which I want to annotate with my custom model) I expect to see cell typest from both data sets.
So if I do the former, should I do the annotation twice and select the cell type based on the confidence score or how would I get the consensus annotation?

Thanks!!

@ChuanXu1
Copy link
Collaborator

@anke-king, if the cell types in the first and second training datasets are totally different, you can combine them and train a single model. For the confidence scores, they are not comparable across two different models; so if you use two models, you need to inspect separately (celltypist.dotplot will be useful most times), and judge by your knowledge.

@ManuelSokolov
Copy link

Hello! After doing the recommended suggestion, how do you recommend plotting the UMAP? In my particular case I have two datasets that should contain the same three cell types but for 24 hours and 72 hours.
My current pipeline is:
`

  1. read 24dataset --> normalize --> classify with celltypist
  2. read 72dataset --> normalize --> classify with celltypist
  3. combine normalized 24h and 72h and apply sc.pp.combat using key 'dataset' (dataset variable is 24h or 72h)
  4. See the combined umap
    `

@ChuanXu1
Copy link
Collaborator

@ManuelSokolov, you can try different integration methods for these two datasets and see how the celltypist predictions are overlaid on the umap.

@ManuelSokolov
Copy link

@ChuanXu1 if I understand correctly you mean:

  1. Integrating 24h and 72h with the reference by dataset (using integration method) to object X
  2. After integrating extract reference from X and use for training
  3. Extract 24h from X --> clasify cell typist
  4. Extract 72h from X --> classify cell typist

@ChuanXu1
Copy link
Collaborator

@ManuelSokolov, the first step is independent from the remaining three. You can annotate your data using CellTypist and add prediction columns in .obs. After that, you shall integrate your datasets by trying different methods (harmony, scVI, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants