multiple models #112

anke-king · 2024-03-22T15:53:02Z

I would like to train cell typist on different data sets. Should I merge the 2 data sets and train the model once or train 2 models and do the annotation twice?

ChuanXu1 · 2024-03-25T22:59:56Z

@anke-king, if you train them separately, you will get two independent models. If you want to combine them for training, you have to unify their annotations to make cell type names consistent. Both approaches are feasible (I personally prefer the former as it's quicker and it's intuitive to check the consistency of predictions from two datasets).

anke-king · 2024-03-26T12:16:14Z

Thank your for your reply!
Just for clarification: I have one data set with cell types for training and a second data set with cell typest which are not in the first data set. In my target data set (which I want to annotate with my custom model) I expect to see cell typest from both data sets.
So if I do the former, should I do the annotation twice and select the cell type based on the confidence score or how would I get the consensus annotation?

Thanks!!

ChuanXu1 · 2024-03-30T11:22:43Z

@anke-king, if the cell types in the first and second training datasets are totally different, you can combine them and train a single model. For the confidence scores, they are not comparable across two different models; so if you use two models, you need to inspect separately (celltypist.dotplot will be useful most times), and judge by your knowledge.

ManuelSokolov · 2024-07-26T11:42:29Z

Hello! After doing the recommended suggestion, how do you recommend plotting the UMAP? In my particular case I have two datasets that should contain the same three cell types but for 24 hours and 72 hours.
My current pipeline is:
`

read 24dataset --> normalize --> classify with celltypist
read 72dataset --> normalize --> classify with celltypist
combine normalized 24h and 72h and apply sc.pp.combat using key 'dataset' (dataset variable is 24h or 72h)
See the combined umap
`

ChuanXu1 · 2024-07-28T21:02:34Z

@ManuelSokolov, you can try different integration methods for these two datasets and see how the celltypist predictions are overlaid on the umap.

ManuelSokolov · 2024-07-28T21:09:01Z

@ChuanXu1 if I understand correctly you mean:

Integrating 24h and 72h with the reference by dataset (using integration method) to object X
After integrating extract reference from X and use for training
Extract 24h from X --> clasify cell typist
Extract 72h from X --> classify cell typist

ChuanXu1 · 2024-07-28T21:24:59Z

@ManuelSokolov, the first step is independent from the remaining three. You can annotate your data using CellTypist and add prediction columns in .obs. After that, you shall integrate your datasets by trying different methods (harmony, scVI, etc.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiple models #112

multiple models #112

anke-king commented Mar 22, 2024

ChuanXu1 commented Mar 25, 2024

anke-king commented Mar 26, 2024

ChuanXu1 commented Mar 30, 2024

ManuelSokolov commented Jul 26, 2024

ChuanXu1 commented Jul 28, 2024

ManuelSokolov commented Jul 28, 2024

ChuanXu1 commented Jul 28, 2024

multiple models #112

multiple models #112

Comments

anke-king commented Mar 22, 2024

ChuanXu1 commented Mar 25, 2024

anke-king commented Mar 26, 2024

ChuanXu1 commented Mar 30, 2024

ManuelSokolov commented Jul 26, 2024

ChuanXu1 commented Jul 28, 2024

ManuelSokolov commented Jul 28, 2024

ChuanXu1 commented Jul 28, 2024