Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not detect a neighborhood graph, will construct one before the over-clustering #111

Open
Sirin24 opened this issue Mar 20, 2024 · 1 comment

Comments

@Sirin24
Copy link

Sirin24 commented Mar 20, 2024

sorry for the very basic questions but i was following this for a custom reference https://colab.research.google.com/github/Teichlab/celltypist/blob/main/docs/notebook/celltypist_tutorial_cv.ipynb#scrollTo=therapeutic-mixture

  1. predictions = celltypist.annotate(adata, model = new_model, majority_voting = True)
    I received the following warning:
    Can not detect a neighborhood graph, will construct one before the over-clustering
    and now it is running for a long time. what did i need to do other than normalizing and log ? also after finishing i compared the resulting umap of the reference to standard scanpy preprossing without celltypist and the umaps are different? what is going on under the hood?

2)Another question I have is if the labels argument can take multiple labels because the reference has major_celltype and a more specific_celltype annotation so how can I transfer both?
new_model = celltypist.train(adata, labels = 'major_celltype', n_jobs = 10, feature_selection = True)

3)what does n_jobs mean?

  1. regarding this part "Overall, we suggest the users to perform their own feature selection before training to alleviate the training burden." i already have a list of markers for each cell type how to use it ? can celltypist can be used on a list not a reference?

  2. my last question is about the dotplot of original labels and predicted or majority_vote labels, what does it mean to have few blue dots, that the model is weak in predicting these cell types?

@ChuanXu1
Copy link
Collaborator

@Sirin24,

  1. CellTypist performs an over-clustering step. If you already have a neighborhood graph in place (i.e., from sc.pp.neighbors), CellTypist will use it; otherwise, a standard Scanpy protocol will be run to construct one. The long runtime is caused by this step. You can set majority_voting = False to skip the majority voting step, or supply your own neighborhood graph calculated in advance for over clustering.
  2. You need to train two models separately.
  3. Number of cpus for one-vs-rest logistic regression training (each cell type takes one cpu).
  4. You can union markers from each cell type and supply the resulting gene list for training. A similar issue is here Running the CellTypist training function celltypist.train on a subset of genes #107
  5. Blue means a probability of <0.5. For example, if you use a blood reference to predict brain cells, all microglia (100%, big dot in the dot plot) in the brain will be assigned to macrophages in the blood as this is the best guess; however, the probability will be low.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants