Cell Typist Providing different results between iterations #126

ManuelSokolov · 2024-07-26T11:47:56Z

Hi! I am doing label transfer from reference dataset and classifying two query sets that should contain exactly same cell types. I noticed that running across several iterations the classifications would be different each iterations.

reference = sc.read_h5ad("data/combined_ref.h5ad")
query1 = sc.read_h5ad("querys/unnorm_sc_C32-24h.h5ad")
query2 = sc.read_h5ad("querys/unnorm_sc_C32-72h.h5ad")

sc.pp.normalize_total(query1, target_sum=1e4)
sc.pp.log1p(query1)

sc.pp.normalize_total(query2, target_sum=1e4)
sc.pp.log1p(query2)

sc.pp.normalize_total(reference, target_sum=1e4)
sc.pp.log1p(reference)

predictions24h = pd.DataFrame()
predictions72h = pd.DataFrame()
predictions24h['id'] = list(query1.obs_names)
predictions72h['id'] = list(query2.obs_names)

features =[]

for i in range(25):
    print(f"iteration{i}")
    model2 = celltypist.train(reference,labels = 'CellClass', n_jobs = 10, feature_selection = True)
    if i == 0:
        features = model2.features
    extracted = model2.features
    features = list(set(extracted) & set(features))  
    prediction_query1 = celltypist.annotate(query1, model = model2, majority_voting=True)
    prediction_query2 = celltypist.annotate(query2, model = model2, majority_voting=True)
    adata2_query1 = prediction_query1.to_adata()
    adata2_query2 = prediction_query2.to_adata()
    predictions24h[f'run{i}'] = list(prediction_query1.predicted_labels.majority_voting)
    predictions72h[f'run{i}'] = list(prediction_query2.predicted_labels.majority_voting)

As you can see in next plot I plotted for each sample (rows) the percentages of predicted cell types per sample (e.g for first sample in the graph, from the 25 iterations of cell types it got classifed 40% of the times as radial glia and 60% of the times as glioblast.

Is this behaviour expected/documented for cell typist ? What is recommended to do in this case?

Best Regards,

Manuel

The text was updated successfully, but these errors were encountered:

ChuanXu1 · 2024-07-28T21:11:05Z

@ManuelSokolov, the training process involves various sources of randomness. For example, the first round of training uses SGD which will shuffle the data before each epoch starts and therefore create randomness. If you want to have a stable model, a better way is to increase the number of iterations during training (e.g., max_iter = 2000) at the cost of longer runtime.

ManuelSokolov · 2024-07-28T21:16:17Z

@ChuanXu1 thank you for your response, the SGD flag is by default set to False so the randomness should not exist. Is there any other reason that can be driving this randomness - disabling feature selection when training seem to have disabled randomness in the model.
Also, my goal in addition to stability is to obtain correct results - a model that classifies wrongly with high confidence scores is not helpfull in this case (the UMAP below shows the result of one iteration)

If I disable feature selection the result will always be same:

However, since the results with and without feature selection seem to be completely different, I am not sure if I can trust the model - can you please comment on this?

ChuanXu1 · 2024-07-28T21:18:06Z

@ManuelSokolov, the first round of training always use SGD. use_SGD = False (the default) is intended for the 2nd round of training after feature selection.

ManuelSokolov · 2024-07-28T21:28:41Z

Sorry @ChuanXu1 you seem to have responded before I edited the response, disabling feature selection seemes to have stabilized the results however difficult to know what is right/wrong, please see message above

ChuanXu1 · 2024-07-28T21:35:54Z

@ManuelSokolov, it is usually recommended to use feature selection to speed up the run and increase the accuracy.

ManuelSokolov · 2024-07-28T21:40:35Z

In this case seems to be reducing accuracy by providing different results across iterations. I also looked into the annotate method and it does standard scalling before classifications, and this option cannot be set to false. What is your recommendation given this example?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cell Typist Providing different results between iterations #126

Cell Typist Providing different results between iterations #126

ManuelSokolov commented Jul 26, 2024 •

edited

Loading

ChuanXu1 commented Jul 28, 2024

ManuelSokolov commented Jul 28, 2024 •

edited

Loading

ChuanXu1 commented Jul 28, 2024

ManuelSokolov commented Jul 28, 2024 •

edited

Loading

ChuanXu1 commented Jul 28, 2024

ManuelSokolov commented Jul 28, 2024 •

edited

Loading

Cell Typist Providing different results between iterations #126

Cell Typist Providing different results between iterations #126

Comments

ManuelSokolov commented Jul 26, 2024 • edited Loading

ChuanXu1 commented Jul 28, 2024

ManuelSokolov commented Jul 28, 2024 • edited Loading

ChuanXu1 commented Jul 28, 2024

ManuelSokolov commented Jul 28, 2024 • edited Loading

ChuanXu1 commented Jul 28, 2024

ManuelSokolov commented Jul 28, 2024 • edited Loading

ManuelSokolov commented Jul 26, 2024 •

edited

Loading

ManuelSokolov commented Jul 28, 2024 •

edited

Loading

ManuelSokolov commented Jul 28, 2024 •

edited

Loading

ManuelSokolov commented Jul 28, 2024 •

edited

Loading