-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cell Typist Providing different results between iterations #126
Comments
@ManuelSokolov, the training process involves various sources of randomness. For example, the first round of training uses SGD which will shuffle the data before each epoch starts and therefore create randomness. If you want to have a stable model, a better way is to increase the number of iterations during training (e.g., max_iter = 2000) at the cost of longer runtime. |
@ChuanXu1 thank you for your response, the SGD flag is by default set to False so the randomness should not exist. Is there any other reason that can be driving this randomness - disabling feature selection when training seem to have disabled randomness in the model. |
@ManuelSokolov, the first round of training always use SGD. |
Sorry @ChuanXu1 you seem to have responded before I edited the response, disabling feature selection seemes to have stabilized the results however difficult to know what is right/wrong, please see message above |
@ManuelSokolov, it is usually recommended to use feature selection to speed up the run and increase the accuracy. |
In this case seems to be reducing accuracy by providing different results across iterations. I also looked into the annotate method and it does standard scalling before classifications, and this option cannot be set to false. What is your recommendation given this example? |
Hi! I am doing label transfer from reference dataset and classifying two query sets that should contain exactly same cell types. I noticed that running across several iterations the classifications would be different each iterations.
As you can see in next plot I plotted for each sample (rows) the percentages of predicted cell types per sample (e.g for first sample in the graph, from the 25 iterations of cell types it got classifed 40% of the times as radial glia and 60% of the times as glioblast.
Is this behaviour expected/documented for cell typist ? What is recommended to do in this case?
Best Regards,
Manuel
The text was updated successfully, but these errors were encountered: