-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leiden Clustering produces too many clusters #3749
Comments
@Intron7 The way the Leiden algorithm works - the nodes in a graph move to form temporary clusters (similar to modularity optimization phase in Louvain algorithm) that maximize the modularity. Afterwards, in the nodes within each such temporary clusters are checked if they are strongly connected (according to Leiden algorithm). If not, then the nodes in a temporary clusters are not merged to from aggregated clusters. One can change the thresholds so that relatively weakly-connected nodes keep forming Leiden clusters. |
I know that Leiden and Louvain produce different clusterings. But when the number of Leiden clusters goes up from 50 to over 1000 from 23.04 to 23.06 I think there might be some unintentional changes. In my testing it looks like the issue has to do with |
Thank you for sharing nice plots. Could you try to run with smaller resolution? |
Thank you for sharing additional plots. Is your dataset public? If so, we would like to run it on our end to figure out a bit more on it. |
https://github.com/Intron7/rapids_singlecell/blob/main/notebooks/demo_gpu-seuratv3.ipynb It's this notebook. I can also create the anndata object needed (with all the preprocessing done) and upload it to googledrive and provide the link |
It would be great if could upload kindly upload the data (that are input the Leiden algorithm) on google drive. I would like to run it locally. |
I also have the same issue with my data that is separate. Solution for leiden clustering would be super helpful! |
@johnhickey22 - which version of cugraph are you using? |
@ChuckHastings - I am using 23.06.02 - let me know if you need anything else. |
@Intron7 |
You can already see this behaviour with
The CPU version of Leiden gives me 9 Clusters. The Bug still exists in Rapids-23.08 |
Any update here - was it solved in another thread? Just wanted to check in, in case I missed something. Would love to integrate this into my analysis pipeline and just waiting for this clustering issue to be solved. Thanks! |
Hi, |
Hi, |
- Normalization factor was missing in the equation to decide if a node and a refined community is strongly connected inside their Louvain community. This PR adds that factor. - Disable random moves in the refinement phase. We plan to expose a flag to enable/disable random moves in a future PR. - Adds new function to flatten Leiden dendrogram as dendrogram flattening process needs additional info to unroll hierarchical leiden clustering Closes #3850 Closes #3749 Authors: - Naim (https://github.com/naimnv) - Alex Barghi (https://github.com/alexbarghi-nv) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Seunghwa Kang (https://github.com/seunghwak) - Brad Rees (https://github.com/BradReesWork) URL: #3990
Looks good to me now |
@Intron7 Thank you for reporting. |
Version
23.06 - 23.06.02
Which installation method(s) does this occur on?
Conda
Describe the bug.
Leiden Clustering produces over 1000 Clusters. This is in contrast to 23.04 where I got around 30-40 for the same test dataset. Louvain Clustering give me 23 clusters.
Minimum reproducible example
Relevant log output
No response
Environment details
Other/Misc.
This happens on 4 tested systems with both 3090 and A100s
Code of Conduct
The text was updated successfully, but these errors were encountered: