Switch from leiden to leidenbase #6792

alanocallaghan · 2022-12-16T20:01:47Z

The Leiden implementation provided by leiden is absurdly slow, way more than I'd expect from the overhead of calling reticulate. This switches for the (mostly) equivalent leidenbase. Tests pass locally for me, see also the demo below comparing 3 Leiden implementations.

See #6754

library("leidenbase")
library("leidenAlg")
library("leiden")
library("igraph")
library("microbenchmark")

fpath <- system.file('testdata', 'igraph_n1500_edgelist.txt.gz', package='leidenbase')
zfp <- gzfile(fpath)
igraph <- read_graph(file = zfp, format='edgelist')

microbenchmark(
    leidenAlg::leiden.community(igraph),
    leidenbase::leiden_find_partition(igraph),
    leiden::leiden(igraph),
    times = 10
)
# Unit: milliseconds
#                           expr         min          lq        mean      median
#       leiden.community(igraph)    52.98505    55.16530    63.16430    55.95267
#  leiden_find_partition(igraph)    29.22238    30.54163    39.15174    37.57693
#                 leiden(igraph) 17140.52418 20671.12398 21172.11517 21128.94100
#           uq         max neval cld
#     75.38524    80.10432    10  a 
#     40.30238    56.91094    10  a 
#  22268.67866 24597.89747    10   b


r1 <- leiden_find_partition(igraph,
    partition_type = "RBConfigurationVertexPartition",
    seed = 1234,
    resolution_parameter = 1
    
)
r2 <- leiden(igraph,
    partition_type = "RBConfigurationVertexPartition",
    resolution_parameter = 1, 
    seed = 1234
)
table(r1$membership, r2)
#       1   2   3   4   5   6   7   8   9
#   1 295   0   0   0   0   0   0   0   0
#   2   0 209   0   0   0   0   0   0   0
#   3   0   0 201   0   0   0   0   0   0
#   4   0   0   0 191   0   0   0   0   0
#   5   0   0   0   0 175   0   0   0   0
#   6   0   0   0   0   0 170   0   0   0
#   7   0   0   0   0   0   0  93   0   0
#   8   0   0   0   0   0   0   0  85   0
#   9   0   0   0   0   0   0   0   0  81

szhorvat · 2022-12-20T07:56:59Z

If you are looking to reduce overhead to a minimum, verify whether you are relying on any feature not already available in igraph::cluster_leiden(), and if not, just use that function. The various "partition type" settings are reproducible by setting the vertex_weight values appropriately. See the general form of the objective function in the documentation, where $n$ represents the vertex weigths, and compare with the objective functions of the various "partition types". For example, setting $n_i = k_i / \sqrt{2m}$ where $k_i$ is the vertex degree/strength and $m$ is the edge count / total edge weight yields the modularity objective function (i.e. RBConfigurationVertexPartition).

alanocallaghan · 2023-02-20T11:11:40Z

It's a bit out of my wheelhouse to go through and reimplement the different objective functions to be honest, I just thought this was a useful contribution that doesn't increase the dependency count.

alanocallaghan · 2024-02-27T17:23:17Z

Pinging @saketkc @samuel-marsh as people who seem to be merging code:

Can you have a look at this? Feel free to say you're not interested and close it, or that you want more info/more tests showing equivalence. Either is fine

samuel-marsh · 2024-02-27T17:30:31Z

I will second @alanocallaghan ping that would be great to have this included!

I’m not member of dev team so my actions are limited here but adding @Gesmira from Seurat team.

Best,
Sam

alanocallaghan · 2024-05-31T10:08:40Z

Just think of the aggregate wasted CPU cycles across every Seurat analysis for no gain

alanocallaghan · 2024-07-12T10:58:47Z

Again bump to say

Feel free to say you're not interested and close it, or that you want more info/more tests showing equivalence. Either is fine

samuel-marsh · 2024-12-18T14:54:38Z

Hi Seurat Team,

I just thought I would bump this to potentially get new eyes on this in the new year.

It would be really great to have this incorporated as the python pass-through is still very slow compared to R native implementations and having faster leiden implementation would help larger workflows significantly.

@dcollins15 hope you don’t mind me tagging you here as you are one I’ve seen doing most of merging/code updates lately.

Thanks as always!
Sam

dcollins15 · 2024-12-18T21:58:54Z

Thanks for the poke @samuel-marsh!

@dcollins15 hope you don’t mind me tagging you here as you are one I’ve seen doing most of merging/code updates lately.

Please feel free to continue tagging me on things you think are important 👌

As you can probably tell from the recent flurry of activity, I'm planning to submit a CRAN release for v5.2.0 in the next couple of days. Given that this change shouldn't affect any results, I think this is a no-brainer to include. ATM I'm working on merging in #8271—once it lands this change will be next to go 🚀

I'll try to do a quick sanity check of the clustering results before I start rebasing/pushing up documentation updates. The main ToDos are:

Rebase alanocallaghan:master onto satijalab:develop and resolve conflicts.
Update the docstring for RunLeiden to drop references to leidenalg and reticulate.
Regenerate roxygen2 documentation.
Update changelog.
Bump version.

@alanocallaghan I'm happy to take care of these updates since you've given us the necessary permissions 🙌

I'll do my best to rebase carefully but in the future, it saves us from having to do any Git-jitsu if you can avoid using branch names that conflict with the ones in the main repository (i.e.master, develop), see #8294.

LogSeuratCommand doesn't like deprecated(), use NULL instead

dcollins15 · 2024-12-19T19:20:29Z

@alanocallaghan this turned out to be quite a bit more involved to get running than I initially realized but I think everything is working as expected now 🚀

In addition to the items I laid out above, I also ended up:

Dropping the check for the leidengalg Python package from RunLeiden.
Deprecating the method parameter for RunLeiden and FindClusters.
Fixing up the docstrings for RunLeiden and FindClusters.
Adding a smoke test for FindClusters.

If @samuel-marsh could give this a quick sanity check that would be fantastic—specifically the way I'm deprecating the method parameter. The smoke test isn't particularly comprehensive but I think it's enough to make us reasonably sure this won't introduce any real regressions—any extra checks you can quickly run are always appreciated 🙏

dcollins15

Going to go ahead and merge this now so I can continue on with the release 🚀 Please do let me know if you spot any issues

alanocallaghan · 2024-12-19T23:56:22Z

Awesome thanks! Sorry it was more work than expected, I'd of course have been happy to do the small extras but I usually default to changelog and documentation changes being done by authors for consistency.

Hope the next release goes smoothly for yous!

mauritsunkel mentioned this pull request Apr 24, 2023

How to use leidenbase instead of Python based 'leiden algorithm' implementation? #7212

Closed

dcollins15 force-pushed the develop branch from a87fd5f to 41d19a8 Compare November 28, 2023 20:43

dcollins15 self-requested a review March 1, 2024 20:32

alanocallaghan and others added 8 commits December 18, 2024 20:29

Switch from leiden to leidenbase

708973f

Add leidenbase to DESCRIPTION

0d9c2a3

Remove import

e07f4ae

Fixup leidenbase::leiden_find_partition call

73d2e9f

Drop py_module_available check for leidenalg

4ed1b19

Deprecate the method parameter for RunLeiden

8298ead

Avoid random.seed <= 0 in RunLeiden

80c9f59

Tidy RunLeiden

8cec034

dcollins15 force-pushed the master branch from 0b8c675 to e07f4ae Compare December 19, 2024 17:18

dcollins15 added 8 commits December 19, 2024 13:27

Update docstring for RunLeiden

f3e31a6

Deprecate method parameter for FindClusters

a08b238

Update default method value for FindCluster.Seurat

e9452c5

LogSeuratCommand doesn't like deprecated(), use NULL instead

Add smoke test for FindClusters

deb8e36

Expand FindClusters smoke test with spot checks

375c188

Update docs

e516544

Update changelog

6c0184e

Bump version

7b0d53d

dcollins15 approved these changes Dec 19, 2024

View reviewed changes

dcollins15 merged commit 6b1c25a into satijalab:develop Dec 19, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from leiden to leidenbase #6792

Switch from leiden to leidenbase #6792

alanocallaghan commented Dec 16, 2022 •

edited

Loading

szhorvat commented Dec 20, 2022 •

edited

Loading

alanocallaghan commented Feb 20, 2023

alanocallaghan commented Feb 27, 2024

samuel-marsh commented Feb 27, 2024

alanocallaghan commented May 31, 2024

alanocallaghan commented Jul 12, 2024

samuel-marsh commented Dec 18, 2024

dcollins15 commented Dec 18, 2024

dcollins15 commented Dec 19, 2024

dcollins15 left a comment

alanocallaghan commented Dec 19, 2024

Switch from leiden to leidenbase #6792

Switch from leiden to leidenbase #6792

Conversation

alanocallaghan commented Dec 16, 2022 • edited Loading

szhorvat commented Dec 20, 2022 • edited Loading

alanocallaghan commented Feb 20, 2023

alanocallaghan commented Feb 27, 2024

samuel-marsh commented Feb 27, 2024

alanocallaghan commented May 31, 2024

alanocallaghan commented Jul 12, 2024

samuel-marsh commented Dec 18, 2024

dcollins15 commented Dec 18, 2024

dcollins15 commented Dec 19, 2024

dcollins15 left a comment

Choose a reason for hiding this comment

alanocallaghan commented Dec 19, 2024

alanocallaghan commented Dec 16, 2022 •

edited

Loading

szhorvat commented Dec 20, 2022 •

edited

Loading