Skip to content
This repository has been archived by the owner on Jun 23, 2023. It is now read-only.

Improved performance of function lp #188

Merged
merged 5 commits into from
Mar 6, 2021
Merged

Conversation

Inazuma110
Copy link
Contributor

@Inazuma110 Inazuma110 commented Mar 5, 2021

Improved performance of function lp

Improved the performance of function lp.
In this function, the following two factors had a significant impact on the performance.

The first is to retrieve the adjacent vertex each time for each vertex.

linked = filter(s2 -> has_interaction(N,s1,s2), species(N; dims=2))

When the number of vertices is N, the computational complexity of this operation is O(N). In the original code, this operation is performed on all vertices, so the computational complexity is O(N^2), which is very slow.
In this fix, adjacent vertices are now precomputed and retained.

The other is the format of the counts variable.

counts = StatsBase.counts(labels)

Before the fix, the function Stats.counts was used to counting the occurrences of labels.
However, if the neighboring labels are far apart in value, there will be many zeroes in the array. (For example, applying the Stats.counts function to [1, 100] would result in [1, 0, ... , 0, 1], resulting in 98 useless elements.)
This information is not only useless but also computationally expensive when knowing the most occurring labels.
In this fix, the information of non-adjacent labels is not retained.

(Also, in the Stats.counts function, if the label value does not start from 1, the return array will start from the smallest label value. (For example. If the label is [2], the Stats.count function will return [1] instead of [0, 1].) This was causing a bug, which has also been fixed.)

I tried to benchmark before and after the change.

Before

julia> N = convert(BipartiteNetwork, web_of_life("M_PL_015"))
666×131 (String) bipartite ecological network (L: 2933 - Bool)

julia> @benchmark lp(N)
BenchmarkTools.Trial: 
  memory estimate:  13.05 MiB
  allocs estimate:  55064
  --------------
  minimum time:     940.893 ms (0.00% GC)
  median time:      2.017 s (0.00% GC)
  mean time:        2.298 s (0.11% GC)
  maximum time:     3.937 s (0.19% GC)
  --------------
  samples:          3
  evals/sample:     1

After

julia> @benchmark lp(N)
BenchmarkTools.Trial: 
  memory estimate:  12.42 MiB
  allocs estimate:  165736
  --------------
  minimum time:     27.190 ms (0.00% GC)
  median time:      39.667 ms (0.00% GC)
  mean time:        43.406 ms (2.12% GC)
  maximum time:     98.638 ms (0.00% GC)
  --------------
  samples:          116
  evals/sample:     1

Related issues

Checklist

  • The code is covered by unit tests
    • Additional cases or module in test/...
    • Relevant lines in test/runtests.jl
  • The code is documented
    • Docstrings written for every function
    • If needed, additional lines in docs/src/...
  • All contributors have added their name and affiliation to .zenodo.json

I did not make any changes to the tests and document, because I did not change the function behavior.

Pinging

Pinging @tpoisot

@tpoisot
Copy link
Member

tpoisot commented Mar 6, 2021

Really nice -- I made a few changes internally to use the functions already in the package, it barely changes the performance (a little less memory is needed). I'll merge - thanks for the great contributions 🎉 !

@tpoisot tpoisot merged commit 307674b into PoisotLab:main Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants