Improved performance of function lp #188

Inazuma110 · 2021-03-05T09:12:52Z

Improved performance of function lp

Improved the performance of function lp.
In this function, the following two factors had a significant impact on the performance.

The first is to retrieve the adjacent vertex each time for each vertex.

linked = filter(s2 -> has_interaction(N,s1,s2), species(N; dims=2))

When the number of vertices is N, the computational complexity of this operation is O(N). In the original code, this operation is performed on all vertices, so the computational complexity is O(N^2), which is very slow.
In this fix, adjacent vertices are now precomputed and retained.

The other is the format of the counts variable.

counts = StatsBase.counts(labels)

Before the fix, the function Stats.counts was used to counting the occurrences of labels.
However, if the neighboring labels are far apart in value, there will be many zeroes in the array. (For example, applying the Stats.counts function to [1, 100] would result in [1, 0, ... , 0, 1], resulting in 98 useless elements.)
This information is not only useless but also computationally expensive when knowing the most occurring labels.
In this fix, the information of non-adjacent labels is not retained.

(Also, in the Stats.counts function, if the label value does not start from 1, the return array will start from the smallest label value. (For example. If the label is [2], the Stats.count function will return [1] instead of [0, 1].) This was causing a bug, which has also been fixed.)

I tried to benchmark before and after the change.

Before

julia> N = convert(BipartiteNetwork, web_of_life("M_PL_015"))
666×131 (String) bipartite ecological network (L: 2933 - Bool)

julia> @benchmark lp(N)
BenchmarkTools.Trial: 
  memory estimate:  13.05 MiB
  allocs estimate:  55064
  --------------
  minimum time:     940.893 ms (0.00% GC)
  median time:      2.017 s (0.00% GC)
  mean time:        2.298 s (0.11% GC)
  maximum time:     3.937 s (0.19% GC)
  --------------
  samples:          3
  evals/sample:     1

After

julia> @benchmark lp(N)
BenchmarkTools.Trial: 
  memory estimate:  12.42 MiB
  allocs estimate:  165736
  --------------
  minimum time:     27.190 ms (0.00% GC)
  median time:      39.667 ms (0.00% GC)
  mean time:        43.406 ms (2.12% GC)
  maximum time:     98.638 ms (0.00% GC)
  --------------
  samples:          116
  evals/sample:     1

Related issues

Checklist

The code is covered by unit tests
- Additional cases or module in test/...
- Relevant lines in test/runtests.jl
The code is documented
- Docstrings written for every function
- If needed, additional lines in docs/src/...
All contributors have added their name and affiliation to .zenodo.json

I did not make any changes to the tests and document, because I did not change the function behavior.

Pinging

Pinging @tpoisot

tpoisot · 2021-03-06T02:14:34Z

Really nice -- I made a few changes internally to use the functions already in the package, it barely changes the performance (a little less memory is needed). I'll merge - thanks for the great contributions 🎉 !

Inazuma110 and others added 3 commits March 5, 2021 16:51

🐎 Preprocessing with adjacent vertices

8457bc6

🐎 Change the way counts are created.

e788e99

🎨 replace the first loop

6d54eb7

tpoisot added 2 commits March 5, 2021 21:36

🐛 fix double docstring

b8baddd

📦 increase version number

4278496

tpoisot merged commit 307674b into PoisotLab:main Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved performance of function lp #188

Improved performance of function lp #188

Inazuma110 commented Mar 5, 2021 •

edited

Loading

tpoisot commented Mar 6, 2021

Improved performance of function lp #188

Improved performance of function lp #188

Conversation

Inazuma110 commented Mar 5, 2021 • edited Loading

Improved performance of function lp

Related issues

Checklist

Pinging

tpoisot commented Mar 6, 2021

Inazuma110 commented Mar 5, 2021 •

edited

Loading