This repository has been archived by the owner on Jun 23, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improved performance of function lp
Improved the performance of function lp.
In this function, the following two factors had a significant impact on the performance.
The first is to retrieve the adjacent vertex each time for each vertex.
When the number of vertices is N, the computational complexity of this operation is O(N). In the original code, this operation is performed on all vertices, so the computational complexity is O(N^2), which is very slow.
In this fix, adjacent vertices are now precomputed and retained.
The other is the format of the counts variable.
Before the fix, the function Stats.counts was used to counting the occurrences of labels.
However, if the neighboring labels are far apart in value, there will be many zeroes in the array. (For example, applying the Stats.counts function to [1, 100] would result in [1, 0, ... , 0, 1], resulting in 98 useless elements.)
This information is not only useless but also computationally expensive when knowing the most occurring labels.
In this fix, the information of non-adjacent labels is not retained.
(Also, in the Stats.counts function, if the label value does not start from 1, the return array will start from the smallest label value. (For example. If the label is [2], the Stats.count function will return [1] instead of [0, 1].) This was causing a bug, which has also been fixed.)
I tried to benchmark before and after the change.
Before
After
Related issues
Checklist
test/...
test/runtests.jl
docs/src/...
.zenodo.json
I did not make any changes to the tests and document, because I did not change the function behavior.
Pinging
Pinging @tpoisot