-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse vector/matrix: add fast implementation of find_next and find_prev (fixed) #23317
Conversation
Before this commit, find_next() will just use the default implementation of looping over each element. When find_next is called without a function filter as first argument, we *know* that semantics are to find elements x satisfying x != 0, so for sparse matrices/vectors, we may only loop over the stored elements. Some care must be taken for stored zero values; that's the reason for the indirection of _sparse_find_next (which only finds the next stored element) and the actual find_next (which does actual non-zero checks).
For next time, it is fine to leave incomplete PR's open :) |
base/sparse/sparsematrix.jl
Outdated
end | ||
row, col = ind2sub(m, i) | ||
lo, hi = m.colptr[col], m.colptr[col+1] | ||
n = searchsortedfirst(@view(m.rowval[lo:hi-1]), row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can specify start and stop index to searchsortedfirst
instead of using a view:
Lines 160 to 172 in a945af3
function searchsortedfirst(v::AbstractVector, x, lo::Int, hi::Int, o::Ordering) | |
lo = lo-1 | |
hi = hi+1 | |
@inbounds while lo < hi-1 | |
m = (lo+hi)>>>1 | |
if lt(o, v[m], x) | |
lo = m | |
else | |
hi = m | |
end | |
end | |
return hi | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooh good find, thanks. Pushing an update.
base/sparse/sparsematrix.jl
Outdated
@@ -1332,6 +1332,42 @@ function findnz(S::SparseMatrixCSC{Tv,Ti}) where {Tv,Ti} | |||
return (I, J, V) | |||
end | |||
|
|||
function _sparse_findnext(m::SparseMatrixCSC, i::Int) | |||
if i > length(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are slightly under-indented, should be 4 spaces rather than 3
test/sparse/sparse.jl
Outdated
@@ -1899,3 +1899,26 @@ end | |||
@test isfinite.(cov_sparse) == isfinite.(cov_dense) | |||
end | |||
end | |||
|
|||
@testset "sparse findprev/findnext operations" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to include some sparse test arrays with stored zeros
(Use git blame -w for finding the non-whitespace edits to this code.)
That AppVeyor failure doesn't seem to be related at all; the code that's failing is tfile = tempname()
io = open(tfile, "w")
ccall(:jl_dump_compiles, Void, (Ptr{Void},), io.handle)
eval(expand(Main, :(for i in 1:10 end)))
ccall(:jl_dump_compiles, Void, (Ptr{Void},), C_NULL)
close(io)
tstats = stat(tfile)
tempty = tstats.size == 0
rm(tfile)
@test tempty == true My guess would be that it's a race condition between |
Just opened #23365 for that. |
Given the resolution of #23365, this should be good to go! |
Needs a rebase. |
@KristofferC you got it :) let's see what CI says. |
Seems like there is still a conflict here. |
Oops, merged |
This fixes a few merge conflicts resulting from other additions to the sparse codebase.
…e explicit Since we now need explicit predicates [1], this optimization only works if we know that the predicate is a function that is false for zero values. As suggested in that pull request, we could find out by calling `f(zero(eltype(array)))` and hoping that `f` is pure, but I like being a bit more conservative and only applying this optimization only to the case where we *know* `f` is equal to `!iszero`. For clarity, this commit also renames the helper method _sparse_findnext() to _sparse_findnextnz(), because now that the predicate-less version doesn't exist anymore, the `nz` part isn't implicit anymore either. [1]: JuliaLang#23812
026e479
to
fe4b76e
Compare
Updated this branch to align with the new semantics for @fredrikekre I accidentally merged from current master without first pulling your prior merge from 30th of September. That merge commit got lost in the process. As far as I can tell, no work of yours got lost through that (it was a clean merge) but if I missed anything, let me know and I'll recover. |
CI failures are libgit related and almost surely unrelated to the work in this branch. Should be good to merge! |
@Sacha0 wanna take a look? :) |
I have blocked some time this evening for reviews, and will try to work this pull request in then :). Best! |
base/sparse/abstractsparse.jl
Outdated
end | ||
end | ||
return j | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the following? :)
function findnext(f::typeof(!iszero), v::AbstractSparseArray, i::Int)
j = _sparse_findnextnz(v, i)
while j != 0 && iszero(v[j])
j = _sparse_findnextnz(v, j+1)
end
return j
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! In the interest of expediency, feel free to commit+merge that if you have time.
base/sparse/abstractsparse.jl
Outdated
end | ||
end | ||
return j | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly? :)
function findprev(f::typeof(!iszero), v::AbstractSparseArray, i::Int)
j = _sparse_findprevnz(v, i)
while j != 0 && iszero(v[j])
j = _sparse_findprevnz(v, j-1)
end
return j
end
base/sparse/sparsevector.jl
Outdated
else | ||
return v.nzind[n] | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A compact alternative:
function _sparse_findnextnz(v::SparseVector, i::Int)
n = searchsortedfirst(v.nzind, i)
return n <= length(v.nzind) ? v.nzind[n] : 0
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm modulo the minor comments! :)
Thanks to @Sacha0 for the suggestion!
Just updated with your comments @Sacha0 . I hope you don't mind I skipped replacing |
base/sparse/sparsevector.jl
Outdated
function _sparse_findnextnz(v::SparseVector, i::Int) | ||
n = searchsortedfirst(v.nzind, i) | ||
if n > length(v.nzind) | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For type stability, return zero(indtype(v))
? (Likewise below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo the minor type stability comment above, lgtm! Thanks @tkluck
! :)
…types This mostly means returning `zero(indtype(...))` instead of `0` in the not-found case. In addition, this commit replaces a few `== 0` checks by `iszero()` to avoid unnecessary promotions. We could similarly replace `+ 1` by `+ one(...)` but that becomes cumbersome very quickly.
Thanks for the review @Sacha0 ! Pushed those updates just now. |
Just added a commit that should resolve the merge conflicts. Do you think we could merge this? |
Looks like! Perhaps @fredrikekre and I should make a brief sweep of the rebased version and then merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test failures appear related?
Looks like. Keeping a pull request open for half a year really makes |
This is needed in response to JuliaLang#24715.
Sorry this has taken so much time, it looks good to me but @Sacha0 should probably take a look and then we should merge! |
If you do not see movement within a few days on a pull request that is approved, passing CI, and not otherwise blocked, please feel welcome to bump! :) Chances are it simply fell off the collective radar and a bump would be appreciated. (Tangentially, having a bot that bumps pull requests under those conditions could work well.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for seeing this work through @tkluck! :)
…#31354)" This seems to duplicate work from JuliaLang#23317 and it causes performance degradation in the cases that one was designed for. See JuliaLang#31354 (comment) This reverts commit e0bef65.
…#31354)" This seems to duplicate work from JuliaLang#23317 and it causes performance degradation in the cases that one was designed for. See JuliaLang#31354 (comment) This reverts commit e0bef65.
…artesian coordinates (#32007) Revert "sparse findnext findprev hash performance improved (#31354)" This seems to duplicate work from #23317 and it causes performance degradation in the cases that one was designed for. See #31354 (comment) This reverts commit e0bef65. Thanks to @mbauman for spotting this issue in #32007 (comment).
…artesian coordinates (#32007) Revert "sparse findnext findprev hash performance improved (#31354)" This seems to duplicate work from #23317 and it causes performance degradation in the cases that one was designed for. See #31354 (comment) This reverts commit e0bef65. Thanks to @mbauman for spotting this issue in #32007 (comment). (cherry picked from commit ec797ef)
(Opening a new pull request instead of re-opening #23312 because I force-pushed and pull requests don't seem to like that.)
Before this commit,
find_next()
will just use the default implementation of looping over each element. When find_next is called without a function filter as first argument, we know that semantics are to find elements x satisfyingx != 0
, so for sparse matrices/vectors, we may only loop over the stored elements.Some care must be taken for stored zero values; that's the reason for the indirection of
_sparse_find_next
(which only finds the next stored element) and the actualfind_next
(which does actual non-zero checks).