Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse vector/matrix: add fast implementation of find_next and find_prev (fixed) #23317

Merged
merged 12 commits into from
Jan 6, 2018

Conversation

tkluck
Copy link
Contributor

@tkluck tkluck commented Aug 18, 2017

(Opening a new pull request instead of re-opening #23312 because I force-pushed and pull requests don't seem to like that.)

Before this commit, find_next() will just use the default implementation of looping over each element. When find_next is called without a function filter as first argument, we know that semantics are to find elements x satisfying x != 0, so for sparse matrices/vectors, we may only loop over the stored elements.

Some care must be taken for stored zero values; that's the reason for the indirection of _sparse_find_next (which only finds the next stored element) and the actual find_next (which does actual non-zero checks).

Before this commit, find_next() will just use the default implementation
of looping over each element. When find_next is called without a
function filter as first argument, we *know* that semantics are to find
elements x satisfying x != 0, so for sparse matrices/vectors, we may
only loop over the stored elements.

Some care must be taken for stored zero values; that's the reason for
the indirection of _sparse_find_next (which only finds the next stored
element) and the actual find_next (which does actual non-zero checks).
@fredrikekre
Copy link
Member

(Opening a new pull request instead of re-opening #23312 because I force-pushed and pull requests don't seem to like that.)

For next time, it is fine to leave incomplete PR's open :)

@fredrikekre fredrikekre added the sparse Sparse arrays label Aug 18, 2017
end
row, col = ind2sub(m, i)
lo, hi = m.colptr[col], m.colptr[col+1]
n = searchsortedfirst(@view(m.rowval[lo:hi-1]), row)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can specify start and stop index to searchsortedfirst instead of using a view:

julia/base/sort.jl

Lines 160 to 172 in a945af3

function searchsortedfirst(v::AbstractVector, x, lo::Int, hi::Int, o::Ordering)
lo = lo-1
hi = hi+1
@inbounds while lo < hi-1
m = (lo+hi)>>>1
if lt(o, v[m], x)
lo = m
else
hi = m
end
end
return hi
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh good find, thanks. Pushing an update.

@@ -1332,6 +1332,42 @@ function findnz(S::SparseMatrixCSC{Tv,Ti}) where {Tv,Ti}
return (I, J, V)
end

function _sparse_findnext(m::SparseMatrixCSC, i::Int)
if i > length(m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are slightly under-indented, should be 4 spaces rather than 3

@@ -1899,3 +1899,26 @@ end
@test isfinite.(cov_sparse) == isfinite.(cov_dense)
end
end

@testset "sparse findprev/findnext operations" begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to include some sparse test arrays with stored zeros

(Use git blame -w for finding the non-whitespace edits to
this code.)
@tkluck
Copy link
Contributor Author

tkluck commented Aug 20, 2017

That AppVeyor failure doesn't seem to be related at all; the code that's failing is

    tfile = tempname()
    io = open(tfile, "w")
    ccall(:jl_dump_compiles, Void, (Ptr{Void},), io.handle)
    eval(expand(Main, :(for i in 1:10 end)))
    ccall(:jl_dump_compiles, Void, (Ptr{Void},), C_NULL)
    close(io)
    tstats = stat(tfile)
    tempty = tstats.size == 0
    rm(tfile)
    @test tempty == true

My guess would be that it's a race condition between close() and stat(). If so, we may be able to fix it by just calling stat(io) instead, which will call fstat (2) under the hood.

@tkluck
Copy link
Contributor Author

tkluck commented Aug 20, 2017

Just opened #23365 for that.

@tkluck
Copy link
Contributor Author

tkluck commented Aug 29, 2017

Given the resolution of #23365, this should be good to go!

@KristofferC
Copy link
Member

Needs a rebase.

@tkluck
Copy link
Contributor Author

tkluck commented Aug 29, 2017

@KristofferC you got it :) let's see what CI says.

@KristofferC
Copy link
Member

Seems like there is still a conflict here.

@tkluck
Copy link
Contributor Author

tkluck commented Aug 29, 2017

Oops, merged master instead of origin/master. Testing+pushing that now.

@fredrikekre fredrikekre requested review from Sacha0 and removed request for Sacha0 September 30, 2017 21:42
@Sacha0
Copy link
Member

Sacha0 commented Oct 1, 2017

Linking to #23812 and #23120 in case this pull request need be updated in accord. Best!

This fixes a few merge conflicts resulting from other additions to
the sparse codebase.
…e explicit

Since we now need explicit predicates [1], this optimization only works
if we know that the predicate is a function that is false for zero
values. As suggested in that pull request, we could find out by calling
`f(zero(eltype(array)))` and hoping that `f` is pure, but I like being
a bit more conservative and only applying this optimization only to the
case where we *know* `f` is equal to `!iszero`.

For clarity, this commit also renames the helper method
_sparse_findnext()  to _sparse_findnextnz(), because now that the
predicate-less version doesn't exist anymore, the `nz` part isn't
implicit anymore either.

[1]: JuliaLang#23812
@tkluck
Copy link
Contributor Author

tkluck commented Nov 3, 2017

Updated this branch to align with the new semantics for find and friends as merged to master from #23812. Very interested to hear your thoughts!

@fredrikekre I accidentally merged from current master without first pulling your prior merge from 30th of September. That merge commit got lost in the process. As far as I can tell, no work of yours got lost through that (it was a clean merge) but if I missed anything, let me know and I'll recover.

@tkluck
Copy link
Contributor Author

tkluck commented Nov 5, 2017

CI failures are libgit related and almost surely unrelated to the work in this branch. Should be good to merge!

@fredrikekre
Copy link
Member

@Sacha0 wanna take a look? :)

@Sacha0
Copy link
Member

Sacha0 commented Nov 5, 2017

@Sacha0 wanna take a look? :)

I have blocked some time this evening for reviews, and will try to work this pull request in then :). Best!

end
end
return j
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the following? :)

function findnext(f::typeof(!iszero), v::AbstractSparseArray, i::Int)
    j = _sparse_findnextnz(v, i)
    while j != 0 && iszero(v[j])
        j = _sparse_findnextnz(v, j+1)
    end
    return j
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! In the interest of expediency, feel free to commit+merge that if you have time.

end
end
return j
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly? :)

function findprev(f::typeof(!iszero), v::AbstractSparseArray, i::Int)
    j = _sparse_findprevnz(v, i)
    while j != 0 && iszero(v[j])
        j = _sparse_findprevnz(v, j-1)
    end
    return j
end

else
return v.nzind[n]
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A compact alternative:

function _sparse_findnextnz(v::SparseVector, i::Int)
    n = searchsortedfirst(v.nzind, i)
    return n <= length(v.nzind) ? v.nzind[n] : 0
end

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo the minor comments! :)

@tkluck
Copy link
Contributor Author

tkluck commented Nov 10, 2017

Just updated with your comments @Sacha0 . I hope you don't mind I skipped replacing if/else/end by ?: (which is arguably a matter of taste), but the code duplication removal is an obvious improvement in clarity. Thanks again!

function _sparse_findnextnz(v::SparseVector, i::Int)
n = searchsortedfirst(v.nzind, i)
if n > length(v.nzind)
return 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For type stability, return zero(indtype(v))? (Likewise below.)

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo the minor type stability comment above, lgtm! Thanks @tkluck! :)

…types

This mostly means returning `zero(indtype(...))` instead of `0` in the
not-found case.  In addition, this commit replaces a few `== 0` checks
by `iszero()` to avoid unnecessary promotions. We could similarly
replace `+ 1` by `+ one(...)` but that becomes cumbersome very quickly.
@tkluck
Copy link
Contributor Author

tkluck commented Nov 14, 2017

Thanks for the review @Sacha0 ! Pushed those updates just now.

@tkluck
Copy link
Contributor Author

tkluck commented Jan 4, 2018

Just added a commit that should resolve the merge conflicts. Do you think we could merge this?

@Sacha0
Copy link
Member

Sacha0 commented Jan 4, 2018

Looks like! Perhaps @fredrikekre and I should make a brief sweep of the rebased version and then merge?

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failures appear related?

@tkluck
Copy link
Contributor Author

tkluck commented Jan 6, 2018

Looks like. Keeping a pull request open for half a year really makes master a bit of a moving target 😕 I'm happy to fix it, but what can we do to make sure it gets merged quickly after that?

@fredrikekre
Copy link
Member

Sorry this has taken so much time, it looks good to me but @Sacha0 should probably take a look and then we should merge!

@Sacha0
Copy link
Member

Sacha0 commented Jan 6, 2018

If you do not see movement within a few days on a pull request that is approved, passing CI, and not otherwise blocked, please feel welcome to bump! :) Chances are it simply fell off the collective radar and a bump would be appreciated. (Tangentially, having a bot that bumps pull requests under those conditions could work well.)

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for seeing this work through @tkluck! :)

@Sacha0 Sacha0 merged commit 2cfa6a5 into JuliaLang:master Jan 6, 2018
tkluck added a commit to tkluck/julia that referenced this pull request May 12, 2019
…#31354)"

This seems to duplicate work from JuliaLang#23317 and it causes performance
degradation in the cases that one was designed for. See
JuliaLang#31354 (comment)

This reverts commit e0bef65.
tkluck added a commit to tkluck/julia that referenced this pull request May 12, 2019
…#31354)"

This seems to duplicate work from JuliaLang#23317 and it causes performance
degradation in the cases that one was designed for. See
JuliaLang#31354 (comment)

This reverts commit e0bef65.
JeffBezanson pushed a commit that referenced this pull request May 23, 2019
…artesian coordinates (#32007)

Revert "sparse findnext findprev hash performance improved (#31354)"

This seems to duplicate work from #23317 and it causes performance
degradation in the cases that one was designed for. See
#31354 (comment)

This reverts commit e0bef65.

Thanks to @mbauman for spotting this issue in
#32007 (comment).
KristofferC pushed a commit that referenced this pull request Jul 16, 2019
…artesian coordinates (#32007)

Revert "sparse findnext findprev hash performance improved (#31354)"

This seems to duplicate work from #23317 and it causes performance
degradation in the cases that one was designed for. See
#31354 (comment)

This reverts commit e0bef65.

Thanks to @mbauman for spotting this issue in
#32007 (comment).

(cherry picked from commit ec797ef)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants