-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create Accumulate iterator #25766
base: master
Are you sure you want to change the base?
create Accumulate iterator #25766
Conversation
537efaf
to
d664815
Compare
base/accumulate.jl
Outdated
rcum_promote_type(op, ::Type{Array{T,N}}) where {T,N} = Array{rcum_promote_type(op,T), N} | ||
|
||
# accumulate_pairwise slightly slower then accumulate, but more numerically | ||
# stable in certain situations (e.g. sums). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"stable" is the wrong word here. Use "accurate"
base/accumulate.jl
Outdated
|
||
# accumulate_pairwise slightly slower then accumulate, but more numerically | ||
# stable in certain situations (e.g. sums). | ||
# it does double the number of operations compared to accumulate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"double the number of op
calls" would be more informative
base/accumulate.jl
Outdated
end | ||
|
||
function cumsum!(out, v::AbstractVector, dim::Integer) | ||
# we dispatch on the possibility of numerical stability issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rephrase to "on the possibility of roundoff errors"
(Again, this is misusing the term "numerical stability". Even naive summation is backwards stable, it is just less accurate than pairwise summation.)
base/accumulate.jl
Outdated
itrstate, accval = accstate | ||
val, itrstate = next(acc.iter, itrstate) | ||
if accval === uninitialized | ||
accval = reduce_first(acc.op, val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is type-unstable even for things like Accumulate(+, itr)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the new Union optimisations mean that there is no performance hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, the Union
optimizations help with cases like this, but retaining type stability will still be faster. Worth Nanosoldiering or otherwise benchmarking to be sure though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases, the only type instability will be in the first call to next
, which is typically unrolled, so we should be able to avoid that.
There are type stability issues, if julia> collect(Base.Accumulate(+, 1., Int[]))
0-element Array{Real,1}
julia> collect(Base.Accumulate(+, 1., Int[1]))
1-element Array{Float64,1}:
2.0 |
d664815
to
520ed02
Compare
This makes it possible to use the `collect` machinery for determining output types. Also moves code out of multidimensional.jl (which is a bit of an odd place for it). Still need to get slicing working.
520ed02
to
8bd8cae
Compare
Okay, this should now fix some of the type stability issues. In summary: It fixes #25506, and generally makes all the promotion machinery work without falling back on I've also changed it so that The main question is whether we want |
@nanosoldier |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Ah, the performance hit seems to be that |
@nanosoldier |
@nanosoldier |
👍 to the general idea here. I would have loved to review this in more detail, but haven't had the time and it's unlikely that I will have today. |
base/deprecated.jl
Outdated
@@ -1407,6 +1407,8 @@ end | |||
|
|||
@deprecate which(s::Symbol) which(Main, s) | |||
|
|||
@deprecate accumulate!(op, dest::AbstractArray, args...) accumulate!(dest, op, args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why dest
first? Function arguments have highest priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.
test/arrayops.jl
Outdated
@inferred accumulate(*, String[]) | ||
@test accumulate(*, ['a' 'b'; 'c' 'd'], 1) == ["a" "b"; "ac" "bd"] | ||
@test accumulate(*, ['a' 'b'; 'c' 'd'], 2) == ["a" "ab"; "c" "cd"] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding
@inferred accumulate(+, 1.0, Int[])
@test eltype(accumulate(+, 1.0, Int[])) == Float64
would be good. (Probably not in this testset.)
base/accumulate.jl
Outdated
|
||
function accumulate!(dest, op, v0, X, dim::Integer) | ||
dim > 0 || throw(ArgumentError("dim must be a positive integer")) | ||
axes(A) == axes(B) || throw(DimensionMismatch("shape of source and destination must match")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is A
and B
should be dest
and X
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need more accumulate!
tests. Historically accumulate!
would be implicitly called anyway by each accumulate
test so this was not necessary. AFAICT there is only a single accumulate!
test now.
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
dc20816
to
c5f5d47
Compare
Looking only at the 1D case for now, I'm getting: julia> A = rand(100);
julia> @btime Base.accumulate_pairwise(Base.add_sum, undef, $A, Base.HasShape{1}()); # this is what cumsum(A) does
771.947 ns (9 allocations: 1.06 KiB)
julia> @btime Base._accumulate(Base.add_sum, undef, $A, Base.HasShape{1}(), nothing); # this uses the Accumulate iterator
175.124 ns (2 allocations: 912 bytes) For comparison, on master: julia> @btime cumsum($A);
147.019 ns (1 allocation: 896 bytes) Using an element type which does not do pairwise summing: julia> A = rand(Int, 100);
julia> @btime cumsum($A); # master
126.825 ns (1 allocation: 896 bytes)
julia> @btime cumsum($A); # this PR
125.654 ns (2 allocations: 912 bytes) The overhead due to the So...
Doesn't seem to gain us much. |
Can we just fix #25506 in the meantime? |
how about I just keep the original pairwise stuff, since that seems to be the problem? We can always change that later. The one thing to do in 0.7 is to change the small integer behaviour, since that is the main breaking change. |
This is remarkable: julia> A=rand(100);
julia> @btime cumsum($A);
944.120 ns (9 allocations: 1.06 KiB)
julia> Base._similar_for(c::AbstractArray, ::Type{T}, itr, ::Base.HasShape) where {T} = similar(c, T, axes(itr))
julia> @btime cumsum($A);
666.361 ns (7 allocations: 1.02 KiB) That's on a different machine than above, same |
Although everything is type-stable and inferable, if I disable the function accumulate_pairwise(op, v0, itr, ::Union{HasLength,HasShape{1}})
i = start(itr)
#if done(itr,i)
# return collect(Accumulate(op, v0, itr))
#end
v1,i = next(itr,i)
y = reduce_first(op,v0,v1)
Y = _similar_for(1:1, typeof(y), itr, IteratorSize(itr))
L = linearindices(Y)
n = length(L)
j = first(L)
while true
Y[j] = y
if done(itr,i)
return Y
end
y,j,i,wider = _accumulate_pairwise!(op,Y,itr,y,j+1,i,last(L)-j,true)
#if !wider
return Y
#end
R = promote_typejoin(eltype(Y), typeof(y))
newY = similar(Y, R)
copyto!(newY,1,Y,1,j)
Y = newY
end
end I'm getting julia> @btime cumsum($A);
197.452 ns (1 allocation: 896 bytes) which is on par with master. The |
My theory for the effect of disabling the
Yay! Unfortunately, that doesn't help the more-than-one-dimensional case. I'll look into that... @simonbyrne, ok if I push my updates to your branch? |
Yes, certainly! I won't have time to look at this for a few days, but make what changes you see fit. |
Ok, I could substantially reduce the setup time for the multi-dimensional case and make it type-stable (if the input permits it): julia> A = ones(1,1);
julia> @btime cumsum($A, dims=1);
42.639 ns (1 allocation: 96 bytes)
julia> @btime cumsum($A, dims=2);
42.780 ns (1 allocation: 96 bytes) Compare with master: julia> @btime cumsum($A, dims=1);
33.831 ns (1 allocation: 96 bytes)
julia> @btime cumsum($A, dims=2);
149.753 ns (4 allocations: 144 bytes) Unfortunately, the time for the actual computation is still worse than on master by more than a factor of 2: # this PR
julia> srand(0);
julia> A = rand(100,100);
julia> @btime cumsum($A, dims=1);
24.235 μs (2 allocations: 78.20 KiB)
julia> @btime cumsum($A, dims=2);
22.350 μs (2 allocations: 78.20 KiB) #master
julia> @btime cumsum($A, dims=1);
8.637 μs (2 allocations: 78.20 KiB)
julia> @btime cumsum($A, dims=2);
9.056 μs (5 allocations: 78.25 KiB) |
Uh, no, this isn't it. While just iterating over function _accumulate!(op::F, dest::AbstractArray{T}, v0, X, dim, inds, st, first_in_dim, dim_delta, widen) where {F,T}
while !done(inds, st)
i, st = next(inds, st)
end
return dest
end is still slower than master, although it doesn't compute anything besides incrementing the index. |
I've managed to fix the performance of plain |
v0::V | ||
iter::I | ||
end | ||
Accumulate(op, iter) = Accumulate(op, undef, iter) # use `undef` as a sentinel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I was afraid would happen with undef
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can create another type to use as a sentinel: any suggestions for names? (I had originally used uninitialized
, which makes slightly more sense).
I will repeat the request to do the small integer type change first, so we can do the rest of this any time. |
For what it's worth, here are the updated numbers on my spot-checks:
|
@nanosoldier |
This doesn't need to be triaged anymore, right? Since after #26658 is merged this won't be breaking. |
Any update on this? |
There is now an |
This makes it possible to use the
collect
machinery for determining output types.Also moves code out of multidimensional.jl (which is a bit of an odd place for it).
To do:
pairwise