-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize top_set_bit(::BigInt) and hash(::Real) #49996
Conversation
Empirical check that for T in [UInt8, Int8]
for i in typemin(T):typemax(T)
for j in (i, big(i), Int(i), unsigned(Int(i)))
Base.top_set_bit(abs(j)) == Base.ndigits0z(j, 2) || error()
end
end
end |
While we're at it, is there a reason we're doing |
I suppose the existing version could be better for types that are slow to decompose and to convert to integers or floating-point values. I benchmarked no difference on any of @btime hash(x) setup=(x=rand(Int)//1)
@btime hash(x) setup=(x=rand(Int)//1024)
@btime hash(x) setup=(x=rand(Int)//rand(Int))
@btime hash(x) setup=(x=rand(Int128)) for switching from |
The first three won't hit it since there is a fast path for |
I tested
And got poor results
The reason for the regression for this PR is likely that |
406574b
to
356ff95
Compare
These performance data do not indicate that we should switch from bitshift to direct conversion. It's possible that that could be good in some cases, but I'll leave that for a different PR. |
base/float.jl
Outdated
end # typemin(Int64) handled by Float64 case | ||
left <= 1024 && left - right <= 53 && return hash(ldexp(Float64(num),pow), h) | ||
left <= 1024 && left - right <= 53 && return hash(ldexp(Float64(num), den_z), h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure that with this change num
will fit in a Float64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, assuming num
and den
are relatively prime (which we already rely on because if num = 5
and den = 3
we produce a different result than if num = 15
and den = 9
).
den
either has 0 trailing zeros or a nonzero number of trailing zeros.
- In the first case,
den_z = 0
, so ifldexp(Float64(num), den_z)
is representable as a Float64, then byldexp(Float64(num), den_z) === ldexp(Float64(num), 0) === Float64(num)
, so isFloat64(num)
. - In the second case,
den_z ≠ 0
, sonum_z = 0
, because we can't have bothnum
andden
be even because they are relatively prime. This means that every bit between 1 andtop_set_bit(x)
is significant, and we check that there are no more than 53 significant bits, sotop_set_bit(x) <= 53
, and thereforex
is a small integer that is representable as aFloat64
.
cf780a0
to
e4ec12c
Compare
e4ec12c
to
bc9b45a
Compare
I reran all the benchmarks that have come up in this and #49998 and reported @btime Base.hash(x, UInt(1234)) setup=(x=Int128(rand(Int)));
# 12.262 ns => 3.958 ns
@btime Base.hash(x, UInt(1234)) setup=(x=Int128(rand(Int128)));
# 24.975 ns => 16.367 ns
@btime Base.top_set_bit(abs(x)) setup=(x=big(rand(Int)));
# 11.469 ns => 2.541 ns
@btime Base.top_set_bit(abs(x)) setup=(x=big(rand(Int128)));
# 11.512 ns => 2.583 ns
@btime hash(x) setup=(x=big(rand(Int))//big(1));
# 33.442 ns => 24.807 ns
@btime hash(x) setup=(x=big(rand(Int128))//big(1));
# 41.625 ns => 35.247 ns
@btime hash(x) setup=(x=rand(Int)//1);
# 3.708 ns => 3.750 ns
@btime hash(x) setup=(x=rand(Int)//1024);
# 3.837 ns => 3.833 ns
@btime hash(x) setup=(x=rand(Int)//rand(Int));
# 14.780 ns => 14.780 ns
@btime hash(x) setup=(x=rand(Int128));
# 23.845 ns => 16.784 ns |
To be clear: are those benchmarks including #49998? |
Those results are just this PR. |
interesting. That somehow seems to be faster than #49998 which is rather surprising to me. I guess our compiler is doing a really good job cleaning this up for |
Test failures seem real. |
Oops, I hadn't noticed :) |
I think this is good to go, but I do kind of want a review from @vtjnash here. |
Closing since #49996 gets the same performance. |
(oops, closed the wrong one) |
* fixup for 49996 and add test from 50065 --------- Co-authored-by: Lilith Hafner <[email protected]>
EDIT: this PR now does a fair bit more than #49986 did, see later comments.
Re-lands #49986, but computes
abs
first. This should be safe now becausendigits0z(x, 2)
andtop_set_bit(abs(x))
are semantically equivalent whereas before I was falsely assumingx > 0
.The performance gain is now only 1.4x instead of 1.5x, which could be noise or the cost of
abs
(probably noise)