Optimize top_set_bit(::BigInt) and hash(::Real) #49996

LilithHafner · 2023-05-30T13:18:52Z

EDIT: this PR now does a fair bit more than #49986 did, see later comments.

Re-lands #49986, but computes abs first. This should be safe now because ndigits0z(x, 2) and top_set_bit(abs(x)) are semantically equivalent whereas before I was falsely assuming x > 0.

The performance gain is now only 1.4x instead of 1.5x, which could be noise or the cost of abs (probably noise)

LilithHafner · 2023-05-30T13:28:01Z

Empirical check that top_set_bit(abs(x)) == Base.ndigits0z(x, 2) is valid:

for T in [UInt8, Int8] 
    for i in typemin(T):typemax(T)
        for j in (i, big(i), Int(i), unsigned(Int(i)))
            Base.top_set_bit(abs(j)) == Base.ndigits0z(j, 2) || error()
        end
    end
end

oscardssmith · 2023-05-30T13:30:46Z

While we're at it, is there a reason we're doing hash(Int64(num) << Int(pow), h) rather than hash(Int64(x))? (and similarly hash(Float64(x)) instead of hash(Int64(num) << Int(pow), h)

LilithHafner · 2023-05-30T14:03:47Z

I suppose the existing version could be better for types that are slow to decompose and to convert to integers or floating-point values. I benchmarked no difference on any of

@btime hash(x) setup=(x=rand(Int)//1)
@btime hash(x) setup=(x=rand(Int)//1024)
@btime hash(x) setup=(x=rand(Int)//rand(Int))
@btime hash(x) setup=(x=rand(Int128))

for switching from num << pow to x in those three places, so the case to make that switch is pretty weak.

oscardssmith · 2023-05-30T14:06:33Z

The first three won't hit it since there is a fast path for hash(x::Rational{<:BitInteger64}, h::UInt). The last will almost never hit it since random Int128s will almost always be bigger than typemax(Int64). You would want to test something like rational{BigInt} (where the pieces were rational) or something like that. We also should possibly be using unsafe_trunc here since we have verified that the arguments are in bounds.

LilithHafner · 2023-05-30T15:54:37Z

I tested

@btime hash(x) setup=(x=big(rand(Int))//big(1))
@btime hash(x) setup=(x=big(rand(Int128))//big(1))

And got poor results

	Int	Int128
Before	16.784	23.218
PR	18.078	24.264
no bitshift	21.210	24.707

The reason for the regression for this PR is likely that top_set_bit(abs(x::BigInt)) is slow because the compiler can't infer nothrow. I have a fix for that.

LilithHafner · 2023-05-30T16:17:37Z

@btime Base.top_set_bit(abs(x)) setup=(x=big(rand(Int128))); gets 1.7x faster with the second commit of this PR (though the compiler still can't infer nothrow), which changes the performance figures due to the first commit and the proposed change from bitshifting to just using the original x.

	Int	Int128
Before	16.825	23.218
PR	14.988	19.433
no bitshift	16.951	19.474

These performance data do not indicate that we should switch from bitshift to direct conversion. It's possible that that could be good in some cases, but I'll leave that for a different PR.

oscardssmith · 2023-05-30T19:15:14Z

base/float.jl

            end # typemin(Int64) handled by Float64 case
-            left <= 1024 && left - right <= 53 && return hash(ldexp(Float64(num),pow), h)
+            left <= 1024 && left - right <= 53 && return hash(ldexp(Float64(num), den_z), h)


are we sure that with this change num will fit in a Float64?

I think so, assuming num and den are relatively prime (which we already rely on because if num = 5 and den = 3 we produce a different result than if num = 15 and den = 9).

den either has 0 trailing zeros or a nonzero number of trailing zeros.

In the first case, den_z = 0, so if ldexp(Float64(num), den_z) is representable as a Float64, then by ldexp(Float64(num), den_z) === ldexp(Float64(num), 0) === Float64(num), so is Float64(num).

In the second case, den_z ≠ 0, so num_z = 0, because we can't have both num and den be even because they are relatively prime. This means that every bit between 1 and top_set_bit(x) is significant, and we check that there are no more than 53 significant bits, so top_set_bit(x) <= 53, and therefore x is a small integer that is representable as a Float64.

LilithHafner · 2023-05-30T19:41:03Z

I reran all the benchmarks that have come up in this and #49998 and reported before => after all benchmarks are allocation free.

@btime Base.hash(x, UInt(1234)) setup=(x=Int128(rand(Int)));
  # 12.262 ns => 3.958 ns
@btime Base.hash(x, UInt(1234)) setup=(x=Int128(rand(Int128)));
  # 24.975 ns => 16.367 ns
@btime Base.top_set_bit(abs(x)) setup=(x=big(rand(Int)));
  # 11.469 ns => 2.541 ns
@btime Base.top_set_bit(abs(x)) setup=(x=big(rand(Int128)));
  # 11.512 ns => 2.583 ns
@btime hash(x) setup=(x=big(rand(Int))//big(1));
  # 33.442 ns => 24.807 ns
@btime hash(x) setup=(x=big(rand(Int128))//big(1));
  # 41.625 ns => 35.247 ns
@btime hash(x) setup=(x=rand(Int)//1);
  # 3.708 ns  => 3.750 ns
@btime hash(x) setup=(x=rand(Int)//1024);
  # 3.837 ns  => 3.833 ns
@btime hash(x) setup=(x=rand(Int)//rand(Int));
  # 14.780 ns => 14.780 ns
@btime hash(x) setup=(x=rand(Int128));
  # 23.845 ns => 16.784 ns

oscardssmith · 2023-05-30T19:46:25Z

To be clear: are those benchmarks including #49998?

LilithHafner · 2023-05-30T19:56:22Z

Those results are just this PR.

oscardssmith · 2023-05-30T20:09:50Z

interesting. That somehow seems to be faster than #49998 which is rather surprising to me. I guess our compiler is doing a really good job cleaning this up for Int128.

LilithHafner · 2023-05-30T20:33:39Z

Yes. It took a fair bit of fiddling before it matched the performance of #49998, but once it did I knew it was pretty good :). I expect the reason this appears more impactful than #49998 is that the impact is more pronounced on my machine than yours.

oscardssmith · 2023-05-31T18:38:16Z

Test failures seem real.

LilithHafner · 2023-05-31T20:39:00Z

Oops, I hadn't noticed :)

oscardssmith · 2023-06-01T14:12:59Z

I think this is good to go, but I do kind of want a review from @vtjnash here.

oscardssmith · 2023-06-02T13:42:17Z

Closing since #49996 gets the same performance.

oscardssmith · 2023-06-02T13:42:50Z

(oops, closed the wrong one)

* fixup for 49996 and add test from 50065 --------- Co-authored-by: Lilith Hafner <[email protected]>

This fixes 2 bugs introduced by #49996 and #50041. Closes #50628.

This fixes 2 bugs introduced by #49996 and #50041. Closes #50628. (cherry picked from commit c777c71)

Use top_set_bit to optimize hash(Real) (reland)

7cd40ed

LilithHafner added performance Must go faster hashing re-land This relands a PR that was previously merged but was later reverted. labels May 30, 2023

optimize top_set_bit(::BigInt)

356ff95

LilithHafner force-pushed the reland-49986 branch from 406574b to 356ff95 Compare May 30, 2023 16:08

implement top_set_bit in Julia for increased performance

71a0974

LilithHafner changed the title ~~Use top_set_bit to optimize hash(Real) (re-land)~~ Optimize top_set_bit(::BigInt) and use it in hash(::Real) (re-land) May 30, 2023

LilithHafner changed the title ~~Optimize top_set_bit(::BigInt) and use it in hash(::Real) (re-land)~~ Optimize top_set_bit(::BigInt) and hash(::Real) May 30, 2023

oscardssmith reviewed May 30, 2023

View reviewed changes

LilithHafner force-pushed the reland-49986 branch from cf780a0 to e4ec12c Compare May 30, 2023 19:20

simplify and optimize hash(::Real) further

bc9b45a

LilithHafner force-pushed the reland-49986 branch from e4ec12c to bc9b45a Compare May 30, 2023 19:20

oscardssmith requested a review from vtjnash May 30, 2023 20:36

bugfix: don't assume pow == 0 when using alternate fallbacks

1f62486

oscardssmith closed this Jun 2, 2023

oscardssmith reopened this Jun 2, 2023

oscardssmith mentioned this pull request Jun 2, 2023

add hash(Integer) fast-path #49998

Closed

oscardssmith merged commit 1961019 into master Jun 2, 2023

oscardssmith deleted the reland-49986 branch June 2, 2023 14:45

OlivierHnt mentioned this pull request Jun 4, 2023

1.0-dev: rework constructors JuliaIntervals/IntervalArithmetic.jl#560

Merged

oscardssmith pushed a commit that referenced this pull request Jun 6, 2023

Fixup hash(::Real) broken by #49996 (#50067)

53bcb39

* fixup for 49996 and add test from 50065 --------- Co-authored-by: Lilith Hafner <[email protected]>

This was referenced Jul 24, 2023

Wrong hashes for some Rationals #50628

Closed

fix hashing regression. #50655

Merged

oscardssmith added a commit that referenced this pull request Jul 26, 2023

fix hashing regression. (#50655)

c777c71

This fixes 2 bugs introduced by #49996 and #50041. Closes #50628.

KristofferC pushed a commit that referenced this pull request Aug 10, 2023

fix hashing regression. (#50655)

3e133c9

This fixes 2 bugs introduced by #49996 and #50041. Closes #50628. (cherry picked from commit c777c71)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize top_set_bit(::BigInt) and hash(::Real) #49996

Optimize top_set_bit(::BigInt) and hash(::Real) #49996

LilithHafner commented May 30, 2023 •

edited

Loading

LilithHafner commented May 30, 2023

oscardssmith commented May 30, 2023

LilithHafner commented May 30, 2023

oscardssmith commented May 30, 2023 •

edited

Loading

LilithHafner commented May 30, 2023

LilithHafner commented May 30, 2023

oscardssmith May 30, 2023

LilithHafner May 30, 2023

LilithHafner commented May 30, 2023

oscardssmith commented May 30, 2023

LilithHafner commented May 30, 2023

oscardssmith commented May 30, 2023

LilithHafner commented May 30, 2023

oscardssmith commented May 31, 2023

LilithHafner commented May 31, 2023

oscardssmith commented Jun 1, 2023

oscardssmith commented Jun 2, 2023

oscardssmith commented Jun 2, 2023

Optimize top_set_bit(::BigInt) and hash(::Real) #49996

Optimize top_set_bit(::BigInt) and hash(::Real) #49996

Conversation

LilithHafner commented May 30, 2023 • edited Loading

LilithHafner commented May 30, 2023

oscardssmith commented May 30, 2023

LilithHafner commented May 30, 2023

oscardssmith commented May 30, 2023 • edited Loading

LilithHafner commented May 30, 2023

LilithHafner commented May 30, 2023

oscardssmith May 30, 2023

Choose a reason for hiding this comment

LilithHafner May 30, 2023

Choose a reason for hiding this comment

LilithHafner commented May 30, 2023

oscardssmith commented May 30, 2023

LilithHafner commented May 30, 2023

oscardssmith commented May 30, 2023

LilithHafner commented May 30, 2023

oscardssmith commented May 31, 2023

LilithHafner commented May 31, 2023

oscardssmith commented Jun 1, 2023

oscardssmith commented Jun 2, 2023

oscardssmith commented Jun 2, 2023

LilithHafner commented May 30, 2023 •

edited

Loading

oscardssmith commented May 30, 2023 •

edited

Loading