-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
32bit base #317
32bit base #317
Conversation
It shouldn't necessarily improve performance (the complexity for write operations should be
What exactly do these columns mean? This is not clear to me. |
Broadly speaking, I'd expect a larger branching factor to be good for reads and bad for writes. I'm surprised that not all the regressions involve building/modifying maps. |
I used an R script to generate the table. |
I'd strongly suggest to compile with |
Updated table with both Lastly, I added the Name master branch Difference PctDifference Faster
1: HashMap/alterFDelete-miss/String 32.9696290684921 42.39932566484126 9.42969659634921 28.6011607130 77.75979583
2: HashMap/alterFInsert/String 215.3235145868651 275.82584350734123 60.50232892047613 28.0983379992 78.06502532
3: HashMap/isSubmapOf/ByteString 11.5999402036724 14.22520081024892 2.62526060657648 22.6316736163 81.54500143
4: HashMap/fromListWith/short/Int 55.6070548667460 67.94833102138890 12.34127615464286 22.1937237716 81.83726374
5: HashMap/fromList/short/Int 48.6182299250794 59.06959459456349 10.45136466948412 21.4968020958 82.30669308
6: HashMap/alterDelete-miss/String 37.3981422372222 44.85334607960317 7.45520384238095 19.9346903252 83.37871197
7: HashMap/insert/String 226.1796030594048 265.25639724075393 39.07679418134919 17.2768868867 85.26829340
8: HashMap/insert/ByteString 185.3351694798413 209.08809480531747 23.75292532547618 12.8161996410 88.63975237
9: HashMap/alterInsert/Int 158.9757773168254 178.35194024710316 19.37616293027777 12.1881227803 89.13599543
10: HashMap/alterFInsert/ByteString 183.1856731308730 204.64208483416667 21.45641170329363 11.7129311133 89.51515192
11: HashMap/alterInsert/String 204.8049832366270 222.53991815833334 17.73493492170633 8.6594254893 92.03067249
12: HashMap/insert/Int 156.6099045867460 166.19426213369047 9.58435754694445 6.1198923352 94.23303944
13: HashMap/alterInsert/ByteString 214.4329253058333 222.34184867607146 7.90892337023813 3.6882971022 96.44289934
14: HashMap/alterFInsert/Int 163.5382111585318 168.70286593051586 5.16465477198411 3.1580721933 96.93860875
15: HashMap/alterFDelete/ByteString 185.6910126868254 191.27941759849205 5.58840491166665 3.0095182480 97.07840761
16: HashMap/alterDelete/Int 124.9060165442460 126.09339499865079 1.18737845440478 0.9506175021 99.05833414
17: HashMap/delete/ByteString 183.2686007890079 184.99019341785714 1.72159262884922 0.9393822081 99.06936006
18: HashMap/delete/Int 124.1260541469444 124.87457840769841 0.74852426075396 0.6030355721 99.40057915
19: HashMap/alterFDelete/Int 121.9434335053571 121.66067152765874 -0.28276197769841 -0.2318796261 100.23241856
20: HashMap/lookup/String 189.7499243107143 188.51114347603175 -1.23878083468254 -0.6528491851 100.65713931
21: HashMap/isSubmapOfNaive/ByteString 17.6835346280844 17.37498814341991 -0.30854648466450 -1.7448235952 101.77580832
22: HashMap/fromList/long/ByteString 96.4228993230952 94.48711075726190 -1.93578856583333 -2.0076025295 102.04873294
23: HashMap/map 16.8987048340729 16.44513348254329 -0.45357135152958 -2.6840598495 102.75808860
24: HashMap/fromListWith/long/ByteString 94.2372655196032 91.41055523039684 -2.82671028920635 -2.9995673937 103.09232373
25: HashMap/fromListWith/long/String 111.7402754723016 107.75569247095237 -3.98458300134921 -3.5659326814 103.69779351
26: HashMap/alterFInsert-dup/Int 131.4835226894841 126.06226711468254 -5.42125557480158 -4.1231444548 104.30045857
27: HashMap/fromList/long/String 123.0455838120635 117.13239325996032 -5.91319055210318 -4.8056910040 105.04829654
28: HashMap/fromList/long/Int 78.2291642735714 74.07452190690476 -4.15464236666667 -5.3108612437 105.60873329
29: HashMap/insert-dup/Int 131.0241765444048 123.93787064400793 -7.08630590039683 -5.4083956773 105.71762760
30: HashMap/fromList/short/String 33.4538726243254 31.55743524325397 -1.89643738107143 -5.6688127033 106.00947880
31: HashMap/fromListWith/long/Int 85.2876245838492 80.34853845690476 -4.93908612694445 -5.7910935508 106.14707650
32: HashMap/alterDelete/ByteString 193.6223339088889 182.08000879123014 -11.54232511765874 -5.9612570950 106.33915013
33: HashMap/lookup/ByteString 86.0702200210714 80.42905462166667 -5.64116539940476 -6.5541431148 107.01384024
34: HashMap/alterInsert-dup/Int 133.8595860533730 124.46328831214286 -9.39629774123017 -7.0195180026 107.54945323
35: HashMap/alterFDelete-miss/Int 68.6103543238095 62.53517027047620 -6.07518405333334 -8.8546169353 109.71482772
36: HashMap/lookup/Int 56.5716651693254 51.19610434095237 -5.37556082837302 -9.5022142486 110.49994115
37: HashMap/fromList/short/ByteString 23.6204318465152 21.23089338493867 -2.38953846157648 -10.1164046327 111.25500665
38: HashMap/intersection 28.1832533207576 25.27620454062410 -2.90704878013348 -10.3148091068 111.50112856
39: HashMap/fromListWith/short/String 27.2837814395671 24.22216661195166 -3.06161482761544 -11.2213727939 112.63972326
40: HashMap/difference 29.7513369804654 26.15295276516595 -3.59838421529943 -12.0948655775 113.75899788
41: HashMap/delete-miss/Int 68.4862471348413 59.07717965416666 -9.40906748067461 -13.7386232628 115.92673776
42: HashMap/alterDelete-miss/Int 75.5412594859127 65.02767487952381 -10.51358460638889 -13.9176718497 116.16786180
43: HashMap/alterInsert-dup/ByteString 95.5499394222619 81.68777050087301 -13.86216892138889 -14.5077736367 116.96969918
44: HashMap/insert-dup/ByteString 89.1090322516270 75.89503456515872 -13.21399768646826 -14.8290216520 117.41088566
45: HashMap/filter 12.8951041891522 10.93289972755772 -1.96220446159452 -15.2166623302 117.94770382
46: HashMap/fromListWith/short/ByteString 22.1765061313709 18.65256742211039 -3.52393870926046 -15.8904143348 118.89251292
47: HashMap/lookup-miss/Int 32.7953670787302 27.54795450480158 -5.24741257392857 -16.0004690947 119.04828387
48: HashMap/foldl' 3.8794029730199 3.24104510120498 -0.63835787181497 -16.4550544569 119.69605025
49: HashMap/alterFInsert-dup/ByteString 90.0987970607936 75.18020872607143 -14.91858833472222 -16.5580327611 119.84377084
50: HashMap/size/ByteString 3.6285647652010 3.02743852101209 -0.60112624418887 -16.5665017186 119.85593564
51: HashMap/lookup-miss/String 32.7155993463889 26.91275673910534 -5.80284260728355 -17.7372346013 121.56168045
52: HashMap/filterWithKey 6.6671753330803 5.42487430954795 -1.24230102353230 -18.6330936486 122.90008860
53: HashMap/delete-miss/String 32.8114436606349 26.68327159129870 -6.12817206933622 -18.6769351959 122.96634447
54: HashMap/insert-dup/String 132.8011339937301 106.71809397821427 -26.08304001551587 -19.6406756713 124.44106622
55: HashMap/size/Int 1.1796196262251 0.93487723845725 -0.24474238776785 -20.7475683116 126.17909365
56: HashMap/lookup-miss/ByteString 22.6591833992063 17.51742733745671 -5.14175606174964 -22.6917094547 129.35223285
57: HashMap/union 11.6413152333117 8.97725471822205 -2.66406051508963 -22.8845320455 129.67567033
58: HashMap/delete-miss/ByteString 25.7546491999820 19.79242582878067 -5.96222337120129 -23.1500857376 130.12376261
59: HashMap/alterFDelete-miss/ByteString 25.6176592040224 19.43847852450938 -6.17918067951299 -24.1207857061 131.78839677
60: HashMap/alterDelete-miss/ByteString 27.3690040863492 20.72142463432900 -6.64757945202020 -24.2887151869 132.08070666
61: HashMap/size/String 3.1031935541833 2.28104357661803 -0.82214997756530 -26.4936737980 136.04271247
62: HashMap/delete/String 291.2628381117063 208.91650081448412 -82.34633729722221 -28.2721743121 139.41590874
63: HashMap/alterDelete/String 300.6992605804762 214.01811996396827 -86.68114061650792 -28.8265227022 140.50177650
64: HashMap/alterFDelete/String 290.0470790068651 203.62735095567461 -86.41972805119050 -29.7950692512 142.44013766
65: HashMap/alterFInsert-dup/String 132.8972828945238 83.72508800170634 -49.17219489281744 -37.0001506591 158.73053832
66: HashMap/alterInsert-dup/String 196.0290256689286 101.35723343103173 -94.67179223789685 -48.2947828337 193.40408083
67: HashMap/foldr 4.5503249296860 2.15623200491741 -2.39409292476860 -52.6136696118 211.03132313
68: HashMap/isSubmapOfNaive/String 52.9287307866270 19.65998594744228 -33.26874483918471 -62.8557389243 269.22059318
69: HashMap/isSubmapOf/String 51.4585838697619 19.10838016412338 -32.35020370563851 -62.8664865468 269.29851420
70: HashMap/isSubmapOf/Int 0.0003249014255 0.00011550580476 -0.00020939562073 -64.4489695330 281.28579871
71: HashMap/isSubmapOfNaive/Int 0.0001321604130 0.00003822007067 -0.00009394034228 -71.0805453635 345.78798687
|
Just pushed some fixes that improve the performance of some of the benchmarks notably I've updated the data table in my previous comment with data that reflects this change |
I think these numbers look pretty good. I'm surprised about the rearranging
of pattern matches though. So far I had believed that GHC ignores this and
simply uses the order of the constructors from the data definition.
I'll be mostly AFK until September 10th. I can review in Mode detail
afterwards.
doyougnu ***@***.***> schrieb am Fr., 27. Aug. 2021, 22:09:
… Just pushed some fixes that improve the performance of some of the
benchmarks notably insert-dup/String and insert/String. The fixes simply
preserve the bang patterns in leading pattern matches on most of the go
closures in the code. When I reordered the pattern matches I overlooked
this and it is (rightfully) impactful on performance.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#317 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA36VC6ON5JQUOR2LFOR7LTT67WGBANCNFSM5C4C4HBQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I tried to check this with a simplistic example of two identical functions except for different orders of pattern matches. GHC seems to treat these functions as identical and CSE's them: module M where
data D
= A [Int]
| B
| C Int
f d = case d of
A xs -> sum xs
B -> 0
C x -> x
-- In the Core this definition is CSE'd as
-- g = f
g d = case d of
C x -> x
B -> 0
A xs -> sum xs So I either need some convincing that the reordering of pattern matches actually affects performance, or I'd like these changes to be reverted. @doyougnu If you need anything from me, please shout! :) |
@sjakobi Apologies for taking so long to reply. You're absolutely right regarding the CSE! I had to double check the core to be sure, reordering the definition of the data constructors does change the pattern match order in core and stg. I've had some success with this but I'll open a new PR if I feel that its worth it. Regarding the performance from my first to second version. I think this probably has more to do with the bang patterns I added on |
Data/HashMap/Internal.hs
Outdated
go h k _ t@(Leaf hy (L ky _)) | ||
| hy == h && ky == k = Empty | ||
| otherwise = t | ||
go h k s t@(BitmapIndexed b ary) | ||
go !h !k !s t@(BitmapIndexed b ary) | ||
| b .&. m == 0 = t | ||
| otherwise = | ||
let !st = A.index ary i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the net effect of all these bang pattern changes? Do they change compiled Core? Do they change demand signatures, unfoldings, or unfolding guidance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These do change compiled Core. I just checked insert
and delete
. There are positive changes with insert
. The branche's insert
loop fuses:
Data.HashMap.Internal.$winsert'
= \ (@ k_sGwF)
(@ v_sGwG)
(w_sGwH :: Eq k_sGwF)
(ww_sGwO :: Exts.Word#)
(w1_sGwJ :: k_sGwF)
(w2_sGwK :: v_sGwG)
(w3_sGwL :: HashMap k_sGwF v_sGwG) ->
letrec {
$s$wgo_sP6x [Occ=LoopBreaker]
:: HashMap k_sGwF v_sGwG
-> Int# -> v_sGwG -> k_sGwF -> Exts.Word# -> HashMap k_sGwF v_sGwG
[LclId, Arity=5, Str=<S,1*U><L,U><L,U><L,U><L,U>, Unf=OtherCon []]
$s$wgo_sP6x
= \ (sc_sP6w :: HashMap k_sGwF v_sGwG)
(sc1_sP6v :: Int#)
(sc2_sP6u :: v_sGwG)
(sc3_sP6t :: k_sGwF)
(sc4_sP6s :: Exts.Word#) ->
case sc_sP6w of wild_X8H {
...
}
}; } in
$s$wgo_sP6x w3_sGwL 0# w2_sGwK w1_sGwJ ww_sGwO
note only 1 wgo....
defined in the letrec
, master uses two:
Data.HashMap.Internal.$winsert'
= \ (@ k_sGhe)
(@ v_sGhf)
(w_sGhg :: Eq k_sGhe)
(ww_sGhn :: Exts.Word#)
(w1_sGhi :: k_sGhe)
(w2_sGhj :: v_sGhf)
(w3_sGhk :: HashMap k_sGhe v_sGhf) ->
letrec {
$s$wgo_sOLH [Occ=LoopBreaker]
:: Exts.Word#
-> Exts.SmallArray# (HashMap k_sGhe v_sGhf)
-> Int#
-> v_sGhf
-> k_sGhe
-> Exts.Word#
-> HashMap k_sGhe v_sGhf
[LclId,
Arity=6,
Str=<L,U><L,U><L,U><L,U><S,1*U><L,U>,
Unf=OtherCon []]
$s$wgo_sOLH
= \ (sc_sOLF :: Exts.Word#)
(sc1_sOLG :: Exts.SmallArray# (HashMap k_sGhe v_sGhf))
(sc2_sOLE :: Int#)
(sc3_sOLD :: v_sGhf)
(sc4_sOLC :: k_sGhe)
(sc5_sOLB :: Exts.Word#) ->
case sc4_sOLC of k1_XcAI { __DEFAULT ->
...
$wgo1_sGhd
= \ (ww1_sGh7 :: Exts.Word#)
(w4_sGh1 :: k_sGhe)
(w5_sGh2 :: v_sGhf)
(ww2_sGhb :: Int#)
(w6_sGh4 :: HashMap k_sGhe v_sGhf) ->
case w4_sGh1 of k1_XcAI { __DEFAULT ->
case w6_sGh4 of wild_X8X {
...
}
}; } in
$wgo1_sGhd ww_sGhn w1_sGhi w2_sGhj 0# w3_sGhk
where go_sGhd
calls go_sOLH
in the Collision
case. This is the case that gets fused on the branch.
For whatever reason the bang on delete
stops the worker/wrapper
transformation:
branch:
delete'
= \ (@ k_agXw)
(@ v_agXx)
($dEq_agXz :: Eq k_agXw)
(h0_acCw :: Hash)
(k0_acCx :: k_agXw)
(m0_acCy :: HashMap k_agXw v_agXx) ->
case k0_acCx of k1_XcG7 { __DEFAULT ->
letrec {
$sgo_sOOZ [Occ=LoopBreaker]
:: HashMap k_agXw v_agXx
-> Int# -> k_agXw -> Exts.Word# -> HashMap k_agXw v_agXx
[LclId, Arity=4, Str=<S,1*U><L,U><L,U><L,U>, Unf=OtherCon []]
$sgo_sOOZ
master:
Data.HashMap.Internal.$wdelete'
= \ (@ k_sGya)
(@ v_sGyb)
(w_sGyc :: Eq k_sGya)
(ww_sGyi :: Exts.Word#)
(w1_sGye :: k_sGya)
(w2_sGyf :: HashMap k_sGya v_sGyb) ->
letrec {
$wgo1_sGy9 [InlPrag=NOUSERINLINE[2], Occ=LoopBreaker]
:: Exts.Word#
-> k_sGya -> Int# -> HashMap k_sGya v_sGyb -> HashMap k_sGya v_sGyb
[LclId, Arity=4, Str=<L,U><S,1*U><L,U><S,1*U>, Unf=OtherCon []]
$wgo1_sGy9
Notice also that on master
the Hash
input gets unboxed due to the WW transformation.
I think that making sure delete
gets Worker/Wrapper'd before merging is a good idea. I'm also fine with just reverting the bangs and opening another PR for any more changes.
I think it would be better to do a separate PR, so we don't confuse where each perf change comes from. Sometimes GHC doesn't W/W something because it decides to inline it always; haven't looked at whether that applies here. |
2d0928f
to
af3f3ff
Compare
I believe I've made the requested changes. Please let me know if there is anything else for this PR. I'll open another for the reordering/bang changes soon. and as always thanks for the reviews! |
I'd love to see some attempt to understand the regressions. Which benchmarks get better and which get worse seem just a tad arbitrary. The |
@treeowl I tried to dig into the regression for So I ran only that benchmark through Benchmark benchmarks: FINISH
Performance counter stats for 'cabal bench --benchmark-options=-m exact HashMap/alterDelete-miss/ByteString' (5 runs):
25,095.34 msec task-clock:u # 0.998 CPUs utilized ( +- 7.85% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
457,847 page-faults:u # 0.018 M/sec ( +- 18.54% )
96,971,467,839 cycles:u # 3.864 GHz ( +- 7.36% ) (75.02%)
4,702,198,414 stalled-cycles-frontend:u # 4.85% frontend cycles idle ( +- 8.26% ) (75.02%)
44,504,481,735 stalled-cycles-backend:u # 45.89% backend cycles idle ( +- 3.95% ) (75.02%)
121,184,360,339 instructions:u # 1.25 insn per cycle
# 0.37 stalled cycles per insn ( +- 6.84% ) (75.03%)
23,780,537,078 branches:u # 947.608 M/sec ( +- 6.70% ) (75.04%)
441,198,832 branch-misses:u # 1.86% of all branches ( +- 15.55% ) (75.03%)
60,914,366,137 L1-dcache-loads:u # 2427.318 M/sec ( +- 6.62% ) (75.03%)
1,882,327,510 L1-dcache-load-misses:u # 3.09% of all L1-dcache accesses ( +- 7.64% ) (75.02%)
<not supported> LLC-loads:u
<not supported> LLC-load-misses:u
25.15 +- 2.01 seconds time elapsed ( +- 8.01% ) and the branch Performance counter stats for 'cabal bench --benchmark-options=-m exact HashMap/alterDelete-miss/ByteString' (5 runs):
24,234.97 msec task-clock:u # 0.998 CPUs utilized ( +- 7.76% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
454,182 page-faults:u # 0.019 M/sec ( +- 18.55% )
93,478,538,634 cycles:u # 3.857 GHz ( +- 7.31% ) (75.03%)
3,641,546,852 stalled-cycles-frontend:u # 3.90% frontend cycles idle ( +- 11.64% ) (75.01%)
43,841,865,823 stalled-cycles-backend:u # 46.90% backend cycles idle ( +- 3.16% ) (75.02%)
118,756,310,576 instructions:u # 1.27 insn per cycle
# 0.37 stalled cycles per insn ( +- 6.98% ) (75.03%)
23,388,366,084 branches:u # 965.067 M/sec ( +- 6.81% ) (75.03%)
417,847,294 branch-misses:u # 1.79% of all branches ( +- 16.32% ) (75.03%)
59,336,453,501 L1-dcache-loads:u # 2448.382 M/sec ( +- 6.76% ) (75.04%)
1,830,916,326 L1-dcache-load-misses:u # 3.09% of all L1-dcache accesses ( +- 7.85% ) (75.02%)
<not supported> LLC-loads:u
<not supported> LLC-load-misses:u
24.28 +- 1.91 seconds time elapsed ( +- 7.85% ) Notice that I think we would expect lower instruction count for 32bit base because less master: benchmarking HashMap/alterDelete-miss/ByteString ... took 13.64 s, total 56 iterations
benchmarked HashMap/alterDelete-miss/ByteString
time 40.14 ms (39.76 ms .. 40.94 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 39.99 ms (39.77 ms .. 40.25 ms)
std dev 395.0 μs (258.1 μs .. 620.5 μs)
Benchmark benchmarks: FINISH
Build profile: -w ghc-8.10.4 -O1
In order, the following will be built (use -v for more details):
- unordered-containers-0.2.14.0 (bench:benchmarks) (first run)
Preprocessing benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Building benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking HashMap/alterDelete-miss/ByteString ... took 13.44 s, total 56 iterations
benchmarked HashMap/alterDelete-miss/ByteString
time 39.52 ms (39.36 ms .. 39.65 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 39.88 ms (39.75 ms .. 40.15 ms)
std dev 303.3 μs (185.8 μs .. 437.5 μs)
Benchmark benchmarks: FINISH
Build profile: -w ghc-8.10.4 -O1
In order, the following will be built (use -v for more details):
- unordered-containers-0.2.14.0 (bench:benchmarks) (first run)
Preprocessing benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Building benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking HashMap/alterDelete-miss/ByteString ... took 13.33 s, total 56 iterations
benchmarked HashMap/alterDelete-miss/ByteString
time 39.45 ms (39.33 ms .. 39.55 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 39.58 ms (39.52 ms .. 39.68 ms)
std dev 140.7 μs (70.19 μs .. 218.5 μs)
Benchmark benchmarks: FINISH
Build profile: -w ghc-8.10.4 -O1
In order, the following will be built (use -v for more details):
- unordered-containers-0.2.14.0 (bench:benchmarks) (first run)
Preprocessing benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Building benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking HashMap/alterDelete-miss/ByteString ... took 13.36 s, total 56 iterations
benchmarked HashMap/alterDelete-miss/ByteString
time 39.35 ms (39.23 ms .. 39.52 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 39.60 ms (39.48 ms .. 39.86 ms)
std dev 288.4 μs (94.09 μs .. 476.1 μs)
Benchmark benchmarks: FINISH
Build profile: -w ghc-8.10.4 -O1
In order, the following will be built (use -v for more details):
- unordered-containers-0.2.14.0 (bench:benchmarks) (first run)
Preprocessing benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Building benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking HashMap/alterDelete-miss/ByteString ... took 13.40 s, total 56 iterations
benchmarked HashMap/alterDelete-miss/ByteString
time 40.35 ms (39.63 ms .. 41.22 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 39.67 ms (39.46 ms .. 39.99 ms)
std dev 428.4 μs (259.0 μs .. 642.5 μs) Notice that there are no heavy outliers reported and benchmarking HashMap/alterDelete-miss/ByteString ... took 12.43 s, total 56 iterations
benchmarked HashMap/alterDelete-miss/ByteString
time 19.29 ms (17.82 ms .. 20.77 ms)
0.991 R² (0.976 R² .. 1.000 R²)
mean 23.13 ms (20.92 ms .. 26.49 ms)
std dev 4.634 ms (104.2 μs .. 5.815 ms)
variance introduced by outliers: 68% (severely inflated)
Benchmark benchmarks: FINISH
Build profile: -w ghc-8.10.4 -O1
In order, the following will be built (use -v for more details):
- unordered-containers-0.2.14.0 (bench:benchmarks) (first run)
Preprocessing benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Building benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking HashMap/alterDelete-miss/ByteString ... took 12.42 s, total 56 iterations
benchmarked HashMap/alterDelete-miss/ByteString
time 19.44 ms (18.09 ms .. 20.94 ms)
0.991 R² (0.974 R² .. 1.000 R²)
mean 23.16 ms (20.98 ms .. 27.47 ms)
std dev 4.578 ms (165.0 μs .. 5.737 ms)
variance introduced by outliers: 68% (severely inflated)
Benchmark benchmarks: FINISH
Build profile: -w ghc-8.10.4 -O1
In order, the following will be built (use -v for more details):
- unordered-containers-0.2.14.0 (bench:benchmarks) (first run)
Preprocessing benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Building benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking HashMap/alterDelete-miss/ByteString ... took 12.45 s, total 56 iterations
benchmarked HashMap/alterDelete-miss/ByteString
time 19.90 ms (18.09 ms .. 21.71 ms)
0.989 R² (0.972 R² .. 0.998 R²)
mean 23.44 ms (21.32 ms .. 26.93 ms)
std dev 4.564 ms (454.2 μs .. 5.767 ms)
variance introduced by outliers: 59% (severely inflated)
Benchmark benchmarks: FINISH
Build profile: -w ghc-8.10.4 -O1
In order, the following will be built (use -v for more details):
- unordered-containers-0.2.14.0 (bench:benchmarks) (first run)
Preprocessing benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Building benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking HashMap/alterDelete-miss/ByteString ... took 12.43 s, total 56 iterations
benchmarked HashMap/alterDelete-miss/ByteString
time 19.48 ms (17.91 ms .. 21.06 ms)
0.988 R² (0.964 R² .. 1.000 R²)
mean 23.36 ms (21.24 ms .. 27.81 ms)
std dev 4.706 ms (692.9 μs .. 5.990 ms)
variance introduced by outliers: 68% (severely inflated)
Benchmark benchmarks: FINISH
Build profile: -w ghc-8.10.4 -O1
In order, the following will be built (use -v for more details):
- unordered-containers-0.2.14.0 (bench:benchmarks) (first run)
Preprocessing benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Building benchmark 'benchmarks' for unordered-containers-0.2.14.0..
Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking HashMap/alterDelete-miss/ByteString ... took 12.40 s, total 56 iterations
benchmarked HashMap/alterDelete-miss/ByteString
time 19.37 ms (17.87 ms .. 20.95 ms)
0.991 R² (0.972 R² .. 1.000 R²)
mean 23.16 ms (20.93 ms .. 27.51 ms)
std dev 4.683 ms (166.7 μs .. 5.930 ms)
variance introduced by outliers: 68% (severely inflated) Each benchmark reports severe outliers which seem to skew the measurement low because In any case I compiled more regression statistics, these are up to date for my latest commit (32bit base no bangs no reordering): Name master branch Difference PctDifference Faster
1: HashMap/isSubmapOfNaive/String 53.7884 65.0119 11.2235 20.86 82.74
2: HashMap/fromList/short/Int 50.3661 60.1439 9.7778 19.41 83.74
3: HashMap/fromListWith/short/Int 57.0969 66.9412 9.8443 17.24 85.29
4: HashMap/alterFDelete-miss/ByteString 26.4369 28.8611 2.4242 9.16 91.60
5: HashMap/alterInsert/Int 153.8916 167.8327 13.9410 9.05 91.69
6: HashMap/alterFInsert/ByteString 172.5733 186.9022 14.3289 8.30 92.33
7: HashMap/insert/ByteString 174.3504 188.4732 14.1227 8.10 92.51
8: HashMap/fromList/long/ByteString 98.4718 104.7043 6.2324 6.32 94.05
9: HashMap/alterInsert/ByteString 189.8528 201.0552 11.2024 5.90 94.43
10: HashMap/fromList/short/String 34.3230 36.2094 1.8863 5.49 94.79
11: HashMap/insert/Int 150.7048 156.3881 5.6833 3.77 96.37
12: HashMap/alterInsert/String 195.1582 202.1771 7.0188 3.59 96.53
13: HashMap/alterFInsert/Int 156.1919 157.4893 1.2973 0.83 99.18
14: HashMap/alterFInsert/String 210.3690 210.2779 -0.0910 -0.04 100.04
15: HashMap/fromListWith/long/String 115.6609 115.5865 -0.0743 -0.06 100.06
16: HashMap/isSubmapOf/ByteString 11.9745 11.8290 -0.1454 -1.21 101.23
17: HashMap/alterFInsert-dup/Int 129.1681 125.0134 -4.1547 -3.21 103.32
18: HashMap/alterFDelete/Int 125.4869 120.6289 -4.8580 -3.87 104.03
19: HashMap/fromList/short/ByteString 23.7682 22.8258 -0.9424 -3.96 104.13
20: HashMap/insert/String 219.4855 210.7733 -8.7121 -3.96 104.13
21: HashMap/delete/ByteString 182.3416 172.9844 -9.3571 -5.13 105.41
22: HashMap/alterDelete/Int 131.6931 123.8543 -7.8388 -5.95 106.33
23: HashMap/fromList/long/String 123.8589 115.8239 -8.0349 -6.48 106.94
24: HashMap/map 17.5870 16.4131 -1.1739 -6.67 107.15
25: HashMap/delete/Int 129.5927 120.9106 -8.6820 -6.69 107.18
26: HashMap/insert-dup/Int 131.7386 122.7031 -9.0355 -6.85 107.36
27: HashMap/alterDelete-miss/String 37.9127 35.2327 -2.6799 -7.06 107.61
28: HashMap/fromList/long/Int 80.9759 74.9517 -6.0241 -7.43 108.04
29: HashMap/fromListWith/short/String 26.2560 24.1738 -2.0821 -7.93 108.61
30: HashMap/alterFDelete/ByteString 187.7723 171.3584 -16.4139 -8.74 109.58
31: HashMap/alterDelete/ByteString 188.3785 171.5909 -16.7876 -8.91 109.78
32: HashMap/fromListWith/long/Int 86.6287 78.5805 -8.0481 -9.29 110.24
33: HashMap/alterInsert-dup/Int 133.4730 121.0694 -12.4036 -9.29 110.25
34: HashMap/fromListWith/long/ByteString 96.9519 86.5336 -10.4183 -10.74 112.04
35: HashMap/alterFDelete-miss/Int 69.7377 62.0982 -7.6395 -10.95 112.30
36: HashMap/lookup/ByteString 87.9955 78.1580 -9.8374 -11.17 112.59
37: HashMap/fromListWith/short/ByteString 20.8472 18.4706 -2.3766 -11.40 112.87
38: HashMap/lookup/String 192.2641 169.7583 -22.5058 -11.70 113.26
39: HashMap/isSubmapOfNaive/ByteString 17.8681 15.7542 -2.1138 -11.83 113.42
40: HashMap/intersection 27.9015 24.1269 -3.7745 -13.52 115.64
41: HashMap/size/ByteString 3.6008 3.1089 -0.4919 -13.66 115.82
42: HashMap/difference 29.1113 24.9238 -4.1874 -14.38 116.80
43: HashMap/alterDelete-miss/Int 78.2321 65.5912 -12.6408 -16.15 119.27
44: HashMap/insert-dup/ByteString 90.6601 75.8275 -14.8326 -16.36 119.56
45: HashMap/foldl' 3.9046 3.2459 -0.6586 -16.86 120.29
46: HashMap/lookup/Int 57.9009 47.8670 -10.0338 -17.32 120.96
47: HashMap/delete-miss/Int 69.4378 57.0186 -12.4191 -17.88 121.78
48: HashMap/alterInsert-dup/ByteString 96.6344 79.0962 -17.5382 -18.14 122.17
49: HashMap/alterFInsert-dup/ByteString 91.8934 75.0882 -16.8051 -18.28 122.38
50: HashMap/filterWithKey 6.7572 5.5201 -1.2371 -18.30 122.41
51: HashMap/lookup-miss/Int 33.2655 27.0819 -6.1836 -18.58 122.83
52: HashMap/filter 13.4468 10.7097 -2.7371 -20.35 125.56
53: HashMap/size/Int 1.1857 0.9361 -0.2495 -21.04 126.65
54: HashMap/lookup-miss/String 33.6928 26.5011 -7.1916 -21.34 127.14
55: HashMap/alterFDelete-miss/String 33.5248 26.3396 -7.1852 -21.43 127.28
56: HashMap/delete-miss/String 33.2863 26.0768 -7.2094 -21.65 127.65
57: HashMap/union 11.6485 8.9965 -2.6520 -22.76 129.48
58: HashMap/insert-dup/String 137.4933 106.1026 -31.3906 -22.83 129.59
59: HashMap/alterFInsert-dup/String 133.7493 102.6985 -31.0508 -23.21 130.23
60: HashMap/delete-miss/ByteString 26.3743 20.0588 -6.3155 -23.94 131.49
61: HashMap/lookup-miss/ByteString 22.9625 17.4523 -5.5101 -23.99 131.57
62: HashMap/size/String 3.0384 2.2222 -0.8162 -26.86 136.73
63: HashMap/alterDelete-miss/ByteString 29.0384 20.6113 -8.4271 -29.02 140.89
64: HashMap/alterDelete/String 297.3323 192.6111 -104.7211 -35.22 154.37
65: HashMap/delete/String 290.7659 184.7439 -106.0220 -36.46 157.39
66: HashMap/alterFDelete/String 295.6333 184.4920 -111.1413 -37.59 160.24
67: HashMap/alterInsert-dup/String 198.6804 94.0286 -104.6518 -52.67 211.30
68: HashMap/foldr 4.6686 2.1455 -2.5231 -54.04 217.60
69: HashMap/isSubmapOf/Int 0.0003 0.0001 -0.0001 -61.41 259.14
70: HashMap/isSubmapOf/String 51.9250 18.1427 -33.7822 -65.05 286.20
71: HashMap/isSubmapOfNaive/Int 0.0001 0.0000 -0.0001 -77.63 447.04 Apologies for the novel! |
This is generally looking promising. Can you make it pass CI, and maybe do one more run of the benchmarks to see how stable the results are? |
Needs a rebase as well. I'll try to reproduce the benchmark results. |
I've rebased on top of If this looks good then I'll push a commit that either adds a comment to the benchmark suite as requested in #317 (comment) or revert to Data
|
I'm happy with these numbers! :) I also realized that this is a purely internal change, so if it should turn out that the old 16bit base was "better" overall, we can easily revert this change.
Reverting to the smaller number seems good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Jeffrey! :)
Please let me know if there is anything else before 0.2.15.0. I'd like to have this make it into the next release!
v0.2.15.0 has already been released – I didn't want to delay the migration to hashable-1.4
any further. I'll make another release within this month though.
I'll have a look. One question: should we add a |
@treeowl I think we can consider something like that if people are unhappy with with the base 32 version. I wouldn't want to offer it upfront, since it would double the size of the API and would probably come with a tradeoff in maintainability. |
Ooh, yuck. I think the usual advice is to wait for at least an alpha release, since otherwise things can shift around. |
Thank you, @doyougnu! :) |
This PR does two things:
Regarding 1. this should improve performance since a path in the tree become
log_32
rather thanlog_16
. Similarly it should mean that the HAMT stays more shallow for longer.Note that the benchmark suite will not show the difference since
n = 2^12 = 4096
which means the HAMT doesn't get very nested.I've benchmarked the PR with
n = 4 * (2^16)
elements. This number is significant because with a 16 bit base then tree should increase in level after 2^16 elements are inserted (I believe). So this number of elements stress tests the HAMT and thus a difference between implementations is more noticeable. Here is a table of the comparison:Positive numbers are slowdowns and negative numbers are speed ups. I'm unsure why some of the larger slowdowns occur and am hoping you might have some insight here. In any case I think the speedups are worth investing time, for example
HashMap/foldr
is 55% faster andHashMap/lookup/Int
is 16% faster by these benchmarks!