Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: begin to implement flat decimal columns #57593

Closed
wants to merge 27 commits into from

Conversation

jordanlewis
Copy link
Member

This commit changes the representation of the Decimals column in the
colexec package to be a wrapped flat Bytes representation. Instead of
storing values of apd.Decimal in a slice (which contain heap pointers),
we now store a serialized form of apd.Decimal in a flat bytes slice,
without any heap pointers. Then, to access the apd.Decimals at runtime,
we "deserialize" them in a close to zero-copy fashion, inflating an
apd.Decimal by pointing its internal varlen Coeff field directly at the
serialized bytes.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member

I fixed up most things since I was curious in the benchmark numbers, and here is what I got (note that I modified the benchmark to use the decimals):

name                                                                 old time/op    new time/op    delta
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-24             52.2µs ± 1%    50.9µs ± 3%   -2.48%  (p=0.003 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-24          16.2ms ± 4%    17.3ms ± 4%   +6.70%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-24       2.98ms ± 1%    3.17ms ± 1%   +6.23%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24          32.7µs ± 5%    31.4µs ± 7%   -3.91%  (p=0.035 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24       2.08ms ± 0%    3.03ms ± 1%  +45.60%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24    1.32ms ± 0%    2.02ms ± 0%  +52.93%  (p=0.000 n=9+10)

name                                                                 old speed      new speed      delta
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-24           4.91MB/s ± 1%  5.03MB/s ± 3%   +2.57%  (p=0.003 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-24        16.2MB/s ± 4%  15.2MB/s ± 4%   -6.25%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-24     88.0MB/s ± 1%  82.8MB/s ± 1%   -5.86%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24        7.83MB/s ± 5%  8.15MB/s ± 7%   +4.14%  (p=0.030 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24      126MB/s ± 0%    87MB/s ± 1%  -31.32%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24   198MB/s ± 0%   130MB/s ± 0%  -34.61%  (p=0.000 n=9+10)

name                                                                 old alloc/op   new alloc/op   delta
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-24              106kB ± 0%     124kB ± 0%  +17.63%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-24          14.6MB ± 0%    14.3MB ± 0%   -2.65%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-24       1.12MB ± 0%    1.03MB ± 0%   -8.05%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24          73.7kB ± 0%    96.1kB ± 0%  +30.38%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24        293kB ± 0%     353kB ± 0%  +20.21%  (p=0.000 n=10+6)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24     210kB ± 0%     274kB ± 0%  +30.70%  (p=0.000 n=9+8)

name                                                                 old allocs/op  new allocs/op  delta
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-24                197 ± 0%       148 ± 0%  -24.87%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-24           32.3k ± 0%     27.2k ± 0%  -15.70%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-24        4.93k ± 0%     0.80k ± 0%  -83.75%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24            78.0 ± 0%      48.0 ± 0%  -38.46%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24        3.50k ± 0%     0.37k ± 0%  -89.53%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24       398 ± 0%       370 ± 0%   -7.04%  (p=0.000 n=10+10)

@jordanlewis
Copy link
Member Author

Can you push a patch with your benchmark change too?

@yuzefovich
Copy link
Member

Sure, I switched one place from value to pointer, rerunning the benchmarks.

@yuzefovich
Copy link
Member

That didn't help (benchmarks only of the switching commit):

name                                                                 old time/op    new time/op    delta
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-24             51.7µs ± 2%    53.8µs ± 2%  +3.97%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-24          16.9ms ± 1%    17.6ms ± 4%  +4.29%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-24       3.17ms ± 0%    3.17ms ± 1%    ~     (p=0.796 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24          30.0µs ± 3%    30.6µs ± 1%  +2.02%  (p=0.017 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24       3.04ms ± 0%    2.83ms ± 0%  -6.99%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24    2.04ms ± 0%    2.04ms ± 0%    ~     (p=0.853 n=10+10)

name                                                                 old speed      new speed      delta
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-24           4.95MB/s ± 2%  4.76MB/s ± 2%  -3.82%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-24        15.5MB/s ± 1%  14.9MB/s ± 3%  -4.09%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-24     82.7MB/s ± 0%  82.7MB/s ± 1%    ~     (p=0.839 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24        8.53MB/s ± 3%  8.36MB/s ± 1%  -2.00%  (p=0.018 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24     86.2MB/s ± 0%  92.7MB/s ± 0%  +7.52%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24   129MB/s ± 0%   129MB/s ± 0%    ~     (p=0.810 n=10+10)

name                                                                 old alloc/op   new alloc/op   delta
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-24              124kB ± 0%     124kB ± 0%  +0.13%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-24          14.1MB ± 0%    14.2MB ± 0%  +1.11%  (p=0.000 n=10+8)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-24       1.03MB ± 0%    1.03MB ± 0%  -0.03%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24          96.1kB ± 0%    96.1kB ± 0%    ~     (all equal)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24        353kB ± 0%     353kB ± 0%  -0.00%  (p=0.002 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24     274kB ± 0%     274kB ± 0%  +0.04%  (p=0.000 n=7+8)

name                                                                 old allocs/op  new allocs/op  delta
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-24                148 ± 0%       150 ± 0%  +1.35%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-24           27.2k ± 0%     27.2k ± 0%  +0.06%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-24          788 ± 0%       778 ± 0%  -1.27%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24            48.0 ± 0%      48.0 ± 0%    ~     (all equal)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24          366 ± 0%       366 ± 0%    ~     (all equal)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24       362 ± 0%       366 ± 0%  +1.10%  (p=0.000 n=10+10)

@jordanlewis jordanlewis force-pushed the flat-dec branch 2 times, most recently from 775cc1c to c8c9c61 Compare December 5, 2020 05:23
@jordanlewis
Copy link
Member Author

This commit is a hybrid, where we store a slice of []apd.Decimal but put all of the backing memory into an adjacent flat Bytes struct. Still doesn't do anything better, but at least the memory is contiguous? Hmm..

name                                                                 old time/op    new time/op    delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-12          20.8µs ± 1%    31.0µs ± 2%   +48.84%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-12       1.40ms ± 1%    1.84ms ± 2%   +31.31%  (p=0.000 n=8+9)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-12     911µs ± 7%     919µs ± 1%      ~     (p=0.739 n=10+10)

name                                                                 old speed      new speed      delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-12        12.3MB/s ± 0%   8.3MB/s ± 2%   -32.79%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-12      187MB/s ± 1%   143MB/s ± 2%   -23.84%  (p=0.000 n=8+9)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-12   288MB/s ± 7%   285MB/s ± 1%      ~     (p=0.739 n=10+10)

name                                                                 old alloc/op   new alloc/op   delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-12          73.7kB ± 0%   148.5kB ± 0%  +101.60%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-12        293kB ± 0%     552kB ± 0%   +88.20%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-12     210kB ± 0%     424kB ± 0%  +102.51%  (p=0.000 n=8+8)

name                                                                 old allocs/op  new allocs/op  delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-12            78.0 ± 0%      50.0 ± 0%   -35.90%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-12        3.50k ± 0%     0.37k ± 0%   -89.45%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-12       394 ± 0%       376 ± 0%    -4.57%  (p=0.000 n=10+10)

@jordanlewis jordanlewis force-pushed the flat-dec branch 2 times, most recently from fa17595 to b9c8c12 Compare December 6, 2020 22:48
jordanlewis and others added 13 commits December 6, 2020 18:58
This commit changes the representation of the Decimals column in the
colexec package to be a wrapped flat Bytes representation. Instead of
storing values of apd.Decimal in a slice (which contain heap pointers),
we now store a serialized form of apd.Decimal in a flat bytes slice,
without any heap pointers. Then, to access the apd.Decimals at runtime,
we "deserialize" them in a close to zero-copy fashion, inflating an
apd.Decimal by pointing its internal varlen Coeff field directly at the
serialized bytes.

Release note: None
@jordanlewis
Copy link
Member Author

Latest commit with completely flat representation has the following benchmark results:

name                                                                   old time/op    new time/op    delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-24             35.9µs ± 5%    32.6µs ± 2%   -9.21%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24            41.0µs ± 2%    37.9µs ± 5%   -7.50%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-24            38.6µs ± 3%    37.4µs ± 2%   -2.94%  (p=0.001 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-24           39.3µs ± 3%    36.7µs ± 2%   -6.76%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-24           155µs ± 2%     149µs ± 1%   -3.83%  (p=0.000 n=8+9)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-24           155µs ± 2%     146µs ± 2%   -5.94%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-24          135µs ± 2%     126µs ± 3%   -6.27%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-24         125µs ± 3%     122µs ± 6%   -2.38%  (p=0.043 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-24        125µs ± 3%     120µs ±10%   -4.65%  (p=0.005 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24         2.12ms ± 0%    2.43ms ± 2%  +14.62%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-24         2.02ms ± 1%    2.07ms ± 1%   +2.56%  (p=0.000 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-24        1.54ms ± 1%    1.52ms ± 1%   -1.44%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-24       1.41ms ± 1%    1.43ms ± 1%   +0.86%  (p=0.002 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24      1.36ms ± 1%    1.36ms ± 1%     ~     (p=0.247 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-24       63.9ms ± 1%    77.2ms ± 1%  +20.77%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-24       61.2ms ± 1%    64.3ms ± 1%   +4.96%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-24      49.0ms ± 1%    48.8ms ± 2%     ~     (p=0.280 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-24     44.9ms ± 1%    44.6ms ± 1%     ~     (p=0.075 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-24    43.1ms ± 1%    42.7ms ± 1%   -0.91%  (p=0.004 n=10+10)

name                                                                   old speed      new speed      delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-24            223kB/s ± 6%   246kB/s ± 2%  +10.15%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24          6.25MB/s ± 2%  6.76MB/s ± 4%   +8.17%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-24          6.64MB/s ± 3%  6.84MB/s ± 2%   +3.03%  (p=0.001 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-24         6.51MB/s ± 2%  6.98MB/s ± 2%   +7.23%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-24        52.9MB/s ± 2%  55.0MB/s ± 1%   +3.97%  (p=0.000 n=8+9)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-24        52.9MB/s ± 2%  56.3MB/s ± 2%   +6.33%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-24       60.9MB/s ± 2%  64.9MB/s ± 3%   +6.72%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-24      65.7MB/s ± 3%  67.3MB/s ± 6%   +2.52%  (p=0.043 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-24     65.3MB/s ± 3%  68.7MB/s ±10%   +5.16%  (p=0.005 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24        123MB/s ± 0%   108MB/s ± 2%  -12.75%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-24        130MB/s ± 1%   127MB/s ± 1%   -2.50%  (p=0.000 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-24       170MB/s ± 1%   172MB/s ± 1%   +1.45%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-24      185MB/s ± 1%   184MB/s ± 1%   -0.85%  (p=0.002 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24     192MB/s ± 1%   193MB/s ± 1%     ~     (p=0.247 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-24      131MB/s ± 1%   109MB/s ± 1%  -17.20%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-24      137MB/s ± 1%   131MB/s ± 1%   -4.72%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-24     171MB/s ± 1%   172MB/s ± 2%     ~     (p=0.280 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-24    187MB/s ± 1%   188MB/s ± 1%     ~     (p=0.075 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-24   195MB/s ± 1%   196MB/s ± 1%   +0.92%  (p=0.004 n=10+10)

name                                                                   old alloc/op   new alloc/op   delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-24             70.0kB ± 0%    91.3kB ± 0%  +30.55%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24            73.7kB ± 0%    96.1kB ± 0%  +30.38%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-24            73.6kB ± 0%    96.1kB ± 0%  +30.60%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-24           73.4kB ± 0%    96.1kB ± 0%  +30.82%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-24           194kB ± 0%     249kB ± 0%  +28.63%  (p=0.000 n=10+8)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-24           189kB ± 0%     249kB ± 0%  +31.44%  (p=0.000 n=8+9)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-24          186kB ± 0%     249kB ± 0%  +34.20%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-24         185kB ± 0%     249kB ± 0%  +34.32%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-24        185kB ± 0%     249kB ± 0%  +34.30%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24          293kB ± 0%     353kB ± 0%  +20.21%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-24          291kB ± 0%     355kB ± 0%  +22.27%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-24         221kB ± 0%     277kB ± 0%  +25.23%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-24        213kB ± 0%     276kB ± 0%  +29.70%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24       210kB ± 0%     274kB ± 0%  +30.73%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-24       1.12MB ± 0%    1.15MB ± 0%   +2.46%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-24       1.15MB ± 0%    1.22MB ± 0%   +5.52%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-24      1.16MB ± 0%    1.26MB ± 0%   +8.37%  (p=0.000 n=7+8)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-24     1.12MB ± 0%    1.22MB ± 0%   +8.90%  (p=0.000 n=8+8)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-24     992kB ± 0%    1081kB ± 0%   +8.98%  (p=0.000 n=9+10)

name                                                                   old allocs/op  new allocs/op  delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-24               47.0 ± 0%      48.0 ± 0%   +2.13%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-24              78.0 ± 0%      48.0 ± 0%  -38.46%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-24              62.0 ± 0%      48.0 ± 0%  -22.58%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-24             47.0 ± 0%      48.0 ± 0%   +2.13%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-24           1.07k ± 0%     0.05k ± 0%  -95.51%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-24             558 ± 0%        50 ± 0%  -91.04%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-24           80.0 ± 0%      56.0 ± 0%  -30.00%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-24          54.0 ± 0%      52.0 ± 0%   -3.70%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-24         47.0 ± 0%      48.0 ± 0%   +2.13%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-24          3.50k ± 0%     0.37k ± 0%  -89.53%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-24          3.03k ± 0%     0.46k ± 0%  -84.96%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-24         1.51k ± 0%     0.47k ± 0%  -69.05%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-24          676 ± 0%       448 ± 0%  -33.73%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-24         398 ± 0%       372 ± 0%   -6.53%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-24        15.4k ± 0%     10.3k ± 0%  -33.21%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-24        16.4k ± 0%     12.8k ± 0%  -21.93%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-24       16.5k ± 0%     14.3k ± 0%  -13.38%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-24      15.0k ± 0%     12.9k ± 0%  -14.02%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-24     11.7k ± 0%     10.8k ± 0%   -8.41%  (p=0.000 n=10+10)

* origin/master: (113 commits)
  update external contributor hall of fame
  ccl/sqlproxyccl: idle connection timout support
  builtins: add fuzzystrmatch soundex and difference builtin functions
  sql,log: productionize the event logging
  kv: fix a snapshot error test matcher
  server: fix decomm status replica overcounting for r1
  sql: fix bug allowing FKs referencing columns with no unique constraint
  sql: fix bug preventing rollback of ALTER TABLE ADD FOREIGN KEY
  pkg/sql: implement levenshtein
  sql: populated pg_depend with table and view dependencies
  sql: use constraint name when adding a primary key constraint
  bazel: generate a few SQL files within the sandbox
  execinfrapb: include complete component ID in stats proto
  build/teamcity-support.sh: re-instate the github-post install
  kvserver: disallow racing replicate queue during tests
  ui: DB Console branding refresh
  sql/catalog/descs: return appropriate type from Get(Table|Type)ByName
  sql: add sql.trace.stmt.enable_threshold
  build: tweak the TC test runner to detect package fails
  sql: add unique constraints to table descriptor for UNIQUE WITHOUT INDEX
  ...
* origin/master:
  colexec: remove almost all usages of execgen.SLICE
  sql: initial support for virtual columns
  kv/kvserver: skip TestReplicateAfterTruncation
  optbuilder: reduce redundant building of arbiter filter expressions
  opt: build all partial index predicate expressions in TableMeta
@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
craig bot pushed a commit that referenced this pull request Jan 11, 2022
74590: colexec: integrate flat, compact decimal datums r=nvanbenschoten a=nvanbenschoten

Replaces #74369 and #57593.

This PR picks up the following changes to `cockroachdb/apd`:
- cockroachdb/apd#103
- cockroachdb/apd#104
- cockroachdb/apd#107
- cockroachdb/apd#108
- cockroachdb/apd#109
- cockroachdb/apd#110
- cockroachdb/apd#111

Release note (performance improvement): The memory representation of DECIMAL datums has been optimized to save space, avoid heap allocations, and eliminate indirection. This increases the speed of DECIMAL arithmetic and aggregation by up to 20% on large data sets.

----

At a high-level, those changes implement the "compact memory representation" for Decimals described in cockroachdb/apd#102 (comment) and later implemented in cockroachdb/apd#103.

Compared to the approach on master, the approach in cockroachdb/apd#103 is a) faster, b) avoids indirection + heap allocation, c) smaller.

Compared to the alternate approach in cockroachdb/apd#102, the approach in cockroachdb/apd#103 is a) [faster for most operations](cockroachdb/apd#102 (comment)), b) more usable because values can be safely copied, c) half the memory size (32 bytes per `Decimal`, vs. 64). 

The memory representation of the Decimal struct in this approach looks like:
```go
type Decimal struct {
    Form     int8
    Negative bool
    Exponent int32
    Coeff    BigInt {
        _inner  *big.Int // nil when value fits in _inline
        _inline [2]uint
    }
} // sizeof = 32
```

With a two-word inline array, any value that would fit in a 128-bit integer (i.e. decimals with a scale-adjusted absolute value up to 2^128 - 1) fit in `_inline`. The indirection through `_inner` is only used for values larger than this.

Before this change, the memory representation of the `Decimal` struct looked like:
```go
type Decimal struct {
    Form     int64
    Negative bool
    Exponent int32
    Coeff    big.Int {
        neg bool
        abs []big.Word {
            data uintptr ---------------. 
            len  int64                  v
            cap  int64         [uint, uint, ...] // sizeof = variable, but around cap = 4, so 32 bytes
        }
    }
} // sizeof = 48 flat bytes + variable-length heap allocated array
```

----

## Performance impact

### Speedup on TPC-DS dataset

The TPC-DS dataset is full of decimal columns, so it's a good playground to test this change. Unfortunately, the variance in the runtime performance of the TPC-DS queries themselves is high (many queries varied by 30-40% per attempt), so it was hard to get signal out of them. Instead, I imported the TPC-DS dataset with a scale factor of 10 and ran some custom aggregation queries against the largest table (`web_sales`, row count = 7,197,566):

Queries
```sql
# q1
select sum(ws_wholesale_cost + ws_ext_list_price) from web_sales;

# q2
select sum(2 * ws_wholesale_cost + ws_ext_list_price) - max(4 * ws_ext_ship_cost), min(ws_net_profit) from web_sales;

# q3
select max(ws_bill_customer_sk + ws_bill_cdemo_sk + ws_bill_hdemo_sk + ws_bill_addr_sk + ws_ship_customer_sk + ws_ship_cdemo_sk + ws_ship_hdemo_sk + ws_ship_addr_sk + ws_web_page_sk + ws_web_site_sk + ws_ship_mode_sk + ws_warehouse_sk + ws_promo_sk + ws_order_number + ws_quantity + ws_wholesale_cost + ws_list_price + ws_sales_price + ws_ext_discount_amt + ws_ext_sales_price + ws_ext_wholesale_cost + ws_ext_list_price + ws_ext_tax + ws_coupon_amt + ws_ext_ship_cost + ws_net_paid + ws_net_paid_inc_tax + ws_net_paid_inc_ship + ws_net_paid_inc_ship_tax + ws_net_profit) from web_sales;
```

Here's the difference in runtime of these three queries before and after this change on an `n2-standard-4` instance:
```
name              old s/op   new s/op   delta
TPC-DS/custom/q1  7.21 ± 3%  6.59 ± 0%   -8.57%  (p=0.000 n=10+10)
TPC-DS/custom/q2  10.2 ± 0%   9.7 ± 3%   -5.42%  (p=0.000 n=10+10)
TPC-DS/custom/q3  21.9 ± 1%  17.3 ± 0%  -21.13%  (p=0.000 n=10+10)
```

### Heap allocation reduction in TPC-DS

Part of the reason for this speedup was that it significantly reduces heap allocations because most decimal values are stored inline. We can see this in q3 from above. Before the change, a heap profile looks like:

<img width="1751" alt="Screen Shot 2022-01-07 at 7 12 49 PM" src="https://user-images.githubusercontent.com/5438456/148625159-9ceb470a-0742-4f75-a533-530d9944143c.png">

After the change, a heap profile looks like:

<img width="1749" alt="Screen Shot 2022-01-07 at 7 17 32 PM" src="https://user-images.githubusercontent.com/5438456/148625174-629f4b47-07cc-4ef6-8723-2e556f7fc00d.png">

_(the dominant source of heap allocations is now `coldata.(*Nulls).Or`. #74592 should help here)_

### Heap allocation reduction in TPC-E

On the read-only portion of the TPC-E (77% of the full workload, in terms of txn mix), this change has a significant impact on total heap allocations. Before the change, `math/big.nat.make` was responsible for **51.07%** of total heap allocations:

<img width="1587" alt="Screen Shot 2021-12-31 at 8 01 00 PM" src="https://user-images.githubusercontent.com/5438456/147842722-965d649d-b29a-4f66-aa07-1b05e52e97af.png">

After the change, `math/big.nat.make` is responsible for only **1.1%** of total heap allocations:

<img width="1580" alt="Screen Shot 2021-12-31 at 9 04 24 PM" src="https://user-images.githubusercontent.com/5438456/147842727-a881a5a3-d038-48bb-bd44-4ade665afe73.png">

That equates to roughly a **50%** reduction in heap allocations.

### Microbenchmarks

```
name                                                                   old time/op    new time/op     delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-10          65.6µs ± 2%     42.5µs ± 0%  -35.15%  (p=0.000 n=9+8)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-10          68.4µs ± 1%     48.4µs ± 1%  -29.20%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-10         1.65ms ± 1%     1.20ms ± 1%  -27.31%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-10       51.4ms ± 1%     38.3ms ± 1%  -25.59%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-10            12.5µs ± 1%      9.4µs ± 2%  -24.72%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-10            12.5µs ± 1%      9.6µs ± 2%  -23.24%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-10             10.5µs ± 1%      8.0µs ± 1%  -23.22%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-10           12.4µs ± 1%      9.6µs ± 1%  -22.70%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-10       60.5µs ± 1%     47.1µs ± 2%  -22.24%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-10        61.2µs ± 1%     47.7µs ± 1%  -22.09%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-10         62.3µs ± 1%     48.7µs ± 2%  -21.91%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-10         1.31ms ± 0%     1.03ms ± 1%  -21.53%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1024-10          82.3µs ± 1%     64.9µs ± 1%  -21.12%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1024-10           86.6µs ± 1%     68.5µs ± 1%  -20.93%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1024-10            96.0µs ± 1%     77.1µs ± 1%  -19.73%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-10       41.2ms ± 0%     33.1ms ± 0%  -19.64%  (p=0.000 n=8+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32-10              17.5µs ± 1%     14.3µs ± 2%  -18.59%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1-10                14.8µs ± 3%     12.1µs ± 3%  -18.26%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32-10               20.0µs ± 1%     16.4µs ± 1%  -18.04%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-10               20.9µs ± 1%     17.2µs ± 3%  -17.80%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-10       884µs ± 0%      731µs ± 0%  -17.30%  (p=0.000 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-10    27.9ms ± 0%     23.1ms ± 0%  -17.27%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1024-10              218µs ± 2%      181µs ± 2%  -17.23%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-10        911µs ± 1%      755µs ± 1%  -17.10%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-10         957µs ± 1%      798µs ± 0%  -16.66%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-10         1.54ms ± 1%     1.29ms ± 1%  -16.56%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1024-10              188µs ± 1%      157µs ± 2%  -16.33%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-10     28.8ms ± 0%     24.1ms ± 0%  -16.14%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-10      30.4ms ± 0%     25.7ms ± 1%  -15.26%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1048576-10          135ms ± 1%      114ms ± 1%  -15.21%  (p=0.000 n=10+9)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=32768-10          1.79ms ± 1%     1.52ms ± 1%  -15.14%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-10            6.29ms ± 1%     5.50ms ± 1%  -12.62%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1048576-10       62.2ms ± 0%     54.7ms ± 0%  -12.08%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32768-10           2.46ms ± 1%     2.17ms ± 1%  -11.88%  (p=0.000 n=10+9)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32768-10            5.64ms ± 0%     4.98ms ± 0%  -11.76%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1048576-10           354ms ± 2%      318ms ± 1%  -10.18%  (p=0.000 n=10+8)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1048576-10        91.8ms ± 1%     83.3ms ± 0%   -9.25%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1048576-10           396ms ± 1%      369ms ± 1%   -6.83%  (p=0.000 n=8+8)

name                                                                   old speed      new speed       delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-10         125MB/s ± 2%    193MB/s ± 0%  +54.20%  (p=0.000 n=9+8)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-10         120MB/s ± 1%    169MB/s ± 1%  +41.24%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-10        159MB/s ± 1%    219MB/s ± 1%  +37.57%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-10      163MB/s ± 1%    219MB/s ± 1%  +34.39%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-10          20.4MB/s ± 1%   27.2MB/s ± 2%  +32.85%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-10            764kB/s ± 2%    997kB/s ± 1%  +30.45%  (p=0.000 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-10          20.5MB/s ± 1%   26.8MB/s ± 2%  +30.28%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-10         20.7MB/s ± 1%   26.8MB/s ± 1%  +29.37%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-10      135MB/s ± 1%    174MB/s ± 2%  +28.61%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-10       134MB/s ± 1%    172MB/s ± 1%  +28.35%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-10        131MB/s ± 1%    168MB/s ± 2%  +28.06%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-10        200MB/s ± 0%    255MB/s ± 1%  +27.45%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1024-10         100MB/s ± 1%    126MB/s ± 1%  +26.78%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1024-10         94.6MB/s ± 1%  119.6MB/s ± 1%  +26.47%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1024-10          85.3MB/s ± 1%  106.3MB/s ± 1%  +24.58%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-10      204MB/s ± 0%    254MB/s ± 0%  +24.44%  (p=0.000 n=8+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32-10            14.6MB/s ± 1%   18.0MB/s ± 2%  +22.83%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1-10               544kB/s ± 3%    664kB/s ± 2%  +22.06%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32-10             12.8MB/s ± 1%   15.6MB/s ± 1%  +22.02%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-10             12.3MB/s ± 1%   14.9MB/s ± 3%  +21.67%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-10     296MB/s ± 0%    358MB/s ± 0%  +20.92%  (p=0.000 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-10   300MB/s ± 0%    363MB/s ± 0%  +20.87%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1024-10           37.5MB/s ± 2%   45.4MB/s ± 2%  +20.82%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-10      288MB/s ± 1%    347MB/s ± 1%  +20.62%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-10       274MB/s ± 1%    329MB/s ± 0%  +19.99%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-10        170MB/s ± 1%    204MB/s ± 1%  +19.85%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1024-10           43.6MB/s ± 1%   52.1MB/s ± 2%  +19.52%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-10    292MB/s ± 0%    348MB/s ± 0%  +19.25%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-10     276MB/s ± 0%    326MB/s ± 1%  +18.00%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1048576-10       62.1MB/s ± 1%   73.3MB/s ± 1%  +17.94%  (p=0.000 n=10+9)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=32768-10         147MB/s ± 1%    173MB/s ± 1%  +17.83%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-10          41.7MB/s ± 1%   47.7MB/s ± 1%  +14.44%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1048576-10      135MB/s ± 0%    153MB/s ± 0%  +13.74%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32768-10          106MB/s ± 1%    121MB/s ± 1%  +13.48%  (p=0.000 n=10+9)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32768-10          46.5MB/s ± 0%   52.7MB/s ± 0%  +13.34%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1048576-10        23.7MB/s ± 2%   26.3MB/s ± 2%  +11.02%  (p=0.000 n=10+9)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1048576-10      91.3MB/s ± 0%  100.7MB/s ± 0%  +10.27%  (p=0.000 n=8+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1048576-10        21.2MB/s ± 1%   22.7MB/s ± 1%   +7.32%  (p=0.000 n=8+8)

name                                                                   old alloc/op   new alloc/op    delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-10          354kB ± 0%      239kB ± 0%  -32.39%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-10          348kB ± 0%      239kB ± 0%  -31.23%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-10           251kB ± 0%      177kB ± 0%  -29.44%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-10           246kB ± 0%      177kB ± 0%  -28.28%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-10         275kB ± 0%      198kB ± 0%  -28.06%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-10          243kB ± 0%      177kB ± 0%  -27.15%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-10         242kB ± 0%      177kB ± 0%  -27.09%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-10        242kB ± 0%      177kB ± 0%  -27.06%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-10        268kB ± 0%      198kB ± 0%  -26.05%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-10       264kB ± 0%      198kB ± 0%  -25.04%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-10            75.1kB ± 0%     56.9kB ± 0%  -24.25%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-10            74.9kB ± 0%     56.9kB ± 0%  -24.12%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-10           74.8kB ± 0%     56.9kB ± 0%  -23.99%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-10             69.6kB ± 0%     53.1kB ± 0%  -23.66%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1-10                95.2kB ± 0%     75.9kB ± 0%  -20.23%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32-10                102kB ± 0%       82kB ± 0%  -20.04%  (p=0.000 n=8+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-10                103kB ± 0%       83kB ± 0%  -19.95%  (p=0.000 n=7+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32-10               100kB ± 0%       80kB ± 0%  -19.90%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-10      1.14MB ± 0%     0.92MB ± 0%  -18.80%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1024-10           271kB ± 0%      227kB ± 0%  -16.16%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-10       1.10MB ± 0%     0.92MB ± 0%  -15.92%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1024-10            280kB ± 1%      235kB ± 1%  -15.91%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-10     1.09MB ± 1%     0.92MB ± 0%  -15.67%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1024-10             291kB ± 0%      245kB ± 1%  -15.53%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-10         1.11MB ± 0%     0.95MB ± 0%  -15.14%  (p=0.000 n=8+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=32768-10          1.22MB ± 0%     1.04MB ± 0%  -14.77%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32768-10           1.65MB ± 0%     1.42MB ± 0%  -13.56%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1024-10              593kB ± 0%      513kB ± 0%  -13.36%  (p=0.000 n=9+8)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1024-10              520kB ± 0%      454kB ± 0%  -12.82%  (p=0.000 n=9+8)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-10       1.04MB ± 0%     0.92MB ± 0%  -11.06%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1048576-10       2.48MB ± 0%     2.25MB ± 0%   -9.32%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-10     967kB ± 0%      881kB ± 0%   -8.89%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1048576-10        7.86MB ± 0%     7.36MB ± 0%   -6.44%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-10            14.2MB ± 1%     13.4MB ± 1%   -5.83%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32768-10            12.3MB ± 0%     11.7MB ± 0%   -5.03%  (p=0.001 n=7+7)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1048576-10         27.2MB ± 1%     25.9MB ± 1%   -4.84%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1048576-10           465MB ± 0%      445MB ± 0%   -4.32%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1048576-10           403MB ± 0%      390MB ± 0%   -3.44%  (p=0.000 n=10+10)

name                                                                   old allocs/op  new allocs/op   delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-10           1.07k ± 0%      0.05k ± 0%  -95.70%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1048576-10            702k ± 0%        32k ± 0%  -95.46%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1048576-10            489k ± 0%        28k ± 0%  -94.33%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-10          4.40k ± 0%      0.30k ± 0%  -93.15%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1024-10           1.11k ± 0%      0.09k ± 0%  -92.02%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-10             561 ± 0%         46 ± 0%  -91.80%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-10          3.45k ± 0%      0.30k ± 0%  -91.28%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1024-10            1.19k ± 0%      0.15k ± 1%  -87.31%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-10          4.87k ± 0%      0.70k ± 0%  -85.69%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-10             32.2k ± 0%       6.3k ± 0%  -80.40%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-10         1.45k ± 3%      0.29k ± 0%  -79.66%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1024-10             1.39k ± 0%      0.30k ± 1%  -78.64%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32768-10             26.2k ± 0%       6.8k ± 1%  -73.95%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=32768-10           6.64k ± 0%      1.95k ± 0%  -70.67%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1024-10              3.44k ± 1%      1.12k ± 1%  -67.48%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1048576-10          62.4k ± 0%      20.4k ± 0%  -67.32%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1024-10              2.95k ± 1%      1.05k ± 1%  -64.52%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32768-10            10.8k ± 0%       4.5k ± 0%  -58.21%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-10          628 ± 3%        294 ± 0%  -53.21%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1048576-10         36.1k ± 0%      20.2k ± 0%  -44.06%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-10           81.7 ± 3%       46.0 ± 0%  -43.67%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-10       14.4k ± 1%       8.2k ± 0%  -42.97%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-10              79.0 ± 0%       46.0 ± 0%  -41.77%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-10        13.7k ± 1%       8.2k ± 0%  -40.05%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-10                  191 ± 1%        120 ± 1%  -37.52%  (p=0.000 n=7+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-10      12.9k ± 2%       8.2k ± 0%  -36.17%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32-10                  176 ± 2%        115 ± 1%  -34.33%  (p=0.000 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-10        12.3k ± 0%       8.2k ± 0%  -33.21%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1048576-10        21.8k ± 0%      15.2k ± 0%  -30.13%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32-10                 118 ± 0%         84 ± 0%  -28.81%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-10              63.0 ± 0%       46.0 ± 0%  -26.98%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-10          57.2 ±14%       46.0 ± 0%  -19.58%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-10     9.69k ± 1%      8.23k ± 0%  -15.07%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-10         340 ± 2%        294 ± 0%  -13.43%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-10               48.0 ± 0%       46.0 ± 0%   -4.17%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-10             48.0 ± 0%       46.0 ± 0%   -4.17%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-10         48.0 ± 0%       46.0 ± 0%   -4.17%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1-10                  82.0 ± 0%       79.0 ± 0%   -3.66%  (p=0.000 n=10+10)
```

Co-authored-by: Nathan VanBenschoten <[email protected]>
@jordanlewis jordanlewis closed this Jan 3, 2023
@jordanlewis jordanlewis deleted the flat-dec branch January 3, 2023 14:57
@jordanlewis
Copy link
Member Author

This is 1 year late but thanks for taking this over the finish line @nvanbenschoten!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants