Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coldata: operate on Nulls value, not reference #74592

Merged
merged 1 commit into from
Jan 10, 2022

Conversation

nvanbenschoten
Copy link
Member

This commit changes col.Vec.SetNulls to accept a Nulls struct by value instead of by pointer. This lets us avoid a heap allocation on each call to Nulls.Or.

We saw this in the "after" heap profiles in #74590, which looked like:

Screen Shot 2022-01-07 at 7 17 32 PM

      File: cockroach
Type: alloc_objects
Time: Jan 8, 2022 at 12:17am (UTC)
Showing nodes accounting for 5943494, 100% of 5943494 total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
                                            843873 48.47% |   github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj.projPlusDecimalDecimalOp.Next.func1 /go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj/proj_non_const_ops.eg.go:3938
                                            823389 47.29% |   github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj.projPlusInt64Int64Op.Next.func1 /go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj/proj_non_const_ops.eg.go:5732
                                             73736  4.24% |   github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj.projPlusInt64DecimalOp.Next.func1 /go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj/proj_non_const_ops.eg.go:5870
   1740998 29.29% 29.29%    1740998 29.29%                | github.com/cockroachdb/cockroach/pkg/col/coldata.(*Nulls).Or /go/src/github.com/cockroachdb/cockroach/pkg/col/coldata/nulls.go:350
----------------------------------------------------------+-------------
                                            819219 49.50% |   github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj.projPlusInt64Int64Op.Next.func1 /go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj/proj_non_const_ops.eg.go:5732
                                            704530 42.57% |   github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj.projPlusDecimalDecimalOp.Next.func1 /go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj/proj_non_const_ops.eg.go:3938
                                            131076  7.92% |   github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj.projPlusInt64DecimalOp.Next.func1 /go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecproj/proj_non_const_ops.eg.go:5870
   1654825 27.84% 57.14%    1654825 27.84%                | github.com/cockroachdb/cockroach/pkg/col/coldata.(*Nulls).Or /go/src/github.com/cockroachdb/cockroach/pkg/col/coldata/nulls.go:348
----------------------------------------------------------+-------------

This PR eliminates one of these two heap allocations.

@nvanbenschoten nvanbenschoten requested a review from a team as a code owner January 8, 2022 01:04
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :lgtm:

Reviewed 12 of 12 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


-- commits, line 2 at r1:
nit: s/col/coldata/.


-- commits, line 6 at r1:
nit: no release note.

This commit changes `col.Vec.SetNulls` to accept a `Nulls` struct by
value instead of by pointer. This lets us avoid a heap allocation on
each call to `Nulls.Or`.

Releaes note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/valNulls branch from e1ecea5 to 8c56129 Compare January 10, 2022 04:27
@nvanbenschoten nvanbenschoten changed the title col: operate on Nulls value, not reference coldata: operate on Nulls value, not reference Jan 10, 2022
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=yuzefovich

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)


-- commits, line 2 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/col/coldata/.

Done.


-- commits, line 6 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: no release note.

Done.

@craig
Copy link
Contributor

craig bot commented Jan 10, 2022

Build succeeded:

@craig craig bot merged commit 62392dc into cockroachdb:master Jan 10, 2022
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/valNulls branch January 10, 2022 21:18
craig bot pushed a commit that referenced this pull request Jan 11, 2022
74590: colexec: integrate flat, compact decimal datums r=nvanbenschoten a=nvanbenschoten

Replaces #74369 and #57593.

This PR picks up the following changes to `cockroachdb/apd`:
- cockroachdb/apd#103
- cockroachdb/apd#104
- cockroachdb/apd#107
- cockroachdb/apd#108
- cockroachdb/apd#109
- cockroachdb/apd#110
- cockroachdb/apd#111

Release note (performance improvement): The memory representation of DECIMAL datums has been optimized to save space, avoid heap allocations, and eliminate indirection. This increases the speed of DECIMAL arithmetic and aggregation by up to 20% on large data sets.

----

At a high-level, those changes implement the "compact memory representation" for Decimals described in cockroachdb/apd#102 (comment) and later implemented in cockroachdb/apd#103.

Compared to the approach on master, the approach in cockroachdb/apd#103 is a) faster, b) avoids indirection + heap allocation, c) smaller.

Compared to the alternate approach in cockroachdb/apd#102, the approach in cockroachdb/apd#103 is a) [faster for most operations](cockroachdb/apd#102 (comment)), b) more usable because values can be safely copied, c) half the memory size (32 bytes per `Decimal`, vs. 64). 

The memory representation of the Decimal struct in this approach looks like:
```go
type Decimal struct {
    Form     int8
    Negative bool
    Exponent int32
    Coeff    BigInt {
        _inner  *big.Int // nil when value fits in _inline
        _inline [2]uint
    }
} // sizeof = 32
```

With a two-word inline array, any value that would fit in a 128-bit integer (i.e. decimals with a scale-adjusted absolute value up to 2^128 - 1) fit in `_inline`. The indirection through `_inner` is only used for values larger than this.

Before this change, the memory representation of the `Decimal` struct looked like:
```go
type Decimal struct {
    Form     int64
    Negative bool
    Exponent int32
    Coeff    big.Int {
        neg bool
        abs []big.Word {
            data uintptr ---------------. 
            len  int64                  v
            cap  int64         [uint, uint, ...] // sizeof = variable, but around cap = 4, so 32 bytes
        }
    }
} // sizeof = 48 flat bytes + variable-length heap allocated array
```

----

## Performance impact

### Speedup on TPC-DS dataset

The TPC-DS dataset is full of decimal columns, so it's a good playground to test this change. Unfortunately, the variance in the runtime performance of the TPC-DS queries themselves is high (many queries varied by 30-40% per attempt), so it was hard to get signal out of them. Instead, I imported the TPC-DS dataset with a scale factor of 10 and ran some custom aggregation queries against the largest table (`web_sales`, row count = 7,197,566):

Queries
```sql
# q1
select sum(ws_wholesale_cost + ws_ext_list_price) from web_sales;

# q2
select sum(2 * ws_wholesale_cost + ws_ext_list_price) - max(4 * ws_ext_ship_cost), min(ws_net_profit) from web_sales;

# q3
select max(ws_bill_customer_sk + ws_bill_cdemo_sk + ws_bill_hdemo_sk + ws_bill_addr_sk + ws_ship_customer_sk + ws_ship_cdemo_sk + ws_ship_hdemo_sk + ws_ship_addr_sk + ws_web_page_sk + ws_web_site_sk + ws_ship_mode_sk + ws_warehouse_sk + ws_promo_sk + ws_order_number + ws_quantity + ws_wholesale_cost + ws_list_price + ws_sales_price + ws_ext_discount_amt + ws_ext_sales_price + ws_ext_wholesale_cost + ws_ext_list_price + ws_ext_tax + ws_coupon_amt + ws_ext_ship_cost + ws_net_paid + ws_net_paid_inc_tax + ws_net_paid_inc_ship + ws_net_paid_inc_ship_tax + ws_net_profit) from web_sales;
```

Here's the difference in runtime of these three queries before and after this change on an `n2-standard-4` instance:
```
name              old s/op   new s/op   delta
TPC-DS/custom/q1  7.21 ± 3%  6.59 ± 0%   -8.57%  (p=0.000 n=10+10)
TPC-DS/custom/q2  10.2 ± 0%   9.7 ± 3%   -5.42%  (p=0.000 n=10+10)
TPC-DS/custom/q3  21.9 ± 1%  17.3 ± 0%  -21.13%  (p=0.000 n=10+10)
```

### Heap allocation reduction in TPC-DS

Part of the reason for this speedup was that it significantly reduces heap allocations because most decimal values are stored inline. We can see this in q3 from above. Before the change, a heap profile looks like:

<img width="1751" alt="Screen Shot 2022-01-07 at 7 12 49 PM" src="https://user-images.githubusercontent.com/5438456/148625159-9ceb470a-0742-4f75-a533-530d9944143c.png">

After the change, a heap profile looks like:

<img width="1749" alt="Screen Shot 2022-01-07 at 7 17 32 PM" src="https://user-images.githubusercontent.com/5438456/148625174-629f4b47-07cc-4ef6-8723-2e556f7fc00d.png">

_(the dominant source of heap allocations is now `coldata.(*Nulls).Or`. #74592 should help here)_

### Heap allocation reduction in TPC-E

On the read-only portion of the TPC-E (77% of the full workload, in terms of txn mix), this change has a significant impact on total heap allocations. Before the change, `math/big.nat.make` was responsible for **51.07%** of total heap allocations:

<img width="1587" alt="Screen Shot 2021-12-31 at 8 01 00 PM" src="https://user-images.githubusercontent.com/5438456/147842722-965d649d-b29a-4f66-aa07-1b05e52e97af.png">

After the change, `math/big.nat.make` is responsible for only **1.1%** of total heap allocations:

<img width="1580" alt="Screen Shot 2021-12-31 at 9 04 24 PM" src="https://user-images.githubusercontent.com/5438456/147842727-a881a5a3-d038-48bb-bd44-4ade665afe73.png">

That equates to roughly a **50%** reduction in heap allocations.

### Microbenchmarks

```
name                                                                   old time/op    new time/op     delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-10          65.6µs ± 2%     42.5µs ± 0%  -35.15%  (p=0.000 n=9+8)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-10          68.4µs ± 1%     48.4µs ± 1%  -29.20%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-10         1.65ms ± 1%     1.20ms ± 1%  -27.31%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-10       51.4ms ± 1%     38.3ms ± 1%  -25.59%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-10            12.5µs ± 1%      9.4µs ± 2%  -24.72%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-10            12.5µs ± 1%      9.6µs ± 2%  -23.24%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-10             10.5µs ± 1%      8.0µs ± 1%  -23.22%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-10           12.4µs ± 1%      9.6µs ± 1%  -22.70%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-10       60.5µs ± 1%     47.1µs ± 2%  -22.24%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-10        61.2µs ± 1%     47.7µs ± 1%  -22.09%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-10         62.3µs ± 1%     48.7µs ± 2%  -21.91%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-10         1.31ms ± 0%     1.03ms ± 1%  -21.53%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1024-10          82.3µs ± 1%     64.9µs ± 1%  -21.12%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1024-10           86.6µs ± 1%     68.5µs ± 1%  -20.93%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1024-10            96.0µs ± 1%     77.1µs ± 1%  -19.73%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-10       41.2ms ± 0%     33.1ms ± 0%  -19.64%  (p=0.000 n=8+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32-10              17.5µs ± 1%     14.3µs ± 2%  -18.59%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1-10                14.8µs ± 3%     12.1µs ± 3%  -18.26%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32-10               20.0µs ± 1%     16.4µs ± 1%  -18.04%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-10               20.9µs ± 1%     17.2µs ± 3%  -17.80%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-10       884µs ± 0%      731µs ± 0%  -17.30%  (p=0.000 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-10    27.9ms ± 0%     23.1ms ± 0%  -17.27%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1024-10              218µs ± 2%      181µs ± 2%  -17.23%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-10        911µs ± 1%      755µs ± 1%  -17.10%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-10         957µs ± 1%      798µs ± 0%  -16.66%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-10         1.54ms ± 1%     1.29ms ± 1%  -16.56%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1024-10              188µs ± 1%      157µs ± 2%  -16.33%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-10     28.8ms ± 0%     24.1ms ± 0%  -16.14%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-10      30.4ms ± 0%     25.7ms ± 1%  -15.26%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1048576-10          135ms ± 1%      114ms ± 1%  -15.21%  (p=0.000 n=10+9)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=32768-10          1.79ms ± 1%     1.52ms ± 1%  -15.14%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-10            6.29ms ± 1%     5.50ms ± 1%  -12.62%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1048576-10       62.2ms ± 0%     54.7ms ± 0%  -12.08%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32768-10           2.46ms ± 1%     2.17ms ± 1%  -11.88%  (p=0.000 n=10+9)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32768-10            5.64ms ± 0%     4.98ms ± 0%  -11.76%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1048576-10           354ms ± 2%      318ms ± 1%  -10.18%  (p=0.000 n=10+8)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1048576-10        91.8ms ± 1%     83.3ms ± 0%   -9.25%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1048576-10           396ms ± 1%      369ms ± 1%   -6.83%  (p=0.000 n=8+8)

name                                                                   old speed      new speed       delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-10         125MB/s ± 2%    193MB/s ± 0%  +54.20%  (p=0.000 n=9+8)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-10         120MB/s ± 1%    169MB/s ± 1%  +41.24%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-10        159MB/s ± 1%    219MB/s ± 1%  +37.57%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-10      163MB/s ± 1%    219MB/s ± 1%  +34.39%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-10          20.4MB/s ± 1%   27.2MB/s ± 2%  +32.85%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-10            764kB/s ± 2%    997kB/s ± 1%  +30.45%  (p=0.000 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-10          20.5MB/s ± 1%   26.8MB/s ± 2%  +30.28%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-10         20.7MB/s ± 1%   26.8MB/s ± 1%  +29.37%  (p=0.000 n=8+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-10      135MB/s ± 1%    174MB/s ± 2%  +28.61%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-10       134MB/s ± 1%    172MB/s ± 1%  +28.35%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-10        131MB/s ± 1%    168MB/s ± 2%  +28.06%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-10        200MB/s ± 0%    255MB/s ± 1%  +27.45%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1024-10         100MB/s ± 1%    126MB/s ± 1%  +26.78%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1024-10         94.6MB/s ± 1%  119.6MB/s ± 1%  +26.47%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1024-10          85.3MB/s ± 1%  106.3MB/s ± 1%  +24.58%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-10      204MB/s ± 0%    254MB/s ± 0%  +24.44%  (p=0.000 n=8+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32-10            14.6MB/s ± 1%   18.0MB/s ± 2%  +22.83%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1-10               544kB/s ± 3%    664kB/s ± 2%  +22.06%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32-10             12.8MB/s ± 1%   15.6MB/s ± 1%  +22.02%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-10             12.3MB/s ± 1%   14.9MB/s ± 3%  +21.67%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-10     296MB/s ± 0%    358MB/s ± 0%  +20.92%  (p=0.000 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-10   300MB/s ± 0%    363MB/s ± 0%  +20.87%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1024-10           37.5MB/s ± 2%   45.4MB/s ± 2%  +20.82%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-10      288MB/s ± 1%    347MB/s ± 1%  +20.62%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-10       274MB/s ± 1%    329MB/s ± 0%  +19.99%  (p=0.000 n=9+9)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-10        170MB/s ± 1%    204MB/s ± 1%  +19.85%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1024-10           43.6MB/s ± 1%   52.1MB/s ± 2%  +19.52%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-10    292MB/s ± 0%    348MB/s ± 0%  +19.25%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-10     276MB/s ± 0%    326MB/s ± 1%  +18.00%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1048576-10       62.1MB/s ± 1%   73.3MB/s ± 1%  +17.94%  (p=0.000 n=10+9)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=32768-10         147MB/s ± 1%    173MB/s ± 1%  +17.83%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-10          41.7MB/s ± 1%   47.7MB/s ± 1%  +14.44%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1048576-10      135MB/s ± 0%    153MB/s ± 0%  +13.74%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32768-10          106MB/s ± 1%    121MB/s ± 1%  +13.48%  (p=0.000 n=10+9)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32768-10          46.5MB/s ± 0%   52.7MB/s ± 0%  +13.34%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1048576-10        23.7MB/s ± 2%   26.3MB/s ± 2%  +11.02%  (p=0.000 n=10+9)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1048576-10      91.3MB/s ± 0%  100.7MB/s ± 0%  +10.27%  (p=0.000 n=8+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1048576-10        21.2MB/s ± 1%   22.7MB/s ± 1%   +7.32%  (p=0.000 n=8+8)

name                                                                   old alloc/op   new alloc/op    delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-10          354kB ± 0%      239kB ± 0%  -32.39%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-10          348kB ± 0%      239kB ± 0%  -31.23%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-10           251kB ± 0%      177kB ± 0%  -29.44%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-10           246kB ± 0%      177kB ± 0%  -28.28%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-10         275kB ± 0%      198kB ± 0%  -28.06%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-10          243kB ± 0%      177kB ± 0%  -27.15%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-10         242kB ± 0%      177kB ± 0%  -27.09%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-10        242kB ± 0%      177kB ± 0%  -27.06%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-10        268kB ± 0%      198kB ± 0%  -26.05%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-10       264kB ± 0%      198kB ± 0%  -25.04%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-10            75.1kB ± 0%     56.9kB ± 0%  -24.25%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-10            74.9kB ± 0%     56.9kB ± 0%  -24.12%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-10           74.8kB ± 0%     56.9kB ± 0%  -23.99%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-10             69.6kB ± 0%     53.1kB ± 0%  -23.66%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1-10                95.2kB ± 0%     75.9kB ± 0%  -20.23%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32-10                102kB ± 0%       82kB ± 0%  -20.04%  (p=0.000 n=8+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-10                103kB ± 0%       83kB ± 0%  -19.95%  (p=0.000 n=7+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32-10               100kB ± 0%       80kB ± 0%  -19.90%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-10      1.14MB ± 0%     0.92MB ± 0%  -18.80%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1024-10           271kB ± 0%      227kB ± 0%  -16.16%  (p=0.000 n=9+9)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-10       1.10MB ± 0%     0.92MB ± 0%  -15.92%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1024-10            280kB ± 1%      235kB ± 1%  -15.91%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-10     1.09MB ± 1%     0.92MB ± 0%  -15.67%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1024-10             291kB ± 0%      245kB ± 1%  -15.53%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-10         1.11MB ± 0%     0.95MB ± 0%  -15.14%  (p=0.000 n=8+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=32768-10          1.22MB ± 0%     1.04MB ± 0%  -14.77%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32768-10           1.65MB ± 0%     1.42MB ± 0%  -13.56%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1024-10              593kB ± 0%      513kB ± 0%  -13.36%  (p=0.000 n=9+8)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1024-10              520kB ± 0%      454kB ± 0%  -12.82%  (p=0.000 n=9+8)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-10       1.04MB ± 0%     0.92MB ± 0%  -11.06%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1048576-10       2.48MB ± 0%     2.25MB ± 0%   -9.32%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-10     967kB ± 0%      881kB ± 0%   -8.89%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1048576-10        7.86MB ± 0%     7.36MB ± 0%   -6.44%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-10            14.2MB ± 1%     13.4MB ± 1%   -5.83%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32768-10            12.3MB ± 0%     11.7MB ± 0%   -5.03%  (p=0.001 n=7+7)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1048576-10         27.2MB ± 1%     25.9MB ± 1%   -4.84%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1048576-10           465MB ± 0%      445MB ± 0%   -4.32%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1048576-10           403MB ± 0%      390MB ± 0%   -3.44%  (p=0.000 n=10+10)

name                                                                   old allocs/op  new allocs/op   delta
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1024-10           1.07k ± 0%      0.05k ± 0%  -95.70%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1048576-10            702k ± 0%        32k ± 0%  -95.46%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1048576-10            489k ± 0%        28k ± 0%  -94.33%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32768-10          4.40k ± 0%      0.30k ± 0%  -93.15%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1024-10           1.11k ± 0%      0.09k ± 0%  -92.02%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1024-10             561 ± 0%         46 ± 0%  -91.80%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32768-10          3.45k ± 0%      0.30k ± 0%  -91.28%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1024-10            1.19k ± 0%      0.15k ± 1%  -87.31%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=32768-10          4.87k ± 0%      0.70k ± 0%  -85.69%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32768-10             32.2k ± 0%       6.3k ± 0%  -80.40%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32768-10         1.45k ± 3%      0.29k ± 0%  -79.66%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1024-10             1.39k ± 0%      0.30k ± 1%  -78.64%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32768-10             26.2k ± 0%       6.8k ± 1%  -73.95%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=32768-10           6.64k ± 0%      1.95k ± 0%  -70.67%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1024-10              3.44k ± 1%      1.12k ± 1%  -67.48%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=1048576-10          62.4k ± 0%      20.4k ± 0%  -67.32%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=1024-10              2.95k ± 1%      1.05k ± 1%  -64.52%  (p=0.000 n=9+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32768-10            10.8k ± 0%       4.5k ± 0%  -58.21%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=32768-10          628 ± 3%        294 ± 0%  -53.21%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=128/numInputRows=1048576-10         36.1k ± 0%      20.2k ± 0%  -44.06%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1024-10           81.7 ± 3%       46.0 ± 0%  -43.67%  (p=0.000 n=9+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=1048576-10       14.4k ± 1%       8.2k ± 0%  -42.97%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=32-10              79.0 ± 0%       46.0 ± 0%  -41.77%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=1048576-10        13.7k ± 1%       8.2k ± 0%  -40.05%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=32-10                  191 ± 1%        120 ± 1%  -37.52%  (p=0.000 n=7+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1048576-10      12.9k ± 2%       8.2k ± 0%  -36.17%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=2/numInputRows=32-10                  176 ± 2%        115 ± 1%  -34.33%  (p=0.000 n=10+9)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1048576-10        12.3k ± 0%       8.2k ± 0%  -33.21%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1024/numInputRows=1048576-10        21.8k ± 0%      15.2k ± 0%  -30.13%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=32/numInputRows=32-10                 118 ± 0%         84 ± 0%  -28.81%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=2/numInputRows=32-10              63.0 ± 0%       46.0 ± 0%  -26.98%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=128/numInputRows=1024-10          57.2 ±14%       46.0 ± 0%  -19.58%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1048576-10     9.69k ± 1%      8.23k ± 0%  -15.07%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=32768-10         340 ± 2%        294 ± 0%  -13.43%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1/numInputRows=1-10               48.0 ± 0%       46.0 ± 0%   -4.17%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=32/numInputRows=32-10             48.0 ± 0%       46.0 ± 0%   -4.17%  (p=0.000 n=10+10)
Aggregator/MIN/ordered/decimal/groupSize=1024/numInputRows=1024-10         48.0 ± 0%       46.0 ± 0%   -4.17%  (p=0.000 n=10+10)
Aggregator/MIN/hash/decimal/groupSize=1/numInputRows=1-10                  82.0 ± 0%       79.0 ± 0%   -3.66%  (p=0.000 n=10+10)
```

Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants