Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvstreamer: optimize singleRangeBatch.Less #82382

Merged
merged 2 commits into from
Jun 11, 2022

Conversation

yuzefovich
Copy link
Member

bench: add benchmarks for lookup joins

This commit adds benchmarks for lookup joins, both when equality columns
are and are not key, both with and without maintaining ordering.

Release note: None

kvstreamer: optimize singleRangeBatch.Less

This commit optimizes singleRangeBatch.Less method which is used when
sorting the requests inside of these objects in the OutOfOrder mode
(which is needed to get the low-level Pebble speedups) by storing the
start keys explicitly instead of performing a couple of function calls
on each Less invocation.

name                             old time/op    new time/op    delta
IndexJoin/Cockroach-24             6.30ms ± 1%    5.78ms ± 1%  -8.31%  (p=0.000 n=10+10)
IndexJoin/MultinodeCockroach-24    8.01ms ± 1%    7.51ms ± 1%  -6.28%  (p=0.000 n=10+10)

name                             old alloc/op   new alloc/op   delta
IndexJoin/Cockroach-24             1.55MB ± 0%    1.57MB ± 0%  +0.98%  (p=0.000 n=9+10)
IndexJoin/MultinodeCockroach-24    2.28MB ± 2%    2.30MB ± 1%    ~     (p=0.400 n=10+9)

name                             old allocs/op  new allocs/op  delta
IndexJoin/Cockroach-24              8.16k ± 1%     8.13k ± 1%    ~     (p=0.160 n=10+10)
IndexJoin/MultinodeCockroach-24     12.7k ± 1%     12.6k ± 0%    ~     (p=0.128 n=10+9)
name                                                    old time/op    new time/op    delta
LookupJoinEqColsAreKeyNoOrdering/Cockroach-24             6.89ms ± 1%    6.43ms ± 1%  -6.65%  (p=0.000 n=10+10)
LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24    8.03ms ± 1%    7.48ms ± 2%  -6.92%  (p=0.000 n=10+10)
LookupJoinNoOrdering/Cockroach-24                         9.21ms ± 3%    8.82ms ± 5%  -4.23%  (p=0.007 n=10+10)
LookupJoinNoOrdering/MultinodeCockroach-24                11.9ms ± 3%    11.5ms ± 3%  -3.36%  (p=0.002 n=9+10)

name                                                    old alloc/op   new alloc/op   delta
LookupJoinEqColsAreKeyNoOrdering/Cockroach-24             1.81MB ± 1%    1.84MB ± 0%  +1.23%  (p=0.000 n=10+10)
LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24    2.50MB ± 2%    2.54MB ± 1%  +1.76%  (p=0.004 n=10+10)
LookupJoinNoOrdering/Cockroach-24                         1.89MB ± 0%    1.91MB ± 1%  +1.09%  (p=0.000 n=9+9)
LookupJoinNoOrdering/MultinodeCockroach-24                2.37MB ± 2%    2.42MB ± 4%  +1.85%  (p=0.010 n=10+9)

name                                                    old allocs/op  new allocs/op  delta
LookupJoinEqColsAreKeyNoOrdering/Cockroach-24              10.8k ± 0%     10.8k ± 1%    ~     (p=0.615 n=10+10)
LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24     15.1k ± 1%     15.0k ± 0%    ~     (p=0.101 n=10+10)
LookupJoinNoOrdering/Cockroach-24                          13.3k ± 1%     13.3k ± 1%    ~     (p=0.549 n=10+9)
LookupJoinNoOrdering/MultinodeCockroach-24                 17.3k ± 1%     17.3k ± 1%    ~     (p=0.460 n=10+8)

Addresses: #82159

Release note: None

@yuzefovich yuzefovich requested review from rharding6373, michae2 and a team June 2, 2022 23:34
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance - memory tradeoff seems fine. I have some questions about the benchmarks.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)


-- commits line 33 at r2:
What are the results for the new Ordering (as opposed to NoOrdering) benchmarks?


pkg/bench/bench_test.go line 1209 at r1 (raw file):

	defer log.Scope(b).Close(b)
	ForEachDB(b, func(b *testing.B, db *sqlutils.SQLRunner) {
		db.Exec(b, `CREATE TABLE t1 (a INT)`)

Should a be a primary key to meet the "equality columns are key" specification?


pkg/bench/bench_test.go line 1210 at r1 (raw file):

	ForEachDB(b, func(b *testing.B, db *sqlutils.SQLRunner) {
		db.Exec(b, `CREATE TABLE t1 (a INT)`)
		db.Exec(b, `INSERT INTO t1 SELECT generate_series(1, 1000)`)

Even though a is not a primary key in t1, does this insert order make it inherently in order in the storage layer? That seems like it would affect benchmark performance. Maybe that's ok, but it'd be good to be aware of.

Code quote:

SELECT * FROM t1 INNER LOOKUP JOIN t2 ON t1.a = t2.a

pkg/bench/bench_test.go line 1233 at r1 (raw file):

		for i := 0; i < b.N; i++ {
			db.Exec(b, `SELECT count(b) FROM t1 INNER LOOKUP JOIN t2 ON t1.a = t2.a GROUP BY t1.a ORDER BY t1.a`)

Why does this query need the count and GROUP BY?

Code quote:

GROUP BY t1.a

@yuzefovich yuzefovich force-pushed the lookup-join-benchmarks branch from 2fbfafc to 229ddd1 Compare June 4, 2022 05:06
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)


-- commits line 33 at r2:

Previously, rharding6373 (Rachael Harding) wrote…

What are the results for the new Ordering (as opposed to NoOrdering) benchmarks?

Those didn't change because we don't sort singleRangeBatch.reqs in the InOrder mode, so Less is not used.


pkg/bench/bench_test.go line 1209 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Should a be a primary key to meet the "equality columns are key" specification?

It doesn't really matter as long as the target index is unique. Here is the comment from JoinReaderSpec:

	// If set, the lookup columns form a key in the target table and thus each
	// lookup has at most one result.
	LookupColumnsAreKey bool `protobuf:"varint,8,opt,name=lookup_columns_are_key,json=lookupColumnsAreKey" json:"lookup_columns_are_key"`

pkg/bench/bench_test.go line 1210 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Even though a is not a primary key in t1, does this insert order make it inherently in order in the storage layer? That seems like it would affect benchmark performance. Maybe that's ok, but it'd be good to be aware of.

Hm, I don't think it matters much since the main lookup query will perform the lookup join without any information about the ordering (since no ordering is provided by t1), and the query doesn't have the ORDER BY, so we will internally sort the requests (in the OutOfOrder mode of the streamer) to get pebble-level optimizations.


pkg/bench/bench_test.go line 1233 at r1 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Why does this query need the count and GROUP BY?

Good point, this is not needed, removed.

@yuzefovich yuzefovich force-pushed the lookup-join-benchmarks branch from 229ddd1 to 9bd3042 Compare June 7, 2022 00:27
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some minor improvements.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)


pkg/bench/bench_test.go line 1210 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, I don't think it matters much since the main lookup query will perform the lookup join without any information about the ordering (since no ordering is provided by t1), and the query doesn't have the ORDER BY, so we will internally sort the requests (in the OutOfOrder mode of the streamer) to get pebble-level optimizations.

Your point matters for the Ordering benchmarks where the streamer (at least currently) doesn't sort requests according to their keys, so with the current version of this PR we'd be benchmarking the best case (when the ordering provided by the streamer's user matches the ordering of keys). I think it's ok though, at least for now.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I too have some questions about the benchmarks, but no really substantive complaints.

Reviewed 3 of 3 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rharding6373 and @yuzefovich)


pkg/kv/kvclient/kvstreamer/streamer.go line 549 at r4 (raw file):

			} else {
				r.reqsKeys = make([]roachpb.Key, len(r.reqs))
			}

unimportant code golfing side comment: I've been thinking about this common little idiom. Wouldn't append handle this for us? In other words, wouldn't it be a little simpler to write something like:

r.reqsKeys = reqsKeysScratch[:0]
for i := range r.reqs {
	r.reqsKeys = append(r.reqsKeys, r.reqs[i].GetInner().Header().Key)
}
sort.Sort(&r)
reqsKeysScratch = r.reqsKeys
r.reqsKeys = nil

pkg/bench/bench_test.go line 1209 at r3 (raw file):

	defer log.Scope(b).Close(b)
	ForEachDB(b, func(b *testing.B, db *sqlutils.SQLRunner) {
		db.Exec(b, `CREATE TABLE t1 (a INT)`)

Do we need to control the distribution of these tables during the benchmark? What if the KV layer initiates a split of one of them during the benchmark? Should we be explicit about whether this is testing local or distributed execution?


pkg/bench/bench_test.go line 1245 at r3 (raw file):

		db.Exec(b, `CREATE TABLE t1 (a INT)`)
		db.Exec(b, `INSERT INTO t1 SELECT generate_series(1, 1000)`)
		db.Exec(b, `CREATE TABLE t2 (a INT, INDEX (a))`)

Is the only difference between the EqColsAreKey benchmarks and these benchmarks a lookup join into a PK vs a lookup join into a secondary index? Are those different in the execution engine? I would expect them to be the same. Do we need different benchmarks?


pkg/bench/bench_test.go line 1250 at r3 (raw file):

		for i := 0; i < b.N; i++ {
			db.Exec(b, `SELECT t2.a FROM t1 INNER LOOKUP JOIN t2 ON t1.a = t2.a`)

nit: This uses SELECT t2.a instead of SELECT * like the other queries. Seems like it should also use SELECT * to return the same number of columns, for an apples-to-apples comparison.

This commit adds benchmarks for lookup joins, both when equality columns
are and are not key, both with and without maintaining ordering.

Release note: None
This commit optimizes `singleRangeBatch.Less` method which is used when
sorting the requests inside of these objects in the OutOfOrder mode
(which is needed to get the low-level Pebble speedups) by storing the
start keys explicitly instead of performing a couple of function calls
on each `Less` invocation.

```
name                             old time/op    new time/op    delta
IndexJoin/Cockroach-24             6.30ms ± 1%    5.78ms ± 1%  -8.31%  (p=0.000 n=10+10)
IndexJoin/MultinodeCockroach-24    8.01ms ± 1%    7.51ms ± 1%  -6.28%  (p=0.000 n=10+10)

name                             old alloc/op   new alloc/op   delta
IndexJoin/Cockroach-24             1.55MB ± 0%    1.57MB ± 0%  +0.98%  (p=0.000 n=9+10)
IndexJoin/MultinodeCockroach-24    2.28MB ± 2%    2.30MB ± 1%    ~     (p=0.400 n=10+9)

name                             old allocs/op  new allocs/op  delta
IndexJoin/Cockroach-24              8.16k ± 1%     8.13k ± 1%    ~     (p=0.160 n=10+10)
IndexJoin/MultinodeCockroach-24     12.7k ± 1%     12.6k ± 0%    ~     (p=0.128 n=10+9)
```
```
name                                                    old time/op    new time/op    delta
LookupJoinEqColsAreKeyNoOrdering/Cockroach-24             6.89ms ± 1%    6.43ms ± 1%  -6.65%  (p=0.000 n=10+10)
LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24    8.03ms ± 1%    7.48ms ± 2%  -6.92%  (p=0.000 n=10+10)
LookupJoinNoOrdering/Cockroach-24                         9.21ms ± 3%    8.82ms ± 5%  -4.23%  (p=0.007 n=10+10)
LookupJoinNoOrdering/MultinodeCockroach-24                11.9ms ± 3%    11.5ms ± 3%  -3.36%  (p=0.002 n=9+10)

name                                                    old alloc/op   new alloc/op   delta
LookupJoinEqColsAreKeyNoOrdering/Cockroach-24             1.81MB ± 1%    1.84MB ± 0%  +1.23%  (p=0.000 n=10+10)
LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24    2.50MB ± 2%    2.54MB ± 1%  +1.76%  (p=0.004 n=10+10)
LookupJoinNoOrdering/Cockroach-24                         1.89MB ± 0%    1.91MB ± 1%  +1.09%  (p=0.000 n=9+9)
LookupJoinNoOrdering/MultinodeCockroach-24                2.37MB ± 2%    2.42MB ± 4%  +1.85%  (p=0.010 n=10+9)

name                                                    old allocs/op  new allocs/op  delta
LookupJoinEqColsAreKeyNoOrdering/Cockroach-24              10.8k ± 0%     10.8k ± 1%    ~     (p=0.615 n=10+10)
LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24     15.1k ± 1%     15.0k ± 0%    ~     (p=0.101 n=10+10)
LookupJoinNoOrdering/Cockroach-24                          13.3k ± 1%     13.3k ± 1%    ~     (p=0.549 n=10+9)
LookupJoinNoOrdering/MultinodeCockroach-24                 17.3k ± 1%     17.3k ± 1%    ~     (p=0.460 n=10+8)
```

Release note: None
@yuzefovich yuzefovich force-pushed the lookup-join-benchmarks branch from 9bd3042 to 76175be Compare June 10, 2022 21:23
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

I think Rachael is onboard with the change too, so I'll go ahead and merge.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2 and @rharding6373)


pkg/kv/kvclient/kvstreamer/streamer.go line 549 at r4 (raw file):

Previously, michae2 (Michael Erickson) wrote…

unimportant code golfing side comment: I've been thinking about this common little idiom. Wouldn't append handle this for us? In other words, wouldn't it be a little simpler to write something like:

r.reqsKeys = reqsKeysScratch[:0]
for i := range r.reqs {
	r.reqsKeys = append(r.reqsKeys, r.reqs[i].GetInner().Header().Key)
}
sort.Sort(&r)
reqsKeysScratch = r.reqsKeys
r.reqsKeys = nil

Good point, I like it, done.


pkg/bench/bench_test.go line 1209 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Do we need to control the distribution of these tables during the benchmark? What if the KV layer initiates a split of one of them during the benchmark? Should we be explicit about whether this is testing local or distributed execution?

These queries follow the pattern of this file where we insert small amount of data, so we know it'll be a single range setup. I think that's ok since the goal of the benchmarks is the "speed of light" check. It's highly unlikely that the KV will split any of these tables into separate ranges since they are just so small, and the benchmarks don't run long enough so that QPS based splitting would kick in.


pkg/bench/bench_test.go line 1245 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Is the only difference between the EqColsAreKey benchmarks and these benchmarks a lookup join into a PK vs a lookup join into a secondary index? Are those different in the execution engine? I would expect them to be the same. Do we need different benchmarks?

The important difference is a lookup join into a unique index (which PK is) vs a lookup join into a non-unique index. The uniqueness property guarantees that a single lookup will produce at most one row, and in the execution engine we use different strategies because of this property. In particular, in the non-streamer code path we parallelize KV batches IFF we have this "at most single row lookup" property. This is the case with the index joins as well as lookup joins when equality columns form a key. In the streamer code-path, the behavior is also different because we propagate this property as Hints.SingleRowLookup hint, and the streamer can simplify some of the bookkeeping if that is true.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2, @rharding6373, and @yuzefovich)


pkg/bench/bench_test.go line 1209 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

These queries follow the pattern of this file where we insert small amount of data, so we know it'll be a single range setup. I think that's ok since the goal of the benchmarks is the "speed of light" check. It's highly unlikely that the KV will split any of these tables into separate ranges since they are just so small, and the benchmarks don't run long enough so that QPS based splitting would kick in.

Makes sense!


pkg/bench/bench_test.go line 1245 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

The important difference is a lookup join into a unique index (which PK is) vs a lookup join into a non-unique index. The uniqueness property guarantees that a single lookup will produce at most one row, and in the execution engine we use different strategies because of this property. In particular, in the non-streamer code path we parallelize KV batches IFF we have this "at most single row lookup" property. This is the case with the index joins as well as lookup joins when equality columns form a key. In the streamer code-path, the behavior is also different because we propagate this property as Hints.SingleRowLookup hint, and the streamer can simplify some of the bookkeeping if that is true.

Ahhh, right, uniqueness! Thank you!

@craig
Copy link
Contributor

craig bot commented Jun 11, 2022

Build succeeded:

@craig craig bot merged commit f2cc0b3 into cockroachdb:master Jun 11, 2022
@yuzefovich yuzefovich deleted the lookup-join-benchmarks branch June 11, 2022 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants