Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/tests: create sysbench microbenchmark suite #133132

Merged

Conversation

nvanbenschoten
Copy link
Member

This commit adds a new microbenchmark suite which emulates sysbench, with subtests for oltp_read_only, oltp_write_only, and oltp_read_write. The structure of the benchmark suite makes it easy to add more subtests in the future (e.g. for oltp_multi_insert).

The suite is designed to run the same workload against different levels of the CockroachDB stack, with the initial implementation targeting SQL (and below) through the sysbenchSQL driver and KV (and below) through the sysbenchKV driver. An example of additional an driver that could be added in the future is sysbenchPebble. It would also make sense to test drivers under different configurations. For example, we should add a variant of the sysbenchKV driver which disables the local RPC fast-path.

The goal of this suite is to provide developers with a way to quickly measure the performance of CockroachDB against sysbench in a Go microbenchmark environment. This will help with identifying opportunities for performance improvements and with providing a rapid feedback loop for evaluating the effectiveness of performance changes.

The initial benchmark performance looks like:

name                        time/op
Sysbench/SQL/OltpReadOnly   2.88ms ±16%
Sysbench/SQL/OltpWriteOnly  2.32ms ±16%
Sysbench/SQL/OltpReadWrite  5.47ms ± 6%
Sysbench/KV/OltpReadOnly     445µs ± 5%
Sysbench/KV/OltpWriteOnly    594µs ± 5%
Sysbench/KV/OltpReadWrite   1.07ms ± 4%

name                        alloc/op
Sysbench/SQL/OltpReadOnly    965kB ± 1%
Sysbench/SQL/OltpWriteOnly   487kB ± 4%
Sysbench/SQL/OltpReadWrite  1.34MB ± 0%
Sysbench/KV/OltpReadOnly     264kB ± 1%
Sysbench/KV/OltpWriteOnly    184kB ± 4%
Sysbench/KV/OltpReadWrite    440kB ± 0%

name                        allocs/op
Sysbench/SQL/OltpReadOnly    6.26k ± 1%
Sysbench/SQL/OltpWriteOnly   3.40k ± 0%
Sysbench/SQL/OltpReadWrite   9.65k ± 0%
Sysbench/KV/OltpReadOnly       650 ± 0%
Sysbench/KV/OltpWriteOnly    1.08k ± 1%
Sysbench/KV/OltpReadWrite    1.73k ± 0%

Epic: None
Release Note: None

@nvanbenschoten nvanbenschoten added the o-perf-efficiency Related to performance efficiency label Oct 21, 2024
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner October 21, 2024 22:54
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/sysbenchBench branch from a3b9176 to 50ab44a Compare October 22, 2024 16:01
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks for putting this together so quickly! Flushing out some comments, I still need to do the line-by-line review, though I don't think I'll find much.

For posterity: I collected and attached profiles via

./dev bench --ignore-cache --stream-output --bench-mem ./pkg/sql/tests/ --filter BenchmarkSysbench/SQL/OltpReadWrite --test-args '-test.cpuprofile=cpu.pb.gz -test.memprofile=mem.pb.gz -test.benchtime=10000x'

cpu.pb.gz
mem.pb.gz

It'll be interesting to compare them with the "real" sysbench on a three-node cluster.

pkg/roachpb/data.go Show resolved Hide resolved
pkg/sql/tests/sysbench_test.go Show resolved Hide resolved
pkg/sql/tests/sysbench_test.go Show resolved Hide resolved
pkg/sql/tests/sysbench_test.go Show resolved Hide resolved
pkg/sql/tests/sysbench_test.go Show resolved Hide resolved
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @nvanbenschoten, and @tbg)


pkg/sql/tests/sysbench_test.go line 126 at r1 (raw file):

)

// sysbenchSQL is SQL-based implementation of sysbenchDriver.

[nit] WDYT about moving this to a sysbench_sql_test.go (and the KV one to sysbench_kv_test)? It will make it easier to navigate the code, especially if we add more drivers.


pkg/sql/tests/sysbench_test.go line 126 at r1 (raw file):

)

// sysbenchSQL is SQL-based implementation of sysbenchDriver.

[nit] A bit more to this comment would help someone who is just looking at this code for the first time. "It runs SQL statements against a single node cluster"


pkg/sql/tests/sysbench_test.go line 261 at r1 (raw file):

}

// sysbenchKV is KV-based implementation of sysbenchDriver.

[nit] "It bypasses the SQL layer and runs the workload directly against the KV layer, on a single node cluster"


pkg/sql/tests/sysbench_test.go line 570 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Should we have a flavor that disables the local server optimization? Otherwise we're eliding a bunch of networking/encoding/decoding overhead here.

Should we have variants with 3 nodes? (can be a TODO for now)


pkg/sql/tests/sysbench_test.go line 577 at r1 (raw file):

		// TODO(nvanbenschoten): add a pebble-level implementation.
	} {
		sysTyp := runtime.FuncForPC(reflect.ValueOf(sysFn).Pointer()).Name()

[nit] This is unnecessarily complicated to read and doesn't allow flexibility to make the names more friendly. We can just do

drivers := []struct{
  name string
  constructorFn func..
 }{
  { name: "SQL", constructorFn: newSysbenchSQL },
  { name: "KV", constructorFn: newSysbenchKV },
}

and similar for the ops.

// NOTE: disabling background work makes the benchmark more predictable, but
// also moderately less realistic.
disableBackgroundWork(st)
s := serverutils.StartServerOnly(b, base.TestServerArgs{Settings: st})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#133307 (comment) suggests this might not be as close to the real thing as we like. Leaving this comment as a discussion placeholder.

This gets us a stack trace in the panic value.

Epic: None
Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/sysbenchBench branch from 50ab44a to ffa2ecb Compare November 3, 2024 22:14
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @RaduBerinde, and @tbg)


pkg/sql/tests/sysbench_test.go line 126 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] A bit more to this comment would help someone who is just looking at this code for the first time. "It runs SQL statements against a single node cluster"

Done.


pkg/sql/tests/sysbench_test.go line 126 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] WDYT about moving this to a sysbench_sql_test.go (and the KV one to sysbench_kv_test)? It will make it easier to navigate the code, especially if we add more drivers.

So far I've found it nice to have this all in the same file. If that changes, we can extract out some driver implementation files.


pkg/sql/tests/sysbench_test.go line 261 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] "It bypasses the SQL layer and runs the workload directly against the KV layer, on a single node cluster"

Done.


pkg/sql/tests/sysbench_test.go line 577 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] This is unnecessarily complicated to read and doesn't allow flexibility to make the names more friendly. We can just do

drivers := []struct{
  name string
  constructorFn func..
 }{
  { name: "SQL", constructorFn: newSysbenchSQL },
  { name: "KV", constructorFn: newSysbenchKV },
}

and similar for the ops.

Done.

This commit adds a new microbenchmark suite which emulates `sysbench`, with
subtests for `oltp_read_only`, `oltp_write_only`, and `oltp_read_write`. The
structure of the benchmark suite makes it easy to add more subtests in the
future (e.g. for `oltp_multi_insert`).

The suite is designed to run the same workload against different levels of the
CockroachDB stack, with the initial implementation targeting SQL (and below)
through the `sysbenchSQL` driver and KV (and below) through the `sysbenchKV`
driver. An example of additional an driver that could be added in the future is
`sysbenchPebble`. It would also make sense to test drivers under different
configurations. For example, we should add a variant of the `sysbenchKV` driver
which disables the local RPC fast-path.

The goal of this suite is to provide developers with a way to quickly measure
the performance of CockroachDB against sysbench in a Go microbenchmark
environment. This will help with identifying opportunities for performance
improvements and with providing a rapid feedback loop for evaluating the
effectiveness of performance changes.

The initial benchmark performance looks like:
```
name                        time/op
Sysbench/SQL/oltp_read_only   2.88ms ±16%
Sysbench/SQL/oltp_write_only  2.32ms ±16%
Sysbench/SQL/oltp_read_write  5.47ms ± 6%
Sysbench/KV/oltp_read_only     445µs ± 5%
Sysbench/KV/oltp_write_only    594µs ± 5%
Sysbench/KV/oltp_read_write   1.07ms ± 4%

name                        alloc/op
Sysbench/SQL/oltp_read_only    965kB ± 1%
Sysbench/SQL/oltp_write_only   487kB ± 4%
Sysbench/SQL/oltp_read_write  1.34MB ± 0%
Sysbench/KV/oltp_read_only     264kB ± 1%
Sysbench/KV/oltp_write_only    184kB ± 4%
Sysbench/KV/oltp_read_write    440kB ± 0%

name                        allocs/op
Sysbench/SQL/oltp_read_only    6.26k ± 1%
Sysbench/SQL/oltp_write_only   3.40k ± 0%
Sysbench/SQL/oltp_read_write   9.65k ± 0%
Sysbench/KV/oltp_read_only       650 ± 0%
Sysbench/KV/oltp_write_only    1.08k ± 1%
Sysbench/KV/oltp_read_write    1.73k ± 0%
```

Epic: None
Release Note: None
This commit adds a new "begin+commit" variant to the sysbench microbenchmark
suite. This test opens and closes a transaction, without actually performing any
reads or writes in that transaction. It is meant to measure the overhead of
transaction orchestration.

The initial benchmark performance looks like:
```
name                          time/op
Sysbench/SQL/oltp_begin_commit   125µs ± 7%
Sysbench/KV/oltp_begin_commit   1.30µs ± 3%

name                          alloc/op
Sysbench/SQL/oltp_begin_commit  19.2kB ± 0%
Sysbench/KV/oltp_begin_commit   2.50kB ± 0%

name                          allocs/op
Sysbench/SQL/oltp_begin_commit     143 ± 0%
Sysbench/KV/oltp_begin_commit     6.00 ± 0%
```

Epic: None
Release Note: None
This commit adds a new "point_select" variant to the sysbench microbenchmark
suite. Like the real sysbench version of this test, this new test performs a
single point select, outside of an explicit transaction.

The initial benchmark performance looks like:
```
name                            time/op
Sysbench/SQL/oltp_point_select   159µs ± 4%
Sysbench/KV/oltp_point_select   18.5µs ± 2%

name                            alloc/op
Sysbench/SQL/oltp_point_select  34.1kB ± 0%
Sysbench/KV/oltp_point_select   5.15kB ± 1%

name                            allocs/op
Sysbench/SQL/oltp_point_select     309 ± 0%
Sysbench/KV/oltp_point_select     37.0 ± 0%
```

Epic: None
Release Note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/sysbenchBench branch from 8556bc7 to 05ffe49 Compare November 4, 2024 19:18
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :lgtm:

Reviewed 1 of 4 files at r1, 3 of 3 files at r3, 3 of 3 files at r4, 3 of 3 files at r5, 3 of 3 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, 1 of 1 files at r9, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @tbg)

@nvanbenschoten
Copy link
Member Author

TFTRs!

bors r+

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/sql/tests/sysbench_test.go line 669 at r9 (raw file):

func disableProfiling() {
	runtime.MemProfileRate = 0

The MemProfileRate doc says this:

The tools that process the memory profiles assume that the profile rate is constant across the lifetime of the program and equal to the current value. Programs that change the memory profiling rate should do so just once, as early as possible in the execution of the program (for example, at the beginning of main).


pkg/sql/tests/sysbench_test.go line 676 at r9 (raw file):

func enableProfiling() {
	runtime.GC()
	runtime.MemProfileRate = 1

MemProfileRate is 512KB by default.. A value of 1 means we sample every allocation. Would the CPU profile still be useful when running with this kind of overhead?


pkg/sql/tests/sysbench_test.go line 677 at r9 (raw file):

	runtime.GC()
	runtime.MemProfileRate = 1
	runtime.SetMutexProfileFraction(1)

Where are we getting these values? Would it be better to just record the old values from disableProfiling and restore those?

Maybe an easier way would be to do the actual workload inside a child b.Run benchmark.

@nvanbenschoten
Copy link
Member Author

bors r-

@craig
Copy link
Contributor

craig bot commented Nov 4, 2024

Canceled.

@nvanbenschoten
Copy link
Member Author

@RaduBerinde had some good points about the attempt to isolate memory, mutex, and block profiling to the workload portion of the benchmark. Since that needs some work, I've shaved off that commit for now to get the rest of this PR merged. I'll put up a separate PR once that part is ready.

bors r+

@craig craig bot merged commit aae0a58 into cockroachdb:master Nov 5, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
o-perf-efficiency Related to performance efficiency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants