-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: unexpected significant overhead under sysbench when tracing is off (0.3% allocs 1.6% cpu) #133307
Comments
|
in #133132 (review) I'm seeing wayyy under 0.5%. I'm unsure why that is - @nvanbenschoten any ideas, is the microbenchmark not tickling this in the same way? |
One thought is that |
Ack that Observability team will pick this up in Nov milestone. Will get back to y'all if it turns out to be a heavy lift and we need to re-evaluate. |
This simulates cockroachdb#133307 having been fixed. Completely disabling this is too optimistic - after all 1% should be traced - but this is fine for now. Epic: none Release note: None
This simulates cockroachdb#133307 having been fixed. Completely disabling this is too optimistic - after all 1% should be traced - but this is fine for now. Epic: none Release note: None
Just wanted to point out that there's also something potentially unexpected here:
|
more paper trail: setting the various cluster settings
doesn't seem to provide a workaround. I'm still seeing it in the profiles. |
…ldrenLocked May not be as important after cockroachdb#133307. This commit reworks the logic in crdbSpan.recordFinishedChildrenLocked to avoid the use of a temporary buffer for structured events when a child is finished while a span is recording structured events. Instead of using a temporary buffer, we now collect the child's events directly into the parents, and then trim the slice to enforce the byte size limit. ``` name old time/op new time/op delta Sysbench/SQL/oltp_read_only 36.2ms ± 2% 35.8ms ± 1% -1.17% (p=0.014 n=9+9) Sysbench/SQL/oltp_write_only 19.0ms ± 1% 19.4ms ± 4% +2.18% (p=0.006 n=8+9) Sysbench/SQL/oltp_read_write 58.4ms ± 3% 58.3ms ± 4% ~ (p=0.730 n=9+9) Sysbench/SQL/oltp_point_select 1.61ms ± 2% 1.58ms ± 2% -1.68% (p=0.013 n=9+10) Sysbench/SQL/oltp_begin_commit 587µs ± 2% 587µs ± 2% ~ (p=0.842 n=10+9) name old alloc/op new alloc/op delta Sysbench/SQL/oltp_read_only 1.05MB ± 1% 1.04MB ± 1% -0.68% (p=0.001 n=10+9) Sysbench/SQL/oltp_write_only 402kB ± 1% 403kB ± 1% ~ (p=0.661 n=10+9) Sysbench/SQL/oltp_read_write 1.50MB ± 1% 1.49MB ± 1% ~ (p=0.579 n=10+10) Sysbench/SQL/oltp_point_select 35.0kB ± 0% 34.9kB ± 0% -0.32% (p=0.002 n=8+9) Sysbench/SQL/oltp_begin_commit 18.8kB ± 0% 18.8kB ± 0% ~ (p=0.845 n=10+8) name old allocs/op new allocs/op delta Sysbench/SQL/oltp_read_only 6.96k ± 1% 6.86k ± 0% -1.51% (p=0.000 n=10+7) Sysbench/SQL/oltp_write_only 3.61k ± 0% 3.60k ± 0% -0.30% (p=0.009 n=10+9) Sysbench/SQL/oltp_read_write 11.0k ± 1% 10.9k ± 1% -0.82% (p=0.014 n=10+10) Sysbench/SQL/oltp_point_select 317 ± 0% 314 ± 0% -1.04% (p=0.000 n=7+10) Sysbench/SQL/oltp_begin_commit 139 ± 0% 140 ± 0% ~ (p=0.828 n=10+9) ``` Epic: None Release note: None
…ldrenLocked May not be as important after cockroachdb#133307. This commit reworks the logic in crdbSpan.recordFinishedChildrenLocked to avoid the use of a temporary buffer for structured events when a child is finished while a span is recording structured events. Instead of using a temporary buffer, we now collect the child's events directly into the parents, and then trim the slice to enforce the byte size limit. ``` name old time/op new time/op delta Sysbench/SQL/oltp_read_only 36.2ms ± 2% 35.8ms ± 1% -1.17% (p=0.014 n=9+9) Sysbench/SQL/oltp_write_only 19.0ms ± 1% 19.4ms ± 4% +2.18% (p=0.006 n=8+9) Sysbench/SQL/oltp_read_write 58.4ms ± 3% 58.3ms ± 4% ~ (p=0.730 n=9+9) Sysbench/SQL/oltp_point_select 1.61ms ± 2% 1.58ms ± 2% -1.68% (p=0.013 n=9+10) Sysbench/SQL/oltp_begin_commit 587µs ± 2% 587µs ± 2% ~ (p=0.842 n=10+9) name old alloc/op new alloc/op delta Sysbench/SQL/oltp_read_only 1.05MB ± 1% 1.04MB ± 1% -0.68% (p=0.001 n=10+9) Sysbench/SQL/oltp_write_only 402kB ± 1% 403kB ± 1% ~ (p=0.661 n=10+9) Sysbench/SQL/oltp_read_write 1.50MB ± 1% 1.49MB ± 1% ~ (p=0.579 n=10+10) Sysbench/SQL/oltp_point_select 35.0kB ± 0% 34.9kB ± 0% -0.32% (p=0.002 n=8+9) Sysbench/SQL/oltp_begin_commit 18.8kB ± 0% 18.8kB ± 0% ~ (p=0.845 n=10+8) name old allocs/op new allocs/op delta Sysbench/SQL/oltp_read_only 6.96k ± 1% 6.86k ± 0% -1.51% (p=0.000 n=10+7) Sysbench/SQL/oltp_write_only 3.61k ± 0% 3.60k ± 0% -0.30% (p=0.009 n=10+9) Sysbench/SQL/oltp_read_write 11.0k ± 1% 10.9k ± 1% -0.82% (p=0.014 n=10+10) Sysbench/SQL/oltp_point_select 317 ± 0% 314 ± 0% -1.04% (p=0.000 n=7+10) Sysbench/SQL/oltp_begin_commit 139 ± 0% 140 ± 0% ~ (p=0.828 n=10+9) ``` Epic: None Release note: None
…ldrenLocked May not be as important after cockroachdb#133307. This commit reworks the logic in crdbSpan.recordFinishedChildrenLocked to avoid the use of a temporary buffer for structured events when a child is finished while a span is recording structured events. Instead of using a temporary buffer, we now collect the child's events directly into the parents, and then trim the slice to enforce the byte size limit. ``` name old time/op new time/op delta Sysbench/SQL/oltp_read_only 36.2ms ± 2% 35.8ms ± 1% -1.17% (p=0.014 n=9+9) Sysbench/SQL/oltp_write_only 19.0ms ± 1% 19.4ms ± 4% +2.18% (p=0.006 n=8+9) Sysbench/SQL/oltp_read_write 58.4ms ± 3% 58.3ms ± 4% ~ (p=0.730 n=9+9) Sysbench/SQL/oltp_point_select 1.61ms ± 2% 1.58ms ± 2% -1.68% (p=0.013 n=9+10) Sysbench/SQL/oltp_begin_commit 587µs ± 2% 587µs ± 2% ~ (p=0.842 n=10+9) name old alloc/op new alloc/op delta Sysbench/SQL/oltp_read_only 1.05MB ± 1% 1.04MB ± 1% -0.68% (p=0.001 n=10+9) Sysbench/SQL/oltp_write_only 402kB ± 1% 403kB ± 1% ~ (p=0.661 n=10+9) Sysbench/SQL/oltp_read_write 1.50MB ± 1% 1.49MB ± 1% ~ (p=0.579 n=10+10) Sysbench/SQL/oltp_point_select 35.0kB ± 0% 34.9kB ± 0% -0.32% (p=0.002 n=8+9) Sysbench/SQL/oltp_begin_commit 18.8kB ± 0% 18.8kB ± 0% ~ (p=0.845 n=10+8) name old allocs/op new allocs/op delta Sysbench/SQL/oltp_read_only 6.96k ± 1% 6.86k ± 0% -1.51% (p=0.000 n=10+7) Sysbench/SQL/oltp_write_only 3.61k ± 0% 3.60k ± 0% -0.30% (p=0.009 n=10+9) Sysbench/SQL/oltp_read_write 11.0k ± 1% 10.9k ± 1% -0.82% (p=0.014 n=10+10) Sysbench/SQL/oltp_point_select 317 ± 0% 314 ± 0% -1.04% (p=0.000 n=7+10) Sysbench/SQL/oltp_begin_commit 139 ± 0% 140 ± 0% ~ (p=0.828 n=10+9) ``` Epic: None Release note: None
132877: kv: deflake TestLeaseQueueLeasePreferencePurgatoryError r=nvanbenschoten a=nvanbenschoten Fixes #132652. This commit fixes a flake in TestLeaseQueueLeasePreferencePurgatoryError where a delayed span config split could trip up the test and cause it to flake. We now wait for the span config split to be processed before waiting on the lease queue's purgatory, which should eliminate the flake. Release note: None 133568: changefeedccl: add PTS to system.users r=andyyang890 a=asg0451 Add `system.users` to the list of system tables that changefeeds protect with PTS. This table is required for CDC Queries. Part of: #128806 Release note (enterprise change): Add `system.users` to the list of system tables that changefeeds protect with PTS. This table is required for CDC Queries. 134207: tracing: avoid temp buffer for structured events in recordFinishedChildrenLocked r=nvanbenschoten a=nvanbenschoten May not be as important after #133307. This commit reworks the logic in `crdbSpan.recordFinishedChildrenLocked` to avoid the use of a temporary buffer for structured events when a child is finished while a span is recording structured events. Instead of using a temporary buffer, we now collect the child's events directly into the parents, and then trim the slice to enforce the byte size limit. ``` name old time/op new time/op delta Sysbench/SQL/oltp_read_only 36.2ms ± 2% 35.8ms ± 1% -1.17% (p=0.014 n=9+9) Sysbench/SQL/oltp_write_only 19.0ms ± 1% 19.4ms ± 4% +2.18% (p=0.006 n=8+9) Sysbench/SQL/oltp_read_write 58.4ms ± 3% 58.3ms ± 4% ~ (p=0.730 n=9+9) Sysbench/SQL/oltp_point_select 1.61ms ± 2% 1.58ms ± 2% -1.68% (p=0.013 n=9+10) Sysbench/SQL/oltp_begin_commit 587µs ± 2% 587µs ± 2% ~ (p=0.842 n=10+9) name old alloc/op new alloc/op delta Sysbench/SQL/oltp_read_only 1.05MB ± 1% 1.04MB ± 1% -0.68% (p=0.001 n=10+9) Sysbench/SQL/oltp_write_only 402kB ± 1% 403kB ± 1% ~ (p=0.661 n=10+9) Sysbench/SQL/oltp_read_write 1.50MB ± 1% 1.49MB ± 1% ~ (p=0.579 n=10+10) Sysbench/SQL/oltp_point_select 35.0kB ± 0% 34.9kB ± 0% -0.32% (p=0.002 n=8+9) Sysbench/SQL/oltp_begin_commit 18.8kB ± 0% 18.8kB ± 0% ~ (p=0.845 n=10+8) name old allocs/op new allocs/op delta Sysbench/SQL/oltp_read_only 6.96k ± 1% 6.86k ± 0% -1.51% (p=0.000 n=10+7) Sysbench/SQL/oltp_write_only 3.61k ± 0% 3.60k ± 0% -0.30% (p=0.009 n=10+9) Sysbench/SQL/oltp_read_write 11.0k ± 1% 10.9k ± 1% -0.82% (p=0.014 n=10+10) Sysbench/SQL/oltp_point_select 317 ± 0% 314 ± 0% -1.04% (p=0.000 n=7+10) Sysbench/SQL/oltp_begin_commit 139 ± 0% 140 ± 0% ~ (p=0.828 n=10+9) ``` Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Miles Frankel <[email protected]>
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1]. [^1]: cockroachdb#133307
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1]. [^1]: cockroachdb#133307
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1]. [^1]: cockroachdb#133307
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1]. [^1]: cockroachdb#133307
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1]. [^1]: cockroachdb#133307
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1]. [^1]: cockroachdb#133307
135682: tracing: default span registry to false r=tbg a=tbg This was extracted from #135146. The tracer active span registry was enabled by default. This forced real spans to be created across the board, and explains part of why tracing is so expensive even when it's not being used[^1]. This commit disables the span registry by default. [^1]: #133307 Epic: CRDB-43584 Release note (ops change): the default value for the cluster setting `trace.span_registry.enabled` has been changed from `true` to `false`. 135827: sql/tablemetadatacache: speed up tests r=dt a=dt This test was taking 35s or more before while sleeping for adoption intervals but can run in ~5s with fast adoptions. Release note: none. Epic: none. Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: David Taylor <[email protected]>
Update on the overhead. The below is on 6a7b1e2 on oltp_read_only. This still seems pretty bad. |
Given that the overhead is within One reason that we pay such a large cost is that in the case of a "no tracing" context, we need to traverse the entire tree to be able to tell if a context contains a span or not, there's no way to "exit early". Given that we already use a custom patched runtime, could we modify the
Maybe something like this: (of course the behavior would have to be extended to all the other context types. I'm unsure of the complexity involved there.)
@tbg do you think this is a direction worth pursuing? It seems like the "giant context chain" problem is widespread. I haven't spent time thinking through the other |
This change attempts to reduce the impact of `StartSpanCtx` when called with tracing disabled. A new fast path is introduced via the `spanSpanFast` method which avoids the more complex logic within `startSpanGeneric`. Additionally, the regexp-based verbose tracing is modified to use an atomic pointer instead of a mutex, and the `SpanOption` iterator is modified to avoid extra copies. Stats for the disabled tracing benchmark: Before: ``` BenchmarkStartSpan/start_span:_1 100000 29.30 ns/op 0 B/op 0 allocs/op ``` After: ``` BenchmarkStartSpan/start_span:_1 100000 18.89 ns/op 0 B/op 0 allocs/op ``` Part of: cockroachdb#133307 Release note: None
A few things I tried today that I'm not confident in but will revisit: Modifying code from this:
to this:
This felt like making a bigger mess by duplicating logic and adding more fields to a struct thats gets copied often. I think I'll try to make these values precomputed instead. The |
This change attempts to reduce the impact of `StartSpanCtx` when called with tracing disabled. A new fast path is introduced via the `spanSpanFast` method which avoids the more complex logic within `startSpanGeneric`. Additionally, the regexp-based verbose tracing is modified to use an atomic pointer instead of a mutex, and the `SpanOption` iterator is modified to avoid extra copies. Stats for the disabled tracing benchmark: Before: ``` BenchmarkStartSpan/start_span:_1 100000 29.30 ns/op 0 B/op 0 allocs/op ``` After: ``` BenchmarkStartSpan/start_span:_1 100000 18.89 ns/op 0 B/op 0 allocs/op ``` Part of: cockroachdb#133307 Release note: None
This change attempts to reduce the impact of `StartSpanCtx` when called with tracing disabled. A new fast path is introduced via the `spanSpanFast` method which avoids the more complex logic within `startSpanGeneric`. Additionally, the regexp-based verbose tracing is modified to use an atomic pointer instead of a mutex, and the `SpanOption` iterator is modified to avoid extra copies. Stats for the disabled tracing benchmark: Before: ``` BenchmarkStartSpan/start_span:_1 100000 29.30 ns/op 0 B/op 0 allocs/op ``` After: ``` BenchmarkStartSpan/start_span:_1 100000 18.89 ns/op 0 B/op 0 allocs/op ``` Part of: cockroachdb#133307 Release note: None
This change attempts to reduce the impact of `StartSpanCtx` when called with tracing disabled. A new fast path is introduced via the `spanSpanFast` method which avoids the more complex logic within `startSpanGeneric`. Additionally, the regexp-based verbose tracing is modified to use an atomic pointer instead of a mutex, and the `SpanOption` iterator is modified to avoid extra copies. Below are benchmark results which show the most impact on `opts=none` cpu. ``` ❯ benchdiff ./pkg/util/tracing --old ':/consolidate regexp bench' -b -r 'BenchmarkTracer_StartSpanCtx' -d 100000x -c 10 test binaries already exist for a2746cd: tracing: consolidate regexp bench into BenchmarkTr test binaries already exist for 41fdcda: tracing: add startSpanFast for optimized path when pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/util/tracing \ name old time/op new time/op delta Tracer_StartSpanCtx/opts=none/parallel=true-10 164ns ±10% 16ns ±64% -90.12% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=none/parallel=false-10 56.1ns ± 5% 47.7ns ± 5% -14.96% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=real/parallel=true-10 584ns ± 3% 546ns ±20% -6.51% (p=0.002 n=8+9) Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10 636ns ± 5% 615ns ± 4% -3.37% (p=0.014 n=10+10) Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10 318ns ± 3% 308ns ± 3% -2.92% (p=0.001 n=10+9) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10 473ns ± 3% 460ns ± 1% -2.79% (p=0.000 n=10+8) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10 669ns ± 4% 651ns ± 2% -2.71% (p=0.014 n=9+9) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10 353ns ± 2% 348ns ± 4% -1.32% (p=0.019 n=9+9) Tracer_StartSpanCtx/opts=real/parallel=false-10 285ns ± 3% 284ns ± 1% ~ (p=0.923 n=10+9) Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10 303ns ± 4% 303ns ± 3% ~ (p=0.811 n=10+10) Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10 590ns ± 2% 585ns ± 3% ~ (p=0.190 n=9+9) Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10 338ns ± 3% 333ns ± 3% ~ (p=0.052 n=10+10) Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10 651ns ± 4% 639ns ± 4% ~ (p=0.052 n=10+10) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10 910ns ±18% 880ns ±24% ~ (p=0.436 n=10+10) Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10 741ns ± 3% 808ns ± 2% +9.10% (p=0.000 n=9+10) Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10 500ns ± 1% 722ns ± 1% +44.49% (p=0.000 n=8+8) name old alloc/op new alloc/op delta Tracer_StartSpanCtx/opts=none/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=none/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10 96.0B ± 0% 96.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10 96.0B ± 0% 96.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10 51.0B ± 0% 51.0B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Tracer_StartSpanCtx/opts=none/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=none/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10 2.00 ± 0% 2.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10 2.00 ± 0% 2.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) ``` Part of: cockroachdb#133307 Epic: CRDB-42854 Release note: None
This change attempts to reduce the impact of `StartSpanCtx` when called with tracing disabled. A new fast path is introduced via the `spanSpanFast` method which avoids the more complex logic within `startSpanGeneric`. Additionally, the regexp-based verbose tracing is modified to use an atomic pointer instead of a mutex, and the `SpanOption` iterator is modified to avoid extra copies. Below are benchmark results which show the most impact on `opts=none` cpu. ``` ❯ benchdiff ./pkg/util/tracing --old ':/consolidate regexp bench' -b -r 'BenchmarkTracer_StartSpanCtx' -d 100000x -c 10 test binaries already exist for a2746cd: tracing: consolidate regexp bench into BenchmarkTr test binaries already exist for 41fdcda: tracing: add startSpanFast for optimized path when pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/util/tracing \ name old time/op new time/op delta Tracer_StartSpanCtx/opts=none/parallel=true-10 164ns ±10% 16ns ±64% -90.12% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=none/parallel=false-10 56.1ns ± 5% 47.7ns ± 5% -14.96% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=real/parallel=true-10 584ns ± 3% 546ns ±20% -6.51% (p=0.002 n=8+9) Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10 636ns ± 5% 615ns ± 4% -3.37% (p=0.014 n=10+10) Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10 318ns ± 3% 308ns ± 3% -2.92% (p=0.001 n=10+9) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10 473ns ± 3% 460ns ± 1% -2.79% (p=0.000 n=10+8) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10 669ns ± 4% 651ns ± 2% -2.71% (p=0.014 n=9+9) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10 353ns ± 2% 348ns ± 4% -1.32% (p=0.019 n=9+9) Tracer_StartSpanCtx/opts=real/parallel=false-10 285ns ± 3% 284ns ± 1% ~ (p=0.923 n=10+9) Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10 303ns ± 4% 303ns ± 3% ~ (p=0.811 n=10+10) Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10 590ns ± 2% 585ns ± 3% ~ (p=0.190 n=9+9) Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10 338ns ± 3% 333ns ± 3% ~ (p=0.052 n=10+10) Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10 651ns ± 4% 639ns ± 4% ~ (p=0.052 n=10+10) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10 910ns ±18% 880ns ±24% ~ (p=0.436 n=10+10) Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10 741ns ± 3% 808ns ± 2% +9.10% (p=0.000 n=9+10) Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10 500ns ± 1% 722ns ± 1% +44.49% (p=0.000 n=8+8) name old alloc/op new alloc/op delta Tracer_StartSpanCtx/opts=none/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=none/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10 96.0B ± 0% 96.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10 96.0B ± 0% 96.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10 51.0B ± 0% 51.0B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Tracer_StartSpanCtx/opts=none/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=none/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10 2.00 ± 0% 2.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10 2.00 ± 0% 2.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) ``` Part of: cockroachdb#133307 Epic: CRDB-42854 Release note: None
136931: tracing: add `startSpanFast` for optimized path when disabled r=tbg a=dhartunian This change attempts to reduce the impact of `StartSpanCtx` when called with tracing disabled. A new fast path is introduced via the `spanSpanFast` method which avoids the more complex logic within `startSpanGeneric`. Additionally, the regexp-based verbose tracing is modified to use an atomic pointer instead of a mutex, and the `SpanOption` iterator is modified to avoid extra copies. Below are benchmark results which show the most impact on `opts=none` cpu. ``` ❯ benchdiff ./pkg/util/tracing --old ':/consolidate regexp bench' -b -r 'BenchmarkTracer_StartSpanCtx' -d 100000x -c 10 test binaries already exist for a2746cd: tracing: consolidate regexp bench into BenchmarkTr test binaries already exist for 41fdcda: tracing: add startSpanFast for optimized path when pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/util/tracing \ name old time/op new time/op delta Tracer_StartSpanCtx/opts=none/parallel=true-10 164ns ±10% 16ns ±64% -90.12% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=none/parallel=false-10 56.1ns ± 5% 47.7ns ± 5% -14.96% (p=0.000 n=10+10) Tracer_StartSpanCtx/opts=real/parallel=true-10 584ns ± 3% 546ns ±20% -6.51% (p=0.002 n=8+9) Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10 636ns ± 5% 615ns ± 4% -3.37% (p=0.014 n=10+10) Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10 318ns ± 3% 308ns ± 3% -2.92% (p=0.001 n=10+9) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10 473ns ± 3% 460ns ± 1% -2.79% (p=0.000 n=10+8) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10 669ns ± 4% 651ns ± 2% -2.71% (p=0.014 n=9+9) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10 353ns ± 2% 348ns ± 4% -1.32% (p=0.019 n=9+9) Tracer_StartSpanCtx/opts=real/parallel=false-10 285ns ± 3% 284ns ± 1% ~ (p=0.923 n=10+9) Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10 303ns ± 4% 303ns ± 3% ~ (p=0.811 n=10+10) Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10 590ns ± 2% 585ns ± 3% ~ (p=0.190 n=9+9) Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10 338ns ± 3% 333ns ± 3% ~ (p=0.052 n=10+10) Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10 651ns ± 4% 639ns ± 4% ~ (p=0.052 n=10+10) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10 910ns ±18% 880ns ±24% ~ (p=0.436 n=10+10) Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10 741ns ± 3% 808ns ± 2% +9.10% (p=0.000 n=9+10) Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10 500ns ± 1% 722ns ± 1% +44.49% (p=0.000 n=8+8) name old alloc/op new alloc/op delta Tracer_StartSpanCtx/opts=none/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=none/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10 96.0B ± 0% 96.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10 96.0B ± 0% 96.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10 48.0B ± 0% 48.0B ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10 51.0B ± 0% 51.0B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Tracer_StartSpanCtx/opts=none/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=none/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10 2.00 ± 0% 2.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10 2.00 ± 0% 2.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10 1.00 ± 0% 1.00 ± 0% ~ (all equal) ``` Part of: #133307 Epic: CRDB-42854 Release note: None Co-authored-by: David Hartunian <[email protected]>
One note on #135567 is that it won't work for folks using the cockroach-go library or who have implemented retry loops as the docs recommend. Namely, we also will never thing we've sampled |
For SQL stats we trace every new fingerprint in an application container in order to populate more statistics that are only available via tracing for each fingerprint. This sampling logic was based on tracking the SQL fingerprint strings encountered by the application and did not factor in fingerprints that are not tracked by SQL stats, such as TCL statements (BEGIN, COMMIT, ROLLBACK) were being treated as new fingerprints on each execution and resulting in tracing being turned on. This commit ensures that we don't consider TCL statements in this sampling strategy.. ``` name old time/op new time/op delta Sysbench/SQL/3node/oltp_read_write-10 5.61ms ± 3% 5.69ms ± 4% ~ (p=0.167 n=8+9) name old alloc/op new alloc/op delta Sysbench/SQL/3node/oltp_read_write-10 2.15MB ± 2% 2.12MB ± 2% -1.24% (p=0.029 n=10+10) name old allocs/op new allocs/op delta Sysbench/SQL/3node/oltp_read_write-10 10.7k ± 1% 10.5k ± 0% -1.69% (p=0.000 n=10+9) ``` Epic: none Part of: cockroachdb#133307 Release note: None
For SQL stats we trace every new fingerprint in an application container in order to populate statistics that are only available via tracing for each fingerprint. This sampling logic worked by tracking the SQL fingerprint strings encountered by each application but did not factor in fingerprints that are not tracked by SQL stats, such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated as new fingerprints on each execution which resulted in tracing being turned on. This commit ensures that we don't consider TCL statements in this sampling strategy.. ``` name old time/op new time/op delta Sysbench/SQL/3node/oltp_read_write-10 5.61ms ± 3% 5.69ms ± 4% ~ (p=0.167 n=8+9) name old alloc/op new alloc/op delta Sysbench/SQL/3node/oltp_read_write-10 2.15MB ± 2% 2.12MB ± 2% -1.24% (p=0.029 n=10+10) name old allocs/op new allocs/op delta Sysbench/SQL/3node/oltp_read_write-10 10.7k ± 1% 10.5k ± 0% -1.69% (p=0.000 n=10+9) ``` Epic: none Part of: cockroachdb#133307 Release note: None
For SQL stats we trace every new fingerprint in an application container in order to populate statistics that are only available via tracing for each fingerprint. This sampling logic worked by tracking the SQL fingerprint strings encountered by each application but did not factor in fingerprints that are not tracked by SQL stats, such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated as new fingerprints on each execution which resulted in tracing being turned on. This commit ensures that we don't consider TCL statements in this sampling strategy. ``` name old time/op new time/op delta Sysbench/SQL/3node/oltp_read_write-10 5.61ms ± 3% 5.69ms ± 4% ~ (p=0.167 n=8+9) name old alloc/op new alloc/op delta Sysbench/SQL/3node/oltp_read_write-10 2.15MB ± 2% 2.12MB ± 2% -1.24% (p=0.029 n=10+10) name old allocs/op new allocs/op delta Sysbench/SQL/3node/oltp_read_write-10 10.7k ± 1% 10.5k ± 0% -1.69% (p=0.000 n=10+9) ``` Epic: none Part of: cockroachdb#133307 Release note: None
For SQL stats we trace every new fingerprint in an application container in order to populate statistics that are only available via tracing for each fingerprint. This sampling logic worked by tracking the SQL fingerprint strings encountered by each application but did not factor in fingerprints that are not tracked by SQL stats, such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated as new fingerprints on each execution which resulted in tracing being turned on. This commit ensures that we don't consider TCL statements in this sampling strategy. TestSampledStatsCollectionOnNewFingerprint is also fixed as it was previously assuming that previously seen statements in new transactions would trigger this sampling behaviour when it should not - this was being erronneously triggered by `BEGIN`. ``` name old time/op new time/op delta Sysbench/SQL/3node/oltp_read_write-10 5.61ms ± 3% 5.69ms ± 4% ~ (p=0.167 n=8+9) name old alloc/op new alloc/op delta Sysbench/SQL/3node/oltp_read_write-10 2.15MB ± 2% 2.12MB ± 2% -1.24% (p=0.029 n=10+10) name old allocs/op new allocs/op delta Sysbench/SQL/3node/oltp_read_write-10 10.7k ± 1% 10.5k ± 0% -1.69% (p=0.000 n=10+9) ``` Epic: none Part of: cockroachdb#133307 Release note: None
For SQL stats we trace every new fingerprint in an application container in order to populate statistics that are only available via tracing for each fingerprint. This sampling logic worked by tracking the SQL fingerprint strings encountered by each application but did not factor in fingerprints that are not tracked by SQL stats, such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated as new fingerprints on each execution which resulted in tracing being turned on. This commit ensures that we don't consider TCL statements in this sampling strategy. TestSampledStatsCollectionOnNewFingerprint is also fixed as it was assuming that previously seen statements in new transactions would trigger this sampling behaviour when it should not - this was being erronneously triggered by `BEGIN`. ``` name old time/op new time/op delta Sysbench/SQL/3node/oltp_read_write-10 5.61ms ± 3% 5.69ms ± 4% ~ (p=0.167 n=8+9) name old alloc/op new alloc/op delta Sysbench/SQL/3node/oltp_read_write-10 2.15MB ± 2% 2.12MB ± 2% -1.24% (p=0.029 n=10+10) name old allocs/op new allocs/op delta Sysbench/SQL/3node/oltp_read_write-10 10.7k ± 1% 10.5k ± 0% -1.69% (p=0.000 n=10+9) ``` Epic: none Part of: cockroachdb#133307 Release note: None
For SQL stats we trace every new fingerprint in an application container in order to populate statistics that are only available via tracing for each fingerprint. This sampling logic worked by tracking the SQL fingerprint strings encountered by each application but did not factor in fingerprints that are not tracked by SQL stats, such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated as new fingerprints on each execution which resulted in tracing being turned on. This commit ensures that we don't consider TCL statements in this sampling strategy. TestSampledStatsCollectionOnNewFingerprint is also fixed as it was assuming that previously seen statements in new transactions would trigger this sampling behaviour when it should not - this was being erronneously triggered by `BEGIN`. ``` name old time/op new time/op delta Sysbench/SQL/3node/oltp_read_write-10 5.61ms ± 3% 5.69ms ± 4% ~ (p=0.167 n=8+9) name old alloc/op new alloc/op delta Sysbench/SQL/3node/oltp_read_write-10 2.15MB ± 2% 2.12MB ± 2% -1.24% (p=0.029 n=10+10) name old allocs/op new allocs/op delta Sysbench/SQL/3node/oltp_read_write-10 10.7k ± 1% 10.5k ± 0% -1.69% (p=0.000 n=10+9) ``` Epic: none Part of: cockroachdb#133307 Release note: None
138041: sql: do not consider TCL stmts in sql stats tracing r=xinhaoz a=xinhaoz For SQL stats we trace every new fingerprint in an application container in order to populate statistics that are only available via tracing for each fingerprint. This sampling logic worked by tracking the SQL fingerprint strings encountered by each application but did not factor in fingerprints that are not tracked by SQL stats, such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated as new fingerprints on each execution which resulted in tracing being turned on. This commit ensures that we don't consider TCL statements in this sampling strategy. TestSampledStatsCollectionOnNewFingerprint is also fixed as it was assuming that previously seen statements in new transactions would trigger this sampling behaviour when it should not - this was being erronneously triggered by `BEGIN`. ``` name old time/op new time/op delta Sysbench/SQL/3node/oltp_read_write-10 5.61ms ± 3% 5.69ms ± 4% ~ (p=0.167 n=8+9) name old alloc/op new alloc/op delta Sysbench/SQL/3node/oltp_read_write-10 2.15MB ± 2% 2.12MB ± 2% -1.24% (p=0.029 n=10+10) name old allocs/op new allocs/op delta Sysbench/SQL/3node/oltp_read_write-10 10.7k ± 1% 10.5k ± 0% -1.69% (p=0.000 n=10+9) ``` Epic: none Part of: #133307 Release note: None Co-authored-by: Xin Hao Zhang <[email protected]>
In profile from https://github.com/cockroachlabs/perf-efficiency-team/pull/4 (oltp_read_write)
5% is very significant. Is this all on by accident?
Jira issue: CRDB-43624
The text was updated successfully, but these errors were encountered: