Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: unexpected significant overhead under sysbench when tracing is off (0.3% allocs 1.6% cpu) #133307

Open
tbg opened this issue Oct 23, 2024 · 10 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months T-observability

Comments

@tbg
Copy link
Member

tbg commented Oct 23, 2024

In profile from https://github.com/cockroachlabs/perf-efficiency-team/pull/4 (oltp_read_write)
5% is very significant. Is this all on by accident?

image

Jira issue: CRDB-43624

@tbg tbg added C-performance Perf of queries or internals. Solution not expected to change functional behavior. branch-master Failures and bugs on the master branch. P-2 Issues/test failures with a fix SLA of 3 months T-observability o-perf-efficiency Related to performance efficiency labels Oct 23, 2024
@tbg
Copy link
Member Author

tbg commented Oct 25, 2024

https://cockroachlabs.slack.com/archives/C06KHP4PU4X/p1729810236138319?thread_ts=1729720298.418539&cid=C06KHP4PU4X

The tl;dr is that there's logic to make sure that we sample with structured recording for 1% of transaction and for every statement fingerprint we've not previously sampled. That's all fine and good, except that we never ever write into our stats collector that BEGIN TRANSACTION or END TRANSACTION have been sampled. This leads to something very funny: all END TRANSACTION statements end up being traced with structured recording.

@tbg
Copy link
Member Author

tbg commented Oct 25, 2024

in #133132 (review) I'm seeing wayyy under 0.5%. I'm unsure why that is - @nvanbenschoten any ideas, is the microbenchmark not tickling this in the same way?

@tbg tbg changed the title tracing: unexpected significant overhead under sysbench when tracing is off tracing: unexpected 5% cpu overhead under sysbench when tracing is off Oct 25, 2024
@exalate-issue-sync exalate-issue-sync bot changed the title tracing: unexpected 5% cpu overhead under sysbench when tracing is off tracing: unexpected significant overhead under sysbench when tracing is off Oct 25, 2024
@nvanbenschoten
Copy link
Member

I'm unsure why that is - @nvanbenschoten any ideas, is the microbenchmark not tickling this in the same way?

One thought is that serverutils.StartServerOnly might be disabling more of the SQL server than expected.

@dhartunian
Copy link
Collaborator

Ack that Observability team will pick this up in Nov milestone. Will get back to y'all if it turns out to be a heavy lift and we need to re-evaluate.

@tbg tbg changed the title tracing: unexpected significant overhead under sysbench when tracing is off tracing: unexpected significant overhead under sysbench when tracing is off (5% cpu) Oct 25, 2024
tbg added a commit to tbg/cockroach that referenced this issue Oct 29, 2024
This simulates cockroachdb#133307
having been fixed.
Completely disabling this is too optimistic - after all 1% should be
traced - but this is fine for now.

Epic: none
Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Oct 29, 2024
This simulates cockroachdb#133307
having been fixed.
Completely disabling this is too optimistic - after all 1% should be
traced - but this is fine for now.

Epic: none
Release note: None
@tbg
Copy link
Member Author

tbg commented Oct 29, 2024

Just wanted to point out that there's also something potentially unexpected here:

https://cockroachlabs.slack.com/archives/C06KHP4PU4X/p1729810236138319?thread_ts=1729720298.418539&cid=C06KHP4PU4X

Aside: the statement stats flushing is pretty intense -- the math on the number of unique statement fingerprints being flushed doesn't really add up to the number of PersistedSQLStats.upsertStatementStats. We see that PersistedSQLStats.MaybeFlush takes 3 minutes... We have an interesting chart of it running statements at the bottom.

image

@tbg
Copy link
Member Author

tbg commented Oct 29, 2024

more paper trail: setting the various cluster settings

		roachprod sql "${cluster}:1" -- -e 'set cluster setting sql.txn_stats.sample_rate = 0;'
		roachprod sql "${cluster}:1" -- -e 'set cluster setting sql.metrics.statement_details.enabled = false;'
		roachprod sql "${cluster}:1" -- -e 'set cluster setting sql.metrics.transaction_details.enabled = false;'

doesn't seem to provide a workaround. I'm still seeing it in the profiles.

https://cockroachlabs.slack.com/archives/C06301L45NF/p1730217788024329?thread_ts=1730215659.655189&cid=C06301L45NF

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Nov 4, 2024
…ldrenLocked

May not be as important after cockroachdb#133307.

This commit reworks the logic in crdbSpan.recordFinishedChildrenLocked to avoid
the use of a temporary buffer for structured events when a child is finished
while a span is recording structured events. Instead of using a temporary buffer,
we now collect the child's events directly into the parents, and then trim the
slice to enforce the byte size limit.

```
name                            old time/op    new time/op    delta
Sysbench/SQL/oltp_read_only       36.2ms ± 2%    35.8ms ± 1%  -1.17%  (p=0.014 n=9+9)
Sysbench/SQL/oltp_write_only      19.0ms ± 1%    19.4ms ± 4%  +2.18%  (p=0.006 n=8+9)
Sysbench/SQL/oltp_read_write      58.4ms ± 3%    58.3ms ± 4%    ~     (p=0.730 n=9+9)
Sysbench/SQL/oltp_point_select    1.61ms ± 2%    1.58ms ± 2%  -1.68%  (p=0.013 n=9+10)
Sysbench/SQL/oltp_begin_commit     587µs ± 2%     587µs ± 2%    ~     (p=0.842 n=10+9)

name                            old alloc/op   new alloc/op   delta
Sysbench/SQL/oltp_read_only       1.05MB ± 1%    1.04MB ± 1%  -0.68%  (p=0.001 n=10+9)
Sysbench/SQL/oltp_write_only       402kB ± 1%     403kB ± 1%    ~     (p=0.661 n=10+9)
Sysbench/SQL/oltp_read_write      1.50MB ± 1%    1.49MB ± 1%    ~     (p=0.579 n=10+10)
Sysbench/SQL/oltp_point_select    35.0kB ± 0%    34.9kB ± 0%  -0.32%  (p=0.002 n=8+9)
Sysbench/SQL/oltp_begin_commit    18.8kB ± 0%    18.8kB ± 0%    ~     (p=0.845 n=10+8)

name                            old allocs/op  new allocs/op  delta
Sysbench/SQL/oltp_read_only        6.96k ± 1%     6.86k ± 0%  -1.51%  (p=0.000 n=10+7)
Sysbench/SQL/oltp_write_only       3.61k ± 0%     3.60k ± 0%  -0.30%  (p=0.009 n=10+9)
Sysbench/SQL/oltp_read_write       11.0k ± 1%     10.9k ± 1%  -0.82%  (p=0.014 n=10+10)
Sysbench/SQL/oltp_point_select       317 ± 0%       314 ± 0%  -1.04%  (p=0.000 n=7+10)
Sysbench/SQL/oltp_begin_commit       139 ± 0%       140 ± 0%    ~     (p=0.828 n=10+9)
```

Epic: None
Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Nov 4, 2024
…ldrenLocked

May not be as important after cockroachdb#133307.

This commit reworks the logic in crdbSpan.recordFinishedChildrenLocked to avoid
the use of a temporary buffer for structured events when a child is finished
while a span is recording structured events. Instead of using a temporary buffer,
we now collect the child's events directly into the parents, and then trim the
slice to enforce the byte size limit.

```
name                            old time/op    new time/op    delta
Sysbench/SQL/oltp_read_only       36.2ms ± 2%    35.8ms ± 1%  -1.17%  (p=0.014 n=9+9)
Sysbench/SQL/oltp_write_only      19.0ms ± 1%    19.4ms ± 4%  +2.18%  (p=0.006 n=8+9)
Sysbench/SQL/oltp_read_write      58.4ms ± 3%    58.3ms ± 4%    ~     (p=0.730 n=9+9)
Sysbench/SQL/oltp_point_select    1.61ms ± 2%    1.58ms ± 2%  -1.68%  (p=0.013 n=9+10)
Sysbench/SQL/oltp_begin_commit     587µs ± 2%     587µs ± 2%    ~     (p=0.842 n=10+9)

name                            old alloc/op   new alloc/op   delta
Sysbench/SQL/oltp_read_only       1.05MB ± 1%    1.04MB ± 1%  -0.68%  (p=0.001 n=10+9)
Sysbench/SQL/oltp_write_only       402kB ± 1%     403kB ± 1%    ~     (p=0.661 n=10+9)
Sysbench/SQL/oltp_read_write      1.50MB ± 1%    1.49MB ± 1%    ~     (p=0.579 n=10+10)
Sysbench/SQL/oltp_point_select    35.0kB ± 0%    34.9kB ± 0%  -0.32%  (p=0.002 n=8+9)
Sysbench/SQL/oltp_begin_commit    18.8kB ± 0%    18.8kB ± 0%    ~     (p=0.845 n=10+8)

name                            old allocs/op  new allocs/op  delta
Sysbench/SQL/oltp_read_only        6.96k ± 1%     6.86k ± 0%  -1.51%  (p=0.000 n=10+7)
Sysbench/SQL/oltp_write_only       3.61k ± 0%     3.60k ± 0%  -0.30%  (p=0.009 n=10+9)
Sysbench/SQL/oltp_read_write       11.0k ± 1%     10.9k ± 1%  -0.82%  (p=0.014 n=10+10)
Sysbench/SQL/oltp_point_select       317 ± 0%       314 ± 0%  -1.04%  (p=0.000 n=7+10)
Sysbench/SQL/oltp_begin_commit       139 ± 0%       140 ± 0%    ~     (p=0.828 n=10+9)
```

Epic: None
Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Nov 4, 2024
…ldrenLocked

May not be as important after cockroachdb#133307.

This commit reworks the logic in crdbSpan.recordFinishedChildrenLocked to avoid
the use of a temporary buffer for structured events when a child is finished
while a span is recording structured events. Instead of using a temporary buffer,
we now collect the child's events directly into the parents, and then trim the
slice to enforce the byte size limit.

```
name                            old time/op    new time/op    delta
Sysbench/SQL/oltp_read_only       36.2ms ± 2%    35.8ms ± 1%  -1.17%  (p=0.014 n=9+9)
Sysbench/SQL/oltp_write_only      19.0ms ± 1%    19.4ms ± 4%  +2.18%  (p=0.006 n=8+9)
Sysbench/SQL/oltp_read_write      58.4ms ± 3%    58.3ms ± 4%    ~     (p=0.730 n=9+9)
Sysbench/SQL/oltp_point_select    1.61ms ± 2%    1.58ms ± 2%  -1.68%  (p=0.013 n=9+10)
Sysbench/SQL/oltp_begin_commit     587µs ± 2%     587µs ± 2%    ~     (p=0.842 n=10+9)

name                            old alloc/op   new alloc/op   delta
Sysbench/SQL/oltp_read_only       1.05MB ± 1%    1.04MB ± 1%  -0.68%  (p=0.001 n=10+9)
Sysbench/SQL/oltp_write_only       402kB ± 1%     403kB ± 1%    ~     (p=0.661 n=10+9)
Sysbench/SQL/oltp_read_write      1.50MB ± 1%    1.49MB ± 1%    ~     (p=0.579 n=10+10)
Sysbench/SQL/oltp_point_select    35.0kB ± 0%    34.9kB ± 0%  -0.32%  (p=0.002 n=8+9)
Sysbench/SQL/oltp_begin_commit    18.8kB ± 0%    18.8kB ± 0%    ~     (p=0.845 n=10+8)

name                            old allocs/op  new allocs/op  delta
Sysbench/SQL/oltp_read_only        6.96k ± 1%     6.86k ± 0%  -1.51%  (p=0.000 n=10+7)
Sysbench/SQL/oltp_write_only       3.61k ± 0%     3.60k ± 0%  -0.30%  (p=0.009 n=10+9)
Sysbench/SQL/oltp_read_write       11.0k ± 1%     10.9k ± 1%  -0.82%  (p=0.014 n=10+10)
Sysbench/SQL/oltp_point_select       317 ± 0%       314 ± 0%  -1.04%  (p=0.000 n=7+10)
Sysbench/SQL/oltp_begin_commit       139 ± 0%       140 ± 0%    ~     (p=0.828 n=10+9)
```

Epic: None
Release note: None
craig bot pushed a commit that referenced this issue Nov 4, 2024
132877: kv: deflake TestLeaseQueueLeasePreferencePurgatoryError r=nvanbenschoten a=nvanbenschoten

Fixes #132652.

This commit fixes a flake in TestLeaseQueueLeasePreferencePurgatoryError where a delayed span config split could trip up the test and cause it to flake. We now wait for the span config split to be processed before waiting on the lease queue's purgatory, which should eliminate the flake.

Release note: None

133568: changefeedccl: add PTS to system.users r=andyyang890 a=asg0451

Add `system.users` to the list of system tables
that changefeeds protect with PTS. This table is
required for CDC Queries.

Part of: #128806

Release note (enterprise change): Add
`system.users` to the list of system tables that
changefeeds protect with PTS. This table is
required for CDC Queries.


134207: tracing: avoid temp buffer for structured events in recordFinishedChildrenLocked r=nvanbenschoten a=nvanbenschoten

May not be as important after #133307.

This commit reworks the logic in `crdbSpan.recordFinishedChildrenLocked` to avoid the use of a temporary buffer for structured events when a child is finished while a span is recording structured events. Instead of using a temporary buffer, we now collect the child's events directly into the parents, and then trim the slice to enforce the byte size limit.

```
name                            old time/op    new time/op    delta
Sysbench/SQL/oltp_read_only       36.2ms ± 2%    35.8ms ± 1%  -1.17%  (p=0.014 n=9+9)
Sysbench/SQL/oltp_write_only      19.0ms ± 1%    19.4ms ± 4%  +2.18%  (p=0.006 n=8+9)
Sysbench/SQL/oltp_read_write      58.4ms ± 3%    58.3ms ± 4%    ~     (p=0.730 n=9+9)
Sysbench/SQL/oltp_point_select    1.61ms ± 2%    1.58ms ± 2%  -1.68%  (p=0.013 n=9+10)
Sysbench/SQL/oltp_begin_commit     587µs ± 2%     587µs ± 2%    ~     (p=0.842 n=10+9)

name                            old alloc/op   new alloc/op   delta
Sysbench/SQL/oltp_read_only       1.05MB ± 1%    1.04MB ± 1%  -0.68%  (p=0.001 n=10+9)
Sysbench/SQL/oltp_write_only       402kB ± 1%     403kB ± 1%    ~     (p=0.661 n=10+9)
Sysbench/SQL/oltp_read_write      1.50MB ± 1%    1.49MB ± 1%    ~     (p=0.579 n=10+10)
Sysbench/SQL/oltp_point_select    35.0kB ± 0%    34.9kB ± 0%  -0.32%  (p=0.002 n=8+9)
Sysbench/SQL/oltp_begin_commit    18.8kB ± 0%    18.8kB ± 0%    ~     (p=0.845 n=10+8)

name                            old allocs/op  new allocs/op  delta
Sysbench/SQL/oltp_read_only        6.96k ± 1%     6.86k ± 0%  -1.51%  (p=0.000 n=10+7)
Sysbench/SQL/oltp_write_only       3.61k ± 0%     3.60k ± 0%  -0.30%  (p=0.009 n=10+9)
Sysbench/SQL/oltp_read_write       11.0k ± 1%     10.9k ± 1%  -0.82%  (p=0.014 n=10+10)
Sysbench/SQL/oltp_point_select       317 ± 0%       314 ± 0%  -1.04%  (p=0.000 n=7+10)
Sysbench/SQL/oltp_begin_commit       139 ± 0%       140 ± 0%    ~     (p=0.828 n=10+9)
```

Epic: None
Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Miles Frankel <[email protected]>
tbg added a commit to tbg/cockroach that referenced this issue Nov 18, 2024
tbg added a commit to tbg/cockroach that referenced this issue Nov 18, 2024
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1].

[^1]: cockroachdb#133307
tbg added a commit to tbg/cockroach that referenced this issue Nov 18, 2024
tbg added a commit to tbg/cockroach that referenced this issue Nov 18, 2024
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1].

[^1]: cockroachdb#133307
tbg added a commit to tbg/cockroach that referenced this issue Nov 18, 2024
tbg added a commit to tbg/cockroach that referenced this issue Nov 18, 2024
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1].

[^1]: cockroachdb#133307
tbg added a commit to tbg/cockroach that referenced this issue Nov 18, 2024
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1].

[^1]: cockroachdb#133307
tbg added a commit to tbg/cockroach that referenced this issue Nov 19, 2024
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1].

[^1]: cockroachdb#133307
tbg added a commit to tbg/cockroach that referenced this issue Nov 19, 2024
This forced real spans to be created across the board, explaining most of the tracing overhead we saw[^1].

[^1]: cockroachdb#133307
tbg added a commit to tbg/cockroach that referenced this issue Nov 19, 2024
craig bot pushed a commit that referenced this issue Nov 20, 2024
135682: tracing: default span registry to false r=tbg a=tbg

This was extracted from #135146.
The tracer active span registry was enabled by default. This forced real spans to be created across the board, and explains part of why tracing is so expensive even when it's not being used[^1].

This commit disables the span registry by default.

[^1]: #133307

Epic: CRDB-43584
Release note (ops change): the default value for the cluster setting `trace.span_registry.enabled` has been changed from `true` to `false`.

135827: sql/tablemetadatacache: speed up tests r=dt a=dt

This test was taking 35s or more before while sleeping for adoption intervals but can run in ~5s with fast adoptions.

Release note: none.
Epic: none.

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: David Taylor <[email protected]>
@tbg tbg changed the title tracing: unexpected significant overhead under sysbench when tracing is off (5% cpu) tracing: unexpected significant overhead under sysbench when tracing is off (0.3% allocs) Nov 26, 2024
@tbg
Copy link
Member Author

tbg commented Nov 26, 2024

Update on the overhead. The below is on 6a7b1e2 on oltp_read_only. This still seems pretty bad.
profile.pb.gz

image image

@tbg tbg changed the title tracing: unexpected significant overhead under sysbench when tracing is off (0.3% allocs) tracing: unexpected significant overhead under sysbench when tracing is off (0.3% allocs 1.6% cpu) Nov 26, 2024
@dhartunian
Copy link
Collaborator

Given that the overhead is within SpanFromContext, there aren't many options available to us. Plumbing down additional state from the tracer into the logging context seems challenging to do. Part of the appeal of logging with a ctx is that the caller doesn't need to worry about this and it would be nice to preserve all the callsites without modification.

One reason that we pay such a large cost is that in the case of a "no tracing" context, we need to traverse the entire tree to be able to tell if a context contains a span or not, there's no way to "exit early".

Given that we already use a custom patched runtime, could we modify the context package to do this for us? There are two options that come to mind:

  1. We can introduce a hasSpan boolean that's propagated upwards into the context stack anytime a valueCtx containing the span key is wrapped. This would allow us to do a "fast check" at the top of the stack. However, this expands the size of the context structs which is probably undesirable given their widespread usage.

  2. An alternative is that we can introduce a spanCtx struct that is the official "span container". valueCtx is no longer used to carry spans. Whenever a spanCtx is wrapped with a different parent, we can invert the wrapping and always hoist the spanCtx to the outer layer. This would allow us to just check the outer type of the Context implementation. If it's a spanCtx we extract the span, otherwise we know that spanCtx could not exist in the context chain because if it did, it would bubble to the top.

Maybe something like this: (of course the behavior would have to be extended to all the other context types. I'm unsure of the complexity involved there.)

type spanCtx struct {
	Context
	span *Span
}

func WithSpan(parent Context, span *Span) Context {
	return &spanCtx{parent, span}
}

func (c spanCtx) Span() *Span {
	return c.span
}

func WithValue(parent Context, key, val any) Context {
	if parent == nil {
		panic("cannot create context from nil parent")
	}
	if key == nil {
		panic("nil key")
	}
	if !reflectlite.TypeOf(key).Comparable() {
		panic("key is not comparable")
	}
	if ss, ok := parent.(*spanCtx); ok {
		return &spanCtx{&valueCtx{ss.Context, key, val}, ss.span}
	}
	return &valueCtx{parent, key, val}
}

// This implementation is possible because if there was a `spanCtx` in
// the parent chain, it would have "bubbled" to the top during
// wrapping. If we're asking a valueCtx for a `Span` it means the
// `spanCtx` does not exist in this chain.
func (c *valueCtx) Span() *Span {
	return nil
}

@tbg do you think this is a direction worth pursuing? It seems like the "giant context chain" problem is widespread.

I haven't spent time thinking through the other tracing package perf hits in your pprof screenshot yet.

dhartunian added a commit to dhartunian/cockroach that referenced this issue Dec 6, 2024
This change attempts to reduce the impact of `StartSpanCtx` when
called with tracing disabled.

A new fast path is introduced via the `spanSpanFast` method which
avoids the more complex logic within `startSpanGeneric`.

Additionally, the regexp-based verbose tracing is modified to use an
atomic pointer instead of a mutex, and the `SpanOption` iterator is
modified to avoid extra copies.

Stats for the disabled tracing benchmark:

Before:
```
BenchmarkStartSpan/start_span:_1         	  100000	        29.30 ns/op	       0 B/op	       0 allocs/op
```

After:
```
BenchmarkStartSpan/start_span:_1         	  100000	        18.89 ns/op	       0 B/op	       0 allocs/op
```

Part of: cockroachdb#133307

Release note: None
@dhartunian
Copy link
Collaborator

A few things I tried today that I'm not confident in but will revisit:

Modifying code from this:

func (opts *spanOptions) recordingType() tracingpb.RecordingType {
	if opts.recordingTypeExplicit {
		return opts.recordingTypeOpt
	}

	var recordingType tracingpb.RecordingType
	if !opts.Parent.empty() && !opts.Parent.IsNoop() {
		recordingType = opts.Parent.i.crdb.recordingType()
	} else if !opts.RemoteParent.Empty() {
		recordingType = opts.RemoteParent.recordingType
	}
	if recordingType < opts.minRecordingTypeOpt {
		recordingType = opts.minRecordingTypeOpt
	}
	return recordingType
}

to this:

func (opts *spanOptions) recordingType() tracingpb.RecordingType {
	if opts.recordingTypeExplicit {
		return opts.recordingTypeOpt
	}

	var recordingType tracingpb.RecordingType
	if opts.Parent.NotEmptyNotNoop() { // combining checks avoids a copy
		recordingType = opts.Parent.i.crdb.recordingType()
	} else if !opts.RemoteParentEmpty { // This is a bool now, avoid struct copy
		recordingType = opts.RemoteParent.recordingType
	}
	if recordingType < opts.minRecordingTypeOpt {
		recordingType = opts.minRecordingTypeOpt
	}
	return recordingType
}

This felt like making a bigger mess by duplicating logic and adding more fields to a struct thats gets copied often. I think I'll try to make these values precomputed instead. The SpanOptions struct should just know its recording type after construction instead of requiring so much dynamic lookup.

tbg pushed a commit to tbg/cockroach that referenced this issue Dec 9, 2024
This change attempts to reduce the impact of `StartSpanCtx` when
called with tracing disabled.

A new fast path is introduced via the `spanSpanFast` method which
avoids the more complex logic within `startSpanGeneric`.

Additionally, the regexp-based verbose tracing is modified to use an
atomic pointer instead of a mutex, and the `SpanOption` iterator is
modified to avoid extra copies.

Stats for the disabled tracing benchmark:

Before:
```
BenchmarkStartSpan/start_span:_1         	  100000	        29.30 ns/op	       0 B/op	       0 allocs/op
```

After:
```
BenchmarkStartSpan/start_span:_1         	  100000	        18.89 ns/op	       0 B/op	       0 allocs/op
```

Part of: cockroachdb#133307

Release note: None
tbg pushed a commit to tbg/cockroach that referenced this issue Dec 9, 2024
This change attempts to reduce the impact of `StartSpanCtx` when
called with tracing disabled.

A new fast path is introduced via the `spanSpanFast` method which
avoids the more complex logic within `startSpanGeneric`.

Additionally, the regexp-based verbose tracing is modified to use an
atomic pointer instead of a mutex, and the `SpanOption` iterator is
modified to avoid extra copies.

Stats for the disabled tracing benchmark:

Before:
```
BenchmarkStartSpan/start_span:_1         	  100000	        29.30 ns/op	       0 B/op	       0 allocs/op
```

After:
```
BenchmarkStartSpan/start_span:_1         	  100000	        18.89 ns/op	       0 B/op	       0 allocs/op
```

Part of: cockroachdb#133307

Release note: None
dhartunian added a commit to dhartunian/cockroach that referenced this issue Dec 9, 2024
This change attempts to reduce the impact of `StartSpanCtx` when
called with tracing disabled.

A new fast path is introduced via the `spanSpanFast` method which
avoids the more complex logic within `startSpanGeneric`.

Additionally, the regexp-based verbose tracing is modified to use an
atomic pointer instead of a mutex, and the `SpanOption` iterator is
modified to avoid extra copies.

Below are benchmark results which show the most impact on `opts=none` cpu.
```
❯ benchdiff  ./pkg/util/tracing --old ':/consolidate regexp bench' -b -r 'BenchmarkTracer_StartSpanCtx' -d 100000x -c 10
test binaries already exist for a2746cd: tracing: consolidate regexp bench into BenchmarkTr
test binaries already exist for 41fdcda: tracing: add startSpanFast for optimized path when

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/util/tracing \

name                                                                            old time/op    new time/op    delta
Tracer_StartSpanCtx/opts=none/parallel=true-10                                     164ns ±10%      16ns ±64%  -90.12%  (p=0.000 n=10+10)
Tracer_StartSpanCtx/opts=none/parallel=false-10                                   56.1ns ± 5%    47.7ns ± 5%  -14.96%  (p=0.000 n=10+10)
Tracer_StartSpanCtx/opts=real/parallel=true-10                                     584ns ± 3%     546ns ±20%   -6.51%  (p=0.002 n=8+9)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10                          636ns ± 5%     615ns ± 4%   -3.37%  (p=0.014 n=10+10)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10                         318ns ± 3%     308ns ± 3%   -2.92%  (p=0.001 n=10+9)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10       473ns ± 3%     460ns ± 1%   -2.79%  (p=0.000 n=10+8)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10      669ns ± 4%     651ns ± 2%   -2.71%  (p=0.014 n=9+9)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10     353ns ± 2%     348ns ± 4%   -1.32%  (p=0.019 n=9+9)
Tracer_StartSpanCtx/opts=real/parallel=false-10                                    285ns ± 3%     284ns ± 1%     ~     (p=0.923 n=10+9)
Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10                             303ns ± 4%     303ns ± 3%     ~     (p=0.811 n=10+10)
Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10                              590ns ± 2%     585ns ± 3%     ~     (p=0.190 n=9+9)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10                       338ns ± 3%     333ns ± 3%     ~     (p=0.052 n=10+10)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10                        651ns ± 4%     639ns ± 4%     ~     (p=0.052 n=10+10)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10        910ns ±18%     880ns ±24%     ~     (p=0.436 n=10+10)
Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10                              741ns ± 3%     808ns ± 2%   +9.10%  (p=0.000 n=9+10)
Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10                             500ns ± 1%     722ns ± 1%  +44.49%  (p=0.000 n=8+8)

name                                                                            old alloc/op   new alloc/op   delta
Tracer_StartSpanCtx/opts=none/parallel=false-10                                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=none/parallel=true-10                                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=false-10                                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=true-10                                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10                             48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10                              48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10                         48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10                          48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10                       48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10                        48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10       96.0B ± 0%     96.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10        96.0B ± 0%     96.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10      48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10                             48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10                              51.0B ± 0%     51.0B ± 0%     ~     (all equal)

name                                                                            old allocs/op  new allocs/op  delta
Tracer_StartSpanCtx/opts=none/parallel=false-10                                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=none/parallel=true-10                                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=false-10                                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=true-10                                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10                              1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10                               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10                          1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10                           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10                        1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10                         1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10        2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10         2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10       1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10                              1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10                               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
```

Part of: cockroachdb#133307
Epic: CRDB-42854

Release note: None
dhartunian added a commit to dhartunian/cockroach that referenced this issue Dec 9, 2024
This change attempts to reduce the impact of `StartSpanCtx` when
called with tracing disabled.

A new fast path is introduced via the `spanSpanFast` method which
avoids the more complex logic within `startSpanGeneric`.

Additionally, the regexp-based verbose tracing is modified to use an
atomic pointer instead of a mutex, and the `SpanOption` iterator is
modified to avoid extra copies.

Below are benchmark results which show the most impact on `opts=none` cpu.
```
❯ benchdiff  ./pkg/util/tracing --old ':/consolidate regexp bench' -b -r 'BenchmarkTracer_StartSpanCtx' -d 100000x -c 10
test binaries already exist for a2746cd: tracing: consolidate regexp bench into BenchmarkTr
test binaries already exist for 41fdcda: tracing: add startSpanFast for optimized path when

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/util/tracing \

name                                                                            old time/op    new time/op    delta
Tracer_StartSpanCtx/opts=none/parallel=true-10                                     164ns ±10%      16ns ±64%  -90.12%  (p=0.000 n=10+10)
Tracer_StartSpanCtx/opts=none/parallel=false-10                                   56.1ns ± 5%    47.7ns ± 5%  -14.96%  (p=0.000 n=10+10)
Tracer_StartSpanCtx/opts=real/parallel=true-10                                     584ns ± 3%     546ns ±20%   -6.51%  (p=0.002 n=8+9)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10                          636ns ± 5%     615ns ± 4%   -3.37%  (p=0.014 n=10+10)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10                         318ns ± 3%     308ns ± 3%   -2.92%  (p=0.001 n=10+9)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10       473ns ± 3%     460ns ± 1%   -2.79%  (p=0.000 n=10+8)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10      669ns ± 4%     651ns ± 2%   -2.71%  (p=0.014 n=9+9)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10     353ns ± 2%     348ns ± 4%   -1.32%  (p=0.019 n=9+9)
Tracer_StartSpanCtx/opts=real/parallel=false-10                                    285ns ± 3%     284ns ± 1%     ~     (p=0.923 n=10+9)
Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10                             303ns ± 4%     303ns ± 3%     ~     (p=0.811 n=10+10)
Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10                              590ns ± 2%     585ns ± 3%     ~     (p=0.190 n=9+9)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10                       338ns ± 3%     333ns ± 3%     ~     (p=0.052 n=10+10)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10                        651ns ± 4%     639ns ± 4%     ~     (p=0.052 n=10+10)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10        910ns ±18%     880ns ±24%     ~     (p=0.436 n=10+10)
Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10                              741ns ± 3%     808ns ± 2%   +9.10%  (p=0.000 n=9+10)
Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10                             500ns ± 1%     722ns ± 1%  +44.49%  (p=0.000 n=8+8)

name                                                                            old alloc/op   new alloc/op   delta
Tracer_StartSpanCtx/opts=none/parallel=false-10                                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=none/parallel=true-10                                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=false-10                                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=true-10                                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10                             48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10                              48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10                         48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10                          48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10                       48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10                        48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10       96.0B ± 0%     96.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10        96.0B ± 0%     96.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10      48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10                             48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10                              51.0B ± 0%     51.0B ± 0%     ~     (all equal)

name                                                                            old allocs/op  new allocs/op  delta
Tracer_StartSpanCtx/opts=none/parallel=false-10                                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=none/parallel=true-10                                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=false-10                                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=true-10                                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10                              1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10                               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10                          1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10                           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10                        1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10                         1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10        2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10         2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10       1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10                              1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10                               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
```

Part of: cockroachdb#133307
Epic: CRDB-42854

Release note: None
craig bot pushed a commit that referenced this issue Dec 10, 2024
136931: tracing: add `startSpanFast` for optimized path when disabled r=tbg a=dhartunian

This change attempts to reduce the impact of `StartSpanCtx` when
called with tracing disabled.

A new fast path is introduced via the `spanSpanFast` method which
avoids the more complex logic within `startSpanGeneric`.

Additionally, the regexp-based verbose tracing is modified to use an
atomic pointer instead of a mutex, and the `SpanOption` iterator is
modified to avoid extra copies.

Below are benchmark results which show the most impact on `opts=none` cpu.
```
❯ benchdiff  ./pkg/util/tracing --old ':/consolidate regexp bench' -b -r 'BenchmarkTracer_StartSpanCtx' -d 100000x -c 10
test binaries already exist for a2746cd: tracing: consolidate regexp bench into BenchmarkTr
test binaries already exist for 41fdcda: tracing: add startSpanFast for optimized path when

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/util/tracing \

name                                                                            old time/op    new time/op    delta
Tracer_StartSpanCtx/opts=none/parallel=true-10                                     164ns ±10%      16ns ±64%  -90.12%  (p=0.000 n=10+10)
Tracer_StartSpanCtx/opts=none/parallel=false-10                                   56.1ns ± 5%    47.7ns ± 5%  -14.96%  (p=0.000 n=10+10)
Tracer_StartSpanCtx/opts=real/parallel=true-10                                     584ns ± 3%     546ns ±20%   -6.51%  (p=0.002 n=8+9)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10                          636ns ± 5%     615ns ± 4%   -3.37%  (p=0.014 n=10+10)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10                         318ns ± 3%     308ns ± 3%   -2.92%  (p=0.001 n=10+9)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10       473ns ± 3%     460ns ± 1%   -2.79%  (p=0.000 n=10+8)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10      669ns ± 4%     651ns ± 2%   -2.71%  (p=0.014 n=9+9)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10     353ns ± 2%     348ns ± 4%   -1.32%  (p=0.019 n=9+9)
Tracer_StartSpanCtx/opts=real/parallel=false-10                                    285ns ± 3%     284ns ± 1%     ~     (p=0.923 n=10+9)
Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10                             303ns ± 4%     303ns ± 3%     ~     (p=0.811 n=10+10)
Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10                              590ns ± 2%     585ns ± 3%     ~     (p=0.190 n=9+9)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10                       338ns ± 3%     333ns ± 3%     ~     (p=0.052 n=10+10)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10                        651ns ± 4%     639ns ± 4%     ~     (p=0.052 n=10+10)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10        910ns ±18%     880ns ±24%     ~     (p=0.436 n=10+10)
Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10                              741ns ± 3%     808ns ± 2%   +9.10%  (p=0.000 n=9+10)
Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10                             500ns ± 1%     722ns ± 1%  +44.49%  (p=0.000 n=8+8)

name                                                                            old alloc/op   new alloc/op   delta
Tracer_StartSpanCtx/opts=none/parallel=false-10                                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=none/parallel=true-10                                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=false-10                                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=true-10                                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10                             48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10                              48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10                         48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10                          48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10                       48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10                        48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10       96.0B ± 0%     96.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10        96.0B ± 0%     96.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10      48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10                             48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10                              51.0B ± 0%     51.0B ± 0%     ~     (all equal)

name                                                                            old allocs/op  new allocs/op  delta
Tracer_StartSpanCtx/opts=none/parallel=false-10                                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=none/parallel=true-10                                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=false-10                                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real/parallel=true-10                                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=false-10                              1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,logtag/parallel=true-10                               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=false-10                          1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent/parallel=true-10                           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=false-10                        1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent/parallel=true-10                         1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=false-10        2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,autoparent,withEventListener/parallel=true-10         2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=false-10      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,manualparent,withEventListener/parallel=true-10       1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=false-10                              1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Tracer_StartSpanCtx/opts=real,regexp/parallel=true-10                               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
```

Part of: #133307
Epic: CRDB-42854

Release note: None

Co-authored-by: David Hartunian <[email protected]>
@ajwerner
Copy link
Contributor

One note on #135567 is that it won't work for folks using the cockroach-go library or who have implemented retry loops as the docs recommend. Namely, we also will never thing we've sampled RELEASE SAVEPOINT cockroach_restart, which happens to magically internally perform a commit. Same goes for the equally magical SHOW COMMIT TIMESTAMP statement.

xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Dec 27, 2024
For SQL stats we trace every new fingerprint in an application
container in order to populate more statistics that are only available
via tracing for each fingerprint. This sampling logic was based on
tracking the SQL fingerprint strings encountered by the application
and did not factor in fingerprints that are not tracked by SQL stats,
such as TCL statements (BEGIN, COMMIT, ROLLBACK) were being treated
as new fingerprints on each execution and resulting in tracing being
turned on. This commit ensures that we don't consider TCL statements
in this sampling strategy..

```
name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-10    5.61ms ± 3%    5.69ms ± 4%    ~     (p=0.167 n=8+9)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-10    2.15MB ± 2%    2.12MB ± 2%  -1.24%  (p=0.029 n=10+10)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_write-10     10.7k ± 1%     10.5k ± 0%  -1.69%  (p=0.000 n=10+9)
```

Epic: none
Part of: cockroachdb#133307

Release note: None
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Dec 27, 2024
For SQL stats we trace every new fingerprint in an application
container in order to populate statistics that are only available
via tracing for each fingerprint. This sampling logic worked by
tracking the SQL fingerprint strings encountered by each application
but did not factor in fingerprints that are not tracked by SQL stats,
such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated
as new fingerprints on each execution which resulted in tracing being
turned on. This commit ensures that we don't consider TCL statements
in this sampling strategy..

```
name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-10    5.61ms ± 3%    5.69ms ± 4%    ~     (p=0.167 n=8+9)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-10    2.15MB ± 2%    2.12MB ± 2%  -1.24%  (p=0.029 n=10+10)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_write-10     10.7k ± 1%     10.5k ± 0%  -1.69%  (p=0.000 n=10+9)
```

Epic: none
Part of: cockroachdb#133307

Release note: None
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Dec 27, 2024
For SQL stats we trace every new fingerprint in an application
container in order to populate statistics that are only available
via tracing for each fingerprint. This sampling logic worked by
tracking the SQL fingerprint strings encountered by each application
but did not factor in fingerprints that are not tracked by SQL stats,
such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated
as new fingerprints on each execution which resulted in tracing being
turned on. This commit ensures that we don't consider TCL statements
in this sampling strategy.

```
name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-10    5.61ms ± 3%    5.69ms ± 4%    ~     (p=0.167 n=8+9)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-10    2.15MB ± 2%    2.12MB ± 2%  -1.24%  (p=0.029 n=10+10)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_write-10     10.7k ± 1%     10.5k ± 0%  -1.69%  (p=0.000 n=10+9)
```

Epic: none
Part of: cockroachdb#133307

Release note: None
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Dec 30, 2024
For SQL stats we trace every new fingerprint in an application
container in order to populate statistics that are only available
via tracing for each fingerprint. This sampling logic worked by
tracking the SQL fingerprint strings encountered by each application
but did not factor in fingerprints that are not tracked by SQL stats,
such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated
as new fingerprints on each execution which resulted in tracing being
turned on. This commit ensures that we don't consider TCL statements
in this sampling strategy. TestSampledStatsCollectionOnNewFingerprint
is also fixed as it was previously assuming that previously seen
statements in new transactions would trigger this sampling behaviour
when it should not - this was being erronneously triggered by
`BEGIN`.

```
name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-10    5.61ms ± 3%    5.69ms ± 4%    ~     (p=0.167 n=8+9)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-10    2.15MB ± 2%    2.12MB ± 2%  -1.24%  (p=0.029 n=10+10)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_write-10     10.7k ± 1%     10.5k ± 0%  -1.69%  (p=0.000 n=10+9)
```

Epic: none
Part of: cockroachdb#133307

Release note: None
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Dec 30, 2024
For SQL stats we trace every new fingerprint in an application
container in order to populate statistics that are only available
via tracing for each fingerprint. This sampling logic worked by
tracking the SQL fingerprint strings encountered by each application
but did not factor in fingerprints that are not tracked by SQL stats,
such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated
as new fingerprints on each execution which resulted in tracing being
turned on. This commit ensures that we don't consider TCL statements
in this sampling strategy. TestSampledStatsCollectionOnNewFingerprint
is also fixed as it was assuming that previously seen statements in
new transactions would trigger this sampling behaviour when it should
not - this was being erronneously triggered by `BEGIN`.

```
name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-10    5.61ms ± 3%    5.69ms ± 4%    ~     (p=0.167 n=8+9)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-10    2.15MB ± 2%    2.12MB ± 2%  -1.24%  (p=0.029 n=10+10)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_write-10     10.7k ± 1%     10.5k ± 0%  -1.69%  (p=0.000 n=10+9)
```

Epic: none
Part of: cockroachdb#133307

Release note: None
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Jan 6, 2025
For SQL stats we trace every new fingerprint in an application
container in order to populate statistics that are only available
via tracing for each fingerprint. This sampling logic worked by
tracking the SQL fingerprint strings encountered by each application
but did not factor in fingerprints that are not tracked by SQL stats,
such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated
as new fingerprints on each execution which resulted in tracing being
turned on. This commit ensures that we don't consider TCL statements
in this sampling strategy. TestSampledStatsCollectionOnNewFingerprint
is also fixed as it was assuming that previously seen statements in
new transactions would trigger this sampling behaviour when it should
not - this was being erronneously triggered by `BEGIN`.

```
name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-10    5.61ms ± 3%    5.69ms ± 4%    ~     (p=0.167 n=8+9)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-10    2.15MB ± 2%    2.12MB ± 2%  -1.24%  (p=0.029 n=10+10)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_write-10     10.7k ± 1%     10.5k ± 0%  -1.69%  (p=0.000 n=10+9)
```

Epic: none
Part of: cockroachdb#133307

Release note: None
craig bot pushed a commit that referenced this issue Jan 6, 2025
138041: sql: do not consider TCL stmts in sql stats tracing r=xinhaoz a=xinhaoz

For SQL stats we trace every new fingerprint in an application
container in order to populate statistics that are only available
via tracing for each fingerprint. This sampling logic worked by
tracking the SQL fingerprint strings encountered by each application
but did not factor in fingerprints that are not tracked by SQL stats,
such as TCL statements (BEGIN, COMMIT, ROLLBACK), were being treated
as new fingerprints on each execution which resulted in tracing being
turned on. This commit ensures that we don't consider TCL statements
in this sampling strategy. TestSampledStatsCollectionOnNewFingerprint
is also fixed as it was assuming that previously seen statements in
new transactions would trigger this sampling behaviour when it should
not - this was being erronneously triggered by `BEGIN`.

```
name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-10    5.61ms ± 3%    5.69ms ± 4%    ~     (p=0.167 n=8+9)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-10    2.15MB ± 2%    2.12MB ± 2%  -1.24%  (p=0.029 n=10+10)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_write-10     10.7k ± 1%     10.5k ± 0%  -1.69%  (p=0.000 n=10+9)
```

Epic: none
Part of: #133307

Release note: None

Co-authored-by: Xin Hao Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months T-observability
Projects
None yet
Development

No branches or pull requests

4 participants