sql: track bytes and rows read unconditionally for every statement #38002

yuzefovich · 2019-06-04T14:10:46Z

In order to propagate metrics about bytes and rows read we use
the existing tracing infrastructure, but, instead of using recording
spans, we use recordable ones. All processors get a recordable span,
but only table readers propagate their stats when full tracing is
disabled.

Addresses: #37409.

In order to see the performance hit, I ran kv workload with read only queries (I guess it should have the largest hit) and saw a decrease of around 3%. Here is a representative output of 30 seconds of load:

With stats:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
   30.0s        0         128368         4278.2      0.2      0.3      0.4      0.6     10.0

Without stats:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
   30.0s        0         131946         4398.1      0.2      0.3      0.4      0.6     10.0

Release note: None

cockroach-teamcity · 2019-06-04T14:11:09Z

This change is

yuzefovich · 2019-06-04T14:18:15Z

At the moment, these metrics are only exposed via storing them in crdb_internal.node_statement_statistics table. I think we should show them on the statement detail page, but probably that should be done in a separate PR.

I might not be fully understanding what I'm doing here, so please take a very critical look. Also cc @jordanlewis @andreimatei if you guys have time to take a look as well.

ajwerner · 2019-06-04T15:55:33Z

3% is sort of a bummer, I'm interested in digging in more to the perf implications

ajwerner

name             old ops/s   new ops/s   delta
KV99-throughput  42.1k ± 3%  40.3k ± 3%  -4.17%  (p=0.032 n=5+5)

name             old ms/s    new ms/s    delta
KV99-P50          8.10 ± 0%   8.60 ± 3%  +6.17%  (p=0.016 n=4+5)
KV99-Avg          2.64 ± 6%   2.74 ± 6%    ~     (p=0.333 n=5+5)

This is with longer runs on your later revision. We're going to need to look at the source of inefficiency. Hopefully we can pre-allocate or pool some objects to mitigate the cost.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @asubiotto)

asubiotto

This looks good to me minus the performance cost. There are several things you could try out:

In the tableReader, tags are set on the span. This is something unnecessary and that we can avoid, although we don't go into getSpans, which would be the more expensive part.
There is no need to create any spans, this is probably expensive as well. I would just create a RecordedSpan with the Stats field set when the tableReader returns metadata.

Reviewed 11 of 15 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)

pkg/sql/distsql_running.go, line 542 at r1 (raw file):

						r.rowsRead += trs.InputStats.NumRows
					} else {
						// TODO(yuzefovich): what should we do here?

I think this is fine

pkg/util/tracing/tracer_span.go, line 250 at r1 (raw file):

		stats, err := types.MarshalAny(s.mu.stats)
		if err != nil {
			// TODO(yuzefovich): what should we do here?

getSpans panics as well.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @asubiotto)

pkg/sql/distsql_running.go, line 542 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I think this is fine

Removed the changes.

pkg/sql/distsqlpb/data.go, line 216 at r2 (raw file):

Previously, ajwerner wrote…

My preference would be that this pool not to be exported and instead have strongly typed functions.

Done.

pkg/sql/distsqlrun/tablereader.go, line 363 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Is it necessary to add a Metrics field to metadata instead of just doing something like distsqlpb.ProducerMetadata{TraceData: []RecordedSpan{Stats: &TableReaderStats{BytesRead: bytes, RowsRead: rows}}}. We can use a RecordedSpan pool. In any case, it would be good to make sure that these bytes are also added to the EXPLAIN ANALYZE plan in case of tracing.

The reasoning for adding Metrics field was attempt at removing one marshaling/unmarshaling conversion, but the benchmarks do not show significant changes because of that, so I removed it as we discussed.

Bytes read do show up in the output of EXPLAIN ANALYZE since they are added to TableReaderStats in outputStatsToTrace.

pkg/sql/distsqlrun/tablereader.go, line 377 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Why bother with a metricsCollector? Since this is unconditional, we could just have the tableReader keep track of the number of rows read.

Indeed, I feel so silly :/

pkg/util/tracing/tracer_span.go, line 250 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

getSpans panics as well.

Removed the changes.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @asubiotto)

pkg/sql/distsqlrun/tablereader.go, line 363 at r3 (raw file):

Previously, yuzefovich wrote…

The reasoning for adding Metrics field was attempt at removing one marshaling/unmarshaling conversion, but the benchmarks do not show significant changes because of that, so I removed it as we discussed.

Bytes read do show up in the output of EXPLAIN ANALYZE since they are added to TableReaderStats in outputStatsToTrace.

My initial change was incomplete - the tests were failing due to duplicate spans. My understanding is that when we leave SpanID field of RecordedSpan uninitialized, it has value of 0 which belongs to the root span which breaks some assumptions. I think now I have fixed the issue, but rerunning the benchmarks shows slight performance decrease (I believe due to marshaling/unmarshaling as I was thinking earlier). Here are the numbers on kv99 ran 4 times for 4 minutes with 10 seconds ramp ("old" one is using Metrics field and "new" one is this PR at the moment):

gcp
name             old ops/s   new ops/s   delta
KV99-throughput  17.8k ± 2%  17.0k ± 3%  -4.39%  (p=0.029 n=4+4)

name             old ms/s    new ms/s    delta
KV99-P50          19.7 ± 7%   20.7 ± 4%    ~     (p=0.257 n=4+4)
KV99-Avg          6.00 ± 0%   6.38 ± 4%  +6.25%  (p=0.029 n=4+4)

aws
name             old ops/s   new ops/s   delta
KV99-throughput  29.8k ± 1%  29.1k ± 3%    ~     (p=0.200 n=4+4)

name             old ms/s    new ms/s    delta
KV99-P50          11.8 ± 3%   11.8 ± 7%    ~     (p=1.000 n=4+4)
KV99-Avg          3.70 ± 0%   3.80 ± 0%  +2.70%  (p=0.029 n=4+4)

The numbers look quite noisy, but it seems that there is a 2-3% performance hit (in addition to about 1% performance hit on Metrics binary), so now I'm leaning towards going back to using Metrics field again.

asubiotto

Reviewed 1 of 12 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @asubiotto, and @yuzefovich)

pkg/sql/distsql_running.go, line 621 at r4 (raw file):

		}
	}
	meta.Release()

Where is meta being acquired from the pool?

pkg/sql/distsqlrun/tablereader.go, line 363 at r3 (raw file):

Previously, yuzefovich wrote…

My initial change was incomplete - the tests were failing due to duplicate spans. My understanding is that when we leave SpanID field of RecordedSpan uninitialized, it has value of 0 which belongs to the root span which breaks some assumptions. I think now I have fixed the issue, but rerunning the benchmarks shows slight performance decrease (I believe due to marshaling/unmarshaling as I was thinking earlier). Here are the numbers on kv99 ran 4 times for 4 minutes with 10 seconds ramp ("old" one is using Metrics field and "new" one is this PR at the moment):
gcp
name             old ops/s   new ops/s   delta
KV99-throughput  17.8k ± 2%  17.0k ± 3%  -4.39%  (p=0.029 n=4+4)

name             old ms/s    new ms/s    delta
KV99-P50          19.7 ± 7%   20.7 ± 4%    ~     (p=0.257 n=4+4)
KV99-Avg          6.00 ± 0%   6.38 ± 4%  +6.25%  (p=0.029 n=4+4)
aws
name             old ops/s   new ops/s   delta
KV99-throughput  29.8k ± 1%  29.1k ± 3%    ~     (p=0.200 n=4+4)

name             old ms/s    new ms/s    delta
KV99-P50          11.8 ± 3%   11.8 ± 7%    ~     (p=1.000 n=4+4)
KV99-Avg          3.70 ± 0%   3.80 ± 0%  +2.70%  (p=0.029 n=4+4)
The numbers look quite noisy, but it seems that there is a 2-3% performance hit (in addition to about 1% performance hit on Metrics binary), so now I'm leaning towards going back to using Metrics field again.

Going back to using the Metrics field sounds like a good call

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @asubiotto)

pkg/sql/distsql_running.go, line 621 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Where is meta being acquired from the pool?

Within generateMeta of tableReader.

pkg/sql/distsqlrun/tablereader.go, line 363 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Going back to using the Metrics field sounds like a good call

Ok, I restored that version. Please take - hopefully last - look.

asubiotto

Reviewed 3 of 12 files at r4, 9 of 9 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @yuzefovich)

pkg/sql/distsql_running.go, line 621 at r4 (raw file):

Previously, yuzefovich wrote…

Within generateMeta of tableReader.

But then you'd also want to acquire and release it in inbound streams and outboxes in the case of distributed flows, right?

pkg/sql/distsqlpb/data.go, line 226 at r5 (raw file):

// Release is part of Releasable interface. Note that we do not explicitly
// reset the values of the fields since those will be correctly assigned during

I think it's convention to reset them though (e.g. table reader specs)

pkg/sql/distsqlrun/tablereader.go, line 303 at r5 (raw file):

		}

		tr.rowsRead++

I would add a comment that rowsRead are counted twice when tracing (I think). Once in the InputStats, then again here.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @asubiotto)

pkg/sql/distsql_running.go, line 621 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

But then you'd also want to acquire and release it in inbound streams and outboxes in the case of distributed flows, right?

Hm, nice catch. You're right - I think, as it was before, I would only release the meta to the pool on the gateway node, but I'd be getting it from the pool on all nodes. I added another acquisition/release in stream encoder/decoder which should be (?) sufficient.

The change seems a little dirty in RemoteProducerMetaToLocalMeta since we want to allocate a new ProducerMetadata object in all cases except for Metrics one. Let me know if you see a cleaner way to go about it (I could extract a separate type check for metrics to put before the type switch, but that seems inefficient and not cleaner).

Let's see what the build says about that change.

pkg/sql/distsqlpb/data.go, line 226 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I think it's convention to reset them though (e.g. table reader specs)

Ok, done.

pkg/sql/distsqlrun/tablereader.go, line 303 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I would add a comment that rowsRead are counted twice when tracing (I think). Once in the InputStats, then again here.

Done.

asubiotto

minus consolidation of metrics release into meta release.

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @asubiotto, and @yuzefovich)

pkg/sql/distsql_running.go, line 529 at r6 (raw file):

			r.bytesRead += meta.Metrics.BytesRead
			r.rowsRead += meta.Metrics.RowsRead
			meta.Metrics.Release()

It might make sense to do something similar to ReleaseSetupFlowRequest, where that takes care of checking and releasing fields instead of having to remember to do so.

We propagate metrics about bytes and rows read unconditionally by emitting appropriate metadata when table readers are drained. Release note: None

yuzefovich · 2019-06-19T17:22:22Z

As discussed with Alfonso, two objects have different lifetimes, so it is easier to keep two Release methods separate.

I kicked off another testing run, and the numbers look good (by mistake, both clusters were on GCP):

name             old ops/s   new ops/s   delta
KV99-throughput  19.7k ± 2%  19.5k ± 2%   ~     (p=0.886 n=4+4)

name             old ms/s    new ms/s    delta
KV99-P50          17.6 ± 8%   17.6 ± 4%   ~     (p=1.000 n=4+4)
KV99-Avg          5.57 ± 4%   5.57 ± 4%   ~     (p=1.000 n=4+4)

name             old ops/s   new ops/s   delta
KV99-throughput  19.0k ± 2%  19.5k ± 2%   ~     (p=0.114 n=4+4)

name             old ms/s    new ms/s    delta
KV99-P50          19.6 ± 4%   18.1 ± 5%   ~     (p=0.057 n=4+4)
KV99-Avg          5.35 ± 3%   5.50 ± 0%   ~     (p=0.429 n=4+4)

It sees like there is no noticeable performance hit (my explanation is that this commit improves things with pooling some allocations and gives a performance hit due to extra bookkeeping and metadata propagation which about cancel each other out).

yuzefovich

Thanks for the reviews!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajwerner and @asubiotto)

pkg/sql/distsql_running.go, line 529 at r6 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

It might make sense to do something similar to ReleaseSetupFlowRequest, where that takes care of checking and releasing fields instead of having to remember to do so.

It seems to me that because of different lifetimes, it'd more complicated to do something similar. I'll look into doing some refactoring around usage of ProducerMetadata this Friday since it seems like we could benefit from pooling those, so I'll merge this as is for now.

38002: sql: track bytes and rows read unconditionally for every statement r=yuzefovich a=yuzefovich In order to propagate metrics about bytes and rows read we use the existing tracing infrastructure, but, instead of using recording spans, we use recordable ones. All processors get a recordable span, but only table readers propagate their stats when full tracing is disabled. Addresses: #37409. In order to see the performance hit, I ran `kv` workload with read only queries (I guess it should have the largest hit) and saw a decrease of around 3%. Here is a representative output of 30 seconds of load: ``` With stats: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 30.0s 0 128368 4278.2 0.2 0.3 0.4 0.6 10.0 Without stats: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 30.0s 0 131946 4398.1 0.2 0.3 0.4 0.6 10.0 ``` Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>

craig · 2019-06-19T17:49:36Z

Build succeeded

GitHub CI (Cockroach)

yuzefovich requested review from ajwerner, asubiotto and a team June 4, 2019 14:10

yuzefovich force-pushed the metrics branch from 817864f to 13c1051 Compare June 4, 2019 14:13

yuzefovich force-pushed the metrics branch 2 times, most recently from 067aa07 to 0ba41e7 Compare June 4, 2019 15:49

yuzefovich force-pushed the metrics branch from 0ba41e7 to bd9c853 Compare June 4, 2019 17:35

ajwerner reviewed Jun 4, 2019

View reviewed changes

asubiotto suggested changes Jun 5, 2019

View reviewed changes

yuzefovich force-pushed the metrics branch 3 times, most recently from b17b0dc to 34974ef Compare June 5, 2019 21:53

yuzefovich requested a review from a team June 5, 2019 21:53

yuzefovich force-pushed the metrics branch 11 times, most recently from 4de3b7a to ec8d4e2 Compare June 6, 2019 22:25

yuzefovich commented Jun 17, 2019

View reviewed changes

yuzefovich force-pushed the metrics branch from c99e5c0 to 7afeceb Compare June 17, 2019 21:50

yuzefovich added the do-not-merge bors won't merge a PR with this label. label Jun 17, 2019

yuzefovich force-pushed the metrics branch 4 times, most recently from ca84892 to ca4728d Compare June 17, 2019 23:32

yuzefovich commented Jun 18, 2019

View reviewed changes

yuzefovich force-pushed the metrics branch from ca4728d to 75bf175 Compare June 18, 2019 03:49

asubiotto reviewed Jun 18, 2019

View reviewed changes

yuzefovich force-pushed the metrics branch from 75bf175 to b9fb6ff Compare June 18, 2019 16:01

yuzefovich commented Jun 18, 2019

View reviewed changes

yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jun 18, 2019

asubiotto reviewed Jun 18, 2019

View reviewed changes

yuzefovich force-pushed the metrics branch 2 times, most recently from 3a0a19c to 69e47c3 Compare June 18, 2019 19:58

yuzefovich commented Jun 18, 2019

View reviewed changes

yuzefovich force-pushed the metrics branch 2 times, most recently from bac349d to d374e87 Compare June 19, 2019 01:54

asubiotto approved these changes Jun 19, 2019

View reviewed changes

sql: track bytes and rows read unconditionally for every statement

09ddc4e

We propagate metrics about bytes and rows read unconditionally by emitting appropriate metadata when table readers are drained. Release note: None

yuzefovich force-pushed the metrics branch from d374e87 to 09ddc4e Compare June 19, 2019 16:29

yuzefovich commented Jun 19, 2019

View reviewed changes

craig bot merged commit 09ddc4e into cockroachdb:master Jun 19, 2019

yuzefovich mentioned this pull request Jun 19, 2019

sql: provide metrics for bytes and rows read on all successful queries #37409

Closed

yuzefovich deleted the metrics branch June 25, 2019 02:22

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: track bytes and rows read unconditionally for every statement #38002

sql: track bytes and rows read unconditionally for every statement #38002

yuzefovich commented Jun 4, 2019

cockroach-teamcity commented Jun 4, 2019

yuzefovich commented Jun 4, 2019

ajwerner commented Jun 4, 2019

ajwerner left a comment

asubiotto left a comment

yuzefovich left a comment

yuzefovich left a comment

asubiotto left a comment

yuzefovich left a comment

asubiotto left a comment

yuzefovich left a comment

asubiotto left a comment

yuzefovich commented Jun 19, 2019

yuzefovich left a comment

craig bot commented Jun 19, 2019

sql: track bytes and rows read unconditionally for every statement #38002

sql: track bytes and rows read unconditionally for every statement #38002

Conversation

yuzefovich commented Jun 4, 2019

cockroach-teamcity commented Jun 4, 2019

yuzefovich commented Jun 4, 2019

ajwerner commented Jun 4, 2019

ajwerner left a comment

Choose a reason for hiding this comment

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich commented Jun 19, 2019

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Jun 19, 2019

Build succeeded