-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: track bytes and rows read unconditionally for every statement #38002
Conversation
At the moment, these metrics are only exposed via storing them in I might not be fully understanding what I'm doing here, so please take a very critical look. Also cc @jordanlewis @andreimatei if you guys have time to take a look as well. |
067aa07
to
0ba41e7
Compare
3% is sort of a bummer, I'm interested in digging in more to the perf implications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name old ops/s new ops/s delta
KV99-throughput 42.1k ± 3% 40.3k ± 3% -4.17% (p=0.032 n=5+5)
name old ms/s new ms/s delta
KV99-P50 8.10 ± 0% 8.60 ± 3% +6.17% (p=0.016 n=4+5)
KV99-Avg 2.64 ± 6% 2.74 ± 6% ~ (p=0.333 n=5+5)
This is with longer runs on your later revision. We're going to need to look at the source of inefficiency. Hopefully we can pre-allocate or pool some objects to mitigate the cost.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @asubiotto)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me minus the performance cost. There are several things you could try out:
- In the tableReader, tags are set on the span. This is something unnecessary and that we can avoid, although we don't go into
getSpans
, which would be the more expensive part. - There is no need to create any spans, this is probably expensive as well. I would just create a RecordedSpan with the
Stats
field set when thetableReader
returns metadata.
Reviewed 11 of 15 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
pkg/sql/distsql_running.go, line 542 at r1 (raw file):
r.rowsRead += trs.InputStats.NumRows } else { // TODO(yuzefovich): what should we do here?
I think this is fine
pkg/util/tracing/tracer_span.go, line 250 at r1 (raw file):
stats, err := types.MarshalAny(s.mu.stats) if err != nil { // TODO(yuzefovich): what should we do here?
getSpans
panics as well.
b17b0dc
to
34974ef
Compare
4de3b7a
to
ec8d4e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @asubiotto)
pkg/sql/distsql_running.go, line 542 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I think this is fine
Removed the changes.
pkg/sql/distsqlpb/data.go, line 216 at r2 (raw file):
Previously, ajwerner wrote…
My preference would be that this pool not to be exported and instead have strongly typed functions.
Done.
pkg/sql/distsqlrun/tablereader.go, line 363 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Is it necessary to add a
Metrics
field to metadata instead of just doing something likedistsqlpb.ProducerMetadata{TraceData: []RecordedSpan{Stats: &TableReaderStats{BytesRead: bytes, RowsRead: rows}}}
. We can use aRecordedSpan
pool. In any case, it would be good to make sure that these bytes are also added to theEXPLAIN ANALYZE
plan in case of tracing.
The reasoning for adding Metrics
field was attempt at removing one marshaling/unmarshaling conversion, but the benchmarks do not show significant changes because of that, so I removed it as we discussed.
Bytes read do show up in the output of EXPLAIN ANALYZE
since they are added to TableReaderStats
in outputStatsToTrace
.
pkg/sql/distsqlrun/tablereader.go, line 377 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Why bother with a
metricsCollector
? Since this is unconditional, we could just have thetableReader
keep track of the number of rows read.
Indeed, I feel so silly :/
pkg/util/tracing/tracer_span.go, line 250 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
getSpans
panics as well.
Removed the changes.
ca84892
to
ca4728d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @asubiotto)
pkg/sql/distsqlrun/tablereader.go, line 363 at r3 (raw file):
Previously, yuzefovich wrote…
The reasoning for adding
Metrics
field was attempt at removing one marshaling/unmarshaling conversion, but the benchmarks do not show significant changes because of that, so I removed it as we discussed.Bytes read do show up in the output of
EXPLAIN ANALYZE
since they are added toTableReaderStats
inoutputStatsToTrace
.
My initial change was incomplete - the tests were failing due to duplicate spans. My understanding is that when we leave SpanID
field of RecordedSpan
uninitialized, it has value of 0 which belongs to the root span which breaks some assumptions. I think now I have fixed the issue, but rerunning the benchmarks shows slight performance decrease (I believe due to marshaling/unmarshaling as I was thinking earlier). Here are the numbers on kv99
ran 4 times for 4 minutes with 10 seconds ramp ("old" one is using Metrics
field and "new" one is this PR at the moment):
gcp
name old ops/s new ops/s delta
KV99-throughput 17.8k ± 2% 17.0k ± 3% -4.39% (p=0.029 n=4+4)
name old ms/s new ms/s delta
KV99-P50 19.7 ± 7% 20.7 ± 4% ~ (p=0.257 n=4+4)
KV99-Avg 6.00 ± 0% 6.38 ± 4% +6.25% (p=0.029 n=4+4)
aws
name old ops/s new ops/s delta
KV99-throughput 29.8k ± 1% 29.1k ± 3% ~ (p=0.200 n=4+4)
name old ms/s new ms/s delta
KV99-P50 11.8 ± 3% 11.8 ± 7% ~ (p=1.000 n=4+4)
KV99-Avg 3.70 ± 0% 3.80 ± 0% +2.70% (p=0.029 n=4+4)
The numbers look quite noisy, but it seems that there is a 2-3% performance hit (in addition to about 1% performance hit on Metrics
binary), so now I'm leaning towards going back to using Metrics
field again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 12 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @asubiotto, and @yuzefovich)
pkg/sql/distsql_running.go, line 621 at r4 (raw file):
} } meta.Release()
Where is meta
being acquired from the pool?
pkg/sql/distsqlrun/tablereader.go, line 363 at r3 (raw file):
Previously, yuzefovich wrote…
My initial change was incomplete - the tests were failing due to duplicate spans. My understanding is that when we leave
SpanID
field ofRecordedSpan
uninitialized, it has value of 0 which belongs to the root span which breaks some assumptions. I think now I have fixed the issue, but rerunning the benchmarks shows slight performance decrease (I believe due to marshaling/unmarshaling as I was thinking earlier). Here are the numbers onkv99
ran 4 times for 4 minutes with 10 seconds ramp ("old" one is usingMetrics
field and "new" one is this PR at the moment):gcp name old ops/s new ops/s delta KV99-throughput 17.8k ± 2% 17.0k ± 3% -4.39% (p=0.029 n=4+4) name old ms/s new ms/s delta KV99-P50 19.7 ± 7% 20.7 ± 4% ~ (p=0.257 n=4+4) KV99-Avg 6.00 ± 0% 6.38 ± 4% +6.25% (p=0.029 n=4+4)
aws name old ops/s new ops/s delta KV99-throughput 29.8k ± 1% 29.1k ± 3% ~ (p=0.200 n=4+4) name old ms/s new ms/s delta KV99-P50 11.8 ± 3% 11.8 ± 7% ~ (p=1.000 n=4+4) KV99-Avg 3.70 ± 0% 3.80 ± 0% +2.70% (p=0.029 n=4+4)
The numbers look quite noisy, but it seems that there is a 2-3% performance hit (in addition to about 1% performance hit on
Metrics
binary), so now I'm leaning towards going back to usingMetrics
field again.
Going back to using the Metrics
field sounds like a good call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @asubiotto)
pkg/sql/distsql_running.go, line 621 at r4 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Where is
meta
being acquired from the pool?
Within generateMeta
of tableReader
.
pkg/sql/distsqlrun/tablereader.go, line 363 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Going back to using the
Metrics
field sounds like a good call
Ok, I restored that version. Please take - hopefully last - look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 12 files at r4, 9 of 9 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @yuzefovich)
pkg/sql/distsql_running.go, line 621 at r4 (raw file):
Previously, yuzefovich wrote…
Within
generateMeta
oftableReader
.
But then you'd also want to acquire and release it in inbound streams and outboxes in the case of distributed flows, right?
pkg/sql/distsqlpb/data.go, line 226 at r5 (raw file):
// Release is part of Releasable interface. Note that we do not explicitly // reset the values of the fields since those will be correctly assigned during
I think it's convention to reset them though (e.g. table reader specs)
pkg/sql/distsqlrun/tablereader.go, line 303 at r5 (raw file):
} tr.rowsRead++
I would add a comment that rowsRead
are counted twice when tracing (I think). Once in the InputStats
, then again here.
3a0a19c
to
69e47c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @asubiotto)
pkg/sql/distsql_running.go, line 621 at r4 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
But then you'd also want to acquire and release it in inbound streams and outboxes in the case of distributed flows, right?
Hm, nice catch. You're right - I think, as it was before, I would only release the meta to the pool on the gateway node, but I'd be getting it from the pool on all nodes. I added another acquisition/release in stream encoder/decoder which should be (?) sufficient.
The change seems a little dirty in RemoteProducerMetaToLocalMeta
since we want to allocate a new ProducerMetadata
object in all cases except for Metrics
one. Let me know if you see a cleaner way to go about it (I could extract a separate type check for metrics to put before the type switch, but that seems inefficient and not cleaner).
Let's see what the build says about that change.
pkg/sql/distsqlpb/data.go, line 226 at r5 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I think it's convention to reset them though (e.g. table reader specs)
Ok, done.
pkg/sql/distsqlrun/tablereader.go, line 303 at r5 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I would add a comment that
rowsRead
are counted twice when tracing (I think). Once in theInputStats
, then again here.
Done.
bac349d
to
d374e87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minus consolidation of metrics release into meta release.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @asubiotto, and @yuzefovich)
pkg/sql/distsql_running.go, line 529 at r6 (raw file):
r.bytesRead += meta.Metrics.BytesRead r.rowsRead += meta.Metrics.RowsRead meta.Metrics.Release()
It might make sense to do something similar to ReleaseSetupFlowRequest
, where that takes care of checking and releasing fields instead of having to remember to do so.
We propagate metrics about bytes and rows read unconditionally by emitting appropriate metadata when table readers are drained. Release note: None
As discussed with Alfonso, two objects have different lifetimes, so it is easier to keep two I kicked off another testing run, and the numbers look good (by mistake, both clusters were on GCP):
It sees like there is no noticeable performance hit (my explanation is that this commit improves things with pooling some allocations and gives a performance hit due to extra bookkeeping and metadata propagation which about cancel each other out). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajwerner and @asubiotto)
pkg/sql/distsql_running.go, line 529 at r6 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
It might make sense to do something similar to
ReleaseSetupFlowRequest
, where that takes care of checking and releasing fields instead of having to remember to do so.
It seems to me that because of different lifetimes, it'd more complicated to do something similar. I'll look into doing some refactoring around usage of ProducerMetadata
this Friday since it seems like we could benefit from pooling those, so I'll merge this as is for now.
38002: sql: track bytes and rows read unconditionally for every statement r=yuzefovich a=yuzefovich In order to propagate metrics about bytes and rows read we use the existing tracing infrastructure, but, instead of using recording spans, we use recordable ones. All processors get a recordable span, but only table readers propagate their stats when full tracing is disabled. Addresses: #37409. In order to see the performance hit, I ran `kv` workload with read only queries (I guess it should have the largest hit) and saw a decrease of around 3%. Here is a representative output of 30 seconds of load: ``` With stats: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 30.0s 0 128368 4278.2 0.2 0.3 0.4 0.6 10.0 Without stats: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 30.0s 0 131946 4398.1 0.2 0.3 0.4 0.6 10.0 ``` Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
Build succeeded |
In order to propagate metrics about bytes and rows read we use
the existing tracing infrastructure, but, instead of using recording
spans, we use recordable ones. All processors get a recordable span,
but only table readers propagate their stats when full tracing is
disabled.
Addresses: #37409.
In order to see the performance hit, I ran
kv
workload with read only queries (I guess it should have the largest hit) and saw a decrease of around 3%. Here is a representative output of 30 seconds of load:Release note: None