sql: fix inflated "overhead" in statement timings #53846

solongordon · 2020-09-02T18:37:50Z

Release justification: Low-risk, high-reward fix to existing
functionality.

Release note (admin ui change): In some cases, the Execution Stats page
would show a confusingly high Overhead latency for a statement. This
could happen due to multiple statements being parsed together or due to
statement execution being retried. To avoid this, we stop considering
the time between when parsing ends and execution begins when determining
service latency.

Fixes cockroachdb#40675 Fixes cockroachdb#50108 Release justification: Low-risk, high-reward fix to existing functionality. Release note (admin ui change): In some cases, the Execution Stats page would show a confusingly high Overhead latency for a statement. This could happen due to multiple statements being parsed together or due to statement execution being retried. To avoid this, we stop considering the time between when parsing ends and execution begins when determining service latency.

cockroach-teamcity · 2020-09-02T18:37:57Z

This change is

arulajmani

I'm not completely in favour of changing the service latency definition to remove time spent retrying a statement -- often this can be big, and it seems wrong to swallow it here. Additionally, this change would make the overhead bucket always be zero (as overhead bucket = service latency - (parse latency + plan latency + execution latency) ), which also seems wrong.

Considering the end goal is to remove confusion around this overhead bucket, I propose:

Let's keep the service latency as is.
Let's start recording the retry latency for statements as well (which is the same as what we record for the implicit transaction comprising the statement).
Change the overhead bucket calculation to be service latency - (parse latency + plan latency + exec latency + retry latency

The only thing this doesn't consider is the case where there are multiple statements on the same line -- but we don't handle that properly anyway. Once we consider them to be part of the same transaction, the service timings of A;B;C would be no different from BEGIN;A;B;C;COMMIT, so I think it's fine to have slightly weird service timings for multi-line statements in the meantime.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)

arulajmani · 2020-09-02T21:33:54Z

I was thinking about this a bit more, and this might be too late for this release, but I wonder if it makes sense to even track service latency at statement level. Service latency and overhead bucket should instead be tracked on a per-transaction basis. Instead of having a statements service latency, we should track the statement's processing latency (parseLat + planLat + execLat).

solongordon

I don't think tracking retry latency really makes sense at the statement level, since we only retry at the transaction level. For example consider this (contrived) example:

BEGIN;
SELECT 1;
SELECT crdb_internal.force_retry('5s');
END;

Both statements are going to get retried for 5 seconds. But it's confusing to tell users that SELECT 1 has a high retry latency since it never actually fails.

Ultimately I don't think we should be reporting retries at the statement level. It would make more sense to report how many times the statement fails with a retryable error.

I also think it's fine to make Overhead near-zero. (It's not quite zero after this change but I think it should be close.) I'm pretty sure the original point of the Overhead bucket was just to alert us if there was latency which was not being caught by the other buckets. The intention is for it to be very small.

Reviewable status: complete! 0 of 0 LGTMs obtained

solongordon

Yeah, I agree with your second comment and I think that's basically what this commit does. :) I think the remaining "overhead" would be the time between when planning ends and execution begins.

Reviewable status: complete! 0 of 0 LGTMs obtained

arulajmani

Yeah, ideally I would've liked service latency be called something different, so as to not confuse from the notion of servicing transactions. I'm not sure if that's a realistic ask, so if it isn't, this change is fine by me!

Also, do we need an overhead bucket for transaction metrics?

Reviewable status: complete! 0 of 0 LGTMs obtained

solongordon · 2020-09-03T14:42:07Z

Yeah I'm definitely open to renaming. Any suggestions? I think it would be pretty easy but maybe I'll open a separate PR for it if it touches a lot of files.

I think adding a concept of overhead to transactions would be more confusing than helpful right now.

arulajmani · 2020-09-03T19:28:44Z

If we do decide to rename, it'll require changes to crdb_internal.node_statement_statistics -- wasn't sure that was something we could do or not. As for suggestions, maybe something like runLatency? I think somewhere in the code we call this processingLatency as well, but I prefer runLatency more. Either way, +1 on the separate PR! :)

solongordon · 2020-09-03T19:33:17Z

OK, merging this and let's revisit the name change. TFTR!

bors r+

craig · 2020-09-03T20:59:58Z

Build succeeded:

GitHub CI (Cockroach)

solongordon requested review from arulajmani and a team September 2, 2020 18:37

arulajmani reviewed Sep 2, 2020

View reviewed changes

solongordon commented Sep 2, 2020

View reviewed changes

arulajmani approved these changes Sep 2, 2020

View reviewed changes

craig bot merged commit fcb30c7 into cockroachdb:master Sep 3, 2020

solongordon mentioned this pull request Sep 8, 2020

ui, sql: statement page mean latency can be negative #45011

Closed

solongordon deleted the overhead-fix branch September 14, 2020 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: fix inflated "overhead" in statement timings #53846

sql: fix inflated "overhead" in statement timings #53846

solongordon commented Sep 2, 2020

cockroach-teamcity commented Sep 2, 2020

arulajmani left a comment

arulajmani commented Sep 2, 2020

solongordon left a comment

solongordon left a comment

arulajmani left a comment

solongordon commented Sep 3, 2020

arulajmani commented Sep 3, 2020

solongordon commented Sep 3, 2020

craig bot commented Sep 3, 2020

sql: fix inflated "overhead" in statement timings #53846

sql: fix inflated "overhead" in statement timings #53846

Conversation

solongordon commented Sep 2, 2020

cockroach-teamcity commented Sep 2, 2020

arulajmani left a comment

Choose a reason for hiding this comment

arulajmani commented Sep 2, 2020

solongordon left a comment

Choose a reason for hiding this comment

solongordon left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

solongordon commented Sep 3, 2020

arulajmani commented Sep 3, 2020

solongordon commented Sep 3, 2020

craig bot commented Sep 3, 2020