Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: show latency with server #49450

Closed
RaduBerinde opened this issue May 22, 2020 · 6 comments · Fixed by #52233
Closed

cli: show latency with server #49450

RaduBerinde opened this issue May 22, 2020 · 6 comments · Fixed by #52233
Assignees
Labels
A-sql-ui Why is my query slow? C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@RaduBerinde
Copy link
Member

RaduBerinde commented May 22, 2020

We recently had some confusion related to measuring latency of queries from the interactive sql client. The client shows the end-to-end latency as measured from the client, so it includes any latency to the server. This is not obvious and/or one might not be aware that the latency to the server is significant.

It would help a lot if at the startup of the interactive client, we would print out the latency to the server (this could be obtained by using the latency of the query it runs already to get the current version).

@RaduBerinde RaduBerinde added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-cli labels May 22, 2020
@RaduBerinde
Copy link
Member Author

@a-entin points out that we may want to somehow show server-side latency of each query (or show both). Even show the time-to-first-result-row separately from the time-to-last-result-row.

I think these are great ideas, but we need to figure out the best way to get (or estimate) this information. We can avoid clutter by only showing the timing breakdown when these times are significantly different (e.g. big client-server latency).

@a-entin
Copy link

a-entin commented May 23, 2020

The doc says re time reporting in the built-in interactive SQL client... "Reveal the time a query takes to complete." This leaves much to interpretation. Through tests and observations, like timing of select 1, it's clear the "time" includes at least one network roundtrip, perhaps the time to spool the entire result set. This is a significant problem, b/c:

  • We are making ourselves look worse (slower) that we really are. Anti-marketing. We shoot ourselves in the foot with what we report in the built-in SQL client.
  • Above issue is amplified against CCloud cluster, when users [who maybe tuning/testing performance] are connected via low bandwidth public Internet. This does not help to advance CCloud offering.
  • It seems WebUI reports server side execution time, which is likely to materially disagree with time reported by the interactive sql client, thus creating a user confusion
  • The savvy competitors are not doing that (example? - memsql reports server side exec number only); we are giving competitors a gift

[Extreme yet real world] Example:

[email protected]:26257/uuiddb> select 1;
  ?column?
------------
         1
(1 row)
Time: 130.7164ms

Suggested behavior:
Report 2 numbers - the server side time before the first record of the result set is emitted to the client and the total time including the time to spool the complete result set. That would be informative and boosting attractiveness of operating in CC.

Example:
Vertica is a relevant example because it's PG derivative, originally using unmodified pqlib. While using the PG wire protocol, Vertica always reported execution time and total time to transmit the results set to the client separately:
Time: First fetch (1 row): 52.728 ms. All rows formatted: 52.881 ms

@awoods187 awoods187 added the A-sql-ui Why is my query slow? label Jun 8, 2020
arulajmani added a commit to arulajmani/cockroach that referenced this issue Jun 16, 2020
Previously, we did not show client-server latency for interactive sql
connections. Now, a line is printed at the top when an interactive sql
session begins which prints the client-server latency. Latency is
measured by the time it takes the "get cluster version" query to return.

Addresses cockroachdb#49450

Release note (cli change): cli now prints the client-server latency on
startup.
@arulajmani
Copy link
Collaborator

Dug into this a bit, and here's my preliminary analysis:

Currently, the time that we display is the end to end time it takes the client to send and receive back the result of a query. This means processing time on the server + the roundtrip. However, this does not include the time to render query results, which we track separately and alert on if the time it took to render is particularly high (higher than 1 second).

So in some sense, we are doing what Vertica is doing, except we aren't printing the time it takes to format rows when the time is trivially small.

The easiest fix involves printing the "client server latency" on startup, and measuring this by timing the query that gets the cluster version, as @RaduBerinde mentioned above. This may not be completely accurate, as this would include the timing for execution and planning of the query, but might suffice for all practical purposes -- we are timing a query that is relatively inexpensive to plan/execute, so the client/server latency will dominate. I've made a PR with this easy fix, which is linked above.

On the other hand, we do store a bunch of statistics like service latency, execution latency, parsing latency, planning latency etc. on the server side in StatsCollecter. So it is possible to calculate the true client-server latency for a particular query there. That being said, I don't think it's possible to stream such information over the PG wire protocol to display in an interactive SQL shell. Please let me know if my understanding here is incorrect.

@a-entin
Copy link

a-entin commented Jun 16, 2020

Have some concerns re printing the "client server latency" on startup, as a solution:

  1. May not always be possible/easy to calculate. Could increase a chance of confusion in case of large network latency and - particularly - large result set. Isn't it true that it may take several roundtrips to return the results set? Unclear how a user would do the calculations.
  2. It's easy to overlook 1 line in the beginning of a session. Imagine a user in a long running session trying different things... Chances are, a new prospect/user wouldn't right down the number report in line one, not it's obvious that number would have to be subtracted.
  3. Perhaps a user could figure out the "true" query execution time by doing time arithmetic with "client server latency" (per above, it may not always be possible)... wouldn't we rather promote a good first impression with best possible number clearly reported vs. obscuring the number in a "puzzle"? I think the product should present itself (yes, self market) in the best possible light and message that in the boldest, easiest to consume, most impactful way.

The first number, i.e. server side query execution should only include the time from beginning of the execution to the point the first record in the result set is emitted from the engine. No need to include even the time to buffer the result set on the server side.

Note that the time reported in interactive SQL should closely correlate to the time report in Web UI

@arulajmani
Copy link
Collaborator

Copying @knz 's comment on the PR linked above here:

Looked into this into the context of the original issue.
I think in the commit message / PR description you can't yet write "Addresses" but instead "Informs"

(Also, nit: github has magic words to auto-close a linked issue; the words include "Fixes" or "Closes" but not "Addresses". You recognize a magic words when it gets underlined on the web display.)

Now at a higher level I think you're not being ambitious enough in this PR. The original issue makes several points which I think you can achieve here:

display the time to receive the 1st row of result separately from the time to receive the last row, and possibly separately to the time to finish the display. The client has all 3 timestamps and we could enhance our time display to inform the user.

regarding server-side execution latency. This information could be stored server-side, and retrieved by a statement. Look into the implementation of SHOW SYNTAX and SHOW TRANSACTION STATUS (the "observer statements"). You could add another one which observes the execution time of the last executed statement before it. Then you could modify the shell to issue that new observer statement immediately after a statement execution completes. Then we'd get exactly what Alex is requesting, and you wouldn't even need to report the client-server roundtrip latency at the beginning of the session.

@arulajmani
Copy link
Collaborator

I like the idea of adding a new observer statement SHOW LAST QUERY STATS that can be issued by an interactive SQL shell to get a breakdown of times that the server already maintains for the last query.

This would allow us to display statistics such as the service time, execution time, planning time etc. (everything that we currently save in the StatsCollector). In the future, this can be expanded upon as/when we change how we bucket times on the server side.

What does this mean in the context of this issue?
In addition to timing every query on the client like the CLI does today, we will also issue the SHOW LAST QUERY STATS command underneath the hood. This will allow us to display the amount of time the server took to service the query in addition to the total time that we show today.
This also leaves scope to display fine grain timings in the future if the use-case arises (possibly behind some sort of settings flag).

All that being said, this will add a new user facing statement and there are considerations with that. I would love to get some additional thoughts on this from folks/if this is worth pursuing.

cc @awoods187 @solongordon @knz

arulajmani added a commit to arulajmani/cockroach that referenced this issue Jun 23, 2020
Adds a new observer statement `SHOW LAST QUERY STATISTICS` which
returns service, latency, planning, and parsing timigns for the last
executed statement. By exposing these timings this PR builds a bridge
towards providing more in-depth metrics to improve timings displayed by
the CLI.

Note that this new statement should probably not be exposed to users,
and should only be used for internal use (and therefore I'm leaving
the release note blank here). This is because the CLI issues some
"under the hood" queries (eg. retriving the current db so that it can
be displayed). If a user would use this statement without insight into
these "under the hood" queries it could prove to be confusing.

Informs cockroachdb#49450

Release note: none
arulajmani added a commit to arulajmani/cockroach that referenced this issue Jun 29, 2020
Adds a new observer statement `SHOW LAST QUERY STATISTICS` which
returns service, latency, planning, and parsing timigns for the last
executed statement. By exposing these timings this PR builds a bridge
towards providing more in-depth metrics to improve timings displayed by
the CLI.

Note that this new statement should probably not be exposed to users,
and should only be used for internal use (and therefore I'm leaving
the release note blank here). This is because the CLI issues some
"under the hood" queries (eg. retriving the current db so that it can
be displayed). If a user would use this statement without insight into
these "under the hood" queries it could prove to be confusing.

Informs cockroachdb#49450

Release note: none
arulajmani added a commit to arulajmani/cockroach that referenced this issue Jul 7, 2020
Adds a new observer statement `SHOW LAST QUERY STATISTICS` which
returns service, latency, planning, and parsing timigns for the last
executed statement. By exposing these timings this PR builds a bridge
towards providing more in-depth metrics to improve timings displayed by
the CLI.

Note that this new statement should probably not be exposed to users,
and should only be used for internal use (and therefore I'm leaving
the release note blank here). This is because the CLI issues some
"under the hood" queries (eg. retriving the current db so that it can
be displayed). If a user would use this statement without insight into
these "under the hood" queries it could prove to be confusing.

Informs cockroachdb#49450

Release note: none
craig bot pushed a commit that referenced this issue Jul 7, 2020
50575: sql: add a new observer statement to show last query statistics r=solongordon a=arulajmani

Adds a new observer statement `SHOW LAST QUERY STATISTICS` which
returns service, latency, planning, and parsing timigns for the last
executed statement. By exposing these timings this PR builds a bridge
towards providing more in-depth metrics to improve timings displayed by
the CLI.

Note that this new statement should probably not be exposed to users,
and should only be used for internal use (and therefore I'm leaving
the release note blank here). This is because the CLI issues some
"under the hood" queries (eg. retriving the current db so that it can
be displayed). If a user would use this statement without insight into
these "under the hood" queries it could prove to be confusing.

Informs #49450

Release note: none

Co-authored-by: arulajmani <[email protected]>
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 3, 2020
Previously, when a user executed multiple statements on a single line
in the CLI, we would send the string as-is to the server. This would
skew statement execution timings by attributing all of the server
execution time to the first statement.

This patch fixes this issue by parsing the supplied string on the
client side and sending invididual statements one at a time. This is
in line with Postgres behavior.

Additionally, this patch also changes the way we calculate the time
displayed by the CLI. Previously, we would show the time it took
between a client sending a request and recieving a result. This would
include the client-server latency as well. Now instead of showing the
total time, we switch to showing just the server execution time. This
is done by issuing the `SHOW LAST QUERY STATISTICS` query underneath
the hood.

Fixes cockroachdb#48180
Informs cockroachdb#49450

Release note (cli change): The way the CLI now shows the server
execution timing instead of the total time it took to recieve results.
This means factors such as rendering time and client server latency
are no longer accounted for in the timings that the CLI prints for a
query execution. Additionally, if multiple statements are supplied on
a single line, the timings displayed now are a more accurate
representation than before. Earlier the time displayed for the first
statement included the processing time for all the statements supplied
in that line. This is no longer the case as individual statement
timings now represent only their execution times. This brings us in line
with the postgres CLI.
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 5, 2020
Previously, when a user executed multiple statements on a single line
in the CLI, we would send the string as-is to the server. This would
skew statement execution timings by attributing all of the server
execution time to the first statement.

This patch fixes this issue by parsing the supplied string on the
client side and sending invididual statements one at a time. This is
in line with Postgres behavior.

Additionally, this patch also changes the way we calculate the time
displayed by the CLI. Previously, we would show the time it took
between a client sending a request and recieving a result. This would
include the client-server latency as well. Now instead of showing the
total time, we switch to showing just the server execution time. This
is done by issuing the `SHOW LAST QUERY STATISTICS` query underneath
the hood.

Fixes cockroachdb#48180
Informs cockroachdb#49450

Release note (cli change): The way the CLI now shows the server
execution timing instead of the total time it took to recieve results.
This means factors such as rendering time and client server latency
are no longer accounted for in the timings that the CLI prints for a
query execution. Additionally, if multiple statements are supplied on
a single line, the timings displayed now are a more accurate
representation than before. Earlier the time displayed for the first
statement included the processing time for all the statements supplied
in that line. This is no longer the case as individual statement
timings now represent only their execution times. This brings us in line
with the postgres CLI.
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 10, 2020
Previously, when a user executed multiple statements on a single line
in the CLI, we would send the string as-is to the server. This would
skew statement execution timings by attributing all of the server
execution time to the first statement.

This patch fixes this issue by parsing the supplied string on the
client side and sending invididual statements one at a time. This is
in line with Postgres behavior.

Additionally, this patch also changes the way we calculate the time
displayed by the CLI. Previously, we would show the time it took
between a client sending a request and recieving a result. This would
include the client-server latency as well. Now instead of showing the
total time, we switch to showing just the server execution time. This
is done by issuing the `SHOW LAST QUERY STATISTICS` query underneath
the hood.

Fixes cockroachdb#48180
Informs cockroachdb#49450

Release note (cli change): The way the CLI now shows the server
execution timing instead of the total time it took to recieve results.
This means factors such as rendering time and client server latency
are no longer accounted for in the timings that the CLI prints for a
query execution. Additionally, if multiple statements are supplied on
a single line, the timings displayed now are a more accurate
representation than before. Earlier the time displayed for the first
statement included the processing time for all the statements supplied
in that line. This is no longer the case as individual statement
timings now represent only their execution times. This brings us in line
with the postgres CLI.
arulajmani added a commit to arulajmani/cockroach that referenced this issue Sep 3, 2020
Previously, the CLI displayed timings by timing a query on the client.
This would include both the client-server latency and the query
execution time, without bucketing which is what. This patch changes
this by switching the way we calculate timings in the CLI. Instead of
timing on the client, we now send a `SHOW LAST QUERY STATISTICS` query
after a statement is executed. This allows us to display both the
network latency and the network latency as separate values.

Additionally, this patch no longer displays timings for queries that
include multiple statements on one line. This is to mitigate cockroachdb#48180,
where timings for executing all the statements are incorrectly
attributed to the first statement. Actually fixing the issue is beyond
the scope of this PR.

Fixes cockroachdb#49450

Release note (cli change): The CLI no longer prints a blanket `Time`
for queries. Instead, if `show_times` is turned on and the server
version is >=20.2, the CLI prints two separate times -- the server
execution time and the network latency.

Release justification: low risk, high benefit change to existing
functionality.
arulajmani added a commit to arulajmani/cockroach that referenced this issue Sep 3, 2020
Previously, the CLI displayed timings by timing a query on the client.
This would include both the client-server latency and the query
execution time, without bucketing which is what. This patch changes
this by switching the way we calculate timings in the CLI. Instead of
timing on the client, we now send a `SHOW LAST QUERY STATISTICS` query
after a statement is executed. This allows us to display both the
network latency and the network latency as separate values.

Additionally, this patch no longer displays timings for queries that
include multiple statements on one line. This is to mitigate cockroachdb#48180,
where timings for executing all the statements are incorrectly
attributed to the first statement. Actually fixing the issue is beyond
the scope of this PR.

Fixes cockroachdb#49450

Release note (cli change): The CLI no longer prints a blanket `Time`
for queries. Instead, if `show_times` is turned on and the server
version is >=20.2, the CLI prints two separate times -- the server
execution time and the network latency.

Release justification: low risk, high benefit change to existing
functionality.
craig bot pushed a commit that referenced this issue Sep 3, 2020
52233: cli: improve interactive sql shell timings r=knz a=arulajmani

Previously, the CLI displayed timings by timing a query on the client.
This would include both the client-server latency and the query
execution time, without bucketing which is what. This patch changes
this by switching the way we calculate timings in the CLI. Instead of
timing on the client, we now send a `SHOW LAST QUERY STATISTICS` query
after a statement is executed. This allows us to display both the
network latency and the network latency as separate values.

Additionally, this patch no longer displays timings for queries that
include multiple statements on one line. This is to mitigate #48180,
where timings for executing all the statements are incorrectly
attributed to the first statement. Actually fixing the issue is beyond
the scope of this PR.

Fixes #49450

Release note (cli change): The CLI no longer prints a blanket `Time`
for queries. Instead, if `show_times` is turned on and the server
version is >=20.2, the CLI prints two separate times -- the server
execution time and the network latency.

Release justification: low risk, high benefit change to existing
functionality.

Co-authored-by: arulajmani <[email protected]>
@craig craig bot closed this as completed in 95d5819 Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-ui Why is my query slow? C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants