-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: improve interactive sql shell timings #52233
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/cli/sql.go, line 1170 at r1 (raw file):
// concatLines. func (c *cliState) doRunStatements(nextState cliStateEnum) cliStateEnum { stmts, err := parser.Parse(c.concatLines)
Nah that won't do unfortunately.
While it is safe to lex a SQL string into tokens, because the lexing rules only very rarely change, we can't make assumptions about the grammar in the CLI shell. Every two versions of the server will support different grammars, and a valid input for one version will not be valid for the CLI shell or vice-versa.
If you are really intent on parsing, then the SHOW SYNTAX statement is the way to go (i.e. do parsing server-side).
pkg/cli/sql_util.go, line 796 at r1 (raw file):
// TODO(arul): This 2 is kind of ugly and unfortunate. execLatencyRaw, hasVal := conn.getServerValueFromColumnIndex( "last query statistics", `SHOW LAST QUERY STATISTICS`, 2 /* exec_latency */)
You also need to make this code tolerant of the absence of support for this (new-ish) statement.
The way to do this is to detect the availability of the statement during connection set-up, e.g. in checkServerMetadata()
and then have a condition here.
ca0d5b2
to
43b4b86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good for another look!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/cli/sql.go, line 1170 at r1 (raw file):
Previously, knz (kena) wrote…
Nah that won't do unfortunately.
While it is safe to lex a SQL string into tokens, because the lexing rules only very rarely change, we can't make assumptions about the grammar in the CLI shell. Every two versions of the server will support different grammars, and a valid input for one version will not be valid for the CLI shell or vice-versa.
If you are really intent on parsing, then the SHOW SYNTAX statement is the way to go (i.e. do parsing server-side).
Thats true, I forgot how much our grammar changes. I don't really need to parse anyway, so I've added a new Scan
method and used that here.
I decided against using the server side parse because checking syntax is a client side configurable setting, and I didn't want to couple timing calculation (old vs this implementation) with the check syntax setting.
pkg/cli/sql_util.go, line 796 at r1 (raw file):
Previously, knz (kena) wrote…
You also need to make this code tolerant of the absence of support for this (new-ish) statement.
The way to do this is to detect the availability of the statement during connection set-up, e.g. incheckServerMetadata()
and then have a condition here.
Added tryEnableServerExecutionTimings
inspired by how we check for SHOW SYNTAX
support. Lmk what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes.
You'll need to tweak some tests (and add some minimal testing for your new feature)
Reviewed 4 of 4 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/cli/sql.go, line 1346 at r2 (raw file):
// Try to enable server execution timings for the CLI to display if // supported by the server. c.tryEnableServerExecutionTimings()
I recommend doing this check upon every re-connection. Consider the following scenario:
- user starts sh ell
- user stops node
- user restarts node in different version (e.g an upgrade)
- user continues using shell, shell reconnects
we want the shell to adjust to the new server's capabilities. Hence my suggestion to check these things in the connect method, like we already do for the version strings.
pkg/cli/sql.go, line 1551 at r2 (raw file):
func (c *cliState) tryEnableServerExecutionTimings() { if err := c.conn.Exec("SHOW LAST QUERY STATISTICS", nil); err != nil { fmt.Fprintf(stderr, "warning: cannot show server execution timings: %v\n", err)
-
you need to check that the result has a column named
exec_latency
in second position. That is the exact assumption made by the remainder of your code below. -
also set the thing to false if there's an error.
pkg/sql/parser/parse.go, line 228 at r2 (raw file):
} // Scan takes a sql string and returns potential sql statements by performing
Given this description, the word "scan" is inappropriate.
Why not use a name that says what it does, eg SplitAtSemicolons
43b4b86
to
0d9bc27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs. Re: tests -- I'm having trouble getting make acceptance TESTS=TestDockerCLI
to work locally, not quite sure whats going on. This is making the test tweaking/addition hard, I'll look into it again tomorrow.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @miretskiy)
pkg/cli/sql.go, line 1346 at r2 (raw file):
Previously, knz (kena) wrote…
I recommend doing this check upon every re-connection. Consider the following scenario:
- user starts sh ell
- user stops node
- user restarts node in different version (e.g an upgrade)
- user continues using shell, shell reconnects
we want the shell to adjust to the new server's capabilities. Hence my suggestion to check these things in the connect method, like we already do for the version strings.
Makes sense. I guess this should've applied to tryEnableCheckSyntax
as well, but I don't think it holds anymore because SHOW SYNTAX
seems to be quite old, so all server versions should be able to recognize that observer statement. Unless I'm missing something, we can cleanup the tryEnableCheckSyntax
stuff now, right?
pkg/cli/sql.go, line 1551 at r2 (raw file):
Previously, knz (kena) wrote…
you need to check that the result has a column named
exec_latency
in second position. That is the exact assumption made by the remainder of your code below.also set the thing to false if there's an error.
Done.
pkg/sql/parser/parse.go, line 228 at r2 (raw file):
Previously, knz (kena) wrote…
Given this description, the word "scan" is inappropriate.
Why not use a name that says what it does, eg
SplitAtSemicolons
Done.
0d9bc27
to
70522aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble getting make acceptance TESTS=TestDockerCLI to work locally
This is what I use to run and troubleshoot these tests:
$ killall -9 cockroach cockroachshort; rm -rf logs; expect -d -f pkg/cli/interactive_tests/foo.tcl ./cockroach
Reviewed 3 of 3 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/cli/sql.go, line 1346 at r2 (raw file):
Unless I'm missing something, we can cleanup the tryEnableCheckSyntax stuff now, right?
yes that sounds right
pkg/cli/sql_util.go, line 179 at r3 (raw file):
} defer func() { _ = rows.Close() }() if rows.Columns()[2] != "exec_latency" {
also check that len(rows.Columns()) >= 2
to avoid surprises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We like it when a client sends statements batched together without pgwire Syncs
in between them because then the server can auto-retry all of them, not just the first one. That's a pretty useful thing; I wouldn't want to lose it. In particular, I use it a lot for testing so at the very lease give me a way to disable it. Better yet, have you considered keeping the batching and, instead of sending the statements one by one, adding the show stats
thing in between every one of them while still batching? Like, transform the batch by interspersing these stats statements.
In any case, please add a flag to disable these new statements (a new \set <foo>
flag), and include it in--debug-sql-cli
. Any magic in the client is a pain in the ass when debugging.
I don't know if the patch does this already, but can we show both the execution time and the network time? The network time is useful to see in geo-distributed clusters.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/sql/parser/parse.go, line 228 at r3 (raw file):
} // SplitAtSemicolons takes a sql string and returns potential sql statements
Please spell out that this does lexing but not parsing (right?). That's important cause we don't want to do client-side parsing (with a potentially old grammar).
Consider renaming to ScanStatements
for coherency with other lexing around this package.
pkg/sql/parser/parse.go, line 241 at r3 (raw file):
stmts = append(stmts, sql) } return
no naked returns pls
70522aa
to
8ba928c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried doing the interspersed statement approach, but I ran into some trouble with statements that return no rows. Seems like lib/pq
doesn't return a result set if other statements in the batch return rows. This means that there if we intersperse a CREATE TABLE
with SHOW LAST QUERY STATISTICS
, instead of running the noRows hook for the create table, we end up printing the result of the SHOW LAST QUERY STATISTICS
! It's easy to test this behavior on master, by doing a BEGIN; CREATE TABLE t(a int); COMMIT;
-- only the COMMIT tag is printed! Not sure if there's a workaround, but I'd be open to implementing it as I like the interspersing approach.
For now, I've added a show_server_execution_times
flag to turn of the "new timings" as per @andreimatei's recommendation. When set to false, we continue doing the batching (and printing out the old timings).
Lastly, I have an open question about Notices. Currently, we have a policy to print out notices once all the statements on a line have been printed. This doesn't fit that well in the new approach where we lex on the client side. (see the change to test_notice.tcl
for an example of the change). Is this okay? Achieving the old behavior is still possible, but it does come with adding some complexity in the code by moving the responsibility of flushing notices, which makes me apprehensive.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
pkg/cli/sql.go, line 1346 at r2 (raw file):
Previously, knz (kena) wrote…
Unless I'm missing something, we can cleanup the tryEnableCheckSyntax stuff now, right?
yes that sounds right
Okay, I've removed it while Im in this part of the code then!
pkg/cli/sql_util.go, line 179 at r3 (raw file):
Previously, knz (kena) wrote…
also check that
len(rows.Columns()) >= 2
to avoid surprises
Ended up refactoring this stuff a little bit. Now I have a function responsible for getting and parsing last query statistics, which validates the columns as well. I like this more instead of having this 2
in two places.
pkg/sql/parser/parse.go, line 228 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Please spell out that this does lexing but not parsing (right?). That's important cause we don't want to do client-side parsing (with a potentially old grammar).
Consider renaming toScanStatements
for coherency with other lexing around this package.
Spelled out the intention to have this be called from the client and why we don't want to do any parsing on the client.
pkg/sql/parser/parse.go, line 241 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
no naked returns pls
Done.
Since you mention notices, can't we just use notices to get the server timing (instead of a dedicated statement), and then we don't have to argue about the value of batching? Like a session variable used by the shell that causes a notice to be returned with each statement? |
Btw, changing the batching changes the semantics of errors, doesn't it? An error in a batch interrupts the execution of the following statements, whereas across batches it doesn't (assuming implicit txn). Maybe we don't handle this well (#44803), but hopefully we'll fix that. |
I think my change brings us more in line with the postgres CLI behavior. PG seems to do do similar client side lexing as my patch, and doesn't treat multiple statements in a single line as an implicit transaction:
|
8ba928c
to
ab4d3cb
Compare
Re: using notices to convey timings back to the client -- this seems a bit hacky to me. Right now, notices are printed out to stderr whereas the query results + timing etc. is printed out to stdout. The idea of going through all notices, figuring out if they pertain to timing, and parsing out timing from the notice error string doesn't seem the correct use of a notice. Additionally, postgres seems to do this client side lexing too, so I don't see a compelling case to not be in line with it. I do acknowledge your point that you find this batching stuff on the client useful. Maybe the solution here is to provide that as an option users can opt in to, and let the default behavior mimic postgres. To that end, I think it might make sense to rename the flag from |
Using notices here seem like exactly the right use for them to me. It's extra information conveyed by the server to the client; it seems quite appropriate to me for the client to cherry-pick what it wants out of them and present them in whatever way it sees fit. Now that we have these notices, I wish we'd use them for reading the txn status too (which I think the shell also does after every batch).
When you say "postgres" you mean "psql", right? There's a difference between the two. I don't think being line with psql is a particular consideration; we made no efforts to support its flags for example. The thing is that, ever since I found out of postgres' behavior for batches, I've really wanted that behavior from the command line. It'd save me a lot of I was gonna say that I don't need to care too much and if you have another reviewer that likes it then run with it, but I guess I do care about our shell experience, and I see doing something with notices to be better in every way. And then other clients could benefit from it too if they set that session var. If you show me that I'm a minority here, I'll shut up. |
FWIW, |
This is my concern as well. The main point of this change is to give users more accurate time information on SQL statements and to break the concern into network and execution. We want to do this because users are concerned about latency/timing. Anything that would make that worse seems a non-starter for me. I also think Andrei's point about retries is a compelling one--we definitely don't want to make the retry behavior worse (and keeping batches from being retried together strikes me as worse). |
This doesn't apply because my change only affects interactive shells.
Yup, we do this for getting both the current database and the transaction status after every batch. I was simply adding on to this pattern. We also do it every time a user enters a new line or they're inputting a multi-line statement. I'm not sure if the clarification about non-interactive shells makes this approach any better or not. I'll try and prototype the notice approach and see if it goes smoothly though. |
@awoods187 @andreimatei What would you think about shipping this behind a flag which was off by default? That way it could still be used as a sales tool but wouldn't disrupt any existing use cases. The NOTICE approach sounds promising but there's no way we're doing it for 20.2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't merge it; I don't think it's worth the code... If its for the sales engineers, you can just give them the binary. Or they can just type out SHOW LAST QUERY STATISTICS
themselves.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
FWIW sales engineers cant just type |
If it helps, all those statements can be inhibited with various shell settings. |
@arulajmani I reviewed your PR again today.
Here's the proposal: keep the new code that issues Regarding the issue #48180, here what we can do in 20.2: if there is a semicolon inside the query, then display a warning to the user to inform them that the times are misleading. (i.e. inform them of the limitation of #48180 and let them decide to split the statement across multiple lines) (I think you could even try disabling the timing report entirely if there's a semicolon in the input. With an explanatory message. This way there's no wrong information for the user to work with.) |
The reason to keep this PR is that it will improve the status quo significantly, by enabling the user to distinguish the network round-trip latency from the server-side latencies. This alone is a good improvement. |
ab4d3cb
to
5315685
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knz thanks for the suggestions, I implemented them more or less as you suggested, and I think this is in a place where the change is minimally invasive while still giving us the ability to show different buckets for query execution (network latency/server exec time), as was the original motivation behind the change.
Re: multiple statements on one line -- these continue to behave the same functionally as they did before my change, albeit no timings are displayed for them anymore.
I might need to revert some tests back, but Ill let CI find them for me.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new approach. Just a few nits and this will be good to go.
Reviewed 9 of 9 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @arulajmani)
pkg/cli/sql.go, line 366 at r4 (raw file):
display: func(_ *cliState) string { return strconv.FormatBool(sqlCtx.showTimes) }, }, `show_server_execution_times`: {
you can simplify this to show_server_times
IMHO
pkg/cli/sql.go, line 1183 at r4 (raw file):
if c.exitErr != nil { cliOutputError(stderr, c.exitErr, true /*showSeverity*/, false /*verbose*/) if c.errExit {
I think you need to preserve this code?
pkg/cli/sql_util.go, line 380 at r4 (raw file):
return 0, 0, err } defer func() { _ = rows.Close() }()
nit: if we're being pedantic, it's possible for the SQL client driver to detect a SQL error only when Close() is called.
so the idiom would be more like this:
defer func() {
closeErr := rows.Close()
err = errors.CombineErrors(err, closeErr)
}()
(CombineError suitably returns nil
if both arguments are nil
)
pkg/cli/sql_util.go, line 891 at r4 (raw file):
fmt.Fprintln(w) } else { fmt.Fprintf(w, "Note: timings for multiple statements on a single line are not supported\n")
-
this
else
is incorrect. (Also the cause of the CI failures) You only want the warning to be printed ifsqlCtx.showTimes
is true -
maybe worth printing to
stderr
, notstdout
(ditto for the other Note above). not feeling strongly about this - you can see for yourself what the UX benefit is. -
ditto, probably only worth printing the notes if
cliCtx.isInteractive
is true. I think there's no point in explaining if there's no eyes to look at this.
pkg/sql/parser/parse.go, line 239 at r4 (raw file):
} // ScanStatements takes a sql string and returns potential sql statements
Given the new goal, I think you can simplify this function to just "HasMultipleStatements" returning bool, stopping the loop at the first semicolon.
5315685
to
41f96e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! Wanna give CI one more run on this still.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
pkg/cli/sql.go, line 366 at r4 (raw file):
Previously, knz (kena) wrote…
you can simplify this to
show_server_times
IMHO
Done.
pkg/cli/sql.go, line 1183 at r4 (raw file):
Previously, knz (kena) wrote…
I think you need to preserve this code?
Didn't mean to delete, missed this when reverting back to the original state. Done.
pkg/cli/sql_util.go, line 380 at r4 (raw file):
Previously, knz (kena) wrote…
nit: if we're being pedantic, it's possible for the SQL client driver to detect a SQL error only when Close() is called.
so the idiom would be more like this:defer func() { closeErr := rows.Close() err = errors.CombineErrors(err, closeErr) }()(CombineError suitably returns
nil
if both arguments arenil
)
Makes sense, I'll change this in runQueryAndFormatResults
while I'm here as well.
pkg/cli/sql_util.go, line 891 at r4 (raw file):
Previously, knz (kena) wrote…
this
else
is incorrect. (Also the cause of the CI failures) You only want the warning to be printed ifsqlCtx.showTimes
is truemaybe worth printing to
stderr
, notstdout
(ditto for the other Note above). not feeling strongly about this - you can see for yourself what the UX benefit is.ditto, probably only worth printing the notes if
cliCtx.isInteractive
is true. I think there's no point in explaining if there's no eyes to look at this.
Makes sense, switched this logic up a bit. I don't have any strong preferences about point 2 really, but I made the switch to stderr
.
pkg/sql/parser/parse.go, line 239 at r4 (raw file):
Previously, knz (kena) wrote…
Given the new goal, I think you can simplify this function to just "HasMultipleStatements" returning bool, stopping the loop at the first semicolon.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo stderr
see below
Also assuming CI is green :)
Reviewed 2 of 8 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @arulajmani, and @knz)
pkg/cli/sql_util.go, line 877 at r5 (raw file):
// No need to print if no one's watching. if sqlCtx.isInteractive { fmt.Fprintf(os.Stderr, "Note: timings for multiple statements on a single line are not supported.\n")
stderr
not os.Stderr
ditto below
also, I'd argue it would be useful to add a reference to the github issue here (see unimplemented.MakeURL()
)
41f96e5
to
8550da4
Compare
Previously, the CLI displayed timings by timing a query on the client. This would include both the client-server latency and the query execution time, without bucketing which is what. This patch changes this by switching the way we calculate timings in the CLI. Instead of timing on the client, we now send a `SHOW LAST QUERY STATISTICS` query after a statement is executed. This allows us to display both the network latency and the network latency as separate values. Additionally, this patch no longer displays timings for queries that include multiple statements on one line. This is to mitigate cockroachdb#48180, where timings for executing all the statements are incorrectly attributed to the first statement. Actually fixing the issue is beyond the scope of this PR. Fixes cockroachdb#49450 Release note (cli change): The CLI no longer prints a blanket `Time` for queries. Instead, if `show_times` is turned on and the server version is >=20.2, the CLI prints two separate times -- the server execution time and the network latency. Release justification: low risk, high benefit change to existing functionality.
8550da4
to
95d5819
Compare
Thanks for all the reviews! bors r=knz |
Build succeeded: |
Previously, the CLI displayed timings by timing a query on the client.
This would include both the client-server latency and the query
execution time, without bucketing which is what. This patch changes
this by switching the way we calculate timings in the CLI. Instead of
timing on the client, we now send a
SHOW LAST QUERY STATISTICS
queryafter a statement is executed. This allows us to display both the
network latency and the network latency as separate values.
Additionally, this patch no longer displays timings for queries that
include multiple statements on one line. This is to mitigate #48180,
where timings for executing all the statements are incorrectly
attributed to the first statement. Actually fixing the issue is beyond
the scope of this PR.
Fixes #49450
Release note (cli change): The CLI no longer prints a blanket
Time
for queries. Instead, if
show_times
is turned on and the serverversion is >=20.2, the CLI prints two separate times -- the server
execution time and the network latency.
Release justification: low risk, high benefit change to existing
functionality.