-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: new heuristic-based completion engine #87606
Conversation
@rickystewart when you get a chance (NOT URGENT) could you help with this? I've defined a new generator rule in pkg/sql/scanner that generates Because it's generated, I'm not adding it to the However every time I run Do you know how I can prevent |
Yes -- add a comment to |
thank you! |
8e94931
to
9c6953e
Compare
@rickystewart hi, for when you have time (NOT URGENT), this PR could use some help: There is a new generated file in sql/scanner (token_names_test.go). How can I add the generation of this file to the dependencies for the linter? |
Oh nvm, found it -- it's because I forgot to update the Makefile. |
7125819
to
9a4f804
Compare
9a4f804
to
b00ea07
Compare
88009: sql: use schema desc ID for OIDs in pg_catalog r=rafiss a=knz Needed for #87606. Prior to this patch, a synthetic OID was used for "namespace" (schema) references in `pg_catalog`. This was making it difficult/impossible to retrieve schema comments using a non-`root` SQL query, e.g. via ```sql SELECT nspname AS schema, coalesce(pc.comment, sc.comment) as description FROM pg_catalog.pg_namespace t LEFT OUTER JOIN system.comments sc ON t.oid = sc.object_id AND sc.type = 4 LEFT OUTER JOIN crdb_internal.predefined_comments pc ON t.oid = pc.object_id AND pc.type = 4 ``` This patch fixes it. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
a049718
to
110fc48
Compare
05b6fc2
to
2c7954d
Compare
86457: cli: replace libedit with bubbline r=ZhouXing19 a=knz First commit from #88574. Benefits from (but is not dependent on) #87606 server-side. Fixes #21826 Fixes #71207 Fixes #71209 Fixes #57885 NB: this will benefit from upstream library releases based off the still-unmerged PRs listed in knz/bubbline#2. Release justification: n/a will not merge before stability ends Release note (cli change): The engine used as line editor in the interactive shell (`cockroach sql`, `demo`) has been updated. It includes numerous bug fixes and new features. The previous engine can still be accessed via the env var COCKROACH_SQL_FORCE_LIBEDIT. This support will be removed in a later version. 92335: kvserver,logstore: introduce log StateLoader r=tbg a=pavelkalinnikov Previously the `StateLoader` accessed both log and state machine state. This commit moves most of the log-specific accesstors to the `logstore` package. Part of #91979 Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>
88219: {bazci,dev}: send build events to beaver hub r=rickystewart a=healthy-pod This code change lets `bazci` and `dev` send build events to beaver hub so we can start collecting data about builds in CI and locally. Release note: None Epic: [CRDB-8350](https://cockroachlabs.atlassian.net/browse/CRDB-8350) 92263: sql,cli: add redacted sql stmts to debug zip r=xinhaoz a=xinhaoz ## Commit 1 This commit adds the builtin, `crdb_internal.anonymize_sql_constants` which takes in a sql string and returns it with constants redacted. This will be used to redact columns that are sql stmts in the redacted debug zip. Release note: None ## Commit 2 Closes #88823 This commit adds the following fields to the redacted debug zip: crdb_internal.create_statements: - create_statement - - create_nofks - alter_statements (each elem is redacted) crdb_internal.create_function_statements: - create_statement crdb_internal.{node,cluster}_distsql_flows: - stmt crdb_internal.{cluster,node}_sessions: - last_active - active_queries crdb_internal.{cluster,node}_queries: - query Release note (cli change): The following fields have been redacted and added to the redacted debug zip: crdb_internal.create_statements: - create_statement - create_nofks - alter_statements (each elem is redacted) crdb_internal.create_function_statements: - create_statement - crdb_internal.{node,cluster}_distsql_flows: - stmt crdb_internal.{cluster,node}_sessions: - last_active - active_queries crdb_internal.{cluster,node}_queries: - query ----------- Running ycsb, tpcc and movr default workloads for 15 minutes and requesting the debug zip on a fresh node in master vs with new changes: master <img width="927" alt="Pasted Graphic" src="https://user-images.githubusercontent.com/20136951/203359099-6a035d1c-00fe-4bbb-92c2-d1c2b2a3b706.png"> branch <img width="978" alt="Pasted Graphic 2" src="https://user-images.githubusercontent.com/20136951/203359158-a8b02ccb-2d82-40b4-a33d-9faf6ddaab70.png"> 92627: leaktest, stop: don't recover from panics r=andreimatei a=andreimatei See individual commits. Epic: None 92793: roachtest: update activerecord blocklist and ignore list r=ZhouXing19,rafiss a=andyyang890 This patch updates the blocklist and ignore list for the activerecord roachtest. This patch also eliminates the per-version lists since each release branch uses their own list now. Fixes #84955 Release note: None 92856: log: fix flakiness in TestStatusLogRedaction r=dhartunian a=abarganier TestStatusLogRedaction was continuously failing when run with `--race`. The test searches for a specific log line in the results from two separate RPCs - `LogFile()` and `Log()`. The test consistently failed when checking the results from the `Log()` RPC because the endpoint has a default max number of log entries set to 1000. Frequently, the log entry that we're searching for has a line index in the range of [1500, 2000], so the test was failing to find the log entry since it was being filtered out due to the default max number of entries. This patch simply sets the max to a high enough value where this should no longer happen. With multiple runs I never saw the total # of log lines exceed 2,500, so we set the max to 5,000 to be safe. Release note: None Fixes: #92789 92929: rttanalysis: skip test which doesn't work due to tracing limitations r=ajwerner a=ajwerner This test ends up not seeing all the round trips because the trace is way too large. Given that, we get flakes sometimes. See the referenced issue (#88885). Fixes #92770 Release note: None 92934: clisqlshell: fix a panic on tab key r=rail a=knz Fixes #92935. This patch ensures that there's no visible panic if the user presses "tab" and there's no completion available. NB: no unit tests here - the test coverage will be delivered by #87606. 92942: kvserver: remove the assertion on not exceeding MaxSpanRequestKeys r=yuzefovich a=yuzefovich This commit fixes the bug in a recently merged commit where we allowed to exceed `MaxSpanRequestKeys` in some cases - we forgot to update the remaining limit to be "exhausted". Furthermore, this commit removes the assertion altogether. Epic: None Release note: None 92953: ui: fix link for index from insights r=maryliag a=maryliag Previously, the link to index details on the drop index insights was using the URL format used by DB Console only. This commit updates to use the correct format when loading from CC Console. Fix #92944 Release note (bug fix): Fix link to index details on drop index insights on CC Console. Co-authored-by: healthy-pod <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Alex Barganier <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: maryliag <[email protected]>
2c7954d
to
b44d969
Compare
This is now rebased, with more tests added, ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! I just looked at the second commit (the one that updates the optimizer)
Reviewed 27 of 27 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @ZhouXing19)
pkg/sql/opt/exec/factory.opt
line 730 at r2 (raw file):
define ShowCompletions { Command *tree.ShowCompletions }
I think this needs to go to the end of this file to avoid causing problems with gists (see the comment on line 39 of plan_gist_factory.go)
pkg/sql/opt/exec/execbuilder/statement.go
line 321 at r2 (raw file):
return execPlan{}, err } // ControlSchedules returns no columns.
nit: update comment
pkg/sql/opt/exec/explain/testdata/gists
line 1012 at r2 (raw file):
# cancelQueriesOp gist-explain-roundtrip
Can you please add a test in this file for SHOW COMPLETIONS and make sure that none of these other tests change? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so COOL! Thanks for adding such comprehensive tests as well.
Most of my comments are just for nits.
Reviewed 8 of 8 files at r1, 27 of 27 files at r2, 2 of 2 files at r3, 2 of 2 files at r4, 23 of 23 files at r5, 8 of 8 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)
pkg/sql/conn_executor_exec.go
line 2007 at r5 (raw file):
) error { res.SetColumns(ctx, colinfo.ShowCompletionsColumns) log.Warningf(ctx, "COMPLETION GENERATOR FOR: %+v", *n)
I'm a bit concerned about the usage of the internal executor here (see #91004 if you're curious). Could you add a comment:
TODO(janexing): better bind the internal executor with the txn.
Thanks!
pkg/sql/compengine/sketch.go
line 54 at r5 (raw file):
var buf strings.Builder if offset < int(tokens[0].Start) { buf.WriteByte('_')
nit: change it to buf.WriteByte(byte(MarkCursorInSpace))
pkg/sql/compengine/sketch.go
line 76 at r5 (raw file):
// Requested offset at or beyond EOF. if cursorPosition == 0 && offset >= int(tokens[len(tokens)-1].Start) {
nit: iiuc, the last token is always the EOF, so the start and the end indexes are the same. It may be easier to understand if we have >= nt(tokens[len(tokens)-1].End)
.
pkg/sql/compengine/sketch.go
line 82 at r5 (raw file):
// outside of the string, so that indexing using them is always // possible. offset = int(tokens[len(tokens)-1].Start)
ditto
pkg/sql/compengine/sketch.go
line 144 at r5 (raw file):
pos, len(sketch), sketch) } if sketch[pos] != '_' && sketch[pos] != '\'' {
nit: change it to if sketch[pos] != MarkCursorInSpace && sketch[pos] != MarkCursorInToken
pkg/sql/comprules/rules.go
line 156 at r5 (raw file):
c.Trace("completing for %q (%d,%d)", prefix, start, end) // TODO(knz): use the comment system instead of crdb_internal.
Just out of curiosity -- could you say more about the "comment system"?
pkg/sql/comprules/rules.go
line 174 at r5 (raw file):
// A database name can only occur after a keyword or a comma (,). var compDbRe = regexp.MustCompile(`i(i'|_)|,(['_]|i')`)
nit: For this regex, seems that sketches ,'
, ,_
and ,i
also match. Is this expected? Should we ask for an identifier token at the beginning?
pkg/sql/compengine/sketch_test.go
line 54 at r5 (raw file):
fmt.Fprintln(tw, "--") c := completions{tokens: tokens, sketch: sketch, queryTokIdx: pos} for i := -2; i <= 2; i++ {
nit: could you add a comment for this? IIUC we're printing out the token corresponding to the current cursor, and also its surrounding tokens.
pkg/sql/completions.go
line 41 at r5 (raw file):
txn := params.p.Txn() queryIterFn := func(ctx context.Context, opName string, stmt string, args ...interface{}) (sqlutil.InternalRows, error) { return ie.QueryIteratorEx(ctx, opName, txn,
We can just use params.p.QueryIteratorEx()
here.
initInternalExecutor()
is expected to be used only in planner.WithInternalExecutor()
or query/exec functions such as planner.ExecEx()
.
pkg/sql/completions_test.go
line 1 at r3 (raw file):
// Copyright 2022 The Cockroach Authors.
nit: We can remove this file.
Release note: None
Before, the planning was short cut using a delegate node. We want the show completion code to become more intelligent; for this, we need proper planning. This commit implements that. This also makes SHOW COMPLETIONS return 5 columns instead of just 1. We will use this in subsequent commits. Release note: None
Release note: None
We will want to run it without a planNode / planner context. Also, this encapsulation makes it possible to share some code with the observer invocation in connExecutor. Release note: None
This commit introduces a new heuristic-based completion engine using sketches of the input token sequence. It works roughly as follows: 1. first the input is scanned into a token array. 2. then the token array is translated to a *sketch*: a string with the same length as the token array, where each character corresponds to an input token and an extra character indicating where the completion was requested. 3. then the completion rules are applied in sequence. Each rule inspects the sketch (and/or the token sequence) and decides whether to do nothing (skip to next rule), or to run some SQL which will return a batch of completion candidates. Each method operates exclusively on the token sequence, and does not require the overall SQL syntax to be valid or complete. Also, the completion engine executes in a streaming fashion, so that there is no need to accumulate all the completion candidates in RAM before returning them to the client. This prevents excessive memory usage server-side, and offers the client an option to cancel the search once enough suggestions have been received. Code organization: - new package `sql/compengine`: the core completion engine, also where sketches are computed. Suggested reading: `api.go` in the new package. - new package `sql/comprules`: the actual completion methods/heuristics, where sketches are mapped into SQL queries that provide the completion candidates. Suggested reading: the test cases under `testdata`. Some more words about sketches: For example, `SHOW COMPLETIONS AT OFFSET 10 FOR 'select myc from sc.mytable;'` produces the sketch `"ii'ii.i;"` where `i` indicates an identifier-like token and `'` indicates the cursor position. The purpose of the sketch is to simplify the pattern matching performed by the completion heuristics, and enables the application of regular expressions on the token sequence. For example, a heuristic to complete schema-qualified relations in the current database, or a database-qualified relation, would use the regexp `[^.;]i\.(['_]|i')`: an identifier not following a period or a semicolon, followed by a period, followed by the completion cursor (with the cursor either directly after the period or on an identifier after the period). Release note: None
Release note (cli change): The interactive SQL shell now supports a rudimentary & experimental form of tab completion to input the name of SQL objects and functions.
b44d969
to
28317c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rytaft and @ZhouXing19)
pkg/sql/conn_executor_exec.go
line 2007 at r5 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
I'm a bit concerned about the usage of the internal executor here (see #91004 if you're curious). Could you add a comment:
TODO(janexing): better bind the internal executor with the txn.
Thanks!
Done.
pkg/sql/compengine/sketch.go
line 54 at r5 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
nit: change it to
buf.WriteByte(byte(MarkCursorInSpace))
Done.
pkg/sql/compengine/sketch.go
line 76 at r5 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
nit: iiuc, the last token is always the EOF, so the start and the end indexes are the same. It may be easier to understand if we have
>= nt(tokens[len(tokens)-1].End)
.
Done.
pkg/sql/compengine/sketch.go
line 82 at r5 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
ditto
Done.
pkg/sql/compengine/sketch.go
line 144 at r5 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
nit: change it to
if sketch[pos] != MarkCursorInSpace && sketch[pos] != MarkCursorInToken
Done.
pkg/sql/comprules/rules.go
line 156 at r5 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
Just out of curiosity -- could you say more about the "comment system"?
Things from pg_catalog. Explained in comment.
pkg/sql/comprules/rules.go
line 174 at r5 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
nit: For this regex, seems that sketches
,'
,,_
and,i
also match. Is this expected? Should we ask for an identifier token at the beginning?
matching on ,'
doesn't seem right, good catch. Modified.
,_
is expected, because we have select * from db1.t1, db2.t2
, we want to see databases listed after the comma.
pkg/sql/opt/exec/factory.opt
line 730 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think this needs to go to the end of this file to avoid causing problems with gists (see the comment on line 39 of plan_gist_factory.go)
Done.
pkg/sql/opt/exec/execbuilder/statement.go
line 321 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: update comment
Good catch.
pkg/sql/opt/exec/explain/testdata/gists
line 1012 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Can you please add a test in this file for SHOW COMPLETIONS and make sure that none of these other tests change? Thank you!
Good idea. Done.
pkg/sql/compengine/sketch_test.go
line 54 at r5 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
nit: could you add a comment for this? IIUC we're printing out the token corresponding to the current cursor, and also its surrounding tokens.
Done.
pkg/sql/completions.go
line 41 at r5 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
We can just use
params.p.QueryIteratorEx()
here.
initInternalExecutor()
is expected to be used only inplanner.WithInternalExecutor()
or query/exec functions such asplanner.ExecEx()
.
Done.
TFYRs! bors r=rytaft,ZhouXing19 |
Build failed (retrying...): |
Build succeeded: |
This PR extends the server-side completion logic available under SHOW COMPLETIONS.
Epic: CRDB-22182
Fixes #37355.
Intended for use together with #86457.
The statement now returns 5 columns: completion, category, description, start, end.
This change is backward-compatible with the CLI code in previous versions, which was not checking the number of columns.
It works roughly as follows:
first the input is scanned into a token array.
then the token array is translated to a sketch: a string with the
same length as the token array, where each character corresponds to an
input token and an extra character indicating where the completion was
requested.
then the completion rules are applied in sequence. Each rule
inspects the sketch (and/or the token sequence) and decides whether
to do nothing (skip to next rule), or to run some SQL which will
return a batch of completion candidates.
Each method operates exclusively on the token sequence, and does
not require the overall SQL syntax to be valid or complete.
Also, the completion engine executes in a streaming fashion, so that
there is no need to accumulate all the completion candidates in RAM
before returning them to the client. This prevents excessive
memory usage server-side, and offers the client an option to cancel
the search once enough suggestions have been received.
Code organization:
new package
sql/compengine
: the core completion engine, alsowhere sketches are computed.
Suggested reading:
api.go
in the new package.new package
sql/comprules
: the actual completionmethods/heuristics, where sketches are mapped into SQL queries
that provide the completion candidates.
Suggested reading: the test cases under
testdata
.Some more words about sketches:
For example,
SHOW COMPLETIONS AT OFFSET 10 FOR 'select myc from sc.mytable;'
produces the sketch"ii'ii.i;"
wherei
indicates anidentifier-like token and
'
indicates the cursor position.The purpose of the sketch is to simplify the pattern matching
performed by the completion heuristics, and enables the application of
regular expressions on the token sequence.
For example, a heuristic to complete schema-qualified relations in
the current database, or a database-qualified relation, would use
the regexp
[^.;]i\.(['_]|i')
: an identifier not following aperiod or a semicolon, followed by a period, followed by the
completion cursor (with the cursor either directly after the period
or on an identifier after the period).
Supersedes #45186.