sql: new heuristic-based completion engine #87606

knz · 2022-09-08T14:23:45Z

This PR extends the server-side completion logic available under SHOW COMPLETIONS.

Epic: CRDB-22182
Fixes #37355.
Intended for use together with #86457.

The statement now returns 5 columns: completion, category, description, start, end.
This change is backward-compatible with the CLI code in previous versions, which was not checking the number of columns.

It works roughly as follows:

first the input is scanned into a token array.
then the token array is translated to a sketch: a string with the
same length as the token array, where each character corresponds to an
input token and an extra character indicating where the completion was
requested.
then the completion rules are applied in sequence. Each rule
inspects the sketch (and/or the token sequence) and decides whether
to do nothing (skip to next rule), or to run some SQL which will
return a batch of completion candidates.

Each method operates exclusively on the token sequence, and does
not require the overall SQL syntax to be valid or complete.

Also, the completion engine executes in a streaming fashion, so that
there is no need to accumulate all the completion candidates in RAM
before returning them to the client. This prevents excessive
memory usage server-side, and offers the client an option to cancel
the search once enough suggestions have been received.

Code organization:

new package sql/compengine: the core completion engine, also
where sketches are computed.

Suggested reading: api.go in the new package.
new package sql/comprules: the actual completion
methods/heuristics, where sketches are mapped into SQL queries
that provide the completion candidates.

Suggested reading: the test cases under testdata.

Some more words about sketches:

For example, SHOW COMPLETIONS AT OFFSET 10 FOR 'select myc from sc.mytable;' produces the sketch "ii'ii.i;" where i indicates an
identifier-like token and ' indicates the cursor position.

The purpose of the sketch is to simplify the pattern matching
performed by the completion heuristics, and enables the application of
regular expressions on the token sequence.

For example, a heuristic to complete schema-qualified relations in
the current database, or a database-qualified relation, would use
the regexp [^.;]i\.(['_]|i'): an identifier not following a
period or a semicolon, followed by a period, followed by the
completion cursor (with the cursor either directly after the period
or on an identifier after the period).

Supersedes #45186.

cockroach-teamcity · 2022-09-08T14:23:58Z

This change is

knz · 2022-09-08T14:26:07Z

@rickystewart when you get a chance (NOT URGENT) could you help with this?

I've defined a new generator rule in pkg/sql/scanner that generates token_names_test.go.

Because it's generated, I'm not adding it to the srcs in the go_test() invocation.

However every time I run dev generate bazel, bazel/gazelle auto-adds it to the srcs list. Then if I run generate go the build starts failing because the file exists both in my work tree and in the sandbox.

Do you know how I can prevent dev generate bazel from adding the generated file to srcs?

rickystewart · 2022-09-08T16:10:23Z

Do you know how I can prevent dev generate bazel from adding the generated file to srcs?

Yes -- add a comment to BUILD.bazel (probably the one in your package subdirectory) like # gazelle:exclude token_names_test.go.

knz · 2022-09-08T16:23:29Z

thank you!

knz · 2022-09-12T10:09:26Z

@rickystewart hi, for when you have time (NOT URGENT), this PR could use some help:

There is a new generated file in sql/scanner (token_names_test.go).
The linter is now complaining that it cannot build/lint the code because this generated file is missing.

How can I add the generation of this file to the dependencies for the linter?

knz · 2022-09-12T10:22:12Z

Oh nvm, found it -- it's because I forgot to update the Makefile.

88009: sql: use schema desc ID for OIDs in pg_catalog r=rafiss a=knz Needed for #87606. Prior to this patch, a synthetic OID was used for "namespace" (schema) references in `pg_catalog`. This was making it difficult/impossible to retrieve schema comments using a non-`root` SQL query, e.g. via ```sql SELECT nspname AS schema, coalesce(pc.comment, sc.comment) as description FROM pg_catalog.pg_namespace t LEFT OUTER JOIN system.comments sc ON t.oid = sc.object_id AND sc.type = 4 LEFT OUTER JOIN crdb_internal.predefined_comments pc ON t.oid = pc.object_id AND pc.type = 4 ``` This patch fixes it. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>

86457: cli: replace libedit with bubbline r=ZhouXing19 a=knz First commit from #88574. Benefits from (but is not dependent on) #87606 server-side. Fixes #21826 Fixes #71207 Fixes #71209 Fixes #57885 NB: this will benefit from upstream library releases based off the still-unmerged PRs listed in knz/bubbline#2. Release justification: n/a will not merge before stability ends Release note (cli change): The engine used as line editor in the interactive shell (`cockroach sql`, `demo`) has been updated. It includes numerous bug fixes and new features. The previous engine can still be accessed via the env var COCKROACH_SQL_FORCE_LIBEDIT. This support will be removed in a later version. 92335: kvserver,logstore: introduce log StateLoader r=tbg a=pavelkalinnikov Previously the `StateLoader` accessed both log and state machine state. This commit moves most of the log-specific accesstors to the `logstore` package. Part of #91979 Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>

88219: {bazci,dev}: send build events to beaver hub r=rickystewart a=healthy-pod This code change lets `bazci` and `dev` send build events to beaver hub so we can start collecting data about builds in CI and locally. Release note: None Epic: [CRDB-8350](https://cockroachlabs.atlassian.net/browse/CRDB-8350) 92263: sql,cli: add redacted sql stmts to debug zip r=xinhaoz a=xinhaoz ## Commit 1 This commit adds the builtin, `crdb_internal.anonymize_sql_constants` which takes in a sql string and returns it with constants redacted. This will be used to redact columns that are sql stmts in the redacted debug zip. Release note: None ## Commit 2 Closes #88823 This commit adds the following fields to the redacted debug zip: crdb_internal.create_statements: - create_statement - - create_nofks - alter_statements (each elem is redacted) crdb_internal.create_function_statements: - create_statement crdb_internal.{node,cluster}_distsql_flows: - stmt crdb_internal.{cluster,node}_sessions: - last_active - active_queries crdb_internal.{cluster,node}_queries: - query Release note (cli change): The following fields have been redacted and added to the redacted debug zip: crdb_internal.create_statements: - create_statement - create_nofks - alter_statements (each elem is redacted) crdb_internal.create_function_statements: - create_statement - crdb_internal.{node,cluster}_distsql_flows: - stmt crdb_internal.{cluster,node}_sessions: - last_active - active_queries crdb_internal.{cluster,node}_queries: - query ----------- Running ycsb, tpcc and movr default workloads for 15 minutes and requesting the debug zip on a fresh node in master vs with new changes: master <img width="927" alt="Pasted Graphic" src="https://user-images.githubusercontent.com/20136951/203359099-6a035d1c-00fe-4bbb-92c2-d1c2b2a3b706.png"> branch <img width="978" alt="Pasted Graphic 2" src="https://user-images.githubusercontent.com/20136951/203359158-a8b02ccb-2d82-40b4-a33d-9faf6ddaab70.png"> 92627: leaktest, stop: don't recover from panics r=andreimatei a=andreimatei See individual commits. Epic: None 92793: roachtest: update activerecord blocklist and ignore list r=ZhouXing19,rafiss a=andyyang890 This patch updates the blocklist and ignore list for the activerecord roachtest. This patch also eliminates the per-version lists since each release branch uses their own list now. Fixes #84955 Release note: None 92856: log: fix flakiness in TestStatusLogRedaction r=dhartunian a=abarganier TestStatusLogRedaction was continuously failing when run with `--race`. The test searches for a specific log line in the results from two separate RPCs - `LogFile()` and `Log()`. The test consistently failed when checking the results from the `Log()` RPC because the endpoint has a default max number of log entries set to 1000. Frequently, the log entry that we're searching for has a line index in the range of [1500, 2000], so the test was failing to find the log entry since it was being filtered out due to the default max number of entries. This patch simply sets the max to a high enough value where this should no longer happen. With multiple runs I never saw the total # of log lines exceed 2,500, so we set the max to 5,000 to be safe. Release note: None Fixes: #92789 92929: rttanalysis: skip test which doesn't work due to tracing limitations r=ajwerner a=ajwerner This test ends up not seeing all the round trips because the trace is way too large. Given that, we get flakes sometimes. See the referenced issue (#88885). Fixes #92770 Release note: None 92934: clisqlshell: fix a panic on tab key r=rail a=knz Fixes #92935. This patch ensures that there's no visible panic if the user presses "tab" and there's no completion available. NB: no unit tests here - the test coverage will be delivered by #87606. 92942: kvserver: remove the assertion on not exceeding MaxSpanRequestKeys r=yuzefovich a=yuzefovich This commit fixes the bug in a recently merged commit where we allowed to exceed `MaxSpanRequestKeys` in some cases - we forgot to update the remaining limit to be "exhausted". Furthermore, this commit removes the assertion altogether. Epic: None Release note: None 92953: ui: fix link for index from insights r=maryliag a=maryliag Previously, the link to index details on the drop index insights was using the URL format used by DB Console only. This commit updates to use the correct format when loading from CC Console. Fix #92944 Release note (bug fix): Fix link to index details on drop index insights on CC Console. Co-authored-by: healthy-pod <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Alex Barganier <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: maryliag <[email protected]>

knz · 2022-12-04T16:04:40Z

This is now rebased, with more tests added, ready for review.

rytaft

Very cool! I just looked at the second commit (the one that updates the optimizer)

Reviewed 27 of 27 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @ZhouXing19)

pkg/sql/opt/exec/factory.opt line 730 at r2 (raw file):

define ShowCompletions {
    Command *tree.ShowCompletions
}

I think this needs to go to the end of this file to avoid causing problems with gists (see the comment on line 39 of plan_gist_factory.go)

pkg/sql/opt/exec/execbuilder/statement.go line 321 at r2 (raw file):

		return execPlan{}, err
	}
	// ControlSchedules returns no columns.

nit: update comment

pkg/sql/opt/exec/explain/testdata/gists line 1012 at r2 (raw file):


# cancelQueriesOp
gist-explain-roundtrip

Can you please add a test in this file for SHOW COMPLETIONS and make sure that none of these other tests change? Thank you!

ZhouXing19

This is so COOL! Thanks for adding such comprehensive tests as well.
Most of my comments are just for nits.

Reviewed 8 of 8 files at r1, 27 of 27 files at r2, 2 of 2 files at r3, 2 of 2 files at r4, 23 of 23 files at r5, 8 of 8 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)

pkg/sql/conn_executor_exec.go line 2007 at r5 (raw file):

) error {
	res.SetColumns(ctx, colinfo.ShowCompletionsColumns)
	log.Warningf(ctx, "COMPLETION GENERATOR FOR: %+v", *n)

I'm a bit concerned about the usage of the internal executor here (see #91004 if you're curious). Could you add a comment:

TODO(janexing): better bind the internal executor with the txn.

Thanks!

pkg/sql/compengine/sketch.go line 54 at r5 (raw file):

	var buf strings.Builder
	if offset < int(tokens[0].Start) {
		buf.WriteByte('_')

nit: change it to buf.WriteByte(byte(MarkCursorInSpace))

pkg/sql/compengine/sketch.go line 76 at r5 (raw file):

	// Requested offset at or beyond EOF.
	if cursorPosition == 0 && offset >= int(tokens[len(tokens)-1].Start) {

nit: iiuc, the last token is always the EOF, so the start and the end indexes are the same. It may be easier to understand if we have >= nt(tokens[len(tokens)-1].End).

pkg/sql/compengine/sketch.go line 82 at r5 (raw file):

		// outside of the string, so that indexing using them is always
		// possible.
		offset = int(tokens[len(tokens)-1].Start)

ditto

pkg/sql/compengine/sketch.go line 144 at r5 (raw file):

			pos, len(sketch), sketch)
	}
	if sketch[pos] != '_' && sketch[pos] != '\'' {

nit: change it to if sketch[pos] != MarkCursorInSpace && sketch[pos] != MarkCursorInToken

pkg/sql/comprules/rules.go line 156 at r5 (raw file):

	c.Trace("completing for %q (%d,%d)", prefix, start, end)
	// TODO(knz): use the comment system instead of crdb_internal.

Just out of curiosity -- could you say more about the "comment system"?

pkg/sql/comprules/rules.go line 174 at r5 (raw file):

// A database name can only occur after a keyword or a comma (,).
var compDbRe = regexp.MustCompile(`i(i'|_)|,(['_]|i')`)

nit: For this regex, seems that sketches ,', ,_ and ,i also match. Is this expected? Should we ask for an identifier token at the beginning?

pkg/sql/compengine/sketch_test.go line 54 at r5 (raw file):

			fmt.Fprintln(tw, "--")
			c := completions{tokens: tokens, sketch: sketch, queryTokIdx: pos}
			for i := -2; i <= 2; i++ {

nit: could you add a comment for this? IIUC we're printing out the token corresponding to the current cursor, and also its surrounding tokens.

pkg/sql/completions.go line 41 at r5 (raw file):

	txn := params.p.Txn()
	queryIterFn := func(ctx context.Context, opName string, stmt string, args ...interface{}) (sqlutil.InternalRows, error) {
		return ie.QueryIteratorEx(ctx, opName, txn,

We can just use params.p.QueryIteratorEx() here.
initInternalExecutor() is expected to be used only in planner.WithInternalExecutor() or query/exec functions such as planner.ExecEx().

pkg/sql/completions_test.go line 1 at r3 (raw file):

// Copyright 2022 The Cockroach Authors.

nit: We can remove this file.

Release note: None

Before, the planning was short cut using a delegate node. We want the show completion code to become more intelligent; for this, we need proper planning. This commit implements that. This also makes SHOW COMPLETIONS return 5 columns instead of just 1. We will use this in subsequent commits. Release note: None

Release note: None

We will want to run it without a planNode / planner context. Also, this encapsulation makes it possible to share some code with the observer invocation in connExecutor. Release note: None

This commit introduces a new heuristic-based completion engine using sketches of the input token sequence. It works roughly as follows: 1. first the input is scanned into a token array. 2. then the token array is translated to a *sketch*: a string with the same length as the token array, where each character corresponds to an input token and an extra character indicating where the completion was requested. 3. then the completion rules are applied in sequence. Each rule inspects the sketch (and/or the token sequence) and decides whether to do nothing (skip to next rule), or to run some SQL which will return a batch of completion candidates. Each method operates exclusively on the token sequence, and does not require the overall SQL syntax to be valid or complete. Also, the completion engine executes in a streaming fashion, so that there is no need to accumulate all the completion candidates in RAM before returning them to the client. This prevents excessive memory usage server-side, and offers the client an option to cancel the search once enough suggestions have been received. Code organization: - new package `sql/compengine`: the core completion engine, also where sketches are computed. Suggested reading: `api.go` in the new package. - new package `sql/comprules`: the actual completion methods/heuristics, where sketches are mapped into SQL queries that provide the completion candidates. Suggested reading: the test cases under `testdata`. Some more words about sketches: For example, `SHOW COMPLETIONS AT OFFSET 10 FOR 'select myc from sc.mytable;'` produces the sketch `"ii'ii.i;"` where `i` indicates an identifier-like token and `'` indicates the cursor position. The purpose of the sketch is to simplify the pattern matching performed by the completion heuristics, and enables the application of regular expressions on the token sequence. For example, a heuristic to complete schema-qualified relations in the current database, or a database-qualified relation, would use the regexp `[^.;]i\.(['_]|i')`: an identifier not following a period or a semicolon, followed by a period, followed by the completion cursor (with the cursor either directly after the period or on an identifier after the period). Release note: None

Release note (cli change): The interactive SQL shell now supports a rudimentary & experimental form of tab completion to input the name of SQL objects and functions.

knz

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rytaft and @ZhouXing19)

pkg/sql/conn_executor_exec.go line 2007 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

I'm a bit concerned about the usage of the internal executor here (see #91004 if you're curious). Could you add a comment:
TODO(janexing): better bind the internal executor with the txn.
Thanks!

Done.

pkg/sql/compengine/sketch.go line 54 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: change it to buf.WriteByte(byte(MarkCursorInSpace))

Done.

pkg/sql/compengine/sketch.go line 76 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: iiuc, the last token is always the EOF, so the start and the end indexes are the same. It may be easier to understand if we have >= nt(tokens[len(tokens)-1].End).

Done.

pkg/sql/compengine/sketch.go line 82 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

ditto

Done.

pkg/sql/compengine/sketch.go line 144 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: change it to if sketch[pos] != MarkCursorInSpace && sketch[pos] != MarkCursorInToken

Done.

pkg/sql/comprules/rules.go line 156 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

Just out of curiosity -- could you say more about the "comment system"?

Things from pg_catalog. Explained in comment.

pkg/sql/comprules/rules.go line 174 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: For this regex, seems that sketches ,', ,_ and ,i also match. Is this expected? Should we ask for an identifier token at the beginning?

matching on ,' doesn't seem right, good catch. Modified.

,_ is expected, because we have select * from db1.t1, db2.t2, we want to see databases listed after the comma.

pkg/sql/opt/exec/factory.opt line 730 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think this needs to go to the end of this file to avoid causing problems with gists (see the comment on line 39 of plan_gist_factory.go)

Done.

pkg/sql/opt/exec/execbuilder/statement.go line 321 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: update comment

Good catch.

pkg/sql/opt/exec/explain/testdata/gists line 1012 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Can you please add a test in this file for SHOW COMPLETIONS and make sure that none of these other tests change? Thank you!

Good idea. Done.

pkg/sql/compengine/sketch_test.go line 54 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: could you add a comment for this? IIUC we're printing out the token corresponding to the current cursor, and also its surrounding tokens.

Done.

pkg/sql/completions.go line 41 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

We can just use params.p.QueryIteratorEx() here.
initInternalExecutor() is expected to be used only in planner.WithInternalExecutor() or query/exec functions such as planner.ExecEx().

Done.

knz · 2022-12-06T21:18:35Z

TFYRs!

bors r=rytaft,ZhouXing19

craig · 2022-12-06T21:41:40Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-06T22:32:01Z

Build succeeded:

Bazel Essential CI (Cockroach)

knz force-pushed the 20220908-completions branch 2 times, most recently from 8e94931 to 9c6953e Compare September 12, 2022 02:01

knz force-pushed the 20220908-completions branch 6 times, most recently from 7125819 to 9a4f804 Compare September 12, 2022 22:28

knz changed the title ~~sql/scanner: new Inspect function to try scanning harder~~ sql: new heuristic-based completion engine Sep 12, 2022

knz mentioned this pull request Sep 15, 2022

sql: use schema desc ID for OIDs in pg_catalog #88009

Merged

knz force-pushed the 20220908-completions branch from 9a4f804 to b00ea07 Compare September 16, 2022 12:48

knz force-pushed the 20220908-completions branch 4 times, most recently from a049718 to 110fc48 Compare September 18, 2022 22:01

knz mentioned this pull request Sep 18, 2022

cli: replace libedit with bubbline #86457

Merged

knz force-pushed the 20220908-completions branch 3 times, most recently from 05b6fc2 to 2c7954d Compare September 25, 2022 16:31

rafiss mentioned this pull request Oct 31, 2022

cli: review new sql shell #91023

Closed

knz mentioned this pull request Dec 2, 2022

clisqlshell: fix a panic on tab key #92934

Merged

knz force-pushed the 20220908-completions branch from 2c7954d to b44d969 Compare December 4, 2022 16:04

knz requested a review from ZhouXing19 December 4, 2022 16:04

knz marked this pull request as ready for review December 4, 2022 16:04

knz requested a review from a team December 4, 2022 16:04

knz requested a review from a team as a code owner December 4, 2022 16:04

knz requested a review from a team December 4, 2022 16:04

knz requested review from a team as code owners December 4, 2022 16:04

rytaft reviewed Dec 5, 2022

View reviewed changes

ZhouXing19 approved these changes Dec 6, 2022

View reviewed changes

knz added 6 commits December 6, 2022 20:35

sql/scanner: new Inspect function to try scanning harder

16de142

Release note: None

sql: move completion tests to logic test file

3c3629b

Release note: None

sql: encapsulate the completion logic out of planNode

8c268c0

We will want to run it without a planNode / planner context. Also, this encapsulation makes it possible to share some code with the observer invocation in connExecutor. Release note: None

cli/sql: integrate the new completion engine with the shell

28317c5

Release note (cli change): The interactive SQL shell now supports a rudimentary & experimental form of tab completion to input the name of SQL objects and functions.

knz force-pushed the 20220908-completions branch from b44d969 to 28317c5 Compare December 6, 2022 20:21

knz commented Dec 6, 2022

View reviewed changes

craig bot merged commit a0c086b into cockroachdb:master Dec 6, 2022

cockroach-teamcity mentioned this pull request Dec 7, 2022

PR #87606 - cli/sql: integrate the new completion engine with the shell cockroachdb/docs#15745

Open

exalate-issue-sync bot mentioned this pull request Jan 22, 2024

Add docs for pgcli support cockroachdb/docs#12018

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: new heuristic-based completion engine #87606

sql: new heuristic-based completion engine #87606

knz commented Sep 8, 2022 •

edited

Loading

cockroach-teamcity commented Sep 8, 2022

knz commented Sep 8, 2022

rickystewart commented Sep 8, 2022

knz commented Sep 8, 2022

knz commented Sep 12, 2022

knz commented Sep 12, 2022

knz commented Dec 4, 2022

rytaft left a comment

ZhouXing19 left a comment

knz left a comment

knz commented Dec 6, 2022

craig bot commented Dec 6, 2022

craig bot commented Dec 6, 2022

sql: new heuristic-based completion engine #87606

sql: new heuristic-based completion engine #87606

Conversation

knz commented Sep 8, 2022 • edited Loading

cockroach-teamcity commented Sep 8, 2022

knz commented Sep 8, 2022

rickystewart commented Sep 8, 2022

knz commented Sep 8, 2022

knz commented Sep 12, 2022

knz commented Sep 12, 2022

knz commented Dec 4, 2022

rytaft left a comment

Choose a reason for hiding this comment

ZhouXing19 left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz commented Dec 6, 2022

craig bot commented Dec 6, 2022

craig bot commented Dec 6, 2022

knz commented Sep 8, 2022 •

edited

Loading