Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: new heuristic-based completion engine #87606

Merged
merged 6 commits into from
Dec 6, 2022

Conversation

knz
Copy link
Contributor

@knz knz commented Sep 8, 2022

This PR extends the server-side completion logic available under SHOW COMPLETIONS.

Epic: CRDB-22182
Fixes #37355.
Intended for use together with #86457.

The statement now returns 5 columns: completion, category, description, start, end.
This change is backward-compatible with the CLI code in previous versions, which was not checking the number of columns.

It works roughly as follows:

  1. first the input is scanned into a token array.

  2. then the token array is translated to a sketch: a string with the
    same length as the token array, where each character corresponds to an
    input token and an extra character indicating where the completion was
    requested.

  3. then the completion rules are applied in sequence. Each rule
    inspects the sketch (and/or the token sequence) and decides whether
    to do nothing (skip to next rule), or to run some SQL which will
    return a batch of completion candidates.

    Each method operates exclusively on the token sequence, and does
    not require the overall SQL syntax to be valid or complete.

Also, the completion engine executes in a streaming fashion, so that
there is no need to accumulate all the completion candidates in RAM
before returning them to the client. This prevents excessive
memory usage server-side, and offers the client an option to cancel
the search once enough suggestions have been received.

Code organization:

  • new package sql/compengine: the core completion engine, also
    where sketches are computed.

    Suggested reading: api.go in the new package.

  • new package sql/comprules: the actual completion
    methods/heuristics, where sketches are mapped into SQL queries
    that provide the completion candidates.

    Suggested reading: the test cases under testdata.

Some more words about sketches:

For example, SHOW COMPLETIONS AT OFFSET 10 FOR 'select myc from sc.mytable;' produces the sketch "ii'ii.i;" where i indicates an
identifier-like token and ' indicates the cursor position.

The purpose of the sketch is to simplify the pattern matching
performed by the completion heuristics, and enables the application of
regular expressions on the token sequence.

For example, a heuristic to complete schema-qualified relations in
the current database, or a database-qualified relation, would use
the regexp [^.;]i\.(['_]|i'): an identifier not following a
period or a semicolon, followed by a period, followed by the
completion cursor (with the cursor either directly after the period
or on an identifier after the period).

Supersedes #45186.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Sep 8, 2022

@rickystewart when you get a chance (NOT URGENT) could you help with this?

I've defined a new generator rule in pkg/sql/scanner that generates token_names_test.go.

Because it's generated, I'm not adding it to the srcs in the go_test() invocation.

However every time I run dev generate bazel, bazel/gazelle auto-adds it to the srcs list. Then if I run generate go the build starts failing because the file exists both in my work tree and in the sandbox.

Do you know how I can prevent dev generate bazel from adding the generated file to srcs?

@rickystewart
Copy link
Collaborator

Do you know how I can prevent dev generate bazel from adding the generated file to srcs?

Yes -- add a comment to BUILD.bazel (probably the one in your package subdirectory) like # gazelle:exclude token_names_test.go.

@knz
Copy link
Contributor Author

knz commented Sep 8, 2022

thank you!

@knz knz force-pushed the 20220908-completions branch 2 times, most recently from 8e94931 to 9c6953e Compare September 12, 2022 02:01
@knz
Copy link
Contributor Author

knz commented Sep 12, 2022

@rickystewart hi, for when you have time (NOT URGENT), this PR could use some help:

There is a new generated file in sql/scanner (token_names_test.go).
The linter is now complaining that it cannot build/lint the code because this generated file is missing.

How can I add the generation of this file to the dependencies for the linter?

@knz
Copy link
Contributor Author

knz commented Sep 12, 2022

Oh nvm, found it -- it's because I forgot to update the Makefile.

@knz knz force-pushed the 20220908-completions branch 6 times, most recently from 7125819 to 9a4f804 Compare September 12, 2022 22:28
@knz knz changed the title sql/scanner: new Inspect function to try scanning harder sql: new heuristic-based completion engine Sep 12, 2022
@knz knz force-pushed the 20220908-completions branch from 9a4f804 to b00ea07 Compare September 16, 2022 12:48
craig bot pushed a commit that referenced this pull request Sep 16, 2022
88009: sql: use schema desc ID for OIDs in pg_catalog r=rafiss a=knz

Needed for #87606.

Prior to this patch, a synthetic OID was used for
"namespace" (schema) references in `pg_catalog`.

This was making it difficult/impossible to retrieve schema comments using a non-`root` SQL query, e.g. via

```sql
SELECT nspname AS schema,
       coalesce(pc.comment, sc.comment) as description
  FROM pg_catalog.pg_namespace t
LEFT OUTER JOIN system.comments sc
    ON t.oid = sc.object_id AND sc.type = 4
LEFT OUTER JOIN crdb_internal.predefined_comments pc
    ON t.oid = pc.object_id AND pc.type = 4
```

This patch fixes it.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@knz knz force-pushed the 20220908-completions branch 4 times, most recently from a049718 to 110fc48 Compare September 18, 2022 22:01
@knz knz force-pushed the 20220908-completions branch 3 times, most recently from 05b6fc2 to 2c7954d Compare September 25, 2022 16:31
craig bot pushed a commit that referenced this pull request Dec 1, 2022
86457: cli: replace libedit with bubbline r=ZhouXing19 a=knz

First commit from #88574.
Benefits from (but is not dependent on) #87606 server-side.

Fixes #21826
Fixes #71207
Fixes #71209
Fixes #57885

NB: this will benefit from upstream library releases based off the still-unmerged PRs listed in knz/bubbline#2.

Release justification: n/a will not merge before stability ends

Release note (cli change): The engine used as line editor in the
interactive shell (`cockroach sql`, `demo`) has been updated. It
includes numerous bug fixes and new features.
The previous engine can still be accessed via the env
var COCKROACH_SQL_FORCE_LIBEDIT. This support will be removed
in a later version.

92335: kvserver,logstore: introduce log StateLoader r=tbg a=pavelkalinnikov

Previously the `StateLoader` accessed both log and state machine state. This commit moves most of the log-specific accesstors to the `logstore` package.

Part of #91979
Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot pushed a commit that referenced this pull request Dec 2, 2022
88219: {bazci,dev}: send build events to beaver hub r=rickystewart a=healthy-pod

This code change lets `bazci` and `dev` send build events to beaver hub so
we can start collecting data about builds in CI and locally.

Release note: None
Epic: [CRDB-8350](https://cockroachlabs.atlassian.net/browse/CRDB-8350)

92263: sql,cli: add redacted sql stmts to debug zip r=xinhaoz a=xinhaoz

## Commit 1

This commit adds the builtin, `crdb_internal.anonymize_sql_constants`
which takes in a sql string and returns it with  constants redacted.
This will be used to redact columns that are sql stmts in the
redacted debug zip.

Release note: None

## Commit 2
Closes #88823

This commit adds the following fields to the redacted
debug zip:

crdb_internal.create_statements:
- create_statement
- - create_nofks
- alter_statements (each elem is redacted)

crdb_internal.create_function_statements:
- create_statement

crdb_internal.{node,cluster}_distsql_flows:
- stmt

crdb_internal.{cluster,node}_sessions:
- last_active
- active_queries

crdb_internal.{cluster,node}_queries:
- query

Release note (cli change):
The following fields have been redacted and added to
the redacted debug zip:
crdb_internal.create_statements:
- create_statement
- create_nofks
- alter_statements (each elem is redacted)

crdb_internal.create_function_statements:
- create_statement
- 
crdb_internal.{node,cluster}_distsql_flows:
- stmt

crdb_internal.{cluster,node}_sessions:
- last_active
- active_queries

crdb_internal.{cluster,node}_queries:
- query



-----------
Running ycsb, tpcc and movr default workloads for 15 minutes and requesting the debug zip on a fresh node in master vs with new changes:
master
<img width="927" alt="Pasted Graphic" src="https://user-images.githubusercontent.com/20136951/203359099-6a035d1c-00fe-4bbb-92c2-d1c2b2a3b706.png">

branch
<img width="978" alt="Pasted Graphic 2" src="https://user-images.githubusercontent.com/20136951/203359158-a8b02ccb-2d82-40b4-a33d-9faf6ddaab70.png">


92627: leaktest, stop: don't recover from panics r=andreimatei a=andreimatei

See individual commits.

Epic: None

92793: roachtest: update activerecord blocklist and ignore list r=ZhouXing19,rafiss a=andyyang890

This patch updates the blocklist and ignore list for the
activerecord roachtest. This patch also eliminates the
per-version lists since each release branch uses their
own list now.

Fixes #84955

Release note: None

92856: log: fix flakiness in TestStatusLogRedaction r=dhartunian a=abarganier

TestStatusLogRedaction was continuously failing when run with `--race`.

The test searches for a specific log line in the results from two separate RPCs - `LogFile()` and `Log()`. The test consistently failed when checking the results from the `Log()` RPC because the endpoint has a default max number of log entries set to 1000. Frequently, the log entry that we're searching for has a line index in the range of [1500, 2000], so the test was failing to find the log entry since it was being filtered out due to the default max number of entries.

This patch simply sets the max to a high enough value where this should no longer happen. With multiple runs I never saw the total # of log lines exceed 2,500, so we set the max to 5,000 to be safe.

Release note: None

Fixes: #92789

92929: rttanalysis: skip test which doesn't work due to tracing limitations r=ajwerner a=ajwerner

This test ends up not seeing all the round trips because the trace is way too large. Given that, we get flakes sometimes. See the referenced issue (#88885).

Fixes #92770

Release note: None

92934: clisqlshell: fix a panic on tab key r=rail a=knz

Fixes  #92935.

This patch ensures that there's no visible panic if the user presses "tab" and there's no completion available.

NB: no unit tests here - the test coverage will be delivered by #87606.

92942: kvserver: remove the assertion on not exceeding MaxSpanRequestKeys r=yuzefovich a=yuzefovich

This commit fixes the bug in a recently merged commit where we allowed to exceed `MaxSpanRequestKeys` in some cases - we forgot to update the remaining limit to be "exhausted". Furthermore, this commit removes the assertion altogether.

Epic: None

Release note: None

92953: ui: fix link for index from insights r=maryliag a=maryliag

Previously, the link to index details on the
drop index insights was using the URL format used by DB Console only. This commit updates to use the correct format when loading from CC Console.

Fix #92944

Release note (bug fix): Fix link to index details on drop index insights on CC Console.

Co-authored-by: healthy-pod <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: maryliag <[email protected]>
@knz knz force-pushed the 20220908-completions branch from 2c7954d to b44d969 Compare December 4, 2022 16:04
@knz knz requested a review from ZhouXing19 December 4, 2022 16:04
@knz knz marked this pull request as ready for review December 4, 2022 16:04
@knz knz requested a review from a team December 4, 2022 16:04
@knz knz requested a review from a team as a code owner December 4, 2022 16:04
@knz knz requested a review from a team December 4, 2022 16:04
@knz knz requested review from a team as code owners December 4, 2022 16:04
@knz
Copy link
Contributor Author

knz commented Dec 4, 2022

This is now rebased, with more tests added, ready for review.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! I just looked at the second commit (the one that updates the optimizer)

Reviewed 27 of 27 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @ZhouXing19)


pkg/sql/opt/exec/factory.opt line 730 at r2 (raw file):

define ShowCompletions {
    Command *tree.ShowCompletions
}

I think this needs to go to the end of this file to avoid causing problems with gists (see the comment on line 39 of plan_gist_factory.go)


pkg/sql/opt/exec/execbuilder/statement.go line 321 at r2 (raw file):

		return execPlan{}, err
	}
	// ControlSchedules returns no columns.

nit: update comment


pkg/sql/opt/exec/explain/testdata/gists line 1012 at r2 (raw file):


# cancelQueriesOp
gist-explain-roundtrip

Can you please add a test in this file for SHOW COMPLETIONS and make sure that none of these other tests change? Thank you!

Copy link
Collaborator

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so COOL! :lgtm_strong: Thanks for adding such comprehensive tests as well.
Most of my comments are just for nits.

Reviewed 8 of 8 files at r1, 27 of 27 files at r2, 2 of 2 files at r3, 2 of 2 files at r4, 23 of 23 files at r5, 8 of 8 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @knz)


pkg/sql/conn_executor_exec.go line 2007 at r5 (raw file):

) error {
	res.SetColumns(ctx, colinfo.ShowCompletionsColumns)
	log.Warningf(ctx, "COMPLETION GENERATOR FOR: %+v", *n)

I'm a bit concerned about the usage of the internal executor here (see #91004 if you're curious). Could you add a comment:

TODO(janexing): better bind the internal executor with the txn.

Thanks!


pkg/sql/compengine/sketch.go line 54 at r5 (raw file):

	var buf strings.Builder
	if offset < int(tokens[0].Start) {
		buf.WriteByte('_')

nit: change it to buf.WriteByte(byte(MarkCursorInSpace))


pkg/sql/compengine/sketch.go line 76 at r5 (raw file):

	// Requested offset at or beyond EOF.
	if cursorPosition == 0 && offset >= int(tokens[len(tokens)-1].Start) {

nit: iiuc, the last token is always the EOF, so the start and the end indexes are the same. It may be easier to understand if we have >= nt(tokens[len(tokens)-1].End).


pkg/sql/compengine/sketch.go line 82 at r5 (raw file):

		// outside of the string, so that indexing using them is always
		// possible.
		offset = int(tokens[len(tokens)-1].Start)

ditto


pkg/sql/compengine/sketch.go line 144 at r5 (raw file):

			pos, len(sketch), sketch)
	}
	if sketch[pos] != '_' && sketch[pos] != '\'' {

nit: change it to if sketch[pos] != MarkCursorInSpace && sketch[pos] != MarkCursorInToken


pkg/sql/comprules/rules.go line 156 at r5 (raw file):

	c.Trace("completing for %q (%d,%d)", prefix, start, end)
	// TODO(knz): use the comment system instead of crdb_internal.

Just out of curiosity -- could you say more about the "comment system"?


pkg/sql/comprules/rules.go line 174 at r5 (raw file):

// A database name can only occur after a keyword or a comma (,).
var compDbRe = regexp.MustCompile(`i(i'|_)|,(['_]|i')`)

nit: For this regex, seems that sketches ,', ,_ and ,i also match. Is this expected? Should we ask for an identifier token at the beginning?


pkg/sql/compengine/sketch_test.go line 54 at r5 (raw file):

			fmt.Fprintln(tw, "--")
			c := completions{tokens: tokens, sketch: sketch, queryTokIdx: pos}
			for i := -2; i <= 2; i++ {

nit: could you add a comment for this? IIUC we're printing out the token corresponding to the current cursor, and also its surrounding tokens.


pkg/sql/completions.go line 41 at r5 (raw file):

	txn := params.p.Txn()
	queryIterFn := func(ctx context.Context, opName string, stmt string, args ...interface{}) (sqlutil.InternalRows, error) {
		return ie.QueryIteratorEx(ctx, opName, txn,

We can just use params.p.QueryIteratorEx() here.
initInternalExecutor() is expected to be used only in planner.WithInternalExecutor() or query/exec functions such as planner.ExecEx().


pkg/sql/completions_test.go line 1 at r3 (raw file):

// Copyright 2022 The Cockroach Authors.

nit: We can remove this file.

knz added 6 commits December 6, 2022 20:35
Before, the planning was short cut using a delegate node.
We want the show completion code to become more intelligent;
for this, we need proper planning. This commit implements that.

This also makes SHOW COMPLETIONS return 5 columns instead of just 1.
We will use this in subsequent commits.

Release note: None
We will want to run it without a planNode / planner context.
Also, this encapsulation makes it possible to share some code
with the observer invocation in connExecutor.

Release note: None
This commit introduces a new heuristic-based completion engine using
sketches of the input token sequence.

It works roughly as follows:

1. first the input is scanned into a token array.

2. then the token array is translated to a *sketch*: a string with the
   same length as the token array, where each character corresponds to an
   input token and an extra character indicating where the completion was
   requested.

3. then the completion rules are applied in sequence. Each rule
   inspects the sketch (and/or the token sequence) and decides whether
   to do nothing (skip to next rule), or to run some SQL which will
   return a batch of completion candidates.

   Each method operates exclusively on the token sequence, and does
   not require the overall SQL syntax to be valid or complete.

Also, the completion engine executes in a streaming fashion, so that
there is no need to accumulate all the completion candidates in RAM
before returning them to the client. This prevents excessive
memory usage server-side, and offers the client an option to cancel
the search once enough suggestions have been received.

Code organization:

- new package `sql/compengine`: the core completion engine, also
  where sketches are computed.

  Suggested reading: `api.go` in the new package.

- new package `sql/comprules`: the actual completion
  methods/heuristics, where sketches are mapped into SQL queries
  that provide the completion candidates.

  Suggested reading: the test cases under `testdata`.

Some more words about sketches:

For example, `SHOW COMPLETIONS AT OFFSET 10 FOR 'select myc from
sc.mytable;'` produces the sketch `"ii'ii.i;"` where `i` indicates an
identifier-like token and `'` indicates the cursor position.

The purpose of the sketch is to simplify the pattern matching
performed by the completion heuristics, and enables the application of
regular expressions on the token sequence.

For example, a heuristic to complete schema-qualified relations in
the current database, or a database-qualified relation, would use
the regexp `[^.;]i\.(['_]|i')`: an identifier not following a
period or a semicolon, followed by a period, followed by the
completion cursor (with the cursor either directly after the period
or on an identifier after the period).

Release note: None
Release note (cli change): The interactive SQL shell now supports
a rudimentary & experimental form of tab completion to input the
name of SQL objects and functions.
@knz knz force-pushed the 20220908-completions branch from b44d969 to 28317c5 Compare December 6, 2022 20:21
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rytaft and @ZhouXing19)


pkg/sql/conn_executor_exec.go line 2007 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

I'm a bit concerned about the usage of the internal executor here (see #91004 if you're curious). Could you add a comment:

TODO(janexing): better bind the internal executor with the txn.

Thanks!

Done.


pkg/sql/compengine/sketch.go line 54 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: change it to buf.WriteByte(byte(MarkCursorInSpace))

Done.


pkg/sql/compengine/sketch.go line 76 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: iiuc, the last token is always the EOF, so the start and the end indexes are the same. It may be easier to understand if we have >= nt(tokens[len(tokens)-1].End).

Done.


pkg/sql/compengine/sketch.go line 82 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

ditto

Done.


pkg/sql/compengine/sketch.go line 144 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: change it to if sketch[pos] != MarkCursorInSpace && sketch[pos] != MarkCursorInToken

Done.


pkg/sql/comprules/rules.go line 156 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

Just out of curiosity -- could you say more about the "comment system"?

Things from pg_catalog. Explained in comment.


pkg/sql/comprules/rules.go line 174 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: For this regex, seems that sketches ,', ,_ and ,i also match. Is this expected? Should we ask for an identifier token at the beginning?

matching on ,' doesn't seem right, good catch. Modified.

,_ is expected, because we have select * from db1.t1, db2.t2, we want to see databases listed after the comma.


pkg/sql/opt/exec/factory.opt line 730 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think this needs to go to the end of this file to avoid causing problems with gists (see the comment on line 39 of plan_gist_factory.go)

Done.


pkg/sql/opt/exec/execbuilder/statement.go line 321 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: update comment

Good catch.


pkg/sql/opt/exec/explain/testdata/gists line 1012 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Can you please add a test in this file for SHOW COMPLETIONS and make sure that none of these other tests change? Thank you!

Good idea. Done.


pkg/sql/compengine/sketch_test.go line 54 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: could you add a comment for this? IIUC we're printing out the token corresponding to the current cursor, and also its surrounding tokens.

Done.


pkg/sql/completions.go line 41 at r5 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

We can just use params.p.QueryIteratorEx() here.
initInternalExecutor() is expected to be used only in planner.WithInternalExecutor() or query/exec functions such as planner.ExecEx().

Done.

@knz
Copy link
Contributor Author

knz commented Dec 6, 2022

TFYRs!

bors r=rytaft,ZhouXing19

@craig
Copy link
Contributor

craig bot commented Dec 6, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 6, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cli: tab complete in cockroach CLI
5 participants