Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: deprecate ordinal column references #93754

Merged
merged 1 commit into from
Dec 20, 2022

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Dec 15, 2022

sql: deprecate ordinal column references

This commit disallows ordinal column references by default. Any
statements using them will result in an error. The session setting
allow_ordinal_column_references can be set to true to revert to
previous behavior.

Note that numeric tuple indexing (e.g., SELECT ((1,2,3)).@2) is still
allowed.

This deprecation is motivated by the inconsistent and surprising
behavior that ordinal column references provide. Ordinal column
references (e.g., SELECT @1 FROM t) are a vestigial and undocumented
feature originally motivated to aid the heuristic planner. The PR that
added them, #10729, discourages their use by users because "they are not
robust against schema changes". It also points out a subtle difference
between ordinal column references and SQL standard ordinals that could
confuse users. As an example, the following statements are not
equivalent:

SELECT @2 FROM t ORDER BY @1;

SELECT @2 FROM t ORDER BY 1;

In the current implementation, an ordinal column reference @n resolves
to the n-th column in the current scope's slice of columns. This has
several implications:

  1. Ordinal columns can refer to any column, not just columns of data
    sources in FROM clauses, as was originally intended. This makes it
    hard to reason about the resolution of an ordinal column reference.
    For example, it's somewhat surprising that the result of the SELECT
    below is not (10, 1).
CREATE TABLE t (a INT, b INT);
INSERT INTO t VALUES (1, 10);

SELECT @2, @1 FROM (SELECT @2, @1 FROM t);
  ?column? | ?column?
-----------+-----------
         1 |       10
(1 row)
  1. The ordering of columns in the scope's slice is not guaranteed to be
    consistent, so any reasonable ordinal column resolution occurs
    more-or-less by chance. As an example of unexpected behavior,
    consider:
CREATE TABLE t (a INT PRIMARY KEY, INDEX ((a+10)));
INSERT INTO t(a) VALUES (1);
ALTER TABLE t ADD COLUMN b INT;

SELECT @1, @2 FROM t;
  ?column? | ?column?
-----------+-----------
         1 |       11
(1 row)

Most users would expect the result of the SELECT statement to be
(1, NULL) because @1 should resolve to a and @2 should resolve
to b. Instead, the ordinal column reference @2 circumvents logic
that keeps inaccessible columns from being referenced, and resolves to
the virtual computed column backing the secondary index.

Epic: None

Release note (sql change): Ordinal column references (e.g.,
SELECT @1, @2 FROM t) are now deprecated. By default, statements using
this syntax will now result in an error. They can be allowed using the
session setting SET allow_ordinal_column_references=true. Support for
ordinal column references is scheduled to be removed in version 23.2.

@mgartner mgartner requested review from knz, rafiss and ajwerner December 15, 2022 22:17
@mgartner mgartner requested a review from a team as a code owner December 15, 2022 22:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor

knz commented Dec 15, 2022

Huh, but what feature then remains to select a column from a source by ordinal position? like "i want just the first column"

@knz
Copy link
Contributor

knz commented Dec 15, 2022

also, what's the motivation for this change generally?

@knz
Copy link
Contributor

knz commented Dec 15, 2022

maybe I can still do select (t.*).@1 from t instead of select @1 from t?

@ajwerner
Copy link
Contributor

Huh, but what feature then remains to select a column from a source by ordinal position? like "i want just the first column"

When does this come up?

@knz
Copy link
Contributor

knz commented Dec 16, 2022

When does this come up?

  • select @1 from pg_get_keywords() because I never remember the name
  • select @2, @1 from ... to invert the order of columns after a join

NB I'm ok with the change if we don't deprecate the syntax (t.*).@N which achieves the same (as far as I can see)

@mgartner mgartner force-pushed the deprecate-ordinal-col-refs branch 2 times, most recently from 5297fc9 to ccd263e Compare December 16, 2022 15:46
@mgartner
Copy link
Collaborator Author

I've added an explanation of the motivation to the commit message.

@knz
Copy link
Contributor

knz commented Dec 16, 2022

cheers that works for me 👍

This commit disallows ordinal column references by default. Any
statements using them will result in an error. The session setting
`allow_ordinal_column_references` can be set to true to revert to
previous behavior.

Note that numeric tuple indexing (e.g., `SELECT ((1,2,3)).@2`) is still
allowed.

This deprecation is motivated by the inconsistent and surprising
behavior that ordinal column references provide. Ordinal column
references (e.g., `SELECT @1 FROM t`) are a vestigial and undocumented
feature originally motivated to aid the heuristic planner. The PR that
added them, cockroachdb#10729, discourages their use by users because "they are not
robust against schema changes". It also points out a subtle difference
between ordinal column references and SQL standard ordinals that could
confuse users. As an example, the following statements are not
equivalent:

```sql
SELECT @2 FROM t ORDER BY @1;

SELECT @2 FROM t ORDER BY 1;
```

In the current implementation, an ordinal column reference `@n` resolves
to the n-th column in the current scope's slice of columns. This has
several implications:

1. Ordinal columns can refer to any column, not just columns of data
   sources in `FROM` clauses, as was originally intended. This makes it
   hard to reason about the resolution of an ordinal column reference.
   For example, it's somewhat surprising that the result of the `SELECT`
   below is not `(10, 1)`.

```
CREATE TABLE t (a INT, b INT);
INSERT INTO t VALUES (1, 10);

SELECT @2, @1 FROM (SELECT @2, @1 FROM t);
  ?column? | ?column?
-----------+-----------
         1 |       10
(1 row)
```

2. The ordering of columns in the scope's slice is not guaranteed to be
   consistent, so any reasonable ordinal column resolution occurs
   more-or-less by chance. As an example of unexpected behavior,
   consider:

```
CREATE TABLE t (a INT PRIMARY KEY, INDEX ((a+10)));
INSERT INTO t(a) VALUES (1);
ALTER TABLE t ADD COLUMN b INT;

SELECT @1, @2 FROM t;
  ?column? | ?column?
-----------+-----------
         1 |       11
(1 row)
```

Most users would expect the result of the `SELECT` statement to be
`(1, NULL)` because `@1` should resolve to `a` and `@2` should resolve
to `b`. Instead, the ordinal column reference `@2` circumvents logic
that keeps inaccessible columns from being referenced, and resolves to
the virtual computed column backing the secondary index.

Epic: None

Release note (sql change): Ordinal column references (e.g.,
`SELECT @1, @2 FROM t`) are now deprecated. By default, statements using
this syntax will now result in an error. They can be allowed using the
session setting `SET allow_ordinal_column_references=true`. Support for
ordinal column references is scheduled to be removed in version 23.2.
@mgartner mgartner force-pushed the deprecate-ordinal-col-refs branch from ccd263e to cfa32f1 Compare December 16, 2022 23:28
@mgartner mgartner requested a review from a team as a code owner December 16, 2022 23:28
@mgartner mgartner requested a review from a team December 16, 2022 23:28
@mgartner mgartner requested a review from a team as a code owner December 16, 2022 23:28
@mgartner mgartner requested review from benbardin and removed request for a team December 16, 2022 23:28
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 37 of 37 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @benbardin, @knz, and @rafiss)

@mgartner
Copy link
Collaborator Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 20, 2022

Build succeeded:

@craig craig bot merged commit 060cdcb into cockroachdb:master Dec 20, 2022
mgartner added a commit to mgartner/cockroach that referenced this pull request Dec 20, 2022
This commit removes some leftover ordinal column references in
`partialidx.Implicator`'s benchmarks. These benchmarks are failing
since cockroachdb#93754 which deprecates ordinal column references.

Epic: None

Release note: None
@mgartner mgartner deleted the deprecate-ordinal-col-refs branch December 20, 2022 20:34
craig bot pushed a commit that referenced this pull request Dec 20, 2022
93858: xform: use ordering from LIMIT as a hint for streaming group-by r=DrewKimball,rharding6373 a=msirek

Fixes #93410

A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan.

Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided.

This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible.

Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.

93992: cli,workload: don't hide useful workloads r=j82w a=knz

Prior to this patch, at least half of the workloads were hidden from view in the output of `cockroach --help`.

There was no good reason for this: most of the workloads are useful for teaching/learning and for experimentation. They all deserve more exposure, so that folk can learn about them without being told by the one person who built the workload in the first place.

So this patch fixes that by exposing of all of them through the online help.

One question that could remain is how much teaching value there is in letting someone experiment with a tool that was built for the benefit of one team only. One specific workload is under consideration here: `bulkingest`, used for benchmarking inside the D&R team, does not really do anything akin to what an end-user would possibly expect to do with a database. For that workload, and the benefit of future workloas akin to it, this patch adds a notice in its help text that it was developed for internal testing only.

Release note: None
Epic: None

94005: opt/partialidx: fix Implicator benchmark r=mgartner a=mgartner

This commit removes some leftover ordinal column references in `partialidx.Implicator`'s benchmarks. These benchmarks are failing since #93754 which deprecates ordinal column references.

Epic: None

Release note: None

Co-authored-by: Mark Sirek <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
mgartner added a commit to mgartner/cockroach that referenced this pull request Dec 21, 2022
This commit fixes a broken roachtest caused by cockroachdb#93754.

Fixes cockroachdb#94058

Release note: None
craig bot pushed a commit that referenced this pull request Dec 21, 2022
93991: tree: correct mutation/DDL property for some opaque operators r=ZhouXing19 a=rafiss

fixes #91713

In ed733ad, a framework was added to mark certain opaque operators as DDL or mutations.

This was enhanced in 06581b3, but that change wasn't exhaustive since it marked some statements as read-only, even if they could perform DDL.

With the addition of `StatementType()` in
8962176, we can make this a little more correct.

This allows the check at
https://github.com/cockroachdb/cockroach/blob/48ef0d89e6179c0d348a5236ad308d81fa392f7c/pkg/sql/opt/exec/execbuilder/relational.go#L163 to work correctly, and reject operations that shouldn't be allowed when using a read-only transaction.

To explain each change:
- BACKUP can modify job state and write to userfiles, so shouldn't be allowed in read-only mode.
- SET commands are always allowed in read-only mode in order to match Postgres behavior, and since those changes are all in-memory and session setting modifications don't respect transactions anyway.
- The crdb_internal tenant functions modify system tables.
- GRANT, REVOKE, and many other privilege-related statements are "DCL" (data control language), and all modify system tables or descriptors.

Release note (bug fix): CREATE ROLE, DELETE ROLE, GRANT, and REVOKE statements no longer work when the transaction is in read-only mode.

94025: storage: return error from {MVCC,Engine}Iterator.Value r=mgartner,nicktrav a=sumeerbhola

Informs cockroachdb/pebble#1170

Epic: CRDB-20378

Release note: None

94031: kvstreamer: fix the usage of the range iterator r=yuzefovich a=yuzefovich

Previously, after `Seek`ing the range iterator to the next key in the batch of requests in the streamer we forgot to check the validity of the iterator. In particular, this could lead to a crash of the process if `Seek` encountered an error for whatever reason. In practice, I've only observed this when running TPCH with high concurrency when GOMEMLIMIT is set.

The bug was introduced in an innocently-looking refactor in 041b104.

Epic: None

Release note (bug fix): CockroachDB could previously crash in rare circumstances when evaluating lookup and index joins. The bug is present since 22.2.0 release. Temporary workaround without upgrading to the release with this fix is changing the value of undocumented cluster setting `sql.distsql.use_streamer.enabled` to `false`.

94073: roachtest: replace ordinal column reference in transfer-leases r=mgartner a=mgartner

This commit fixes a broken roachtest caused by #93754.

Fixes #94058

Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
adityamaru pushed a commit to adityamaru/cockroach that referenced this pull request Dec 22, 2022
This commit fixes a broken roachtest caused by cockroachdb#93754.

Fixes cockroachdb#94058

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants