Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/lex: quoted SQL identifiers are not NFC-normalized #55396

Closed
knz opened this issue Oct 9, 2020 · 0 comments · Fixed by #71140
Closed

sql/lex: quoted SQL identifiers are not NFC-normalized #55396

knz opened this issue Oct 9, 2020 · 0 comments · Fixed by #71140
Assignees
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. A-sql-syntax Issues strictly related to the SQL grammar, with no semantic aspect C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@knz
Copy link
Contributor

knz commented Oct 9, 2020

CockroachDB, like PostgreSQL, intends to preserve the case of SQL identifier that are entered between double quotes, such that e.g.

CREATE TABLE "Abc"
CREATE ROLE "Abc"

preserve the case and result in a thing named "Abc".

Meanwhile, without double quotes the identified is case-folded and unicode-folded with NFC.

However, we want the NFC to occur also with double-quoted identifiers! This is because we do not want to have e.g. two tables named "è", one named with the single accented later and one with the 3-character combined diacritic.

Today this normalization does not take place, so we can see these ambiguous table/db/column names side-by-side which is poor UX.

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-syntax Issues strictly related to the SQL grammar, with no semantic aspect labels Oct 9, 2020
craig bot pushed a commit that referenced this issue Oct 27, 2020
55398:  *: make conversions between `string` and usernames more explicit  r=aaron-crl,arulajmani a=knz

Informs (but does not fix) #55396

Informs #54696
Informs  #55389

tldr: the conversions between "external" strings and internal
usernames was unprincipled, and it turns out, incorrect in some
cases. This patch cleans this up by introducing a strict conversion
API.

**Background**

CockroachDB currently performs case folding and unicode NFC
normalization upon receiving usernames, specifically usernames received
from as SQL login principals.

Internally, usernames are often—but not always—considered
pre-normalized for the purpose of comparisons, privilege checks, role
membership checks and the like.

Finally, sometimes usernames are reported "back to outside".
In error messages, log files etc, but also:

- in the SQL syntax produced by SHOW CREATE, SHOW SYNTAX etc.
- to generate OS-level file names for exported files.

**New API**

This patch introduces a new data type `security.SQLUsername`.
It is incomparable and non-convertible with the Go `string`.

Engineers must now declare intent about how to do the conversion:

- `security.MakeSQLUsernameFromUserInput` converts an "outside" string
  to a username.
- `security.MakeSQLUsernameFromPreNormalizedString` promises that
  its argument has already been previously normalized.
- `security.MakeSQLUsernameFromPreNormalizedStringChecked` also checks
  that its argument has already been previously normalized.

To output usernames, the following APIs are also available.

- `username.Normalized()` produces the username itself, without
  decorations. These corresponds to the raw string (after
  normalization).

- `username.SQLIdentifier()` produces the username in valid
  SQL syntax, so that it can be injected safely in a SQL
  statement.

- `(*tree.FmtCtx).FormatUsername()` takes a username and properly
  handles quoting and anonymization, like `FormatName()` does for
  `tree.Name` already.

Likewise, conversion from/to protobuf is now regulated, via the new
APIs `username.EncodeProto()` and `usernameproto.Decode()`.

**Problems being solved**

- the usernames "from outside" were normalized sometimes, *but not
  consistently*:

  1. they were in the arguments of CREATE/DROP/ALTER ROLE. This was
     not changed.
  2. they were not consistently converted in `cockroach cert`. This was
     corrected.
  3. they were not in the `cockroach userfile` commands. This
     has been adjusted with a reference to issue #55389.
  4. they are *not* in GRANT/REVOKE. This patch does not change
     this behavior, but highlights it by spelling out
     `MakeSQLUsernameFromPreNormalizedString()` in the implementation.
  5. ditto for CREATE SCHEMA ... AUTHORIZATION and ALTER ... OWNER TO
  6. they were in the argument to `LoginRequest`. This was not
     changed.
  7. they were not in the argument of the other API requests that
     allow usernames, for example `ListSessions` or `CancelQuery`.
     This was not changed, but is now documented in the API.

- the usernames "inside" were incorrectly directly injected
  in SQL statements, even though they may contain special
  characters that create SQL syntax errors.

  This has been corrected by suitable uses of the new
  `SQLIdentifier()` method.

- There was an outright bug in a call to `CreateGCJobRec` (something
  about GCing jobs), where a `Description` field was passed in lieu
  of a username for a `User` field. The implications of this are unclear.

**Status after this change**

The new API makes it possible to audit exactly where "sensitive"
username/string conversion occurs. After this patch, we find the
following uses:

- `MakeSQLUsernameFromUserInput`:

  - pgwire user auth
  - CLI URL parsing
  - `cockroach userfile`
  - `cockroach cert`
  - `(*rpc.SecurityContext).PGURL()` (unsure whether that's a good thing)
  - CREATE/DROP/ALTER ROLE
  - when using literal strings as `role_spec` in the SQL grammar

- `MakeSQLUsernameFromPreNormalizedString`:

  - role membership checks inside SQL based on data read
    from `system` tables.
  - in GRANT/REVOKE (this is surprising, see above)

- `MakeSQLUsernameFromPreNormalizedStringChecked`:

  - when intepreting the username in API query parameters, for
    those API documented as using pre-normalized usernames.

Release note: None

55986: opt: use paired joins for left semi inverted joins r=sumeerbhola a=sumeerbhola

Semi joins using the inverted index used to be converted
to an inner inverted join followed by an inner lookup join
and a distinct for de-duplication. They are now converted
to paired-joins consisting of an inner inverted join
followed by a left semi lookup join, which is more
efficient.

Release note: None

55997: sql: fix object lookup for ALTER TABLE ... RENAME TO ... r=ajwerner a=jayshrivastava

Previously, ALTER TABLE schema.table RENAME TO schema.existing_table
would fail because the statement would look up the destination table in
the public schema only. This commit fixes this behavior by using
the correct destination schema.

Fixes #55985

Release note: None


Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
@knz knz added the S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. label Nov 18, 2020
@knz knz added the A-schema-descriptors Relating to SQL table/db descriptor handling. label Nov 18, 2020
@rafiss rafiss added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label May 12, 2021
@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
@craig craig bot closed this as completed in 6159a73 Oct 12, 2021
@exalate-issue-sync exalate-issue-sync bot removed the T-sql-schema-deprecated Use T-sql-foundations instead label May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. A-sql-syntax Issues strictly related to the SQL grammar, with no semantic aspect C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants