-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/lex: quoted SQL identifiers are not NFC-normalized #55396
Labels
A-schema-descriptors
Relating to SQL table/db descriptor handling.
A-sql-syntax
Issues strictly related to the SQL grammar, with no semantic aspect
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
S-3-ux-surprise
Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Comments
knz
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-sql-syntax
Issues strictly related to the SQL grammar, with no semantic aspect
labels
Oct 9, 2020
craig bot
pushed a commit
that referenced
this issue
Oct 27, 2020
55398: *: make conversions between `string` and usernames more explicit r=aaron-crl,arulajmani a=knz Informs (but does not fix) #55396 Informs #54696 Informs #55389 tldr: the conversions between "external" strings and internal usernames was unprincipled, and it turns out, incorrect in some cases. This patch cleans this up by introducing a strict conversion API. **Background** CockroachDB currently performs case folding and unicode NFC normalization upon receiving usernames, specifically usernames received from as SQL login principals. Internally, usernames are often—but not always—considered pre-normalized for the purpose of comparisons, privilege checks, role membership checks and the like. Finally, sometimes usernames are reported "back to outside". In error messages, log files etc, but also: - in the SQL syntax produced by SHOW CREATE, SHOW SYNTAX etc. - to generate OS-level file names for exported files. **New API** This patch introduces a new data type `security.SQLUsername`. It is incomparable and non-convertible with the Go `string`. Engineers must now declare intent about how to do the conversion: - `security.MakeSQLUsernameFromUserInput` converts an "outside" string to a username. - `security.MakeSQLUsernameFromPreNormalizedString` promises that its argument has already been previously normalized. - `security.MakeSQLUsernameFromPreNormalizedStringChecked` also checks that its argument has already been previously normalized. To output usernames, the following APIs are also available. - `username.Normalized()` produces the username itself, without decorations. These corresponds to the raw string (after normalization). - `username.SQLIdentifier()` produces the username in valid SQL syntax, so that it can be injected safely in a SQL statement. - `(*tree.FmtCtx).FormatUsername()` takes a username and properly handles quoting and anonymization, like `FormatName()` does for `tree.Name` already. Likewise, conversion from/to protobuf is now regulated, via the new APIs `username.EncodeProto()` and `usernameproto.Decode()`. **Problems being solved** - the usernames "from outside" were normalized sometimes, *but not consistently*: 1. they were in the arguments of CREATE/DROP/ALTER ROLE. This was not changed. 2. they were not consistently converted in `cockroach cert`. This was corrected. 3. they were not in the `cockroach userfile` commands. This has been adjusted with a reference to issue #55389. 4. they are *not* in GRANT/REVOKE. This patch does not change this behavior, but highlights it by spelling out `MakeSQLUsernameFromPreNormalizedString()` in the implementation. 5. ditto for CREATE SCHEMA ... AUTHORIZATION and ALTER ... OWNER TO 6. they were in the argument to `LoginRequest`. This was not changed. 7. they were not in the argument of the other API requests that allow usernames, for example `ListSessions` or `CancelQuery`. This was not changed, but is now documented in the API. - the usernames "inside" were incorrectly directly injected in SQL statements, even though they may contain special characters that create SQL syntax errors. This has been corrected by suitable uses of the new `SQLIdentifier()` method. - There was an outright bug in a call to `CreateGCJobRec` (something about GCing jobs), where a `Description` field was passed in lieu of a username for a `User` field. The implications of this are unclear. **Status after this change** The new API makes it possible to audit exactly where "sensitive" username/string conversion occurs. After this patch, we find the following uses: - `MakeSQLUsernameFromUserInput`: - pgwire user auth - CLI URL parsing - `cockroach userfile` - `cockroach cert` - `(*rpc.SecurityContext).PGURL()` (unsure whether that's a good thing) - CREATE/DROP/ALTER ROLE - when using literal strings as `role_spec` in the SQL grammar - `MakeSQLUsernameFromPreNormalizedString`: - role membership checks inside SQL based on data read from `system` tables. - in GRANT/REVOKE (this is surprising, see above) - `MakeSQLUsernameFromPreNormalizedStringChecked`: - when intepreting the username in API query parameters, for those API documented as using pre-normalized usernames. Release note: None 55986: opt: use paired joins for left semi inverted joins r=sumeerbhola a=sumeerbhola Semi joins using the inverted index used to be converted to an inner inverted join followed by an inner lookup join and a distinct for de-duplication. They are now converted to paired-joins consisting of an inner inverted join followed by a left semi lookup join, which is more efficient. Release note: None 55997: sql: fix object lookup for ALTER TABLE ... RENAME TO ... r=ajwerner a=jayshrivastava Previously, ALTER TABLE schema.table RENAME TO schema.existing_table would fail because the statement would look up the destination table in the public schema only. This commit fixes this behavior by using the correct destination schema. Fixes #55985 Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: sumeerbhola <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]>
knz
added
the
S-3-ux-surprise
Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.
label
Nov 18, 2020
rafiss
added
the
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
label
May 12, 2021
exalate-issue-sync
bot
removed
the
T-sql-schema-deprecated
Use T-sql-foundations instead
label
May 10, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-schema-descriptors
Relating to SQL table/db descriptor handling.
A-sql-syntax
Issues strictly related to the SQL grammar, with no semantic aspect
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
S-3-ux-surprise
Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
CockroachDB, like PostgreSQL, intends to preserve the case of SQL identifier that are entered between double quotes, such that e.g.
preserve the case and result in a thing named "Abc".
Meanwhile, without double quotes the identified is case-folded and unicode-folded with NFC.
However, we want the NFC to occur also with double-quoted identifiers! This is because we do not want to have e.g. two tables named
"è"
, one named with the single accented later and one with the 3-character combined diacritic.Today this normalization does not take place, so we can see these ambiguous table/db/column names side-by-side which is poor UX.
The text was updated successfully, but these errors were encountered: