Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: decide on JSON ordering before 23.2 release #105668

Closed
rafiss opened this issue Jun 27, 2023 · 6 comments · Fixed by #101260
Closed

sql: decide on JSON ordering before 23.2 release #105668

rafiss opened this issue Jun 27, 2023 · 6 comments · Fixed by #101260
Assignees
Labels
A-sql-encoding Relating to the SQL/KV encoding. branch-master Failures and bugs on the master branch. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-sql-queries SQL Queries Team

Comments

@rafiss
Copy link
Collaborator

rafiss commented Jun 27, 2023

#101260 was created for the following reason:

Currently, #97928 and #99275 are responsible for laying out
a lexicographical ordering for JSON columns to be forward
indexable in nature. This ordering is based on the rules
posted by Postgres and is defined here: #99849

However, Postgres currently sorts the empty JSON array
before any other JSON values. An issue has been opened:
postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

Thus, this PR intends on replicating this behaviour until
the issue has been identified and resolved by Postgres.

However, @mgartner says:

Before this is merged, we'd like to check the SQL 2023 spec which reportedly defines the ordering of JSON values. If the ordering in the spec does not follow this empty array inconsistency of Postgres, we're probably fine with sticking with the current ordering. If the spec defines an ordering completely different than Postgres, we'll have to reconsider our options. This needs to be done before v23.2 is shipped.

This issue tracks that decision, and is a release blocker, since it will be a headache to change the ordering after the release goes out.

Epic: CRDB-24501

Jira issue: CRDB-29144

@rafiss rafiss added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-encoding Relating to the SQL/KV encoding. branch-master Failures and bugs on the master branch. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 27, 2023
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jun 27, 2023
@exalate-issue-sync exalate-issue-sync bot added T-sql-queries SQL Queries Team and removed T-sql-queries SQL Queries Team labels Jun 27, 2023
@mgartner mgartner self-assigned this Jun 28, 2023
@mgartner
Copy link
Collaborator

Thanks for creating this issue. This is currently waiting on me to examine the SQL:2023 spec and see if it details the ordering of JSON values.

@rytaft
Copy link
Collaborator

rytaft commented Jun 28, 2023

@rafiss @mgartner Can we call this a GA-blocker instead of release-blocker?

@mgartner mgartner added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 5, 2023
@mgartner mgartner moved this to 23.2 Release in SQL Queries Jul 24, 2023
@mgartner
Copy link
Collaborator

Empty JSON Array Comparison

The 2023 SQL spec (private copy) specifies the high level ordering of JSON values as:

null < scalar < array < object

Where scalar is a number, string, boolean, or datetime. There's no mention of special behavior for an empty array. The language is quite specific:

— An SQL/JSON null value compares less than any SQL/JSON scalar;
— An SQL/JSON scalar compares less than any SQL/JSON array;
— An SQL/JSON array compares less than any SQL/JSON object;

Therefore, Postgres's ordering of an empty JSON array before all other JSON values is inconsistent with the spec.

With the support of the 2023 spec, we have some justification to diverge from Postgres and NOT merge #101260. The spec's behavior is more intuitive than Postgres's. And I believe this is a bug in Postgres, though we've had no responses on the bug report and I think it's unlikely they'll be motivated to fix it anytime soon. I'd love to hear opinions from the SQL Foundations and Migrations team cc @rafiss @otan.

General JSON Array Comparison

There's another notable divergence in JSON array comparison behavior. The spec states:

— An SQL/JSON array A compares less than another SQL/JSON array B, if one of the following is true:

• The cardinality of A is less than the cardinality of B and the elements at the same array index in both SQL/JSON arrays compare equal for all elements of A;
• There exists an element E in A that compares less than the corresponding element in B and each element that precedes E in A compares equal to the corresponding element in B;

The Postgres docs highlight different comparison behavior with JSON arrays:

Array with n elements > array with n - 1 elements

For example, according to the 2023 spec, [1, 2, 3] is less than [1, 4] (via the second bullet point), but [1, 2, 3] is greater than [1, 4] in Postgres because the former has more elements than the latter.

Our current implementation matches Postgres's behavior here. I think it would be far more disruptive to users for Postgres to fix this inconsistency compared to the empty array inconsistency, so I think it's very unlikely they will ever change this behavior. For that reason, matching Postgres and diverging from the spec seems safe in this case.

@rafiss
Copy link
Collaborator Author

rafiss commented Jul 27, 2023

From the SQL Foundations perspective, I feel comfortable diverging from PG on this (with some public documentation somewhere). we now have enough 3rd party tooling support so we can handle this difference externally.

The Migrations view may be different. @otan has told me about some tools that verify migrations from PG down to some details such as ordering, so he may have more perspective.

@otan
Copy link
Contributor

otan commented Jul 27, 2023

I'd much rather match PG by default for reasons rafi mentions. The question I have is whether PG plans to change their implementation.

@mgartner
Copy link
Collaborator

mgartner commented Aug 4, 2023

Ok, let's mimic Postgres behavior exactly then. I'll go ahead and review and merge #101260.

mgartner pushed a commit to Shivs11/cockroach that referenced this issue Aug 4, 2023
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a
lexicographical ordering for JSON columns to be forward indexable in
nature. This ordering is based on the rules posted by Postgres and is
in cockroachdb#99849.

However, Postgres currently sorts the empty JSON array before any other
JSON values. A Postgres bug report for this has been opened:
https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

This PR intends on replicating the Postgres behavior.

Fixes cockroachdb#105668

Epic: CRDB-24501

Release note: None
mgartner pushed a commit to Shivs11/cockroach that referenced this issue Aug 4, 2023
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a
lexicographical ordering for JSON columns to be forward indexable in
nature. This ordering is based on the rules posted by Postgres and is
in cockroachdb#99849.

However, Postgres currently sorts the empty JSON array before any other
JSON values. A Postgres bug report for this has been opened:
https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

This PR intends on replicating the Postgres behavior.

Fixes cockroachdb#105668

Epic: CRDB-24501

Release note: None
craig bot pushed a commit that referenced this issue Aug 8, 2023
101260: sql: replicating JSON empty array ordering found in Postgres r=mgartner a=Shivs11

Currently, #97928 and #99275 are responsible for laying out a
lexicographical ordering for JSON columns to be forward indexable in
nature. This ordering is based on the rules posted by Postgres and is
in #99849.

However, Postgres currently sorts the empty JSON array before any other
JSON values. A Postgres bug report for this has been opened:
https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

This PR intends on replicating the Postgres behavior.

Fixes #105668

Epic: CRDB-24501

Release note: None


108160: roachtest/awsdms: run once a week instead r=Jeremyyang920 a=otan

Save a bit of mad dosh by running awsdms once a weekly instead of daily. We don't need this tested every week.

Epic: None

Release note: None

108300: schemachanger: Unskip some backup tests r=Xiang-Gu a=Xiang-Gu

Randomly skip subtests in the BACKUP/RESTORE suites before parallelizing them.

Epic: None
Release note: None

108328: rowexec: fix TestUncertaintyErrorIsReturned under race r=yuzefovich a=yuzefovich

We just saw a case when `TestUncertaintyErrorIsReturned` failed under race because we got a different DistSQL plan. This seems plausible in case the range cache population (which the test does explicitly) isn't quick enough for some reason, so this commit allows for the DistSQL plan to match the expectation via `SucceedsSoon` (if we happen to get a bad plan, then the following query execution should have the up-to-date range cache).

Fixes: #108250.

Release note: None

108341: importer: fix stale comment on mysqlStrToDatum r=mgartner,DrewKimball a=otan

Release note: None
Epic: None

From #108286 (review)

108370: go.mod: bump Pebble to fffe02a195e3 r=RahulAggarwal1016 a=RahulAggarwal1016

fffe02a1 db: simplify ScanInternal()
df7e2ae1 vfs: deflake TestDiskHealthChecking_Filesystem ff5c929a Rate Limit Scan Statistics
af8c5f27 internal/cache: mark panic messages as redaction-safe

Epic: none
Release note: none

108379: changefeedccl: deflake TestChangefeedSchemaChangeBackfillCheckpoint r=miretskiy a=jayshrivastava

Previously, the test `TestChangefeedSchemaChangeBackfillCheckpoint` would fail because it would read a table span too early. A schema change using the delcarative schema changer will update a table span to point to a new set of ranges. Previously, this test would use the span from before the schema change, which is incorrect. This change makes it use the span from after the schema change.

I stress tested this 30k times under the new schema changer and 10k times under the legacy schema changer to ensure the test is not flaky anymore.

Fixes: #108084
Release note: None
Epic: None

Co-authored-by: Shivam Saraf <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Xiang Gu <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rahul Aggarwal <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
@craig craig bot closed this as completed in 056c300 Aug 8, 2023
@github-project-automation github-project-automation bot moved this from 23.2 Release to Done in SQL Queries Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-encoding Relating to the SQL/KV encoding. branch-master Failures and bugs on the master branch. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants