Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/tech-notes: adding a note about JSONB key encoding #99849

Merged
merged 1 commit into from
Mar 30, 2023

Conversation

Shivs11
Copy link
Contributor

@Shivs11 Shivs11 commented Mar 28, 2023

This commit adds a tech note that goes ahead to explain
how JSON will be key encoded in CRDB. It builds on the
RFC for JSONB encoding and explains the main motivation
behind the requirement for a key encoding in the first
place. The tech note also dives into details pertaining
to the encoding for each of the present JSON values:
NULL, Strings, Numbers, Boolean, Arrays and Objects.

Lastly, the tech note also consists of examples showing
how JSON values are encoded in hopes of simplifying the
concept to a new reader.

Release note: None

Epic: CRDB-24501

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Shivs11 Shivs11 force-pushed the jsonb_key_encoding_tech_note branch from 8ab0fae to 286a36d Compare March 28, 2023 19:27
@Shivs11 Shivs11 requested review from mgartner, rytaft and a team March 28, 2023 19:28
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @Shivs11)


docs/tech-notes/jsonb_forward_indexing.md line 4 at r1 (raw file):

This document intends to build on the JSONB value encoding RFC

Add a link to the RFC

Currently, it is only possible to value-encode JSONB values in CRDB....

Since you're implementing key encoding right now, I would change this paragraph to say that the key encoding does exist, and this doc explains how it works.


docs/tech-notes/jsonb_forward_indexing.md line 8 at r1 (raw file):

The motivation and use-cases for including JSONB were included in the scoping RFC of JSONB.

Add a link

Currently, it is only possible to value-encode...

Same comment as above


docs/tech-notes/jsonb_forward_indexing.md line 32 at r1 (raw file):

4. Moreover, arrays with equal number of elements are compared in the order:
element - 1, element - 2, element - 3, ….

nit: this notation is a bit confusing, since it looks like you are subtracting 1 from element. Perhaps element1, element2, element3, ... would be clearer?


docs/tech-notes/jsonb_forward_indexing.md line 35 at r1 (raw file):

5. Objects with an equal number of key value pairs are compared in the order:
key - 1, value - 1, key - 2, value - 2, ….

same comment here


docs/tech-notes/jsonb_forward_indexing.md line 37 at r1 (raw file):

Moreover, care has to be taken to ensure that none of these bytes correspond to a possible encoding of a JSON value.

I thought we decided this wasn't a concern?


docs/tech-notes/jsonb_forward_indexing.md line 100 at r1 (raw file):

JSONB_Array, enc(1), enc(1), JSONB_Terminator,

Are you missing JSONB_Number in between the two enc(1)? And a second JSONB_Terminator at the end of the array?

JSONB_String, enc(b),

Are you missing a JSONB_Terminator here?

@Shivs11 Shivs11 force-pushed the jsonb_key_encoding_tech_note branch from 286a36d to 595fa45 Compare March 29, 2023 13:44
Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


docs/tech-notes/jsonb_forward_indexing.md line 4 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This document intends to build on the JSONB value encoding RFC

Add a link to the RFC

Currently, it is only possible to value-encode JSONB values in CRDB....

Since you're implementing key encoding right now, I would change this paragraph to say that the key encoding does exist, and this doc explains how it works.

Done!


docs/tech-notes/jsonb_forward_indexing.md line 8 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

The motivation and use-cases for including JSONB were included in the scoping RFC of JSONB.

Add a link

Currently, it is only possible to value-encode...

Same comment as above

Done!


docs/tech-notes/jsonb_forward_indexing.md line 32 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: this notation is a bit confusing, since it looks like you are subtracting 1 from element. Perhaps element1, element2, element3, ... would be clearer?

I agree. Changes have been made.


docs/tech-notes/jsonb_forward_indexing.md line 35 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

same comment here

Done.


docs/tech-notes/jsonb_forward_indexing.md line 37 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Moreover, care has to be taken to ensure that none of these bytes correspond to a possible encoding of a JSON value.

I thought we decided this wasn't a concern?

You're right. The well defined structure of the encoding could resolve this issue. Removing this.


docs/tech-notes/jsonb_forward_indexing.md line 100 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

JSONB_Array, enc(1), enc(1), JSONB_Terminator,

Are you missing JSONB_Number in between the two enc(1)? And a second JSONB_Terminator at the end of the array?

JSONB_String, enc(b),

Are you missing a JSONB_Terminator here?

Yes, thank you for the catch. Was also missing a JSONB_Terminator at the very end of the last example to notify the end of an object.

Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @Shivs11)


docs/tech-notes/jsonb_forward_indexing.md line 5 at r2 (raw file):

This document intends to build on the JSONB value encoding [RFC](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20171005_jsonb_encoding.md). 
Previously, it was only possible to value-encode JSONB values in CRDB. The following document proposes a format 

nit: "proposes" still sounds like this is a feature that you haven't yet implemented. I'd say "describes" instead.


docs/tech-notes/jsonb_forward_indexing.md line 6 at r2 (raw file):

This document intends to build on the JSONB value encoding [RFC](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20171005_jsonb_encoding.md). 
Previously, it was only possible to value-encode JSONB values in CRDB. The following document proposes a format 
for key-encoding JSONB values in hopes of having a semantic ordering for JSON which in turn allows forward indexes on JSON columns. 

ditto "in hopes of" -- maybe "for the purpose of" is better


docs/tech-notes/jsonb_forward_indexing.md line 113 at r2 (raw file):

enc('{“a”: ‘[1]’, “b”: {“c”: 0}'::JSONB)

Looks like you're missing a closing }

This commit adds a tech note that goes ahead to explain
how JSON will be key encoded in CRDB. It builds on the
RFC for JSONB encoding and explains the main motivation
behind the requirement for a key encoding in the first
place. The tech note also dives into details pertaining
to the encoding for each of the present JSON values:
NULL, Strings, Numbers, Boolean, Arrays and Objects.

Lastly, the tech note also consists of examples showing
how JSON values are encoded in hopes of simplifying the
concept to a new reader.

Release note: None

Epic: CRDB-24501
@Shivs11 Shivs11 force-pushed the jsonb_key_encoding_tech_note branch from 595fa45 to 0a28e35 Compare March 29, 2023 19:53
Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


docs/tech-notes/jsonb_forward_indexing.md line 5 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: "proposes" still sounds like this is a feature that you haven't yet implemented. I'd say "describes" instead.

Done!


docs/tech-notes/jsonb_forward_indexing.md line 6 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

ditto "in hopes of" -- maybe "for the purpose of" is better

Done.


docs/tech-notes/jsonb_forward_indexing.md line 113 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

enc('{“a”: ‘[1]’, “b”: {“c”: 0}'::JSONB)

Looks like you're missing a closing }

Done!

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)

@Shivs11 Shivs11 requested a review from yuzefovich March 29, 2023 20:23
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! Be sure to update this when we decide how to handle the encoding of an empty JSONB array.

Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing @mgartner! Thank you for the review!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

@Shivs11
Copy link
Contributor Author

Shivs11 commented Mar 30, 2023

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 30, 2023

Build succeeded:

@craig craig bot merged commit 50b9c90 into cockroachdb:master Mar 30, 2023
Shivs11 added a commit to Shivs11/cockroach that referenced this pull request Apr 6, 2023
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out
a lexicographical ordering for JSON columns to be forward
indexable in nature. This ordering is based on the rules
posted by Postgres and is defined here: cockroachdb#99849

However, Postgres currently sorts the empty JSON array
before any other JSON values. An issue has been opened:
https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

Thus, this PR intends on replicating this behaviour until
the issue has been identified and resolved by Postgres.

Epic: CRDB-24501

Release note (sql change): This PR shall now place the empty JSON array
to have the lowest (highest) precedence when the JSON column
is being sorted in ascending (descending) order.
Shivs11 added a commit to Shivs11/cockroach that referenced this pull request Apr 11, 2023
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out
a lexicographical ordering for JSON columns to be forward
indexable in nature. This ordering is based on the rules
posted by Postgres and is defined here: cockroachdb#99849

However, Postgres currently sorts the empty JSON array
before any other JSON values. An issue has been opened:
https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

Thus, this PR intends on replicating this behaviour until
the issue has been identified and resolved by Postgres.

Epic: CRDB-24501

Release note (sql change): This PR shall now place the empty JSON array
to have the lowest (highest) precedence when the JSON column
is being sorted in ascending (descending) order.
mgartner pushed a commit to Shivs11/cockroach that referenced this pull request Aug 4, 2023
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a
lexicographical ordering for JSON columns to be forward indexable in
nature. This ordering is based on the rules posted by Postgres and is
in cockroachdb#99849.

However, Postgres currently sorts the empty JSON array before any other
JSON values. A Postgres bug report for this has been opened:
https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

This PR intends on replicating the Postgres behavior.

Fixes cockroachdb#105668

Epic: CRDB-24501

Release note: None
mgartner pushed a commit to Shivs11/cockroach that referenced this pull request Aug 4, 2023
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a
lexicographical ordering for JSON columns to be forward indexable in
nature. This ordering is based on the rules posted by Postgres and is
in cockroachdb#99849.

However, Postgres currently sorts the empty JSON array before any other
JSON values. A Postgres bug report for this has been opened:
https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

This PR intends on replicating the Postgres behavior.

Fixes cockroachdb#105668

Epic: CRDB-24501

Release note: None
mgartner pushed a commit to Shivs11/cockroach that referenced this pull request Aug 8, 2023
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a
lexicographical ordering for JSON columns to be forward indexable in
nature. This ordering is based on the rules posted by Postgres and is
in cockroachdb#99849.

However, Postgres currently sorts the empty JSON array before any other
JSON values. A Postgres bug report for this has been opened:
https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

This PR intends on replicating the Postgres behavior.

Fixes cockroachdb#105668

Epic: CRDB-24501

Release note: None
craig bot pushed a commit that referenced this pull request Aug 8, 2023
101260: sql: replicating JSON empty array ordering found in Postgres r=mgartner a=Shivs11

Currently, #97928 and #99275 are responsible for laying out a
lexicographical ordering for JSON columns to be forward indexable in
nature. This ordering is based on the rules posted by Postgres and is
in #99849.

However, Postgres currently sorts the empty JSON array before any other
JSON values. A Postgres bug report for this has been opened:
https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

This PR intends on replicating the Postgres behavior.

Fixes #105668

Epic: CRDB-24501

Release note: None


108160: roachtest/awsdms: run once a week instead r=Jeremyyang920 a=otan

Save a bit of mad dosh by running awsdms once a weekly instead of daily. We don't need this tested every week.

Epic: None

Release note: None

108300: schemachanger: Unskip some backup tests r=Xiang-Gu a=Xiang-Gu

Randomly skip subtests in the BACKUP/RESTORE suites before parallelizing them.

Epic: None
Release note: None

108328: rowexec: fix TestUncertaintyErrorIsReturned under race r=yuzefovich a=yuzefovich

We just saw a case when `TestUncertaintyErrorIsReturned` failed under race because we got a different DistSQL plan. This seems plausible in case the range cache population (which the test does explicitly) isn't quick enough for some reason, so this commit allows for the DistSQL plan to match the expectation via `SucceedsSoon` (if we happen to get a bad plan, then the following query execution should have the up-to-date range cache).

Fixes: #108250.

Release note: None

108341: importer: fix stale comment on mysqlStrToDatum r=mgartner,DrewKimball a=otan

Release note: None
Epic: None

From #108286 (review)

108370: go.mod: bump Pebble to fffe02a195e3 r=RahulAggarwal1016 a=RahulAggarwal1016

fffe02a1 db: simplify ScanInternal()
df7e2ae1 vfs: deflake TestDiskHealthChecking_Filesystem ff5c929a Rate Limit Scan Statistics
af8c5f27 internal/cache: mark panic messages as redaction-safe

Epic: none
Release note: none

108379: changefeedccl: deflake TestChangefeedSchemaChangeBackfillCheckpoint r=miretskiy a=jayshrivastava

Previously, the test `TestChangefeedSchemaChangeBackfillCheckpoint` would fail because it would read a table span too early. A schema change using the delcarative schema changer will update a table span to point to a new set of ranges. Previously, this test would use the span from before the schema change, which is incorrect. This change makes it use the span from after the schema change.

I stress tested this 30k times under the new schema changer and 10k times under the legacy schema changer to ensure the test is not flaky anymore.

Fixes: #108084
Release note: None
Epic: None

Co-authored-by: Shivam Saraf <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Xiang Gu <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rahul Aggarwal <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants