-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs/tech-notes: adding a note about JSONB key encoding #99849
docs/tech-notes: adding a note about JSONB key encoding #99849
Conversation
8ab0fae
to
286a36d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @Shivs11)
docs/tech-notes/jsonb_forward_indexing.md
line 4 at r1 (raw file):
This document intends to build on the JSONB value encoding RFC
Add a link to the RFC
Currently, it is only possible to value-encode JSONB values in CRDB....
Since you're implementing key encoding right now, I would change this paragraph to say that the key encoding does exist, and this doc explains how it works.
docs/tech-notes/jsonb_forward_indexing.md
line 8 at r1 (raw file):
The motivation and use-cases for including JSONB were included in the scoping RFC of JSONB.
Add a link
Currently, it is only possible to value-encode...
Same comment as above
docs/tech-notes/jsonb_forward_indexing.md
line 32 at r1 (raw file):
4. Moreover, arrays with equal number of elements are compared in the order: element - 1, element - 2, element - 3, ….
nit: this notation is a bit confusing, since it looks like you are subtracting 1 from element. Perhaps element1, element2, element3, ...
would be clearer?
docs/tech-notes/jsonb_forward_indexing.md
line 35 at r1 (raw file):
5. Objects with an equal number of key value pairs are compared in the order: key - 1, value - 1, key - 2, value - 2, ….
same comment here
docs/tech-notes/jsonb_forward_indexing.md
line 37 at r1 (raw file):
Moreover, care has to be taken to ensure that none of these bytes correspond to a possible encoding of a JSON value.
I thought we decided this wasn't a concern?
docs/tech-notes/jsonb_forward_indexing.md
line 100 at r1 (raw file):
JSONB_Array, enc(1), enc(1), JSONB_Terminator,
Are you missing JSONB_Number
in between the two enc(1)? And a second JSONB_Terminator
at the end of the array?
JSONB_String, enc(b),
Are you missing a JSONB_Terminator
here?
286a36d
to
595fa45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
docs/tech-notes/jsonb_forward_indexing.md
line 4 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This document intends to build on the JSONB value encoding RFC
Add a link to the RFC
Currently, it is only possible to value-encode JSONB values in CRDB....
Since you're implementing key encoding right now, I would change this paragraph to say that the key encoding does exist, and this doc explains how it works.
Done!
docs/tech-notes/jsonb_forward_indexing.md
line 8 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
The motivation and use-cases for including JSONB were included in the scoping RFC of JSONB.
Add a link
Currently, it is only possible to value-encode...
Same comment as above
Done!
docs/tech-notes/jsonb_forward_indexing.md
line 32 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: this notation is a bit confusing, since it looks like you are subtracting 1 from element. Perhaps
element1, element2, element3, ...
would be clearer?
I agree. Changes have been made.
docs/tech-notes/jsonb_forward_indexing.md
line 35 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
same comment here
Done.
docs/tech-notes/jsonb_forward_indexing.md
line 37 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Moreover, care has to be taken to ensure that none of these bytes correspond to a possible encoding of a JSON value.
I thought we decided this wasn't a concern?
You're right. The well defined structure of the encoding could resolve this issue. Removing this.
docs/tech-notes/jsonb_forward_indexing.md
line 100 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
JSONB_Array, enc(1), enc(1), JSONB_Terminator,
Are you missing
JSONB_Number
in between the two enc(1)? And a secondJSONB_Terminator
at the end of the array?
JSONB_String, enc(b),
Are you missing a
JSONB_Terminator
here?
Yes, thank you for the catch. Was also missing a JSONB_Terminator
at the very end of the last example to notify the end of an object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @Shivs11)
docs/tech-notes/jsonb_forward_indexing.md
line 5 at r2 (raw file):
This document intends to build on the JSONB value encoding [RFC](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20171005_jsonb_encoding.md). Previously, it was only possible to value-encode JSONB values in CRDB. The following document proposes a format
nit: "proposes" still sounds like this is a feature that you haven't yet implemented. I'd say "describes" instead.
docs/tech-notes/jsonb_forward_indexing.md
line 6 at r2 (raw file):
This document intends to build on the JSONB value encoding [RFC](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20171005_jsonb_encoding.md). Previously, it was only possible to value-encode JSONB values in CRDB. The following document proposes a format for key-encoding JSONB values in hopes of having a semantic ordering for JSON which in turn allows forward indexes on JSON columns.
ditto "in hopes of" -- maybe "for the purpose of" is better
docs/tech-notes/jsonb_forward_indexing.md
line 113 at r2 (raw file):
enc(
'{“a”: ‘[1]’, “b”: {“c”: 0}'::JSONB
)
Looks like you're missing a closing }
This commit adds a tech note that goes ahead to explain how JSON will be key encoded in CRDB. It builds on the RFC for JSONB encoding and explains the main motivation behind the requirement for a key encoding in the first place. The tech note also dives into details pertaining to the encoding for each of the present JSON values: NULL, Strings, Numbers, Boolean, Arrays and Objects. Lastly, the tech note also consists of examples showing how JSON values are encoded in hopes of simplifying the concept to a new reader. Release note: None Epic: CRDB-24501
595fa45
to
0a28e35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
docs/tech-notes/jsonb_forward_indexing.md
line 5 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: "proposes" still sounds like this is a feature that you haven't yet implemented. I'd say "describes" instead.
Done!
docs/tech-notes/jsonb_forward_indexing.md
line 6 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
ditto "in hopes of" -- maybe "for the purpose of" is better
Done.
docs/tech-notes/jsonb_forward_indexing.md
line 113 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
enc(
'{“a”: ‘[1]’, “b”: {“c”: 0}'::JSONB
)Looks like you're missing a closing
}
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done! Be sure to update this when we decide how to handle the encoding of an empty JSONB array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing @mgartner! Thank you for the review!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
bors r+ |
Build succeeded: |
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is defined here: cockroachdb#99849 However, Postgres currently sorts the empty JSON array before any other JSON values. An issue has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org Thus, this PR intends on replicating this behaviour until the issue has been identified and resolved by Postgres. Epic: CRDB-24501 Release note (sql change): This PR shall now place the empty JSON array to have the lowest (highest) precedence when the JSON column is being sorted in ascending (descending) order.
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is defined here: cockroachdb#99849 However, Postgres currently sorts the empty JSON array before any other JSON values. An issue has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org Thus, this PR intends on replicating this behaviour until the issue has been identified and resolved by Postgres. Epic: CRDB-24501 Release note (sql change): This PR shall now place the empty JSON array to have the lowest (highest) precedence when the JSON column is being sorted in ascending (descending) order.
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is in cockroachdb#99849. However, Postgres currently sorts the empty JSON array before any other JSON values. A Postgres bug report for this has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org This PR intends on replicating the Postgres behavior. Fixes cockroachdb#105668 Epic: CRDB-24501 Release note: None
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is in cockroachdb#99849. However, Postgres currently sorts the empty JSON array before any other JSON values. A Postgres bug report for this has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org This PR intends on replicating the Postgres behavior. Fixes cockroachdb#105668 Epic: CRDB-24501 Release note: None
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is in cockroachdb#99849. However, Postgres currently sorts the empty JSON array before any other JSON values. A Postgres bug report for this has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org This PR intends on replicating the Postgres behavior. Fixes cockroachdb#105668 Epic: CRDB-24501 Release note: None
101260: sql: replicating JSON empty array ordering found in Postgres r=mgartner a=Shivs11 Currently, #97928 and #99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is in #99849. However, Postgres currently sorts the empty JSON array before any other JSON values. A Postgres bug report for this has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org This PR intends on replicating the Postgres behavior. Fixes #105668 Epic: CRDB-24501 Release note: None 108160: roachtest/awsdms: run once a week instead r=Jeremyyang920 a=otan Save a bit of mad dosh by running awsdms once a weekly instead of daily. We don't need this tested every week. Epic: None Release note: None 108300: schemachanger: Unskip some backup tests r=Xiang-Gu a=Xiang-Gu Randomly skip subtests in the BACKUP/RESTORE suites before parallelizing them. Epic: None Release note: None 108328: rowexec: fix TestUncertaintyErrorIsReturned under race r=yuzefovich a=yuzefovich We just saw a case when `TestUncertaintyErrorIsReturned` failed under race because we got a different DistSQL plan. This seems plausible in case the range cache population (which the test does explicitly) isn't quick enough for some reason, so this commit allows for the DistSQL plan to match the expectation via `SucceedsSoon` (if we happen to get a bad plan, then the following query execution should have the up-to-date range cache). Fixes: #108250. Release note: None 108341: importer: fix stale comment on mysqlStrToDatum r=mgartner,DrewKimball a=otan Release note: None Epic: None From #108286 (review) 108370: go.mod: bump Pebble to fffe02a195e3 r=RahulAggarwal1016 a=RahulAggarwal1016 fffe02a1 db: simplify ScanInternal() df7e2ae1 vfs: deflake TestDiskHealthChecking_Filesystem ff5c929a Rate Limit Scan Statistics af8c5f27 internal/cache: mark panic messages as redaction-safe Epic: none Release note: none 108379: changefeedccl: deflake TestChangefeedSchemaChangeBackfillCheckpoint r=miretskiy a=jayshrivastava Previously, the test `TestChangefeedSchemaChangeBackfillCheckpoint` would fail because it would read a table span too early. A schema change using the delcarative schema changer will update a table span to point to a new set of ranges. Previously, this test would use the span from before the schema change, which is incorrect. This change makes it use the span from after the schema change. I stress tested this 30k times under the new schema changer and 10k times under the legacy schema changer to ensure the test is not flaky anymore. Fixes: #108084 Release note: None Epic: None Co-authored-by: Shivam Saraf <[email protected]> Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: Xiang Gu <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Rahul Aggarwal <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]>
This commit adds a tech note that goes ahead to explain
how
JSON
will be key encoded in CRDB. It builds on theRFC for
JSONB
encoding and explains the main motivationbehind the requirement for a key encoding in the first
place. The tech note also dives into details pertaining
to the encoding for each of the present
JSON
values:NULL, Strings, Numbers, Boolean, Arrays and Objects
.Lastly, the tech note also consists of examples showing
how
JSON
values are encoded in hopes of simplifying theconcept to a new reader.
Release note: None
Epic: CRDB-24501