-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: support JSONB encoding and decoding for forward indexes #97928
Conversation
3ed3556
to
dae1750
Compare
dae1750
to
0aae184
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
I see that there are some tests in keyside
package that might be applicable, but I think it'd be good to confirm that they do provide sufficient test coverage.
Reviewed 2 of 2 files at r1, 9 of 9 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rytaft, and @Shivs11)
-- commits
line 2 at r1:
Seems like the first commit is a leftover and should just be squashed with the main change.
-- commits
line 46 at r2:
It'd be good to add a link to the Google doc in this commit message, or even better to merge that doc into the cockroach repo in some form. Perhaps you could create new encoding/json.go
file where you'd include most of the functions for the JSON encoding and then include the main idea of the JSON key encoding at the top of that file as a comment.
-- commits
line 50 at r2:
nit: the audience of the Release note
is the docs writers who then use the note to compile into the release notes list that we provide with each release. The release note should explain the user-visible behavior. Does this commit already make it possible for users to create forward indexes on JSON columns? Can we already ORDER BY
JSON columns? If so, then that's what should be mentioned in the release note text, if not, then perhaps the release note should remain None
.
pkg/sql/colexec/sorttopk.eg.go
line 411 at r2 (raw file):
func compareRow_false( t *topKSorter, vecIdx1 int, vecIdx2 int, rowIdx1 int, rowIdx2 int, groupIdx1 int, groupIdx2 int,
Hm, I'm surprised by the changes here. Do you know why this file changed? (It should be auto-generated, and I don't think the corresponding template file has been changed.)
pkg/sql/rowenc/keyside/decode.go
line 122 at r2 (raw file):
case types.JsonFamily: var json json.JSON var buf []byte
nit: let's remove buf
and use rkey
for the remainder of the key as is done in all other cases.
pkg/sql/rowenc/keyside/decode.go
line 129 at r2 (raw file):
} dJson, err = tree.MakeDJSON(json)
nit: you can use tree.NewDJSON
and inline that into a.NewDJSON
call below.
pkg/sql/rowenc/keyside/json.go
line 1 at r2 (raw file):
package keyside
nit: this file is missing the license header. Take a look at this tip https://cockroachlabs.atlassian.net/wiki/spaces/ENG/pages/154206209/Goland+Tips+and+Tricks#Add-License-header-automatically-in-%22Go-file%22-action.
pkg/sql/rowenc/keyside/json.go
line 13 at r2 (raw file):
// encodeJSONKey is responsible for encoding the different JSON // values. func encodeJSONKey(b []byte, json *tree.DJSON, dir encoding.Direction) ([]byte, error) {
nit: it seems like this function can be simplified to
return json.JSON.EncodeForwardIndex(b, dir)
so perhaps we can just remove this encodeJSONKey
altogether?
pkg/sql/rowenc/keyside/json.go
line 37 at r2 (raw file):
} if (typ == encoding.JSONNull) || (typ == encoding.JSONNullDesc) {
nit: perhaps it's a bit cleaner if we use switch typ
instead of if
s, then we'd also add a default
case with an assertion, WDYT?
pkg/sql/rowenc/keyside/json.go
line 40 at r2 (raw file):
jsonVal, err = json.MakeJSON(nil) if err != nil { panic("could not decode JSON Null")
Let's not use panics here given that we already have error
return parameter. Let's just do return nil, nil, errors.NewAssertionErrorWithWrappedErrf(err, "could not decode JSON Null")
and alike.
pkg/sql/rowenc/keyside/json.go
line 54 at r2 (raw file):
} else if typ == encoding.JSONString { var str string buf, str, err = encoding.DecodeUnsafeStringAscending(buf, nil)
In keyside.Decode
we call DecodeUnsafeStringAscendingDeepCopy
for the ascending direction, and I think we need to do that here as well. The rationale is that DecodeUnsafeStringAscending
with nil
second argument might return str
that "aliases" the memory of its input buf
(meaning that it points into the same memory region). That can undesirable since it could prevent the large buf
from being garbage collected, so we always ask for the "deep copy". (We don't need to do that with the descending direction because there we always perform the deep copy anyway.)
pkg/sql/rowenc/keyside/json.go
line 58 at r2 (raw file):
panic("could not decode the JSON string") } buf = buf[1:] //Removing the terminator
nit: for inline comments like this without a period we usually use a lower case as well as a space after the slashes, i.e. something like // removing the terminator
.
pkg/sql/rowenc/keyside/json.go
line 66 at r2 (raw file):
panic("could not decode the JSON string") } buf = buf[1:] //Removing the terminator
Should we add a non-zero length check before removing the terminators?
pkg/sql/rowenc/keyside/json.go
line 96 at r2 (raw file):
} return jsonVal, buf, err
nit: we should've handled all non-nil error cases above, so maybe explicitly do return jsonVal, buf, nil
here.
pkg/sql/rowenc/keyside/json.go
line 100 at r2 (raw file):
func decodeJSONArray(buf []byte, dir encoding.Direction) (json.JSON, []byte, error) { // extracting the total number of elements in the byte array
nit: "byte array"? maybe "json array"?
pkg/sql/rowenc/keyside/json.go
line 104 at r2 (raw file):
buf, length, err := encoding.DecodeJSONArrayOrJSONObjectLength(buf, dir) // Allocating it len(buf) bytes since in the worst case, a JSON array
nit: something sounds off in "Allocating it len(buf) bytes", perhaps we want "Pre-allocate the array builder with 'length' number of JSON elements ...". Also, "len(buf) bytes" seems to contradict the current code, perhaps this comment is stale?
pkg/sql/rowenc/keyside/json.go
line 106 at r2 (raw file):
// Allocating it len(buf) bytes since in the worst case, a JSON array // could consist of only primitive JSON values which take up only one // byte during their encodings (taken up by their respective markers)
nit: missing period.
pkg/sql/rowenc/keyside/json.go
line 116 at r2 (raw file):
if encoding.IsJSONKeyDone(buf, dir) { buf = buf[1:] return jsonArray.Build(), buf, err
nit: err
should always be nil
here.
pkg/sql/rowenc/keyside/json.go
line 127 at r2 (raw file):
func decodeJSONObject(buf []byte, dir encoding.Direction) (json.JSON, []byte, error) { // extracting the total number of elements in the byte array
nit: s/byte array/json object/
.
pkg/sql/rowenc/keyside/json.go
line 162 at r2 (raw file):
} jsonObject.Add(*key, value)
jsonNull.AsText()
will return key=nil
, is it ever possible that jsonKey
was jsonNull
?
pkg/util/encoding/encoding.go
line 1907 at r2 (raw file):
} // getJSONLength returns the length of a key encoded array. The input
nit: s/array/JSON/
.
pkg/util/encoding/encoding.go
line 2003 at r2 (raw file):
// Getting the number of elements present // in the container
nit: missing period.
pkg/util/encoding/encoding.go
line 2006 at r2 (raw file):
numberElems, err := getVarintLen(b) if err != nil { panic("failed to get the number of elements encoded in " +
nit: return an assertion failure error.
pkg/util/encoding/encoding.go
line 3597 at r2 (raw file):
// ValidateAndConsumeJSONKeyMarker checks that the marker at the front // of buf is valid for a given JSON value of the given direction, and // if so. It returns an error if the tag is valid.
nit: missing some words in "and if so". Also probably s/is valid/is invalid/
.
pkg/util/encoding/encoding.go
line 3617 at r2 (raw file):
} } else { return nil, Unknown, errors.Newf("invalid type found %s", typ)
nit: "invalid direction".
pkg/util/json/json.go
line 98 at r2 (raw file):
Size() uintptr EncodeForwardIndex(buf []byte, dir encoding.Direction) ([]byte, error)
nit: a quick comment would be helpful.
pkg/util/json/json.go
line 1875 at r2 (raw file):
buf = encoding.EncodeStringDescending(buf, string(j)) default: panic("invalid direction")
nit: let's use assertions instead of panics here too.
pkg/util/json/json.go
line 1911 at r2 (raw file):
for i := range j { var childBuf []byte
nit: no need to define it if you use :=
below.
pkg/util/json/json.go
line 1918 at r2 (raw file):
} for k := range childBuf {
Let's replace this loop with buf = append(buf, childBuf...)
. Or we could even simplify the whole loop:
var err error
for _, a := range j {
buf, err = a.EncodeForwardIndex(buf, dir)
if err != nil {
return nil, err
}
}
pkg/util/json/json.go
line 1931 at r2 (raw file):
buf = encoding.EncodeJSONArrayOrJSONObjectLength(buf, dir, int64(len(j))) // A jsonObject will already have the k-v pairs sorted according to the keys.
nit: let's add validation of this sorted property behind buildutil.CrdbTestBuild
flag.
pkg/util/json/json.go
line 1932 at r2 (raw file):
// A jsonObject will already have the k-v pairs sorted according to the keys. for i := range j {
This loop can be simplified too as in the comment above.
pkg/sql/rowenc/keyside/keyside_test.go
line 242 at r2 (raw file):
// Only some types are round-trip key encodable. switch typ.Family() { case types.JsonFamily, types.CollatedStringFamily, types.TupleFamily, types.DecimalFamily,
My guess is that JSON objects that have decimal numbers inside are not roundtrip key encodable (i.e, key encoding of '1.0'::JSON
and '1.00'::JSON
should be the same).
I think we also need to modify colinfo.CanHaveCompositeKeyEncoding
to return true
for JSONs. A similar change needs to be done to DJSON
so that it implements CompositeDatum
interface, and IsComposite
would return true if a JSON contains a composite decimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @yuzefovich , thank you for reviewing this! I definitely agree that the testing needs to be more thorough and I shall be submitting a fresh PR that shall increase the overall test coverage.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rytaft, and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Seems like the first commit is a leftover and should just be squashed with the main change.
Done!
Previously, yuzefovich (Yahor Yuzefovich) wrote…
It'd be good to add a link to the Google doc in this commit message, or even better to merge that doc into the cockroach repo in some form. Perhaps you could create new
encoding/json.go
file where you'd include most of the functions for the JSON encoding and then include the main idea of the JSON key encoding at the top of that file as a comment.
Hmm I agree that the document should be more accessible. I was wondering if I could add it in the repo: https://github.com/cockroachdb/cockroach/tree/master/docs/tech-notes
? I see that the RFC for encoding and JSONB valueside encoding lies there too, so for a new reader placing the document here might make it more accessible?
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: the audience of the
Release note
is the docs writers who then use the note to compile into the release notes list that we provide with each release. The release note should explain the user-visible behavior. Does this commit already make it possible for users to create forward indexes on JSON columns? Can we alreadyORDER BY
JSON columns? If so, then that's what should be mentioned in the release note text, if not, then perhaps the release note should remainNone
.
Yes, thank you for pointing this out. I was a little confused if I should let it be None
or have the current message since this commit specifically doesn't enable forward indexing (shall be pushing in a fresh PR for those changes) neither does it enable ordering by the JSON columns. Shall change it to None
then!
pkg/sql/colexec/sorttopk.eg.go
line 411 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Hm, I'm surprised by the changes here. Do you know why this file changed? (It should be auto-generated, and I don't think the corresponding template file has been changed.)
Ah yes, I remember bringing this up with @rytaft as well and I was a little confused why this was occurring. The two of us double checked the configuration for crlfmt
and that seems to be right too. Not sure how this change got here.
pkg/sql/rowenc/keyside/decode.go
line 122 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: let's remove
buf
and userkey
for the remainder of the key as is done in all other cases.
Done!
pkg/sql/rowenc/keyside/decode.go
line 129 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: you can use
tree.NewDJSON
and inline that intoa.NewDJSON
call below.
The changes that I have for this reflect what I think you meant. Feel free to point me in the right direction if this is wrong.
pkg/sql/rowenc/keyside/json.go
line 1 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: this file is missing the license header. Take a look at this tip https://cockroachlabs.atlassian.net/wiki/spaces/ENG/pages/154206209/Goland+Tips+and+Tricks#Add-License-header-automatically-in-%22Go-file%22-action.
Done!
pkg/sql/rowenc/keyside/json.go
line 13 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: it seems like this function can be simplified to
return json.JSON.EncodeForwardIndex(b, dir)
so perhaps we can just remove this
encodeJSONKey
altogether?
Yes, we could. However, I was wondering for readibility purposes it would be better to have both encodeJSONKey
and decodeJSONKey
in the file? My train of thought was that it would help someone in the future navigate through code better? WDYT?
pkg/sql/rowenc/keyside/json.go
line 54 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
In
keyside.Decode
we callDecodeUnsafeStringAscendingDeepCopy
for the ascending direction, and I think we need to do that here as well. The rationale is thatDecodeUnsafeStringAscending
withnil
second argument might returnstr
that "aliases" the memory of its inputbuf
(meaning that it points into the same memory region). That can undesirable since it could prevent the largebuf
from being garbage collected, so we always ask for the "deep copy". (We don't need to do that with the descending direction because there we always perform the deep copy anyway.)
Hmm, this makes sense. Thanks for a great explanation! Changed code to reflect this!
pkg/sql/rowenc/keyside/json.go
line 58 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: for inline comments like this without a period we usually use a lower case as well as a space after the slashes, i.e. something like
// removing the terminator
.
Done!
pkg/sql/rowenc/keyside/json.go
line 66 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Should we add a non-zero length check before removing the terminators?
Hmm, I am struggling to think where we could need this though. I only state this as if the encoding is followed by the proposed algorithm, then a terminator byte will always be present. Moreover, the encoding prior to the terminator byte involves using pre-existing encodings for primitive types (such as Strings, Numbers, Boolean, Null). The same can also be said for Arrays and Objects which follow a similar mechanism.
However, I might be missing a case where this could be required and if this is the case, feel free to let me know!
pkg/sql/rowenc/keyside/json.go
line 96 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: we should've handled all non-nil error cases above, so maybe explicitly do
return jsonVal, buf, nil
here.
Done!
pkg/sql/rowenc/keyside/json.go
line 100 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: "byte array"? maybe "json array"?
Done!
pkg/sql/rowenc/keyside/json.go
line 104 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: something sounds off in "Allocating it len(buf) bytes", perhaps we want "Pre-allocate the array builder with 'length' number of JSON elements ...". Also, "len(buf) bytes" seems to contradict the current code, perhaps this comment is stale?
Ah I must have forgotten to update the comment. The comment is stale. Updated it.
pkg/sql/rowenc/keyside/json.go
line 106 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: missing period.
Done!
pkg/sql/rowenc/keyside/json.go
line 116 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
err
should always benil
here.
Done!
pkg/sql/rowenc/keyside/json.go
line 127 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/byte array/json object/
.
Done!
pkg/sql/rowenc/keyside/json.go
line 162 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
jsonNull.AsText()
will returnkey=nil
, is it ever possible thatjsonKey
wasjsonNull
?
Hmm i see your concern. However, I believe that we cannot have nill
as a key for a given object thereby never having key to be nil
right?
pkg/util/encoding/encoding.go
line 1907 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/array/JSON/
.
Done!
pkg/util/encoding/encoding.go
line 2003 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: missing period.
Done!
pkg/util/encoding/encoding.go
line 2006 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: return an assertion failure error.
Done!
pkg/util/encoding/encoding.go
line 3597 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: missing some words in "and if so". Also probably
s/is valid/is invalid/
.
Done!
pkg/util/encoding/encoding.go
line 3617 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: "invalid direction".
Done!
pkg/util/json/json.go
line 98 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: a quick comment would be helpful.
Done!
pkg/util/json/json.go
line 1875 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: let's use assertions instead of panics here too.
Done!
pkg/util/json/json.go
line 1911 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: no need to define it if you use
:=
below.
I tried removing the definition but I ended up getting an error. This seems to be happening since I am using childBuf
as a function argument
pkg/util/json/json.go
line 1918 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Let's replace this loop with
buf = append(buf, childBuf...)
. Or we could even simplify the whole loop:var err error for _, a := range j { buf, err = a.EncodeForwardIndex(buf, dir) if err != nil { return nil, err } }
If we do go ahead with replacing the loop as per as the current example, won't buf
be overwritten everytime? If we have an array [1, 2]
::JSONB, then won't buf just have the encodings of the element 2
and not 1
?
Moreover, it is interesting to note that I did indeed try using append
earlier on instead of the secondary loop. However, I did not seem to get the right results as it wasn't working.
657a649
to
a9fffd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r3, 3 of 3 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rytaft, and @Shivs11)
Previously, Shivs11 (Shivam) wrote…
Hmm I agree that the document should be more accessible. I was wondering if I could add it in the repo:
https://github.com/cockroachdb/cockroach/tree/master/docs/tech-notes
? I see that the RFC for encoding and JSONB valueside encoding lies there too, so for a new reader placing the document here might make it more accessible?
Yeah, adding the doc as an RFC into docs/RFCS
sounds good to me. That should be done in a separate PR.
pkg/sql/rowenc/keyside/decode.go
line 122 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
Done!
I actually meant removing this line altogether and using var rkey []byte
that is defined outside of the switch above.
pkg/sql/rowenc/keyside/decode.go
line 129 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
The changes that I have for this reflect what I think you meant. Feel free to point me in the right direction if this is wrong.
We can simplify this a bit further:
var json json.JSON
json, rkey, err = decodeJSONKey(key, dir)
if err != nil {
return nil, nil, err
}
d := a.NewDJSON(tree.DJSON{JSON: json})
return d, rkey, err
The difference is that dJson
is tree.Datum
, so when you assign tree.NewDJSON
to dJson
, we lose the concrete type of *tree.DJSON
, and thus you needed to have an explicit conversion.
pkg/sql/rowenc/keyside/json.go
line 1 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
Done!
Still don't see it.
pkg/sql/rowenc/keyside/json.go
line 13 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
Yes, we could. However, I was wondering for readibility purposes it would be better to have both
encodeJSONKey
anddecodeJSONKey
in the file? My train of thought was that it would help someone in the future navigate through code better? WDYT?
I'll leave it up to you to decide whether to keep this helper function or not, but I think we should definitely simplify it - there is no reason for this function to be 8 lines long or so when a single line will do.
pkg/sql/rowenc/keyside/json.go
line 40 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Let's not use panics here given that we already have
error
return parameter. Let's just doreturn nil, nil, errors.NewAssertionErrorWithWrappedErrf(err, "could not decode JSON Null")
and alike.
Let's not lose the original error - let's use errors.NewAssertionErrorWithWrappedErrf(err, "could not decode JSON Null")
- this way we see both the original error (what exactly we couldn't decode) and the additional hint (that we tried to decode JSON Null).
pkg/sql/rowenc/keyside/json.go
line 66 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
Hmm, I am struggling to think where we could need this though. I only state this as if the encoding is followed by the proposed algorithm, then a terminator byte will always be present. Moreover, the encoding prior to the terminator byte involves using pre-existing encodings for primitive types (such as Strings, Numbers, Boolean, Null). The same can also be said for Arrays and Objects which follow a similar mechanism.
However, I might be missing a case where this could be required and if this is the case, feel free to let me know!
I understand that we don't expect buf
to be non-zero because then it would indicate that the encoded value was corrupted.
However, it's good practice to try to handle even unexpected scenarios like this more gracefully than hitting out-of-bounds panic. For example, in DecodeBitArrayDescending
in the middle of the function we can return errBitArrayTerminatorMissing
when the function is given an invalid input (namely, it's missing a terminator). If that function didn't have such a check and an invalid encoding of the BitArray was passed in, it could panic.
pkg/sql/rowenc/keyside/json.go
line 162 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
Hmm i see your concern. However, I believe that we cannot have
nill
as a key for a given object thereby never having key to benil
right?
Yeah, probably you're right.
pkg/sql/rowenc/keyside/json.go
line 89 at r3 (raw file):
var err error var str string buf, _, err = encoding.DecodeJSONValueLength(buf, dir)
I'm surprised that we now encode the length of the string - why do we need it? I'd assume we can just use the key-encoding of regular string.
pkg/util/encoding/encoding.go
line 1907 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
Done!
Sorry, I should've been more clear. The syntax s/foo/bar/
comes from sed
command and means "replace every entry of foo
with bar
.
"key encoded array/JSON" sounds off - what kinds of JSON do we need this function for? Is it "json array or json object"?
pkg/util/json/json.go
line 1911 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
I tried removing the definition but I ended up getting an error. This seems to be happening since I am using
childBuf
as a function argument
Oh yeah, but that seems off - we should just use buf
as the argument and the result. This is related to the discussion below - EncodeForwardIndex
should be appending the encoding of j[i]
into the passed in buffer and then returning updated buffer as the result. We don't need childBuf
at all.
pkg/util/json/json.go
line 1918 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
If we do go ahead with replacing the loop as per as the current example, won't
buf
be overwritten everytime? If we have an array[1, 2]
::JSONB, then won't buf just have the encodings of the element2
and not1
?Moreover, it is interesting to note that I did indeed try using
append
earlier on instead of the secondary loop. However, I did not seem to get the right results as it wasn't working.
Well, the behavior will depend on the contract of EncodeForwardIndex
. I think all encoding methods take in buf []byte
as their first parameter and the result of encoding some value is always appended to buf
, with the result of the append always returned. If so, then yes, buf
will be overwritten on every iteration (or, to be more precise, it'll be appended to on every iteration), so I believe the code snippet I shared should just work.
a9fffd3
to
6f2795a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rytaft, and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Yeah, adding the doc as an RFC into
docs/RFCS
sounds good to me. That should be done in a separate PR.
Hmm, I see. @mgartner and I had a discussion about this and we were thinking of having this under tech-notes
. However, after reviewing the README on RFCS, I think it will probably be better to include the document there since it is a change which offers enhancement to Cockroach. WDYT @mgartner ?
pkg/sql/rowenc/keyside/decode.go
line 122 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I actually meant removing this line altogether and using
var rkey []byte
that is defined outside of the switch above.
My apologies for the misunderstanding. The changes have been made.
pkg/sql/rowenc/keyside/decode.go
line 129 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
We can simplify this a bit further:
var json json.JSON json, rkey, err = decodeJSONKey(key, dir) if err != nil { return nil, nil, err } d := a.NewDJSON(tree.DJSON{JSON: json}) return d, rkey, err
The difference is that
dJson
istree.Datum
, so when you assigntree.NewDJSON
todJson
, we lose the concrete type of*tree.DJSON
, and thus you needed to have an explicit conversion.
Ahh I see what you now mean by the unnecessary conversions that can be saved by an inline as suggested by you. Is much better imo and have made the changes.
pkg/sql/rowenc/keyside/json.go
line 1 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Still don't see it.
Strange. Tried it again.
pkg/sql/rowenc/keyside/json.go
line 13 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I'll leave it up to you to decide whether to keep this helper function or not, but I think we should definitely simplify it - there is no reason for this function to be 8 lines long or so when a single line will do.
Hmm, yes I see your point. I have reduced the length of this helper from 8 to the line you suggested.
pkg/sql/rowenc/keyside/json.go
line 37 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: perhaps it's a bit cleaner if we use
switch typ
instead ofif
s, then we'd also add adefault
case with an assertion, WDYT?
I agree. Changes have been made.
pkg/sql/rowenc/keyside/json.go
line 40 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Let's not lose the original error - let's use
errors.NewAssertionErrorWithWrappedErrf(err, "could not decode JSON Null")
- this way we see both the original error (what exactly we couldn't decode) and the additional hint (that we tried to decode JSON Null).
Yes, I think this is a better wrapper for the original error indeed. I have made the changes in the other case statements as well.
pkg/sql/rowenc/keyside/json.go
line 66 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I understand that we don't expect
buf
to be non-zero because then it would indicate that the encoded value was corrupted.However, it's good practice to try to handle even unexpected scenarios like this more gracefully than hitting out-of-bounds panic. For example, in
DecodeBitArrayDescending
in the middle of the function we can returnerrBitArrayTerminatorMissing
when the function is given an invalid input (namely, it's missing a terminator). If that function didn't have such a check and an invalid encoding of the BitArray was passed in, it could panic.
Ah I see. I have made the changes for having a check with respect to the length of the function. However, I see that terminators for different json
values, such as jsonKeyTerminator
and jsonKeyDescendingTerminator
, are not exported and are defined inside of encoding.go
. In order to replicate the check of the terminator, as done inside of the function DecodeBitArrayDescending
, do you think it is good practice to export these variables? I am unsure if it would be good design for me to go ahead and export them.
pkg/sql/rowenc/keyside/json.go
line 89 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I'm surprised that we now encode the length of the string - why do we need it? I'd assume we can just use the key-encoding of regular string.
Ah yes, that's a good question. In my first version of the code, I had in fact forgotten to encode the length of the string as this was part of my idea as described in the jsonb encoding
document. Moreover, the length of a string is required since shorter strings are lexicographically smaller than larger strings. After the lengths of two strings, as an example, are compared, then the subsequent characters would be compared. This behaviour was also verified in postgres.
pkg/util/encoding/encoding.go
line 1907 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Sorry, I should've been more clear. The syntax
s/foo/bar/
comes fromsed
command and means "replace every entry offoo
withbar
."key encoded array/JSON" sounds off - what kinds of JSON do we need this function for? Is it "json array or json object"?
Hmm you are right. This function will be required by all JSON values, such as Strings, numbers, etc., and is not restricted to only Objects/Arrays. Changes have been made to reflect this in the comment.
pkg/util/json/json.go
line 1918 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Well, the behavior will depend on the contract of
EncodeForwardIndex
. I think all encoding methods take inbuf []byte
as their first parameter and the result of encoding some value is always appended tobuf
, with the result of the append always returned. If so, then yes,buf
will be overwritten on every iteration (or, to be more precise, it'll be appended to on every iteration), so I believe the code snippet I shared should just work.
Ah, I see your point now. Changes have been made!
pkg/util/json/json.go
line 1932 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This loop can be simplified too as in the comment above.
Done!
6f2795a
to
7c713c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rytaft, and @yuzefovich)
pkg/sql/rowenc/keyside/json.go
line 89 at r3 (raw file):
Previously, Shivs11 (Shivam) wrote…
Ah yes, that's a good question. In my first version of the code, I had in fact forgotten to encode the length of the string as this was part of my idea as described in the
jsonb encoding
document. Moreover, the length of a string is required since shorter strings are lexicographically smaller than larger strings. After the lengths of two strings, as an example, are compared, then the subsequent characters would be compared. This behaviour was also verified in postgres.
Commenting here since the previous comment is invalid. Tested this with Postgres as well to verify we don't need to encode the length of a string. Thank you @yuzefovich
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think I only have minor comments / questions left. Seems like the existing tests in keyside
package do cover us already, so probably the action item is to ensure that that coverage is sufficient.
The main remaining comment that's left from me is about composite type stuff. However, it might not be required to be resolved in this PR since we haven't hooked up the usage of this key encoding yet.
Reviewed 2 of 4 files at r5, 2 of 2 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rytaft, and @Shivs11)
pkg/sql/colexec/sorttopk.eg.go
line 411 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
Ah yes, I remember bringing this up with @rytaft as well and I was a little confused why this was occurring. The two of us double checked the configuration for
crlfmt
and that seems to be right too. Not sure how this change got here.
We need to revert this change. Try running ./dev gen go
. Also, we do have a CI check for this which currently fails.
pkg/sql/rowenc/keyside/json.go
line 66 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
Ah I see. I have made the changes for having a check with respect to the length of the function. However, I see that terminators for different
json
values, such asjsonKeyTerminator
andjsonKeyDescendingTerminator
, are not exported and are defined inside ofencoding.go
. In order to replicate the check of the terminator, as done inside of the functionDecodeBitArrayDescending
, do you think it is good practice to export these variables? I am unsure if it would be good design for me to go ahead and export them.
I think exporting those terminators is not the best way forward. Perhaps we should define a couple of exported helper method in encoding
package, like DecodeJSONNumberAscending
and alike which would internally call DecodeDecimalAscending
to get the decimal but also would check that there is the correct terminator after the decimal and remove it too. This way we don't need to export such low level details.
Alternatively, perhaps we could provide encoding.IsJSONKeyTerminator
method and use it in keyside
package. I probably like this approach more.
pkg/sql/rowenc/keyside/json.go
line 98 at r6 (raw file):
func decodeJSONString(buf []byte, dir encoding.Direction) (json.JSON, []byte, error) { // extracting the total number of elements in the byte array
nit: leftover comment.
pkg/sql/rowenc/keyside/json.go
line 109 at r6 (raw file):
} if err != nil { panic("could not decode the JSON string")
nit: there are a couple more panics here.
pkg/sql/rowenc/keyside/json.go
line 111 at r6 (raw file):
panic("could not decode the JSON string") } buf = buf[1:] // removing the terminator
nit: we should add a length check here too.
pkg/sql/rowenc/keyside/json.go
line 173 at r6 (raw file):
// should be a valid JSON value. jsonKey, buf, err = decodeJSONKey(buf, dir)
nit: seems like unnecessary empty line.
pkg/util/encoding/encoding.go
line 1909 at r6 (raw file):
// getJSONLength returns the length of a key encoded JSON. // The input must have had the type marker stripped from the front. func getJSONLength(buf []byte, dir Direction) (int, error) {
nit: this method is extremely similar to getArrayLength
, should we unify them to avoid the copy paste? We'd only need to provide two additional arguments.
pkg/util/encoding/encoding.go
line 1912 at r6 (raw file):
result := 0 for { if IsJSONKeyDone(buf, dir) {
nit: missing len(buf) == 0
check here.
pkg/util/json/json.go
line 98 at r6 (raw file):
Size() uintptr // EncodeForwardIndex implements forward indexing for JSONB values.
nit: perhaps mention explicitly that the key encoding is appended to buf
and the result is returned.
pkg/sql/rowenc/keyside/keyside_test.go
line 242 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
My guess is that JSON objects that have decimal numbers inside are not roundtrip key encodable (i.e, key encoding of
'1.0'::JSON
and'1.00'::JSON
should be the same).I think we also need to modify
colinfo.CanHaveCompositeKeyEncoding
to returntrue
for JSONs. A similar change needs to be done toDJSON
so that it implementsCompositeDatum
interface, andIsComposite
would return true if a JSON contains a composite decimal.
This comment still seems relevant.
Perhaps the existing tests in keyside
package don't create such JSONs so that the test doesn't fail. It would actually be good to extend randomJSONNumber
in json
package to rarely return decimal like 1.00
with random number of zeroes. My guess if we make that change, then TestEncodeDecode
will start failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Did a very brief skim. Sorry I won't have time to review it in detail before I head OOO today, so no need to wait on me for sign-off.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @Shivs11)
pkg/sql/rowenc/keyside/json.go
line 120 at r6 (raw file):
func decodeJSONArray(buf []byte, dir encoding.Direction) (json.JSON, []byte, error) { // extracting the total number of elements in the json array
nit: start comment with capital letter and end with period. (here and in several other places)
pkg/util/encoding/encoding.go
line 3604 at r6 (raw file):
if (typ == JSONNullDesc) || (typ == JSONNumberDesc) || (typ == JSONStringDesc) || (typ == JSONFalseDesc) || (typ == JSONTrueDesc) || (typ == JSONArrayDesc) || (typ == JSONObjectDesc) {
nit: you can use a switch
here and below.
e.g.:
switch typ {
case JSONNullDesc, JSONNumberDesc ... JSONObjectDesc:
...
default:
...
}
Can also use an outer switch for dir
7c713c2
to
435a4bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
pkg/sql/rowenc/keyside/json.go
line 66 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think exporting those terminators is not the best way forward. Perhaps we should define a couple of exported helper method in
encoding
package, likeDecodeJSONNumberAscending
and alike which would internally callDecodeDecimalAscending
to get the decimal but also would check that there is the correct terminator after the decimal and remove it too. This way we don't need to export such low level details.Alternatively, perhaps we could provide
encoding.IsJSONKeyTerminator
method and use it inkeyside
package. I probably like this approach more.
Oh I see. I have already defined a method, even in my previous commits, called IsJSONKeyDone
which provides the same functionality. Shall be using this.
pkg/sql/rowenc/keyside/json.go
line 98 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: leftover comment.
Done!
pkg/sql/rowenc/keyside/json.go
line 109 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: there are a couple more panics here.
Done!
pkg/sql/rowenc/keyside/json.go
line 111 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: we should add a length check here too.
Done!
pkg/sql/rowenc/keyside/json.go
line 173 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: seems like unnecessary empty line.
Removed.
pkg/util/encoding/encoding.go
line 1909 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: this method is extremely similar to
getArrayLength
, should we unify them to avoid the copy paste? We'd only need to provide two additional arguments.
I totally agree. Changes have been made.
pkg/util/encoding/encoding.go
line 1912 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: missing
len(buf) == 0
check here.
Done!
pkg/util/json/json.go
line 98 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: perhaps mention explicitly that the key encoding is appended to
buf
and the result is returned.
Done!
pkg/sql/rowenc/keyside/keyside_test.go
line 242 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This comment still seems relevant.
Perhaps the existing tests in
keyside
package don't create such JSONs so that the test doesn't fail. It would actually be good to extendrandomJSONNumber
injson
package to rarely return decimal like1.00
with random number of zeroes. My guess if we make that change, thenTestEncodeDecode
will start failing.
Hmm, that's an interesting point for sure. Changes have been made as this point is relevant and the need for composite encoding was also discussed offline. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
pkg/sql/colexec/sorttopk.eg.go
line 411 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
We need to revert this change. Try running
./dev gen go
. Also, we do have a CI check for this which currently fails.
I tried running the command but I don't see the changes being made.
435a4bc
to
55d3a4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @Shivs11)
pkg/sql/colexec/sorttopk.eg.go
line 411 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
I tried running the command but I don't see the changes being made.
I'm not sure what's up with this. Here is my GoLand config:
When I checkout your branch, it reverts the changes in this file if I run ./dev gen execgen
or ./dev gen go
. If nothing else works, consider just doing a git reset
on this file and not opening it up in the GoLand editor. The changes in this file will definitely need to be reverted before this can merge.
pkg/sql/rowenc/keyside/json.go
line 69 at r8 (raw file):
return nil, nil, errors.New("cannot find JSON terminator") } if !(encoding.IsJSONKeyDone(buf, dir)) {
nit: no need for parenthesis, i.e. can just do !encoding.IsJSONKeyDone...
. Also, perhaps just squash this check with the one above into a single if
.
pkg/sql/sem/tree/datum.go
line 3619 at r8 (raw file):
switch d.JSON.Type() { case json.NumberJSONType: return true
We should only return true
here if wrapped decimal is composite (which is only the case for -0
and with zeroes in the end. We should extract a helper from DDecimal.IsComposite
and use it here.
pkg/sql/sem/tree/datum.go
line 3620 at r8 (raw file):
case json.NumberJSONType: return true case json.ArrayJSONType:
I think we also need to recurse into the values of JSON object type.
pkg/sql/sem/tree/datum.go
line 3628 at r8 (raw file):
} } default:
nit: this default
is redundant with the return below. I think we should either avoid the return (by having the return
in the end of the ArrayJSONType case) or remove the default case.
pkg/util/encoding/encoding.go
line 1885 at r8 (raw file):
func getArrayOrJSONLength(buf []byte, dir Direction, isJSON bool) (int, error) { result := 0 f := IsArrayKeyDone
nit: consider passing this function as the argument rather than boolean. This will make the callsites of getArrayOrJSONLength
more clear. (Generally speaking, we try to avoid bool
arguments, and in this particular case it's extremely easy to do.)
pkg/util/json/encoded.go
line 588 at r8 (raw file):
func (j *jsonEncoded) AsArray() ([]JSON, bool) { return nil, false
Shouldn't we decode it and check whether it's an array?
2bff923
to
83e3d4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
pkg/sql/colexec/sorttopk.eg.go
line 411 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I'm not sure what's up with this. Here is my GoLand config:
When I checkout your branch, it reverts the changes in this file if I run
./dev gen execgen
or./dev gen go
. If nothing else works, consider just doing agit reset
on this file and not opening it up in the GoLand editor. The changes in this file will definitely need to be reverted before this can merge.
Yes, it's strange. The configurations are similar on mine and were verified by @rytaft too. Shall be doing a git reset
file then.
pkg/sql/rowenc/index_encoding.go
line 1449 at r10 (raw file):
// Inverted indexes on a composite type (i.e. an array of composite types) // should not add the indexed column to the value. if index.GetType() == descpb.IndexDescriptor_INVERTED && id == index.InvertedColumnID() {
This change was added as it was breaking unit tests present in inverted_joiner_test
.
pkg/sql/rowenc/keyside/json.go
line 120 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: start comment with capital letter and end with period. (here and in several other places)
Done!
pkg/sql/rowenc/keyside/json.go
line 69 at r8 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: no need for parenthesis, i.e. can just do
!encoding.IsJSONKeyDone...
. Also, perhaps just squash this check with the one above into a singleif
.
Done!
pkg/sql/sem/tree/datum.go
line 3619 at r8 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
We should only return
true
here if wrapped decimal is composite (which is only the case for-0
and with zeroes in the end. We should extract a helper fromDDecimal.IsComposite
and use it here.
Oh some of my changes seem to be lost in a stash since I had this earlier on...but yes I do agree with this. Changes have been made. The idea that I had was converting the current JSON Number
into a DDecimal
and then calling IsComposite
on it.
pkg/sql/sem/tree/datum.go
line 3620 at r8 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think we also need to recurse into the values of JSON object type.
Done!
pkg/sql/sem/tree/datum.go
line 3628 at r8 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: this
default
is redundant with the return below. I think we should either avoid the return (by having thereturn
in the end of the ArrayJSONType case) or remove the default case.
I was getting a nogo
error when I had no return
and just a default
earlier on. I have switched this to now have only a return
.
pkg/util/encoding/encoding.go
line 3604 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: you can use a
switch
here and below.
e.g.:switch typ { case JSONNullDesc, JSONNumberDesc ... JSONObjectDesc: ... default: ... }
Can also use an outer switch for
dir
Done!
pkg/util/encoding/encoding.go
line 1885 at r8 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: consider passing this function as the argument rather than boolean. This will make the callsites of
getArrayOrJSONLength
more clear. (Generally speaking, we try to avoidbool
arguments, and in this particular case it's extremely easy to do.)
Hmm, this is a great point. Yes, passing functions as parameters would be better design. Changes have been made!
pkg/util/json/encoded.go
line 588 at r8 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Shouldn't we decode it and check whether it's an array?
Ah, you are right this is a wrapper. Changes have been made.
pkg/util/json/json.go
line 1931 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: let's add validation of this sorted property behind
buildutil.CrdbTestBuild
flag.
Struggling to understand what you exactly mean by "adding the validation of the sorted property behind the flag"
Does this mean adding it as a comment where the flag has been defined?
Currently, cockroachdb#97928 outlines the scheme for JSONB encoding and decoding for forward indexes. However, the PR doesn't enable this feature to our users. This current PR aims to allow forward indexes on JSONB columns. The presence of a lexicographical ordering, as described in cockroachdb#97928, shall now allow primary and secondary indexes on JSONB columns along with the ability to use ORDER BY filter in their queries. Additionally, JSON values consist of decimal numbers and containers, such as Arrays and Objects, which can contain these decimal numbers. In order to preserve the values after the decimal, JSONB columns are now required to be composite in nature. This shall enable such values to be stored in both the key and the value side of a K/V pair in hopes of receiving the exact value. Fixes: cockroachdb#35706 Release note (sql change): This PR adds support for enabling forward indexes and ordering on JSON values. Epic: CRDB-24501
Currently, cockroachdb#97928 outlines the scheme for JSONB encoding and decoding for forward indexes. However, the PR doesn't enable this feature to our users. This current PR aims to allow forward indexes on JSONB columns. The presence of a lexicographical ordering, as described in cockroachdb#97928, shall now allow primary and secondary indexes on JSONB columns along with the ability to use ORDER BY filter in their queries. Additionally, JSON values consist of decimal numbers and containers, such as Arrays and Objects, which can contain these decimal numbers. In order to preserve the values after the decimal, JSONB columns are now required to be composite in nature. This shall enable such values to be stored in both the key and the value side of a K/V pair in hopes of receiving the exact value. Release note (sql change): This PR adds support for enabling forward indexes and ordering on JSON values. Epic: CRDB-24501
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is defined here: cockroachdb#99849 However, Postgres currently sorts the empty JSON array before any other JSON values. An issue has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org Thus, this PR intends on replicating this behaviour until the issue has been identified and resolved by Postgres. Epic: CRDB-24501 Release note (sql change): This PR shall now place the empty JSON array to have the lowest (highest) precedence when the JSON column is being sorted in ascending (descending) order.
Currently, cockroachdb#97928 outlines the scheme for JSONB encoding and decoding for forward indexes. However, the PR doesn't enable this feature to our users. This current PR aims to allow forward indexes on JSONB columns. The presence of a lexicographical ordering, as described in cockroachdb#97928, shall now allow primary and secondary indexes on JSONB columns along with the ability to use ORDER BY filter in their queries. Additionally, JSON values consist of decimal numbers and containers, such as Arrays and Objects, which can contain these decimal numbers. In order to preserve the values after the decimal, JSONB columns are now required to be composite in nature. This shall enable such values to be stored in both the key and the value side of a K/V pair in hopes of receiving the exact value. Fixes: cockroachdb#35706 Release note (sql change): This PR adds support for enabling forward indexes and ordering on JSON values. Epic: CRDB-24501
Currently, cockroachdb#97928 outlines the scheme for JSONB encoding and decoding for forward indexes. However, the PR doesn't enable this feature to our users. This current PR aims to allow forward indexes on JSONB columns. The presence of a lexicographical ordering, as described in cockroachdb#97928, shall now allow primary and secondary indexes on JSONB columns along with the ability to use ORDER BY filter in their queries. Additionally, JSON values consist of decimal numbers and containers, such as Arrays and Objects, which can contain these decimal numbers. In order to preserve the values after the decimal, JSONB columns are now required to be composite in nature. This shall enable such values to be stored in both the key and the value side of a K/V pair in hopes of receiving the exact value. Fixes: cockroachdb#35706 Release note (sql change): This PR adds support for enabling forward indexes and ordering on JSON values. Epic: CRDB-24501
99275: sql: enabling forward indexes and ORDERBY on JSONB columns r=celiala a=Shivs11 Currently, #97928 outlines the scheme for JSONB encoding and decoding for forward indexes. However, the PR doesn't enable this feature to our users. This current PR aims to allow forward indexes on JSONB columns. The presence of a lexicographical ordering, as described in #97928, shall now allow primary and secondary indexes on JSONB columns along with the ability to use `ORDER BY` filter in their queries. Additionally, JSON values consist of decimal numbers and containers, such as Arrays and Objects, which can contain these decimal numbers. In order to preserve the values after the decimal, JSONB columns are now required to be composite in nature. This shall enable such values to be stored in both the key and the value side of a K/V pair in hopes of receiving the exact value. Fixes: #35706 Release note (sql change): This PR adds support for enabling forward indexes and ordering on JSON values. Epic: [CRDB-24501](https://cockroachlabs.atlassian.net/browse/CRDB-24501) 100942: kvserver: add metrics to track snapshot queue size r=kvoli a=miraradeva Previously, we had metrics to track the number of snapshots waiting in the snapshot queue; however, snapshots may be of different sizes, so it is also helpful to track the size of all snapshots in the queue. This change adds the following metrics to track the total size of all snapshots waiting in the queue: range.snapshots.send-queue-bytes range.snapshots.recv-queue-bytes Informs: #85528 Release note (ops change): Added two new metrics, range.snapshots.(send|recv)-queue-bytes, to track the total size of all snapshots waiting in the snapshot queue. 101220: roachtest: prevent shared mutable state across c2c roachtest runs r=benbardin a=msbutler Previously, all `c2c/*` roachtests run with `--count` would provide incomprehensible results because multiple roachtest runs of the same test would override each other's state. Specifically, the latest call of `test_spec.Run()`, would override the `test.Test` harness, and `syncedCluster.Cluster` used by all other tests with the same registration. This patch fixes this problem by moving all fields in `replicationSpec` that are set during test execution (i.e. a `test_spec.Run` call), to a new `replicationDriver` struct. Now, `replicationSpec` gets defined during test registration and is shared across test runs, while `replicationDriver` gets set within a test run. Epic: None Release note: None Co-authored-by: Shivam Saraf <[email protected]> Co-authored-by: Mira Radeva <[email protected]> Co-authored-by: Michael Butler <[email protected]>
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is defined here: cockroachdb#99849 However, Postgres currently sorts the empty JSON array before any other JSON values. An issue has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org Thus, this PR intends on replicating this behaviour until the issue has been identified and resolved by Postgres. Epic: CRDB-24501 Release note (sql change): This PR shall now place the empty JSON array to have the lowest (highest) precedence when the JSON column is being sorted in ascending (descending) order.
Previously, cockroachdb#99275 and cockroachdb#97928 were responsible for laying the foundations of a JSON lexicographical order as well as enabling JSON forward indexes in CRDB. However, these commits did not account for version gates. In a given cluster, all of its nodes are required to be at a certain version number for creating JSON forward indexes as well as for ordering JSON columns. Thus, this PR aims to enable version gates for ensuring all nodes in a given cluster meet this requirement. Epic: CRDB-24501 Fixes: cockroachdb#35706 Release note (sql change): This PR adds version gates which requires all nodes in a given cluster to have a minimum binary version number which in turn is required for creating forward indexes on JSON columns and for ordering JSON columns.
Previously, cockroachdb#99275 and cockroachdb#97928 were responsible for laying the foundations of a JSON lexicographical order as well as enabling JSON forward indexes in CRDB. However, these commits did not account for version gates. In a given cluster, all of its nodes are required to be at a certain version number for creating JSON forward indexes as well as for ordering JSON columns. Thus, this PR aims to enable version gates for ensuring all nodes in a given cluster meet this requirement. Epic: CRDB-24501 Fixes: cockroachdb#35706 Release note (sql change): This PR adds version gates which requires all nodes in a given cluster to have a minimum binary version number which in turn is required for creating forward indexes on JSON columns and for ordering JSON columns.
Previously, cockroachdb#99275 and cockroachdb#97928 were responsible for laying the foundations of a JSON lexicographical order as well as enabling JSON forward indexes in CRDB. However, these commits did not account for version gates. In a given cluster, all of its nodes are required to be at a certain version number for creating JSON forward indexes as well as for ordering JSON columns. Thus, this PR aims to enable version gates for ensuring all nodes in a given cluster meet this requirement. Epic: CRDB-24501 Fixes: cockroachdb#35706 Release note (sql change): This PR adds version gates which requires all nodes in a given cluster to have a minimum binary version number which in turn is required for creating forward indexes on JSON columns and for ordering JSON columns.
Previously, cockroachdb#99275 and cockroachdb#97928 were responsible for laying the foundations of a JSON lexicographical order as well as enabling JSON forward indexes in CRDB. However, these commits did not account for version gates. In a given cluster, all of its nodes are required to be at a certain version number for creating JSON forward indexes as well as for ordering JSON columns. Thus, this PR aims to enable version gates for ensuring all nodes in a given cluster meet this requirement. Epic: CRDB-24501 Fixes: cockroachdb#35706 Release note (sql change): This PR adds version gates which requires all nodes in a given cluster to have a minimum binary version number which in turn is required for creating forward indexes on JSON columns and for ordering JSON columns.
101932: sql: adding version gates for changes related to JSON forward indexes r=fqazi a=Shivs11 Previously, #99275 and #97928 were responsible for laying the foundations of a JSON lexicographical order as well as enabling JSON forward indexes in CRDB. However, these commits did not account for version gates. In a given cluster, all of its nodes are required to be at a certain version number for creating JSON forward indexes as well as for ordering JSON columns. Thus, this PR aims to enable version gates for ensuring all nodes in a given cluster meet this requirement. Epic: [CRDB-24501](https://cockroachlabs.atlassian.net/browse/CRDB-24501) Fixes: #35706 Release note (sql change): This PR adds version gates which requires all nodes in a given cluster to have a minimum binary version number which in turn is required for creating forward indexes on JSON columns and for ordering JSON columns. Co-authored-by: Shivam Saraf <[email protected]>
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is in cockroachdb#99849. However, Postgres currently sorts the empty JSON array before any other JSON values. A Postgres bug report for this has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org This PR intends on replicating the Postgres behavior. Fixes cockroachdb#105668 Epic: CRDB-24501 Release note: None
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is in cockroachdb#99849. However, Postgres currently sorts the empty JSON array before any other JSON values. A Postgres bug report for this has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org This PR intends on replicating the Postgres behavior. Fixes cockroachdb#105668 Epic: CRDB-24501 Release note: None
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is in cockroachdb#99849. However, Postgres currently sorts the empty JSON array before any other JSON values. A Postgres bug report for this has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org This PR intends on replicating the Postgres behavior. Fixes cockroachdb#105668 Epic: CRDB-24501 Release note: None
101260: sql: replicating JSON empty array ordering found in Postgres r=mgartner a=Shivs11 Currently, #97928 and #99275 are responsible for laying out a lexicographical ordering for JSON columns to be forward indexable in nature. This ordering is based on the rules posted by Postgres and is in #99849. However, Postgres currently sorts the empty JSON array before any other JSON values. A Postgres bug report for this has been opened: https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org This PR intends on replicating the Postgres behavior. Fixes #105668 Epic: CRDB-24501 Release note: None 108160: roachtest/awsdms: run once a week instead r=Jeremyyang920 a=otan Save a bit of mad dosh by running awsdms once a weekly instead of daily. We don't need this tested every week. Epic: None Release note: None 108300: schemachanger: Unskip some backup tests r=Xiang-Gu a=Xiang-Gu Randomly skip subtests in the BACKUP/RESTORE suites before parallelizing them. Epic: None Release note: None 108328: rowexec: fix TestUncertaintyErrorIsReturned under race r=yuzefovich a=yuzefovich We just saw a case when `TestUncertaintyErrorIsReturned` failed under race because we got a different DistSQL plan. This seems plausible in case the range cache population (which the test does explicitly) isn't quick enough for some reason, so this commit allows for the DistSQL plan to match the expectation via `SucceedsSoon` (if we happen to get a bad plan, then the following query execution should have the up-to-date range cache). Fixes: #108250. Release note: None 108341: importer: fix stale comment on mysqlStrToDatum r=mgartner,DrewKimball a=otan Release note: None Epic: None From #108286 (review) 108370: go.mod: bump Pebble to fffe02a195e3 r=RahulAggarwal1016 a=RahulAggarwal1016 fffe02a1 db: simplify ScanInternal() df7e2ae1 vfs: deflake TestDiskHealthChecking_Filesystem ff5c929a Rate Limit Scan Statistics af8c5f27 internal/cache: mark panic messages as redaction-safe Epic: none Release note: none 108379: changefeedccl: deflake TestChangefeedSchemaChangeBackfillCheckpoint r=miretskiy a=jayshrivastava Previously, the test `TestChangefeedSchemaChangeBackfillCheckpoint` would fail because it would read a table span too early. A schema change using the delcarative schema changer will update a table span to point to a new set of ranges. Previously, this test would use the span from before the schema change, which is incorrect. This change makes it use the span from after the schema change. I stress tested this 30k times under the new schema changer and 10k times under the legacy schema changer to ensure the test is not flaky anymore. Fixes: #108084 Release note: None Epic: None Co-authored-by: Shivam Saraf <[email protected]> Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: Xiang Gu <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Rahul Aggarwal <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]>
Currently, it is not possible to create a primary and a secondary index
on a JSON column in CRDB. This is because forward indexes on JSONB
columns have not yet been implemented due to a lack of a valid
lexicographic ordering.
To address this, a key encoding strategy was developed for each JSON
value. In order to maintain a lexicographical ordering of the encodings
of JSON values, different marker bytes were defined in an order similar
to the order defined for these JSON values. Encodings for primitive JSON
values, such as Null, False and True, only consist of their marker
bytes.
e.g: To encode a JSON False:
enc(false::JSONB) = [JSONB_Null_Tag]
Encodings for JSON Strings and Numbers consist of a concatenation of
their respective marker bytes, the encoding of the string or the number
in consideration and a terminator byte to indicate that the encoding for
the JSON value has ended.
e.g: To encode a JSON String '"a"':
enc('"a"'::JSONB) = [JSONB_String_Tag, enc("a"), JSONB_Terminator_Tag]
Encodings for JSON Arrays and Objects consist of a concatenation of
their respective marker bytes, the total number of elements/key-value
pairs present within the container, the encodings of the elements
present in the container followed by a terminator tag to
indicate the encoding for the given JSON container has ended.
e.g: To encode a JSON array '["a"]':
enc('["a"]'::JSONB) = [JSONB_Array_Tag, enc(1), JSONB_String_Tag, enc(a), JSONB_Terminator_Tag, JSONB_Terminator_Tag]
Note
: This PR shall not allow users to create any type of indexon a JSON column neither will it allow users to
ORDER BY
JSON. This PR is intended to only demonstrate how JSON will
be key encoded and decoded. Moreover, support for making JSON
columns composite in nature will also be present in a different
PR.
Epic: CRDB-24501
Release note: None