Skip to content

Commit

Permalink
sql: replicating JSON empty array ordering found in Postgres
Browse files Browse the repository at this point in the history
Currently, cockroachdb#97928 and cockroachdb#99275 are responsible for laying out a
lexicographical ordering for JSON columns to be forward indexable in
nature. This ordering is based on the rules posted by Postgres and is
in cockroachdb#99849.

However, Postgres currently sorts the empty JSON array before any other
JSON values. A Postgres bug report for this has been opened:
https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org

This PR intends on replicating the Postgres behavior.

Fixes cockroachdb#105668

Epic: CRDB-24501

Release note: None
  • Loading branch information
Shivs11 authored and mgartner committed Aug 8, 2023
1 parent 1382b26 commit 056c300
Show file tree
Hide file tree
Showing 9 changed files with 133 additions and 50 deletions.
10 changes: 9 additions & 1 deletion docs/tech-notes/jsonb_forward_indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,18 @@ The following rules were kept in mind while designing this form of encoding, as
5. Objects with an equal number of key value pairs are compared in the order:
`key1`, `value1`, `key2`, `value2`, ….

**NOTE:** There is one exception to these rules, which is neither documented by
Postgres, nor mentioned in the source code: empty arrays are the minimum JSON
value. As far as we can tell, this is a Postgres bug that has existed for some
time. We've decided to replicate this behavior to remain consistent with
Postgres. We've filed a [Postgres bug report](https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org)
to track the issue.

In order to satisfy property 1 at all times, tags are defined in an increasing order of bytes.
These tags will also have to be defined in a way where the tag representing an object is a large byte representation
for a hexadecimal value (such as 0xff) and the subsequent objects have a value 1 less than the previous one,
where the ordering is described in point 1 above.
where the ordering is described in point 1 above. There is a special tag for empty JSON arrays
in order to handle the special case of empty arrays being ordered before all other JSON values.

Additionally, tags representing terminators will also be defined. There will be two terminators, one for the ascending designation and the other for the descending one, and will be required to denote the end of a key encoding of the following JSON values: Objects, Arrays, Number and Strings. JSON Boolean and JSON Null are not required to have the terminator since they do not have variable length encoding due to the presence of a single tag (as explained later in this document).

Expand Down
22 changes: 11 additions & 11 deletions pkg/sql/logictest/testdata/logic_test/json_index
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ INSERT INTO t VALUES
query T
SELECT x FROM t ORDER BY x
----
[]
"a"
"aa"
"abcdefghi"
"b"
1
100
[]
{"a": "b"}


Expand All @@ -38,13 +38,13 @@ INSERT INTO t VALUES
query T
SELECT x FROM t@t_pkey ORDER BY x
----
[]
"a"
"aa"
"abcdefghi"
"b"
1
100
[]
{"a": "b"}

# Use the index for point lookups.
Expand Down Expand Up @@ -77,12 +77,12 @@ query T
SELECT x FROM t@t_pkey WHERE x > '1' ORDER BY x
----
100
[]
{"a": "b"}

query T
SELECT x FROM t@t_pkey WHERE x < '1' ORDER BY x
----
[]
"a"
"aa"
"abcdefghi"
Expand All @@ -92,12 +92,12 @@ SELECT x FROM t@t_pkey WHERE x < '1' ORDER BY x
query T
SELECT x FROM t@t_pkey WHERE x > '1' OR x < '1' ORDER BY x
----
[]
"a"
"aa"
"abcdefghi"
"b"
100
[]
{"a": "b"}

query T
Expand All @@ -109,12 +109,12 @@ query T
SELECT x FROM t@t_pkey WHERE x > '1' OR x < '1' ORDER BY x DESC
----
{"a": "b"}
[]
100
"b"
"abcdefghi"
"aa"
"a"
[]

# Adding more primitive JSON values.
statement ok
Expand All @@ -129,6 +129,7 @@ INSERT INTO t VALUES
query T
SELECT x FROM t@t_pkey ORDER BY x
----
[]
null
"Testing Punctuation?!."
"a"
Expand All @@ -141,18 +142,17 @@ null
100
false
true
[]
{"a": "b"}

query T
SELECT x FROM t@t_pkey WHERE x > 'true' ORDER BY x
----
[]
{"a": "b"}

query T
SELECT x FROM t@t_pkey WHERE x < 'false' ORDER BY x
----
[]
null
"Testing Punctuation?!."
"a"
Expand Down Expand Up @@ -330,12 +330,12 @@ query T
SELECT x FROM t@t_pkey ORDER BY x
----
NULL
[]
null
"crdb"
1
false
true
[]
[1, 2, 3]
{}
{"a": "b", "c": "d"}
Expand All @@ -346,24 +346,24 @@ SELECT x FROM t@t_pkey ORDER BY x DESC
{"a": "b", "c": "d"}
{}
[1, 2, 3]
[]
true
false
1
"crdb"
null
[]
NULL

# Test to show JSON Null is different from NULL.
query T
SELECT x FROM t@t_pkey WHERE x IS NOT NULL ORDER BY x
----
[]
null
"crdb"
1
false
true
[]
[1, 2, 3]
{}
{"a": "b", "c": "d"}
Expand Down Expand Up @@ -446,12 +446,12 @@ INSERT INTO t VALUES
query T
SELECT x FROM t@i ORDER BY x;
----
[]
null
"crdb"
1
false
true
[]
[null]
[1]
[{"a": "b"}]
Expand Down
14 changes: 7 additions & 7 deletions pkg/sql/opt/exec/execbuilder/testdata/json
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ vectorized: true
• scan
missing stats
table: t@t_pkey
spans: [/'null' - /'null'] [/'""' - /'""'] [/'[]' - /'[]'] [/'{}' - /'{}']
spans: [/'[]' - /'[]'] [/'null' - /'null'] [/'""' - /'""'] [/'{}' - /'{}']

# Multicolumn index, including JSONB

Expand Down Expand Up @@ -252,20 +252,20 @@ INSERT INTO composite VALUES (1, '1.00'::JSONB), (2, '1'::JSONB), (3, '2'::JSONB
(4, '3.0'::JSONB), (5, '"a"'::JSONB)
----
CPut /Table/108/1/1/0 -> /TUPLE/
InitPut /Table/108/2/"G*\x02\x00\x00\x89\x88" -> /BYTES/0x2f0f0c200000002000000403348964
InitPut /Table/108/2/"H*\x02\x00\x00\x89\x88" -> /BYTES/0x2f0f0c200000002000000403348964
CPut /Table/108/1/2/0 -> /TUPLE/
InitPut /Table/108/2/"G*\x02\x00\x00\x8a\x88" -> /BYTES/
InitPut /Table/108/2/"H*\x02\x00\x00\x8a\x88" -> /BYTES/
CPut /Table/108/1/3/0 -> /TUPLE/
InitPut /Table/108/2/"G*\x04\x00\x00\x8b\x88" -> /BYTES/
InitPut /Table/108/2/"H*\x04\x00\x00\x8b\x88" -> /BYTES/
CPut /Table/108/1/4/0 -> /TUPLE/
InitPut /Table/108/2/"G*\x06\x00\x00\x8c\x88" -> /BYTES/0x2f0f0c20000000200000040334891e
InitPut /Table/108/2/"H*\x06\x00\x00\x8c\x88" -> /BYTES/0x2f0f0c20000000200000040334891e
CPut /Table/108/1/5/0 -> /TUPLE/
InitPut /Table/108/2/"F\x12a\x00\x01\x00\x8d\x88" -> /BYTES/
InitPut /Table/108/2/"G\x12a\x00\x01\x00\x8d\x88" -> /BYTES/

query T kvtrace
SELECT j FROM composite where j = '1.00'::JSONB
----
Scan /Table/108/2/"G*\x02\x00\x0{0"-1"}
Scan /Table/108/2/"H*\x02\x00\x0{0"-1"}

query T
SELECT j FROM composite ORDER BY j;
Expand Down
2 changes: 1 addition & 1 deletion pkg/sql/rowenc/keyside/json.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ func decodeJSONKey(buf []byte, dir encoding.Direction) (json.JSON, []byte, error
}
buf = buf[1:] // removing the terminator
jsonVal = json.FromDecimal(dec)
case encoding.JSONArray, encoding.JSONArrayDesc:
case encoding.JSONArray, encoding.JSONArrayDesc, encoding.JsonEmptyArray, encoding.JsonEmptyArrayDesc:
jsonVal, buf, err = decodeJSONArray(buf, dir)
if err != nil {
return nil, nil, errors.NewAssertionErrorWithWrappedErrf(err, "could not decode JSON Array")
Expand Down
59 changes: 40 additions & 19 deletions pkg/util/encoding/encoding.go
Original file line number Diff line number Diff line change
Expand Up @@ -107,13 +107,18 @@ const (

// Defining different key markers, for the ascending designation,
// for handling different JSON values.
jsonNullKeyMarker = voidMarker + 1
jsonStringKeyMarker = jsonNullKeyMarker + 1
jsonNumberKeyMarker = jsonStringKeyMarker + 1
jsonFalseKeyMarker = jsonNumberKeyMarker + 1
jsonTrueKeyMarker = jsonFalseKeyMarker + 1
jsonArrayKeyMarker = jsonTrueKeyMarker + 1
jsonObjectKeyMarker = jsonArrayKeyMarker + 1

// Postgres currently has a special case (maybe a bug) where the empty JSON
// Array sorts before all other JSON values. See the bug report:
// https://www.postgresql.org/message-id/17873-826fdc8bbcace4f1%40postgresql.org
jsonEmptyArrayKeyMarker = voidMarker + 1
jsonNullKeyMarker = jsonEmptyArrayKeyMarker + 1
jsonStringKeyMarker = jsonNullKeyMarker + 1
jsonNumberKeyMarker = jsonStringKeyMarker + 1
jsonFalseKeyMarker = jsonNumberKeyMarker + 1
jsonTrueKeyMarker = jsonFalseKeyMarker + 1
jsonArrayKeyMarker = jsonTrueKeyMarker + 1
jsonObjectKeyMarker = jsonArrayKeyMarker + 1

arrayKeyTerminator byte = 0x00
arrayKeyDescendingTerminator byte = 0xFF
Expand All @@ -127,13 +132,14 @@ const (

// Defining different key markers, for the descending designation,
// for handling different JSON values.
jsonNullKeyDescendingMarker = jsonObjectKeyMarker + 7
jsonStringKeyDescendingMarker = jsonNullKeyDescendingMarker - 1
jsonNumberKeyDescendingMarker = jsonStringKeyDescendingMarker - 1
jsonFalseKeyDescendingMarker = jsonNumberKeyDescendingMarker - 1
jsonTrueKeyDescendingMarker = jsonFalseKeyDescendingMarker - 1
jsonArrayKeyDescendingMarker = jsonTrueKeyDescendingMarker - 1
jsonObjectKeyDescendingMarker = jsonArrayKeyDescendingMarker - 1
jsonEmptyArrayKeyDescendingMarker = jsonObjectKeyMarker + 8
jsonNullKeyDescendingMarker = jsonEmptyArrayKeyDescendingMarker - 1
jsonStringKeyDescendingMarker = jsonNullKeyDescendingMarker - 1
jsonNumberKeyDescendingMarker = jsonStringKeyDescendingMarker - 1
jsonFalseKeyDescendingMarker = jsonNumberKeyDescendingMarker - 1
jsonTrueKeyDescendingMarker = jsonFalseKeyDescendingMarker - 1
jsonArrayKeyDescendingMarker = jsonTrueKeyDescendingMarker - 1
jsonObjectKeyDescendingMarker = jsonArrayKeyDescendingMarker - 1

// Terminators for JSON Key encoding.
jsonKeyTerminator byte = 0x00
Expand Down Expand Up @@ -1789,6 +1795,9 @@ const (
JSONArrayDesc Type = 39
JSONObject Type = 40
JSONObjectDesc Type = 41
// Special case
JsonEmptyArray Type = 42
JsonEmptyArrayDesc Type = 43
)

// typMap maps an encoded type byte to a decoded Type. It's got 256 slots, one
Expand Down Expand Up @@ -1849,6 +1858,10 @@ func slowPeekType(b []byte) Type {
return JSONArray
case m == jsonArrayKeyDescendingMarker:
return JSONArrayDesc
case m == jsonEmptyArrayKeyMarker:
return JsonEmptyArray
case m == jsonEmptyArrayKeyDescendingMarker:
return JsonEmptyArrayDesc
case m == jsonObjectKeyMarker:
return JSONObject
case m == jsonObjectKeyDescendingMarker:
Expand Down Expand Up @@ -2009,10 +2022,12 @@ func PeekLength(b []byte) (int, error) {
length, err := getArrayOrJSONLength(b[1:], dir, IsJSONKeyDone)
return 1 + length, err
case jsonArrayKeyMarker, jsonArrayKeyDescendingMarker,
jsonObjectKeyMarker, jsonObjectKeyDescendingMarker:
jsonObjectKeyMarker, jsonObjectKeyDescendingMarker,
jsonEmptyArrayKeyMarker, jsonEmptyArrayKeyDescendingMarker:
dir := Ascending
if (m == jsonArrayKeyDescendingMarker) ||
(m == jsonObjectKeyDescendingMarker) {
(m == jsonObjectKeyDescendingMarker) ||
(m == jsonEmptyArrayKeyDescendingMarker) {
dir = Descending
}
// removing the starter tag
Expand Down Expand Up @@ -3500,11 +3515,17 @@ func EncodeJSONTrueKeyMarker(buf []byte, dir Direction) []byte {

// EncodeJSONArrayKeyMarker adds a JSON Array key encoding marker
// to buf and returns the new buffer.
func EncodeJSONArrayKeyMarker(buf []byte, dir Direction) []byte {
func EncodeJSONArrayKeyMarker(buf []byte, dir Direction, arrayLength int64) []byte {
switch dir {
case Ascending:
if arrayLength == 0 {
return append(buf, jsonEmptyArrayKeyMarker)
}
return append(buf, jsonArrayKeyMarker)
case Descending:
if arrayLength == 0 {
return append(buf, jsonEmptyArrayKeyDescendingMarker)
}
return append(buf, jsonArrayKeyDescendingMarker)
default:
panic("invalid direction")
Expand Down Expand Up @@ -3621,15 +3642,15 @@ func ValidateAndConsumeJSONKeyMarker(buf []byte, dir Direction) ([]byte, Type, e
case Descending:
switch typ {
case JSONNullDesc, JSONNumberDesc, JSONStringDesc, JSONFalseDesc,
JSONTrueDesc, JSONArrayDesc, JSONObjectDesc:
JSONTrueDesc, JSONArrayDesc, JSONObjectDesc, JsonEmptyArrayDesc:
return buf[1:], typ, nil
default:
return nil, Unknown, errors.Newf("invalid type found %s", typ)
}
case Ascending:
switch typ {
case JSONNull, JSONNumber, JSONString, JSONFalse, JSONTrue, JSONArray,
JSONObject:
JSONObject, JsonEmptyArray:
return buf[1:], typ, nil
default:
return nil, Unknown, errors.Newf("invalid type found %s", typ)
Expand Down
6 changes: 6 additions & 0 deletions pkg/util/encoding/type_string.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 11 additions & 1 deletion pkg/util/json/encoded.go
Original file line number Diff line number Diff line change
Expand Up @@ -606,10 +606,20 @@ func (j *jsonEncoded) AreKeysSorted() bool {
return decoded.AreKeysSorted()
}

func (j *jsonEncoded) Compare(other JSON) (int, error) {
func (j *jsonEncoded) Compare(other JSON) (_ int, err error) {
if other == nil {
return -1, nil
}
// We must first check for the special case of empty arrays, which are the
// minimum JSON value.
switch {
case isEmptyArray(j) && isEmptyArray(other):
return 0, nil
case isEmptyArray(j):
return -1, nil
case isEmptyArray(other):
return 1, nil
}
if cmp := cmpJSONTypes(j.Type(), other.Type()); cmp != 0 {
return cmp, nil
}
Expand Down
Loading

0 comments on commit 056c300

Please sign in to comment.