-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: inverted-index accelerate filters of the form j->'a' = '{"b": "c"}' #60541
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, that was fast!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)
pkg/sql/logictest/testdata/logic_test/inverted_index, line 808 at r1 (raw file):
{"a": []} query T
These tests should use query T rowsort
because there is no guarantee about row ordering (some test configs distribute the table data randomly between multiple nodes).
pkg/sql/logictest/testdata/logic_test/inverted_index, line 809 at r1 (raw file):
query T SELECT j FROM f@i WHERE j->'a' = '{}'
I believe this should only return {"a": {}}
. By the way, you should try to run the new logictests with @primary
to make sure we get the same results.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 819 at r1 (raw file):
{"a": {"b": {"d": 2}}} {"a": {}} {"a": {"b": "c"}}
Add {"a": {"b": "c", "d": "e"}}
and {"a":{"b": "c"}, "d": "e"}
to make sure we don't incorrectly return these for j->'a' = '{"b": "c"}'
pkg/sql/opt/invertedidx/json_array.go, line 93 at r1 (raw file):
// constrains a json or array index according to the given constant. func getInvertedExprForJSONOrArrayIndex( evalCtx *tree.EvalContext, d tree.Datum, setNotTight bool,
[nit] would be cleaner to use tight bool
and do if !tight {}
below.
pkg/sql/opt/invertedidx/json_array.go, line 187 at r1 (raw file):
var spanExpr *inverted.SpanExpression if d, ok := nonIndexParam.(tree.Datum); ok { invertedExpr := getInvertedExprForJSONOrArrayIndex(evalCtx, d, false)
[nit] add /* setNotTight */
(we do this so the code is easier to read)
pkg/sql/opt/invertedidx/json_array.go, line 232 at r1 (raw file):
return nil, nil } return getInvertedExprForJSONOrArrayIndex(g.evalCtx, d, false), nil
ditto
pkg/sql/opt/invertedidx/json_array.go, line 335 at r1 (raw file):
} return getInvertedExprForJSONOrArrayIndex(evalCtx, d, false)
ditto
pkg/sql/opt/memo/testdata/stats/inverted-json, line 388 at r1 (raw file):
# A query with the fetch val operator with a single key/val pair object on the # right side uses the inverted index.
Isn't this plan missing an "inverted-filterer"? From what i can tell, it would return all objects where j->'a'
contains {"b": "c"}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu, @RaduBerinde, and @rytaft)
pkg/sql/logictest/testdata/logic_test/inverted_index, line 808 at r1 (raw file):
Previously, RaduBerinde wrote…
These tests should use
query T rowsort
because there is no guarantee about row ordering (some test configs distribute the table data randomly between multiple nodes).
FYI sometimes the rowsort
option doesn't behave nicely with JSON output. If run into issues you might have to change the query to guarantee the order, e.g., SELECT j FROM f@i WHERE j->'a' = '[]' ORDER BY k
.
pkg/sql/opt/invertedidx/json_array.go, line 103 at r1 (raw file):
} if setNotTight { invertedExpr.SetNotTight()
Is there a test case that you found that fails without this addition? I suppose an expression like j @> '{"a": {"b": "c"}}'
is tight, while j->'a' = '{"b": "c"}' is not tight? Can we call
SetNotTightat the end of
extractJSONFetchValEqConditionrather than passing the
setNotTight` bool here?
pkg/sql/opt/memo/testdata/stats/inverted-json, line 388 at r1 (raw file):
Previously, RaduBerinde wrote…
Isn't this plan missing an "inverted-filterer"? From what i can tell, it would return all objects where
j->'a'
contains{"b": "c"}
.
@angelazxu here is where we determine if an inverted-filterer should be added after an inverted index scan:
cockroach/pkg/sql/opt/xform/select_funcs.go
Line 718 in 0bf4147
needInvertedFilter := !spanExpr.Unique || spanExpr.Operator != inverted.None |
This is a good example of one of the problems with using getInvertedExprForJSONOrArrayIndex
for generating inverted spans for the ->
and =
operators. It seems to cater specifically to the concept of containment. Maybe that line I linked to should also check for tightness of the expression? I'm not sure why it's not currently considering the tightness of spanExpr
, it seems like it should be, maybe @rytaft knows.
pkg/sql/opt/memo/testdata/stats/inverted-json, line 388 at r1 (raw file): Previously, mgartner (Marcus Gartner) wrote…
I think we're just not setting it as "not tight" in this case (we only set it in the array case) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu, @mgartner, and @RaduBerinde)
pkg/sql/logictest/testdata/logic_test/inverted_index, line 766 at r1 (raw file):
(22, '{"z": {"a": "b", "c": "d", "e": "f"}}'), (23, '{"a": "b", "x": ["c", "d", "e"]}}'), (24, '{"a": "b", "c": [{"d": 1}, {"e": 2}]}}')
since you mentioned that this should support arrays as well as objects, would be good to add a few rows with a top-level array, not just nested arrays.
pkg/sql/opt/invertedidx/json_array.go, line 103 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is there a test case that you found that fails without this addition? I suppose an expression like
j @> '{"a": {"b": "c"}}'
is tight, whilej->'a' = '{"b": "c"}' is not tight? Can we call
SetNotTightat the end of
extractJSONFetchValEqConditionrather than passing the
setNotTight` bool here?
agreed, you should be able to do this at the end of extractJSONFetchValEqCondition
pkg/sql/opt/invertedidx/json_array.go, line 365 at r1 (raw file):
// When the right side is an array, the InvertedExpression generated is not // tight. We must indicate it is non-tight so an additional filter is added. if typ == json.ArrayJSONType {
I think this should include objects as well
pkg/sql/opt/memo/testdata/stats/inverted-json, line 388 at r1 (raw file):
Previously, RaduBerinde wrote…
I think we're just not setting it as "not tight" in this case (we only set it in the array case)
We shouldn't need an inverted filter just because an expression is not tight. An inverted filter is only needed if there is a set operation that must be applied (either union or intersection).
But if an expression is not tight, we do need to apply the filter on the output of the scan or scan+inverted filter (just as a regular filter, not inverted filter). So j->'a' = '{"b": "c"}'
should be added as a filter of the index-join here. I think @RaduBerinde is right that we just need to be sure to set it as not tight. I've added a comment in the code above where I think you'll need to make a change to fix this.
5ed92ee
to
0b1fccf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @i, @mgartner, @primary, @RaduBerinde, and @rytaft)
pkg/sql/logictest/testdata/logic_test/inverted_index, line 766 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
since you mentioned that this should support arrays as well as objects, would be good to add a few rows with a top-level array, not just nested arrays.
Added some arrays (27, '[1, 2, {"b": "c"}]')
and (28, '[{"a": {"b": "c"}}, "d", "e"]')
to show that this change only supports arrays within json objects (when the expression is a json fetch operator with an array on the right side), and will not include top-level arrays in the output, as we discussed today :)
pkg/sql/logictest/testdata/logic_test/inverted_index, line 808 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
FYI sometimes the
rowsort
option doesn't behave nicely with JSON output. If run into issues you might have to change the query to guarantee the order, e.g.,SELECT j FROM f@i WHERE j->'a' = '[]' ORDER BY k
.
Done - yup rowsort
was being weird so I ended up using ORDER BY k
.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 809 at r1 (raw file):
Previously, RaduBerinde wrote…
I believe this should only return
{"a": {}}
. By the way, you should try to run the new logictests with@primary
to make sure we get the same results.
Works as expected now, after setting Not Tight to expressions with objects. Also, I compared all the @i tests with @primary and all results match!
pkg/sql/logictest/testdata/logic_test/inverted_index, line 819 at r1 (raw file):
Previously, RaduBerinde wrote…
Add
{"a": {"b": "c", "d": "e"}}
and{"a":{"b": "c"}, "d": "e"}
to make sure we don't incorrectly return these forj->'a' = '{"b": "c"}'
Added! Just to clarify, I think {"a": {"b": "c"}, "d": "e"}
should be returned for j->'a' = '{"b": "c"}'
, since we are just checking for the key/value pair "a": {"b": "c"}
inside the json, not necessarily the only pair - does that sound right?
pkg/sql/opt/invertedidx/json_array.go, line 103 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
agreed, you should be able to do this at the end of
extractJSONFetchValEqCondition
Ohh I see what you mean. Changed it to just be inside extractJSONFetchValEqCondition
!
pkg/sql/opt/invertedidx/json_array.go, line 365 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think this should include objects as well
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @angelazxu, @i, @primary, and @RaduBerinde)
pkg/sql/opt/invertedidx/json_array.go, line 360 at r2 (raw file):
tight := true // When the right side is an array, the InvertedExpression generated is not
[nit] an array or an object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @angelazxu, @i, and @primary)
pkg/sql/logictest/testdata/logic_test/inverted_index, line 819 at r1 (raw file):
Previously, angelazxu (Angela Xu) wrote…
Added! Just to clarify, I think
{"a": {"b": "c"}, "d": "e"}
should be returned forj->'a' = '{"b": "c"}'
, since we are just checking for the key/value pair"a": {"b": "c"}
inside the json, not necessarily the only pair - does that sound right?
That sounds correct to me.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 849 at r2 (raw file):
query T SELECT j FROM f@i WHERE j->'a' = '["b", "c", "d", "e"]' OR j->'a' = '["b", "e", "c", "d"]'ORDER BY k
[nit] add space before ORDER
pkg/sql/opt/invertedidx/json_array.go, line 438 at r2 (raw file):
invertedExpr := getInvertedExprForJSONOrArrayIndex(evalCtx, tree.NewDJSON(obj)) if !tight {
[nit] I don't see a need for the tight
variable. You can replace this conditional with the one above (and move the comment about tight down here):
if typ == json.ArrayJSONType || typ == json.ObjectJSONType {
and maybe move the initialization of typ
down here too since this is the only thing its used for.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 819 at r1 (raw file): Previously, mgartner (Marcus Gartner) wrote…
Ah, yes, you're right |
0b1fccf
to
a16127e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/invertedidx/json_array.go, line 360 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] an array or an object
Done.
50e8138
to
f24b0c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, 2 of 2 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @angelazxu)
pkg/sql/opt/invertedidx/json_array_test.go, line 410 at r4 (raw file):
}, { // Arrays on the right side of the equality are not yet supported.
This comment needs to be updated -- maybe mention that arrays on the right side of the equality are not tight.
pkg/sql/opt/invertedidx/json_array_test.go, line 417 at r4 (raw file):
}, { // Objects on the right side of the equality are not yet supported.
Ditto for objects
f24b0c2
to
474e456
Compare
Previously, the optimizer did not plan inverted index scans for queries with the JSON fetch value operator `->` when an object or array was on the right side of the equality operator `=`, only when it was a boolean, string, number, or null. This change allows the inverted index to be used in these types of queries. It supports objects with multiple key/value pairs, nested objects, arrays, arrays nested within objects, and objects nested within arrays. Fixes: cockroachdb#59605 Release note: None
474e456
to
5886177
Compare
bors r+ |
This PR was included in a batch that was canceled, it will be automatically retried |
Build succeeded: |
Previously, the optimizer did not plan inverted index scans for queries with
the JSON fetch value operator
->
when an object or array was on the right sideof the equality operator
=
, only when it was a boolean, string, number, ornull.
This change allows the inverted index to be used in these types of queries. It
supports objects with multiple key/value pairs, nested objects, arrays, arrays
nested within objects, and objects nested within arrays.
Fixes: #59605
Release note: None