Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: inverted-index accelerate filters of the form j->'a' = '{"b": "c"}' #60541

Merged
merged 1 commit into from
Feb 17, 2021

Conversation

angelazxu
Copy link
Contributor

Previously, the optimizer did not plan inverted index scans for queries with
the JSON fetch value operator -> when an object or array was on the right side
of the equality operator =, only when it was a boolean, string, number, or
null.

This change allows the inverted index to be used in these types of queries. It
supports objects with multiple key/value pairs, nested objects, arrays, arrays
nested within objects, and objects nested within arrays.

Fixes: #59605

Release note: None

@angelazxu angelazxu requested a review from a team as a code owner February 12, 2021 21:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, that was fast!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 808 at r1 (raw file):

{"a": []}

query T

These tests should use query T rowsort because there is no guarantee about row ordering (some test configs distribute the table data randomly between multiple nodes).


pkg/sql/logictest/testdata/logic_test/inverted_index, line 809 at r1 (raw file):


query T
SELECT j FROM f@i WHERE j->'a' = '{}'

I believe this should only return {"a": {}}. By the way, you should try to run the new logictests with @primary to make sure we get the same results.


pkg/sql/logictest/testdata/logic_test/inverted_index, line 819 at r1 (raw file):

{"a": {"b": {"d": 2}}}
{"a": {}}
{"a": {"b": "c"}}

Add {"a": {"b": "c", "d": "e"}} and {"a":{"b": "c"}, "d": "e"} to make sure we don't incorrectly return these for j->'a' = '{"b": "c"}'


pkg/sql/opt/invertedidx/json_array.go, line 93 at r1 (raw file):

// constrains a json or array index according to the given constant.
func getInvertedExprForJSONOrArrayIndex(
	evalCtx *tree.EvalContext, d tree.Datum, setNotTight bool,

[nit] would be cleaner to use tight bool and do if !tight {} below.


pkg/sql/opt/invertedidx/json_array.go, line 187 at r1 (raw file):

			var spanExpr *inverted.SpanExpression
			if d, ok := nonIndexParam.(tree.Datum); ok {
				invertedExpr := getInvertedExprForJSONOrArrayIndex(evalCtx, d, false)

[nit] add /* setNotTight */ (we do this so the code is easier to read)


pkg/sql/opt/invertedidx/json_array.go, line 232 at r1 (raw file):

				return nil, nil
			}
			return getInvertedExprForJSONOrArrayIndex(g.evalCtx, d, false), nil

ditto


pkg/sql/opt/invertedidx/json_array.go, line 335 at r1 (raw file):

	}

	return getInvertedExprForJSONOrArrayIndex(evalCtx, d, false)

ditto


pkg/sql/opt/memo/testdata/stats/inverted-json, line 388 at r1 (raw file):


# A query with the fetch val operator with a single key/val pair object on the
# right side uses the inverted index.

Isn't this plan missing an "inverted-filterer"? From what i can tell, it would return all objects where j->'a' contains {"b": "c"}.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu, @RaduBerinde, and @rytaft)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 808 at r1 (raw file):

Previously, RaduBerinde wrote…

These tests should use query T rowsort because there is no guarantee about row ordering (some test configs distribute the table data randomly between multiple nodes).

FYI sometimes the rowsort option doesn't behave nicely with JSON output. If run into issues you might have to change the query to guarantee the order, e.g., SELECT j FROM f@i WHERE j->'a' = '[]' ORDER BY k.


pkg/sql/opt/invertedidx/json_array.go, line 103 at r1 (raw file):

	}
	if setNotTight {
		invertedExpr.SetNotTight()

Is there a test case that you found that fails without this addition? I suppose an expression like j @> '{"a": {"b": "c"}}' is tight, while j->'a' = '{"b": "c"}' is not tight? Can we call SetNotTightat the end ofextractJSONFetchValEqConditionrather than passing thesetNotTight` bool here?


pkg/sql/opt/memo/testdata/stats/inverted-json, line 388 at r1 (raw file):

Previously, RaduBerinde wrote…

Isn't this plan missing an "inverted-filterer"? From what i can tell, it would return all objects where j->'a' contains {"b": "c"}.

@angelazxu here is where we determine if an inverted-filterer should be added after an inverted index scan:

needInvertedFilter := !spanExpr.Unique || spanExpr.Operator != inverted.None

This is a good example of one of the problems with using getInvertedExprForJSONOrArrayIndex for generating inverted spans for the -> and = operators. It seems to cater specifically to the concept of containment. Maybe that line I linked to should also check for tightness of the expression? I'm not sure why it's not currently considering the tightness of spanExpr, it seems like it should be, maybe @rytaft knows.

@RaduBerinde
Copy link
Member


pkg/sql/opt/memo/testdata/stats/inverted-json, line 388 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

@angelazxu here is where we determine if an inverted-filterer should be added after an inverted index scan:

needInvertedFilter := !spanExpr.Unique || spanExpr.Operator != inverted.None

This is a good example of one of the problems with using getInvertedExprForJSONOrArrayIndex for generating inverted spans for the -> and = operators. It seems to cater specifically to the concept of containment. Maybe that line I linked to should also check for tightness of the expression? I'm not sure why it's not currently considering the tightness of spanExpr, it seems like it should be, maybe @rytaft knows.

I think we're just not setting it as "not tight" in this case (we only set it in the array case)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu, @mgartner, and @RaduBerinde)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 766 at r1 (raw file):

  (22, '{"z": {"a": "b", "c": "d", "e": "f"}}'),
  (23, '{"a": "b", "x": ["c", "d", "e"]}}'),
  (24, '{"a": "b", "c": [{"d": 1}, {"e": 2}]}}')

since you mentioned that this should support arrays as well as objects, would be good to add a few rows with a top-level array, not just nested arrays.


pkg/sql/opt/invertedidx/json_array.go, line 103 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Is there a test case that you found that fails without this addition? I suppose an expression like j @> '{"a": {"b": "c"}}' is tight, while j->'a' = '{"b": "c"}' is not tight? Can we call SetNotTightat the end ofextractJSONFetchValEqConditionrather than passing thesetNotTight` bool here?

agreed, you should be able to do this at the end of extractJSONFetchValEqCondition


pkg/sql/opt/invertedidx/json_array.go, line 365 at r1 (raw file):

	// When the right side is an array, the InvertedExpression generated is not
	// tight. We must indicate it is non-tight so an additional filter is added.
	if typ == json.ArrayJSONType {

I think this should include objects as well


pkg/sql/opt/memo/testdata/stats/inverted-json, line 388 at r1 (raw file):

Previously, RaduBerinde wrote…

I think we're just not setting it as "not tight" in this case (we only set it in the array case)

We shouldn't need an inverted filter just because an expression is not tight. An inverted filter is only needed if there is a set operation that must be applied (either union or intersection).

But if an expression is not tight, we do need to apply the filter on the output of the scan or scan+inverted filter (just as a regular filter, not inverted filter). So j->'a' = '{"b": "c"}' should be added as a filter of the index-join here. I think @RaduBerinde is right that we just need to be sure to set it as not tight. I've added a comment in the code above where I think you'll need to make a change to fix this.

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @i, @mgartner, @primary, @RaduBerinde, and @rytaft)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 766 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

since you mentioned that this should support arrays as well as objects, would be good to add a few rows with a top-level array, not just nested arrays.

Added some arrays (27, '[1, 2, {"b": "c"}]') and (28, '[{"a": {"b": "c"}}, "d", "e"]') to show that this change only supports arrays within json objects (when the expression is a json fetch operator with an array on the right side), and will not include top-level arrays in the output, as we discussed today :)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 808 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

FYI sometimes the rowsort option doesn't behave nicely with JSON output. If run into issues you might have to change the query to guarantee the order, e.g., SELECT j FROM f@i WHERE j->'a' = '[]' ORDER BY k.

Done - yup rowsort was being weird so I ended up using ORDER BY k .


pkg/sql/logictest/testdata/logic_test/inverted_index, line 809 at r1 (raw file):

Previously, RaduBerinde wrote…

I believe this should only return {"a": {}}. By the way, you should try to run the new logictests with @primary to make sure we get the same results.

Works as expected now, after setting Not Tight to expressions with objects. Also, I compared all the @i tests with @primary and all results match!


pkg/sql/logictest/testdata/logic_test/inverted_index, line 819 at r1 (raw file):

Previously, RaduBerinde wrote…

Add {"a": {"b": "c", "d": "e"}} and {"a":{"b": "c"}, "d": "e"} to make sure we don't incorrectly return these for j->'a' = '{"b": "c"}'

Added! Just to clarify, I think {"a": {"b": "c"}, "d": "e"} should be returned for j->'a' = '{"b": "c"}', since we are just checking for the key/value pair "a": {"b": "c"} inside the json, not necessarily the only pair - does that sound right?


pkg/sql/opt/invertedidx/json_array.go, line 103 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

agreed, you should be able to do this at the end of extractJSONFetchValEqCondition

Ohh I see what you mean. Changed it to just be inside extractJSONFetchValEqCondition!


pkg/sql/opt/invertedidx/json_array.go, line 365 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think this should include objects as well

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! :lgtm:

Reviewed 3 of 3 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @angelazxu, @i, @primary, and @RaduBerinde)


pkg/sql/opt/invertedidx/json_array.go, line 360 at r2 (raw file):

	tight := true

	// When the right side is an array, the InvertedExpression generated is not

[nit] an array or an object

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!! :lgtm: Just left a few nits.

Reviewed 3 of 3 files at r2.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @angelazxu, @i, and @primary)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 819 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

Added! Just to clarify, I think {"a": {"b": "c"}, "d": "e"} should be returned for j->'a' = '{"b": "c"}', since we are just checking for the key/value pair "a": {"b": "c"} inside the json, not necessarily the only pair - does that sound right?

That sounds correct to me.


pkg/sql/logictest/testdata/logic_test/inverted_index, line 849 at r2 (raw file):


query T
SELECT j FROM f@i WHERE j->'a' = '["b", "c", "d", "e"]' OR  j->'a' = '["b", "e", "c", "d"]'ORDER BY k

[nit] add space before ORDER


pkg/sql/opt/invertedidx/json_array.go, line 438 at r2 (raw file):

	invertedExpr := getInvertedExprForJSONOrArrayIndex(evalCtx, tree.NewDJSON(obj))
	if !tight {

[nit] I don't see a need for the tight variable. You can replace this conditional with the one above (and move the comment about tight down here):

if typ == json.ArrayJSONType || typ == json.ObjectJSONType {

and maybe move the initialization of typ down here too since this is the only thing its used for.

@RaduBerinde
Copy link
Member


pkg/sql/logictest/testdata/logic_test/inverted_index, line 819 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

That sounds correct to me.

Ah, yes, you're right

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)


pkg/sql/opt/invertedidx/json_array.go, line 360 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] an array or an object

Done.

@angelazxu angelazxu force-pushed the opt_59605_new branch 2 times, most recently from 50e8138 to f24b0c2 Compare February 17, 2021 05:49
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r3, 2 of 2 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @angelazxu)


pkg/sql/opt/invertedidx/json_array_test.go, line 410 at r4 (raw file):

		},
		{
			// Arrays on the right side of the equality are not yet supported.

This comment needs to be updated -- maybe mention that arrays on the right side of the equality are not tight.


pkg/sql/opt/invertedidx/json_array_test.go, line 417 at r4 (raw file):

		},
		{
			// Objects on the right side of the equality are not yet supported.

Ditto for objects

Previously, the optimizer did not plan inverted index scans for queries with
the JSON fetch value operator `->` when an object or array was on the right
side of the equality operator `=`, only when it was a boolean, string, number,
or null.

This change allows the inverted index to be used in these types of queries. It
supports objects with multiple key/value pairs, nested objects, arrays, arrays
nested within objects, and objects nested within arrays.

Fixes: cockroachdb#59605

Release note: None
@angelazxu
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 17, 2021

This PR was included in a batch that was canceled, it will be automatically retried

@craig craig bot merged commit 1a43d11 into cockroachdb:master Feb 17, 2021
@craig
Copy link
Contributor

craig bot commented Feb 17, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: inverted-index accelerate filters of the form j->'a' = '{"b": "c"}
5 participants