Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: index accelerate JSON filters in the forms j->'a' @> '1' and j->'a' <@ '1' #63048

Merged
merged 1 commit into from
Apr 12, 2021

Conversation

angelazxu
Copy link
Contributor

We previously did not have inverted index support for expressions with a
JSON fetch val operator on the left side of @> (contains) or <@ (contained by)
expressions.

This commit adds support to use the inverted index for query filters with JSON
fetch val and containment operators. These include any contains or contained by
expressions with fetch val or chained fetch val operators on the left side, and
a constant value on the right side, including booleans, strings, numbers,
nulls, arrays, and objects.

Fixes #61430

Release note (performance improvement): Expressions with the -> (fetch val)
operator on the left side of either <@ (contained by) or @> (contains) now
support index-acceleration.

@angelazxu angelazxu requested a review from a team as a code owner April 3, 2021 00:14
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)


pkg/sql/opt/exec/execbuilder/testdata/virtual_columns, line 1431 at r1 (raw file):

  estimated row count: 3 (missing stats)
  table: inv@iv_j_idx
  spans: /10/"a"/"a"/"b"-/10/"a"/"a"/"b"/PrefixEnd /20/"a"/"a"/"b"-/20/"a"/"a"/"b"/PrefixEnd /30/"a"/"a"/"b"-/30/"a"/"a"/"b"/PrefixEnd

The (j->'a') virtual column index is not longer used here, possibly because this change now supports j->'a' in general, so the costs might be the same? Not sure if it matters, or if we want to make it so we choose the virtual column when it's available?

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! This is a tricky one but you did a really nice job using the building blocks we already have.

Reviewed 7 of 7 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 772 at r1 (raw file):

  (28, '[{"a": {"b": "c"}}, "d", "e"]'),
  (29, '{"a": null}'),
  (30, '{"a": [1, 2, null]}')

can you add rows with {}, [], and NULL?


pkg/sql/logictest/testdata/logic_test/inverted_index, line 982 at r1 (raw file):

----
{"a": null}

Can you add a few more tests, just to check the logic with empty arrays/objects:
SELECT j FROM f@i WHERE j->'a' <@ '{"b": [1, 2]}'
SELECT j FROM f@i WHERE j->'a' <@ '[]'
SELECT j FROM f@i WHERE j->'a' <@ '{}'
SELECT j FROM f@i WHERE j->'a' @> '[]'
SELECT j FROM f@i WHERE j->'a' @> '{}'

Also would be good to add some tests with a couple of different nesting levels of arrays/objects. The logic is pretty tricky here, so in the absence of randomized tests I want to make sure we've covered a wide range of cases.


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 464 at r1 (raw file):

                  estimated row count: 111 (missing stats)
                  table: d@foo_inv
                  spans: /{}-/{}/PrefixEnd /"a"/"b"-/"a"/"b"/PrefixEnd

It's unfortunate that we need to scan {} here, but I'm not sure it's worth the complexity to avoid it...

Might be worth opening an issue to track or at least adding a TODO somewhere in the code


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 496 at r1 (raw file):

                  estimated row count: 111 (missing stats)
                  table: d@foo_inv
                  spans: /{}-/{}/PrefixEnd /"a"/{}-/"a"/{}/PrefixEnd /"a"/{}-/"a"/{}/PrefixEnd /"a"/"c"/"b"-/"a"/"c"/"b"/PrefixEnd

Do you know why we're including the spans for {"a": {}} twice? Not really a big deal, but maybe we're doing extra work that's not needed?


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 552 at r1 (raw file):

                  estimated row count: 111 (missing stats)
                  table: d@foo_inv
                  spans: /{}-/{}/PrefixEnd /"a"/1-/"a"/1/PrefixEnd /"a"/2-/"a"/2/PrefixEnd /"a"/[]-/"a"/{} /"a"/[]-/"a"/{} /"a"/Arr/1-/"a"/Arr/1/PrefixEnd /"a"/Arr/2-/"a"/Arr/2/PrefixEnd

{"a": []} is duplicated here too


pkg/sql/opt/exec/execbuilder/testdata/virtual_columns, line 1431 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

The (j->'a') virtual column index is not longer used here, possibly because this change now supports j->'a' in general, so the costs might be the same? Not sure if it matters, or if we want to make it so we choose the virtual column when it's available?

I think it's ok that we don't choose the virtual column, but it may be a sign that our costing isn't quite right. I'd just update the comment for this test case and change it to a TODO to investigate. If you can find another test case that does use the virtual column then that would be great to include (e.g., what if you change the predicate to <@ instead of @>?), but I don't think it's essential.


pkg/sql/opt/invertedidx/json_array.go, line 387 at r1 (raw file):

	} else {
		if fetch, ok := left.(*memo.FetchValExpr); ok {
			// When the expression has a JSON fetch operator on the left, it is handled in extractJSONFetchValCondition.

do you have any test cases where the fetch val condition is on the right? can we handle that case?


pkg/sql/opt/invertedidx/json_array.go, line 423 at r1 (raw file):

	left *memo.FetchValExpr,
	right opt.ScalarExpr,
	equals, containedBy bool,

The two booleans here is getting a bit awkward... just passing the operator directly might be better, but I also see why you did it this way. Either way, this needs a comment explaining what the parameters mean (and that if equals is true then containedBy is ignored)


pkg/sql/opt/invertedidx/json_array.go, line 508 at r1 (raw file):

		return invertedExpr
	}
	// For Contains and ContainedBy expressions, we may need to build additional

nit: add a blank line above to separate this out


pkg/sql/opt/invertedidx/json_array.go, line 517 at r1 (raw file):

	// We get an inverted expression for each object constructed, and union
	// these expressions.
	for i := 0; i < len(objs); i++ {

nit: can do for i := range objs


pkg/sql/opt/invertedidx/json_array.go, line 534 at r1 (raw file):

// fetchContainmentObjects constructs new JSON objects based on the value type and
// whether the expression uses <@ or @>. An array of these JSONs is returned.

nit: this comment could use a bit more info to explain that the keys and val come from a fetchVal expression, maybe with an example


pkg/sql/opt/invertedidx/json_array.go, line 540 at r1 (raw file):

	switch typ {
	case json.ArrayJSONType:
		// For Array values, we must create a scalar value object, because

nit: mention this is only for ContainedBy


pkg/sql/opt/invertedidx/json_array.go, line 547 at r1 (raw file):

		//   j->'a' @> '[1]', no new object required, we already have '{"a": [1]}'
		//   j->'a' <@ '[1]', build '{"a": 1}', we already have '{"a": [1]}'
		//   j->'a' <@ '[1, [2], 3]', build '{"a": 1}', '{"a": 3}', we already have '{"a": [1, [2], 3]}'

nice examples!


pkg/sql/opt/invertedidx/json_array.go, line 565 at r1 (raw file):

	case json.ObjectJSONType:
		// For objects in ContainedBy expressions, we do not need to generate the

nit: mention why it's also not needed for Contains


pkg/sql/opt/invertedidx/json_array.go, line 580 at r1 (raw file):

		// Scalar value examples:
		//   j->'a' @> '1', we want '{"a": 1}', '{"a": [1]}'
		//   j->'a' <@ '1', we want '{"a": 1}'

nit: also mention what we already have (like you did above)


pkg/sql/opt/invertedidx/json_array.go, line 596 at r1 (raw file):

// Note that key0 is the outer-most fetch val index, so the expression
// j->'a'->'b' = 1 results in {"a": {"b": 1}}.
func buildObject(keys []string, val json.JSON) json.JSON {

nit: This comment is a bit out of place now that this is a separate function. Probably just needs a few more words to explain that the given slice of keys was extracted from a fetchVal expression by the caller.


pkg/sql/opt/memo/testdata/stats/inverted-json, line 1193 at r1 (raw file):


# A query with fetch val and contains operators uses the inverted index when an
# array is on the right side, and the expression is not tight..

nit: extra .

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 772 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

can you add rows with {}, [], and NULL?

Added - also created the issue I mentioned: #63180


pkg/sql/logictest/testdata/logic_test/inverted_index, line 982 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Can you add a few more tests, just to check the logic with empty arrays/objects:
SELECT j FROM f@i WHERE j->'a' <@ '{"b": [1, 2]}'
SELECT j FROM f@i WHERE j->'a' <@ '[]'
SELECT j FROM f@i WHERE j->'a' <@ '{}'
SELECT j FROM f@i WHERE j->'a' @> '[]'
SELECT j FROM f@i WHERE j->'a' @> '{}'

Also would be good to add some tests with a couple of different nesting levels of arrays/objects. The logic is pretty tricky here, so in the absence of randomized tests I want to make sure we've covered a wide range of cases.

Added the tests you listed and a couple of nested arrays/objects, also rearranged and grouped similar cases so hopefully it's easier to look through! Also - currently two cases are commented out temporarily due to the issue I mentioned above, and will be able to be run after #63196 is merged.


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 464 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It's unfortunate that we need to scan {} here, but I'm not sure it's worth the complexity to avoid it...

Might be worth opening an issue to track or at least adding a TODO somewhere in the code

added a TODO above the test case, also created the issue here: #63184


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 496 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Do you know why we're including the spans for {"a": {}} twice? Not really a big deal, but maybe we're doing extra work that's not needed?

So for b->'a'->'c' <@ '"b"' we would be constructing {"a": {"c": "b"}}, which means the spans for {"a": {}} are created twice somewhere in encodeContainedInvertedIndexSpans...
I spent some time stepping through the code and it looks like this is happening:

In the for loop, when the value/child is an object, we call the function twice - on the non-empty child and on an empty child with prefixWithEnd.
At the beginning of the function, we create the empty object span with the prefix and union this span with whatever else is created - so the non-empty child function call returns an extra empty object value span.

I was able to fix this and get rid of all the duplicate spans, but I'll put those changes in a separate PR and link it below :)


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 552 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

{"a": []} is duplicated here too

also due to what I mentioned above, will be fixed in that PR!


pkg/sql/opt/exec/execbuilder/testdata/virtual_columns, line 1431 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think it's ok that we don't choose the virtual column, but it may be a sign that our costing isn't quite right. I'd just update the comment for this test case and change it to a TODO to investigate. If you can find another test case that does use the virtual column then that would be great to include (e.g., what if you change the predicate to <@ instead of @>?), but I don't think it's essential.

Ran a test with <@ and it doesn't use the virtual column either, added a TODO


pkg/sql/opt/invertedidx/json_array.go, line 387 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

do you have any test cases where the fetch val condition is on the right? can we handle that case?

Just added support for that case! We will just pass it into extractJSONFetchValCondition as the equivalent expression with fetch val on the left and flip the contains/containedby operator. The rest should follow the existing logic, so I just added a few test cases in each of the test files.


pkg/sql/opt/invertedidx/json_array.go, line 423 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

The two booleans here is getting a bit awkward... just passing the operator directly might be better, but I also see why you did it this way. Either way, this needs a comment explaining what the parameters mean (and that if equals is true then containedBy is ignored)

Ah yes it is kinda awkward, but I wanted the containedBy bool since it's more convenient to use in a lot of places :) added comment!

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)


pkg/sql/opt/invertedidx/json_array.go, line 423 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

Ah yes it is kinda awkward, but I wanted the containedBy bool since it's more convenient to use in a lot of places :) added comment!

If you extract collectKeys to a separate helper function, then you could have two extractJSONFetchValCondition... functions, one for = and one for @> and <@, without too much duplication. That might make things simpler.


pkg/sql/opt/invertedidx/json_array.go, line 539 at r2 (raw file):

			invertedExpr = expr
		} else {
			invertedExpr = inverted.Or(invertedExpr, expr)

Do we need to explicitly set this expr as "not tight" with invertedExpr.SetNotTight()? inverted.Or returns a tight=true expression if both the left and the right input are tight, but in this case where we are manufacturing this union it seems like it would never be tight. Is there a test case that proves this is ok or not?


pkg/sql/opt/invertedidx/json_array.go, line 551 at r2 (raw file):

// {"a", "b"} as keys, "c" as val, and construct {"a": "b": ["c"]}.
// An array of the constructed JSONs is returned.
func fetchContainmentObjects(keys []string, val json.JSON, containedBy bool) ([]json.JSON, error) {

nit: it might be more clear to name this buildFetchContainmentObjects or use some verb other than fetch. fetch sounds like a verb that this function is doing, but it's not "fetching", it's just related to FetchVal operators.

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/invertedidx/json_array.go, line 423 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

If you extract collectKeys to a separate helper function, then you could have two extractJSONFetchValCondition... functions, one for = and one for @> and <@, without too much duplication. That might make things simpler.

Oh!! Changed it :D


pkg/sql/opt/invertedidx/json_array.go, line 539 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Do we need to explicitly set this expr as "not tight" with invertedExpr.SetNotTight()? inverted.Or returns a tight=true expression if both the left and the right input are tight, but in this case where we are manufacturing this union it seems like it would never be tight. Is there a test case that proves this is ok or not?

So all <@ expressions are not tight, only some of the @> expressions have tight=true, which I think is okay - for example, we convert something like j->'a'->'b' @> [1] to j @> {"a": {"b": [1]}} which has tight spans, and there wouldn't be any JSONs satisfying the second expression that wouldn't satisfy the first. Based on all the logic tests I've run comparing queries using j@i vs. j@primary, there hasn't been any cases where tight spans produce incorrect results. I think in cases where the expression has a single scalar/array/object on the right side of @>, the span is tight. For all other cases, it wouldn't be tight, and this is already checked for in encodeContainedInvertedIndexSpans/encodeContainingInvertedIndexSpans

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 496 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

So for b->'a'->'c' <@ '"b"' we would be constructing {"a": {"c": "b"}}, which means the spans for {"a": {}} are created twice somewhere in encodeContainedInvertedIndexSpans...
I spent some time stepping through the code and it looks like this is happening:

In the for loop, when the value/child is an object, we call the function twice - on the non-empty child and on an empty child with prefixWithEnd.
At the beginning of the function, we create the empty object span with the prefix and union this span with whatever else is created - so the non-empty child function call returns an extra empty object value span.

I was able to fix this and get rid of all the duplicate spans, but I'll put those changes in a separate PR and link it below :)

#63355

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r1, 5 of 6 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 952 at r3 (raw file):


query T
SELECT j FROM f@i WHERE j->'a' <@ '{"b": [1, 2]}' ORDER BY k

Can you add a row that is {"a": {"b": []}} and make sure it is returned for this case? Unless there is another test that is testing something like this already that I'm not seeing.


pkg/sql/logictest/testdata/logic_test/inverted_index, line 1041 at r3 (raw file):

# query T
# SELECT j FROM f@i WHERE j->'a' @> '"b"' AND j->'a' <@ '["b", "c", "d", "e"]' ORDER BY k
# ----

nit: Add a TODO(angelazxu): Uncommented these tests once #63180 is fixed.


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 435 at r3 (raw file):


# TODO(angelazxu): The {} span does not need to be scanned here, but is
 # included when finding spans contained by {"a": "b"} (see #63184).

nit: remove space at the beginning of the line


pkg/sql/opt/exec/execbuilder/testdata/virtual_columns, line 1431 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

Ran a test with <@ and it doesn't use the virtual column either, added a TODO

I think the actual cost of scanning iv_jv_idx and iv_j_idx would be the same in this case - the same number of inverted index entries would be scanned in both cases. Or would we consider the former cheaper because the keys are shorter? Would that have a significant impact on the scan performance?

You could just remove your TODO and drop the iv_j_idx right before this test so that the test still exercises what it was originally meant to - to verify that we can plan a scan on a multi-column inverted index where the prefix and inverted columns are virtual.


pkg/sql/opt/invertedidx/json_array.go, line 539 at r2 (raw file):

Previously, angelazxu (Angela Xu) wrote…

So all <@ expressions are not tight, only some of the @> expressions have tight=true, which I think is okay - for example, we convert something like j->'a'->'b' @> [1] to j @> {"a": {"b": [1]}} which has tight spans, and there wouldn't be any JSONs satisfying the second expression that wouldn't satisfy the first. Based on all the logic tests I've run comparing queries using j@i vs. j@primary, there hasn't been any cases where tight spans produce incorrect results. I think in cases where the expression has a single scalar/array/object on the right side of @>, the span is tight. For all other cases, it wouldn't be tight, and this is already checked for in encodeContainedInvertedIndexSpans/encodeContainingInvertedIndexSpans

Makes sense, thanks for the explanation!


pkg/sql/opt/invertedidx/json_array.go, line 547 at r3 (raw file):

func (j *jsonOrArrayFilterPlanner) collectKeys(
	currKeys []string, fetch *memo.FetchValExpr,
) (keys []string, foundKeys bool) {

nit: it might be more idiomatic to rename foundKeys to ok. you could also remove the bool entirely and return empty keys in the case that we can't collect any, and check the length of the keys from the call sites.


pkg/sql/opt/invertedidx/json_array.go, line 552 at r3 (raw file):

	// expression.
	if !memo.CanExtractConstDatum(fetch.Index) {
		return

nit: make the return values explicit: return nil, false


pkg/sql/opt/invertedidx/json_array.go, line 556 at r3 (raw file):

	key, ok := memo.ExtractConstDatum(fetch.Index).(*tree.DString)
	if !ok {
		return

ditto: make the return values explicit: return nil, false


pkg/sql/opt/invertedidx/json_array.go, line 573 at r3 (raw file):

	if innerFetch, ok := fetch.Json.(*memo.FetchValExpr); ok {
		keys, foundKeys := j.collectKeys(keys, innerFetch)
		return keys, foundKeys

If you remove the bool return value (suggested above), these two lines simplify to return j.collectKeys(keys)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 6 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @mgartner)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 1041 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Add a TODO(angelazxu): Uncommented these tests once #63180 is fixed.

Could you also try rebasing this PR on top of #63355 if you haven't already and see if it fixes these tests? I think the duplicate spans you fixed in #63355 had incorrect JSON encodings, and I am wondering if by chance that could be the cause of #63180?


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 464 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

added a TODO above the test case, also created the issue here: #63184

Thanks!


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 496 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

#63355

Good find!


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 607 at r3 (raw file):


query T
EXPLAIN (VERBOSE) SELECT * from d where '"b"' <@ b->'a'

did you intend for this to use the inverted index? You could force it with d@foo_inv if that was the intention


pkg/sql/opt/invertedidx/json_array.go, line 537 at r3 (raw file):

// outer-most fetch val index.
//
// Later on, we iterate forward through these keys to build a JSON object

nit: now that this is a separate function, I'd change "Later on, we" to "Callers of this function should"


pkg/sql/opt/invertedidx/json_array.go, line 579 at r3 (raw file):

}

// fetchContainmentObjects constructs new JSON objects with given keys and val.

nit: update to match function name

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 1041 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Could you also try rebasing this PR on top of #63355 if you haven't already and see if it fixes these tests? I think the duplicate spans you fixed in #63355 had incorrect JSON encodings, and I am wondering if by chance that could be the cause of #63180?

Added the comment, tried rebasing but still getting the errors (as mentioned in slack)


pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 607 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

did you intend for this to use the inverted index? You could force it with d@foo_inv if that was the intention

fixed!


pkg/sql/opt/exec/execbuilder/testdata/virtual_columns, line 1431 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I think the actual cost of scanning iv_jv_idx and iv_j_idx would be the same in this case - the same number of inverted index entries would be scanned in both cases. Or would we consider the former cheaper because the keys are shorter? Would that have a significant impact on the scan performance?

You could just remove your TODO and drop the iv_j_idx right before this test so that the test still exercises what it was originally meant to - to verify that we can plan a scan on a multi-column inverted index where the prefix and inverted columns are virtual.

done!


pkg/sql/opt/invertedidx/json_array.go, line 547 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: it might be more idiomatic to rename foundKeys to ok. you could also remove the bool entirely and return empty keys in the case that we can't collect any, and check the length of the keys from the call sites.

removed the bool!


pkg/sql/opt/invertedidx/json_array.go, line 556 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

ditto: make the return values explicit: return nil, false

Done. (return nil now since I removed the bool)


pkg/sql/opt/invertedidx/json_array.go, line 573 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

If you remove the bool return value (suggested above), these two lines simplify to return j.collectKeys(keys)

done!

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Reviewed 6 of 6 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)

…'a'<@'1'

We previously did not have inverted index support for expressions with a
JSON fetch val operator on the left side of @> (contains) or <@ (contained by)
expressions.

This commit adds support to use the inverted index for query filters with JSON
fetch val and containment operators. These include any contains or contained by
expressions with fetch val or chained fetch val operators on the left side, and
a constant value on the right side, including booleans, strings, numbers,
nulls, arrays, and objects.

Fixes cockroachdb#61430

Release note (performance improvement): Expressions with the -> (fetch val)
operator on the left side of either <@ (contained by) or @> (contains) now
support index-acceleration.
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @angelazxu)

@angelazxu
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 12, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: index accelerate JSON filters in the form j->'a' @> '1'
4 participants