-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: fix JSON fetch value operator evaluation #55316
Conversation
6bcf389
to
a5c02fe
Compare
│ table d@primary · · | ||
│ key columns a · · | ||
└── scan · · (a) · | ||
· estimated row count 110 (missing stats) · · | ||
· table d@foo_inv · · | ||
· spans /"a"/"b"-/"a"/"b"/PrefixEnd · · | ||
|
||
# TODO(mgartner): Add support for building inverted index constraints for chained JSON | ||
# fetch operators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a ticket to track here: #55317
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also created a ticket to track the j->'a' @> '{"x": "y"}
case: #55318
@@ -273,33 +273,37 @@ filter · · (a, b) · | |||
· spans FULL SCAN · · | |||
|
|||
|
|||
# TODO(mgartner): It should not be required to force the index scan. It is | |||
# required until the statistics builder treats b->'a' = '"b"' similarly to the | |||
# containment operator, @>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a ticket to track here: #55319
@yuzefovich please take a look at the vectorized engine changes in the first commit, and let me know if this is the right approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me, thanks. I just have a couple of nits.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @RaduBerinde)
pkg/sql/colexec/proj_const_ops_tmpl.go, line 101 at r1 (raw file):
} projCol := projVec._RET_TYP() _outNulls := projVec.Nulls()
nit: this deserves a comment to spell out the "contract" (similar to what we have about _overloadHelper
above).
pkg/sql/colexec/proj_const_ops_tmpl.go, line 140 at r1 (raw file):
} } // {{if _HAS_NULLS}}
nit: initially I thought that there might be a bug that we're not updating nulls vector when the argument vector didn't have nulls (i.e. _HAS_NULLS == false
), but in that case _outNulls
has already been updated by _ASSIGN
function if need be. I think it's worth adding a comment to mention this observation.
pkg/sql/colexec/proj_const_ops_tmpl.go, line 141 at r1 (raw file):
} // {{if _HAS_NULLS}} projVec.SetNulls(_outNulls.Or(colNulls))
I think here we want to be more efficient in non-datum cases - coldata.Nulls.Copy()
is faster than coldata.Nulls.Or()
. I'd do something like this:
// {{if eq .VecMethod "Datum"}}
projVec.SetNulls(_outNulls.Or(colNulls))
// {{else}}
colNullsCopy := colNulls.Copy()
projVec.SetNulls(&colNullsCopy)
// {{end}}
This will also make _outNulls
variable unused in many code paths, so you'll need to go around the unused warning.
Update: I tried it out, and the microbenchmarks don't show much difference, so the current approach looks good. (If you're curious, the diff is here.)
pkg/sql/colexec/proj_non_const_ops_tmpl.go, line 112 at r1 (raw file):
col1 := vec1._L_TYP() col2 := vec2._R_TYP() _outNulls := projVec.Nulls()
nit: ditto for comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 3 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/idxconstraint/index_constraints.go, line 996 at r2 (raw file):
} rightConst := rhs.(*memo.ConstExpr)
, ok
missing. Also, I think we have an extractconstval somewhere since other operators can be constant (e.g. True, False, Null)
a5c02fe
to
31a5d89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @RaduBerinde, @rytaft, and @yuzefovich)
pkg/sql/colexec/proj_const_ops_tmpl.go, line 101 at r1 (raw file):
Previously, yuzefovich wrote…
nit: this deserves a comment to spell out the "contract" (similar to what we have about
_overloadHelper
above).
Done.
pkg/sql/colexec/proj_const_ops_tmpl.go, line 140 at r1 (raw file):
Previously, yuzefovich wrote…
nit: initially I thought that there might be a bug that we're not updating nulls vector when the argument vector didn't have nulls (i.e.
_HAS_NULLS == false
), but in that case_outNulls
has already been updated by_ASSIGN
function if need be. I think it's worth adding a comment to mention this observation.
Good idea, done.
pkg/sql/colexec/proj_const_ops_tmpl.go, line 141 at r1 (raw file):
Previously, yuzefovich wrote…
I think here we want to be more efficient in non-datum cases -coldata.Nulls.Copy()
is faster thancoldata.Nulls.Or()
. I'd do something like this:// {{if eq .VecMethod "Datum"}} projVec.SetNulls(_outNulls.Or(colNulls)) // {{else}} colNullsCopy := colNulls.Copy() projVec.SetNulls(&colNullsCopy) // {{end}}
This will also make_outNulls
variable unused in many code paths, so you'll need to go around the unused warning.Update: I tried it out, and the microbenchmarks don't show much difference, so the current approach looks good. (If you're curious, the diff is here.)
I tried the same approach and also ran into _outNulls
being unused, and gave up at that point.
But I did look at the implementation of Or
, and if one of the Nulls has no nulls, then it performs a copy. So it makes sense that your microbenchmarks show little difference.
pkg/sql/colexec/proj_non_const_ops_tmpl.go, line 112 at r1 (raw file):
Previously, yuzefovich wrote…
nit: ditto for comments
Done.
pkg/sql/opt/idxconstraint/index_constraints.go, line 996 at r2 (raw file):
Previously, RaduBerinde wrote…
, ok
missing. Also, I think we have an extractconstval somewhere since other operators can be constant (e.g. True, False, Null)
Great catch on the missing , ok
! Fixed.
We only care about the case when the RHS is a constant DJSON, so neglecting True
, False
, and Null
is not an issue. But I've updated to use memo.CanExtractConstDatum
and memo.ExtractConstDatum
anyway since it seems more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! You need to run make execgen
on the first commit to regenerate the colexec code with comments.
Reviewed 3 of 7 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @RaduBerinde, @rytaft, and @yuzefovich)
31a5d89
to
104e5e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to run make execgen on the first commit to regenerate the colexec code with comments.
Oops! Done.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @RaduBerinde, @rytaft, and @yuzefovich)
This commit fixes incorrect evaluation of the JSON fetch value operator, `->`. There were two bugs causing incorrect evaluation. The first issue was a result of optimizer normalization rules that produced unequivalent scalar expressions when converting `->` to `@>`. The rules would convert expressions like `j -> '"a"' = 1` to `j @> '{"a": 1}'`. This is invalid because `->` results in `NULL` when the LHS does not contain the RHS key, but the resulting `@>` expression is always either `true` or `false`, never `NULL`. These normalization rules have been removed. These two rules existed to provide inverted index-acceleration for queries with `->`, because the optimizer can only index-accerlate `@>` operators during exploration. As a result of their removal, queries with `->` operators are no longer index-accelerated. This will be remedied in a future commit. The second issue was a bug in the vectorized overload of the `->` operator. Previously, when the operator evaluated to `NULL` with two non-`NULL` inputs, the resulting `NULL` would not be tracked by the `Nulls` struct. Fixes cockroachdb#49143 Release note (bug fix): Previously, the JSON fetch value operator, `->`, would evaluate incorrectly in some cases. This has been fixed.
This commit allows inverted indexes to be scanned to satisfy query filters in the form: `j->'a' = '1'`. The optimizer had previously supported this by normalizing these expressions into expressions with JSON containment operators, `@>`, but it lost this support when these normalization rules were found to produce inequivalent expressions. Note that query filters of several forms, which were previously accelerated via the incorrect normalization rules, are not accelerated as part of this commit, such as: j->'a' @> '{"x": "y"} j->'a'->'b' = '"c"' Release note: None
104e5e2
to
30f8072
Compare
TFTRs! bors r+ |
Build succeeded: |
@mgartner are you planning to backport this? (just wondering since it affects JSON compatibility with Django 3.1) |
I think we should backport for 20.2.1. |
Created a backport PR: #55447 |
Prior to cockroachdb#55316, the optimizer generated inverted index scans on indexed JSON columns when queries had filters with chained fetch value operators, for example `j->'a'->'b' = '1'`. The logic that made this possible was found to create query plans not equivalent to the query, so it was removed. This commit restores the ability to index accelerate chained -> operators. Fixes cockroachdb#55317 Release note (performance improvement): A bug fix included in 20.2.1 for for the JSON fetch value operator, `->`, resulted in chained `->` operators in query filters not being index accelerated, e.g., `j->'a'->'b' = '1'`. Chained `->` are now index accelerated.
Prior to cockroachdb#55316, the optimizer generated inverted index scans on indexed JSON columns when queries had filters with chained fetch value operators, for example `j->'a'->'b' = '1'`. The logic that made this possible was found to create query plans not equivalent to the query, so it was removed. This commit restores the ability to index accelerate chained -> operators. Fixes cockroachdb#55317 Release note (performance improvement): A bug fix included in 20.2.1 for for the JSON fetch value operator, `->`, resulted in chained `->` operators in query filters not being index accelerated, e.g., `j->'a'->'b' = '1'`. Chained `->` are now index accelerated.
Prior to cockroachdb#55316, the optimizer generated inverted index scans on indexed JSON columns when queries had filters with chained fetch value operators, for example `j->'a'->'b' = '1'`. The logic that made this possible was found to create query plans not equivalent to the query, so it was removed. This commit restores the ability to index accelerate chained -> operators. Fixes cockroachdb#55317 Release note (performance improvement): A bug fix included in 20.2.1 for for the JSON fetch value operator, `->`, resulted in chained `->` operators in query filters not being index accelerated, e.g., `j->'a'->'b' = '1'`. Chained `->` are now index accelerated.
Prior to cockroachdb#55316, the optimizer generated inverted index scans on indexed JSON columns when queries had filters with chained fetch value operators, for example `j->'a'->'b' = '1'`. The logic that made this possible was found to create query plans not equivalent to the query, so it was removed. This commit restores the ability to index accelerate chained -> operators. Fixes cockroachdb#55317 Release note (performance improvement): A bug fix included in 20.2.1 for for the JSON fetch value operator, `->`, resulted in chained `->` operators in query filters not being index accelerated, e.g., `j->'a'->'b' = '1'`. Chained `->` are now index accelerated.
Prior to cockroachdb#55316, the optimizer generated inverted index scans on indexed JSON columns when queries had filters with chained fetch value operators, for example `j->'a'->'b' = '1'`. The logic that made this possible was found to create query plans not equivalent to the query, so it was removed. This commit restores the ability to index accelerate chained -> operators. Fixes cockroachdb#55317 Release note (performance improvement): A bug fix included in 20.2.1 for for the JSON fetch value operator, `->`, resulted in chained `->` operators in query filters not being index accelerated, e.g., `j->'a'->'b' = '1'`. Chained `->` are now index accelerated.
59494: opt: index accelerate chained fetch value operators r=rytaft a=mgartner #### opt: add test for JSON fetch val inverse Release note: None #### opt: index accelerate chained fetch value operators Prior to #55316, the optimizer generated inverted index scans on indexed JSON columns when queries had filters with chained fetch value operators, for example `j->'a'->'b' = '1'`. The logic that made this possible was found to create query plans not equivalent to the query, so it was removed. This commit restores the ability to index accelerate chained -> operators. Fixes #55317 Release note (performance improvement): A bug fix included in 20.2.1 for for the JSON fetch value operator, `->`, resulted in chained `->` operators in query filters not being index accelerated, e.g., `j->'a'->'b' = '1'`. Chained `->` are now index accelerated. Co-authored-by: Marcus Gartner <[email protected]>
This commit removes an invalid normalization from the NormalizeVisitor. It was previously discovered that transforming expressions in the form `j->'a' = '1'` to `j @> '{"a": 1}'` is invalid (see cockroachdb#49143). This transformation rule was removed from the optimizer in cockroachdb#55316. But the same transformation was not removed from the NormalizeVisitor. This visitor is only used to normalize scalar expressions in table descriptors (`DEFAULT` expressions, computed column expressions, and partial index predicates) during a backfill. Fixes cockroachdb#75097 Release note (bug fix): A bug has been fixed that caused incorrect values to be written to computed columns when their expressions were of the form `j->x = y`, where `j` is a `JSON` column and `x` and `y` are constants. This bug also caused corruption of partial indexes with `WHERE` clauses containing expressions of the same form. This bug was present since version 2.0.0.
75908: scripts: per-branch bump-pebble.sh script r=jbowens a=nicktrav Currently, the master branch, in addition to each release branch relies on the same `bump-pebbble.sh` from the master branch. There are subtle differences between master and the release branches (i.e. build system) that ends up breaking the script as the changes are introduced on the master branch but not backported to the release branches. One solution to this problem is to continue to maintain one script on the master branch, but include switching logic for each release branch to account for the differences. An alternative approach is to have a script per release branch. Rather than having switching logic, the script has the appropriate logic for that branch. When a new release branch is cut, the script inherits the most up-to-date logic from master, and all that needs to change is the name of the branch and the corresponding Pebble branch. Pin the `bump-pebble.sh` script to the master branch. The script will error out if it is run from a different branch. Release note: None 75914: tree: remove invalid normalization r=mgartner a=mgartner This commit removes an invalid normalization from the NormalizeVisitor. It was previously discovered that transforming expressions in the form `j->'a' = '1'` to `j @> '{"a": 1}'` is invalid (see #49143). This transformation rule was removed from the optimizer in #55316. But the same transformation was not removed from the NormalizeVisitor. This visitor is only used to normalize scalar expressions in table descriptors (`DEFAULT` expressions, computed column expressions, and partial index predicates) during a backfill. Fixes #75907 Release note (bug fix): A bug has been fixed that caused incorrect values to be written to computed columns when their expressions were of the form `j->x = y`, where `j` is a `JSON` column and `x` and `y` are constants. This bug also caused corruption of partial indexes with `WHERE` clauses containing expressions of the same form. This bug was present since version 2.0.0. 75963: batcheval: use same stats timestamp for `AddSSTable` assertions r=rhu713 a=erikgrinaker Resolves #75643. Resolves #75642. Release note: None Co-authored-by: Nick Travers <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Erik Grinaker <[email protected]>
sql: fix JSON fetch value operator evaluation
This commit fixes incorrect evaluation of the JSON fetch value operator,
->
. There were two bugs causing incorrect evaluation.The first issue was a result of optimizer normalization rules that
produced unequivalent scalar expressions when converting
->
to@>
.The rules would convert expressions like
j -> '"a"' = 1
toj @> '{"a": 1}'
. This is invalid because->
results inNULL
whenthe LHS does not contain the RHS key, but the resulting
@>
expressionis always either
true
orfalse
, neverNULL
. These normalizationrules have been removed.
These two rules existed to provide inverted index-acceleration for
queries with
->
, because the optimizer can only index-accerlate@>
operators during exploration. As a result of their removal, queries with
->
operators are no longer index-accelerated. This will be remedied ina future commit.
The second issue was a bug in the vectorized overload of the
->
operator. Previously, when the operator evaluated to
NULL
with twonon-
NULL
inputs, the resultingNULL
would not be tracked by theNulls
struct.Fixes #49143
Release note (bug fix): Previously, the JSON fetch value operator,
->
,would evaluate incorrectly in some cases. This has been fixed.
opt: index-accelerate equalities with JSON fetch expressions
This commit allows inverted indexes to be scanned to satisfy query
filters in the form:
j->'a' = '1'
. The optimizer had previouslysupported this by normalizing these expressions into expressions with
JSON containment operators,
@>
, but it lost this support when thesenormalization rules were found to produce inequivalent expressions.
Note that query filters of several forms, which were previously
accelerated via the incorrect normalization rules, are not accelerated
as part of this commit, such as:
Release note: None