Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: excluding json field with -> operator incorrectly returns values where the key doesn't exist #49143

Closed
timgraham opened this issue May 16, 2020 · 2 comments · Fixed by #55316
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner

Comments

@timgraham
Copy link
Contributor

timgraham commented May 16, 2020

Describe the problem

A SQL clause like WHERE NOT (("jsontest"."value" -> 'j') = 'null') unexpectedly (unlike PostgreSQL) returns values where 'j' doesn't exist. Instead, it should return only values where 'j' exists and isn't null.

To Reproduce

CREATE TABLE jsontest (
    value json
);
INSERT INTO jsontest VALUES ('{}');
INSERT INTO jsontest VALUES ('{"a": 1}');
INSERT INTO jsontest VALUES ('{"j": null}');
INSERT INTO jsontest VALUES ('{"j": 1}');
SELECT * FROM jsontest WHERE NOT (("jsontest"."value" -> 'j') = 'null');

Expected behavior

Expected result (as PostgreSQL):

{"j": 1}

Actual results:

   value
------------
  {}
  {"j": 1}
  {"a": 1}

Environment:

  • CockroachDB version 20.1.0

Django 3.1 (to be released in August 2020) adds support for JSONField and this issue came up in a failing test.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels May 16, 2020
@timgraham timgraham changed the title sql: excluding null json keys incorrectly returns values where the key doesn't exist sql: excluding null json field incorrectly returns values where the key doesn't exist May 16, 2020
@timgraham
Copy link
Contributor Author

This also applies to values besides 'null'. For example, given the same setup as in the issue description:

SELECT * FROM jsontest WHERE NOT (("jsontest"."value" ->'a') = '1');

gives no results on PostgreSQL but 3 results on CockroachDB:

     value
---------------
  {}
  {"j": null}
  {"j": 1}

@timgraham timgraham changed the title sql: excluding null json field incorrectly returns values where the key doesn't exist sql: excluding json field with -> operator incorrectly returns values where the key doesn't exist Sep 11, 2020
@jordanlewis
Copy link
Member

This is an interesting issue - the problem is that the optimizer reduces things of the form

SELECT json->key = value

to

SELECT value @> '{key: value}'

which isn't valid when json->key isn't present. @> returns false when there's something not present, but ->key returns null.

@rafiss rafiss added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. labels Sep 11, 2020
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Sep 11, 2020
@mgartner mgartner assigned mgartner and unassigned rytaft Sep 14, 2020
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 7, 2020
This commit fixes incorrect evaluation of the JSON fetch value operator,
`->`. There were two bugs causing incorrect evaluation.

The first issue was a result of optimizer normalization rules that
produced unequivalent scalar expressions when converting `->` to `@>`.
The rules would convert expressions like `j -> '"a"' = 1` to
`j @> '{"a": 1}'`. This is invalid because `->` results in `NULL` when
the LHS does not contain the RHS key, but the resulting `@>` expression
is always either `true` or `false`, never `NULL`. These normalization
rules have been removed.

These two rules existed to provide inverted index-acceleration for
queries with `->`, because the optimizer can only index-accerlate `@>`
operators during exploration. As a result of their removal, queries with
`->` operators are no longer index-accelerated. This will be remedied in
a future commit.

The second issue was a bug in the vectorized overload of the `->`
operator. Previously, when the operator evaluated to `NULL` with two
non-`NULL` inputs, the resulting `NULL` would not be tracked by the
`Nulls` struct.

Fixes cockroachdb#49143

Release note (bug fix): Previously, the JSON fetch value operator, `->`,
would evaluate incorrectly in some cases. This has been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 8, 2020
This commit fixes incorrect evaluation of the JSON fetch value operator,
`->`. There were two bugs causing incorrect evaluation.

The first issue was a result of optimizer normalization rules that
produced unequivalent scalar expressions when converting `->` to `@>`.
The rules would convert expressions like `j -> '"a"' = 1` to
`j @> '{"a": 1}'`. This is invalid because `->` results in `NULL` when
the LHS does not contain the RHS key, but the resulting `@>` expression
is always either `true` or `false`, never `NULL`. These normalization
rules have been removed.

These two rules existed to provide inverted index-acceleration for
queries with `->`, because the optimizer can only index-accerlate `@>`
operators during exploration. As a result of their removal, queries with
`->` operators are no longer index-accelerated. This will be remedied in
a future commit.

The second issue was a bug in the vectorized overload of the `->`
operator. Previously, when the operator evaluated to `NULL` with two
non-`NULL` inputs, the resulting `NULL` would not be tracked by the
`Nulls` struct.

Fixes cockroachdb#49143

Release note (bug fix): Previously, the JSON fetch value operator, `->`,
would evaluate incorrectly in some cases. This has been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 9, 2020
This commit fixes incorrect evaluation of the JSON fetch value operator,
`->`. There were two bugs causing incorrect evaluation.

The first issue was a result of optimizer normalization rules that
produced unequivalent scalar expressions when converting `->` to `@>`.
The rules would convert expressions like `j -> '"a"' = 1` to
`j @> '{"a": 1}'`. This is invalid because `->` results in `NULL` when
the LHS does not contain the RHS key, but the resulting `@>` expression
is always either `true` or `false`, never `NULL`. These normalization
rules have been removed.

These two rules existed to provide inverted index-acceleration for
queries with `->`, because the optimizer can only index-accerlate `@>`
operators during exploration. As a result of their removal, queries with
`->` operators are no longer index-accelerated. This will be remedied in
a future commit.

The second issue was a bug in the vectorized overload of the `->`
operator. Previously, when the operator evaluated to `NULL` with two
non-`NULL` inputs, the resulting `NULL` would not be tracked by the
`Nulls` struct.

Fixes cockroachdb#49143

Release note (bug fix): Previously, the JSON fetch value operator, `->`,
would evaluate incorrectly in some cases. This has been fixed.
craig bot pushed a commit that referenced this issue Oct 9, 2020
55316: sql: fix JSON fetch value operator evaluation r=mgartner a=mgartner

#### sql: fix JSON fetch value operator evaluation

This commit fixes incorrect evaluation of the JSON fetch value operator,
`->`. There were two bugs causing incorrect evaluation.

The first issue was a result of optimizer normalization rules that
produced unequivalent scalar expressions when converting `->` to `@>`.
The rules would convert expressions like `j -> '"a"' = 1` to
`j @> '{"a": 1}'`. This is invalid because `->` results in `NULL` when
the LHS does not contain the RHS key, but the resulting `@>` expression
is always either `true` or `false`, never `NULL`. These normalization
rules have been removed.

These two rules existed to provide inverted index-acceleration for
queries with `->`, because the optimizer can only index-accerlate `@>`
operators during exploration. As a result of their removal, queries with
`->` operators are no longer index-accelerated. This will be remedied in
a future commit.

The second issue was a bug in the vectorized overload of the `->`
operator. Previously, when the operator evaluated to `NULL` with two
non-`NULL` inputs, the resulting `NULL` would not be tracked by the
`Nulls` struct.

Fixes #49143

Release note (bug fix): Previously, the JSON fetch value operator, `->`,
would evaluate incorrectly in some cases. This has been fixed.

#### opt: index-accelerate equalities with JSON fetch expressions

This commit allows inverted indexes to be scanned to satisfy query
filters in the form: `j->'a' = '1'`. The optimizer had previously
supported this by normalizing these expressions into expressions with
JSON containment operators, `@>`, but it lost this support when these
normalization rules were found to produce inequivalent expressions.

Note that query filters of several forms, which were previously
accelerated via the incorrect normalization rules, are not accelerated
as part of this commit, such as:

    j->'a' @> '{"x": "y"}
    j->'a'->'b' = '"c"'

Release note: None


55383: docker: Add hostname to the docker image r=DuskEagle a=jlinder

Before: `hostname` was not installed on the image.

Why: The `hostname` command is used by our publicly available k8s
manifests.

Now: The `hostname` command is installed in the standard base image.

Release note (bug fix): Add the `hostname` command to the docker image so the
image can be used with our helm chart and cockroach-operator.

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: James H. Linder <[email protected]>
@craig craig bot closed this as completed in f5a4779 Oct 9, 2020
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 12, 2020
This commit fixes incorrect evaluation of the JSON fetch value operator,
`->`. There were two bugs causing incorrect evaluation.

The first issue was a result of optimizer normalization rules that
produced unequivalent scalar expressions when converting `->` to `@>`.
The rules would convert expressions like `j -> '"a"' = 1` to
`j @> '{"a": 1}'`. This is invalid because `->` results in `NULL` when
the LHS does not contain the RHS key, but the resulting `@>` expression
is always either `true` or `false`, never `NULL`. These normalization
rules have been removed.

These two rules existed to provide inverted index-acceleration for
queries with `->`, because the optimizer can only index-accerlate `@>`
operators during exploration. As a result of their removal, queries with
`->` operators are no longer index-accelerated. This will be remedied in
a future commit.

The second issue was a bug in the vectorized overload of the `->`
operator. Previously, when the operator evaluated to `NULL` with two
non-`NULL` inputs, the resulting `NULL` would not be tracked by the
`Nulls` struct.

Fixes cockroachdb#49143

Release note (bug fix): Previously, the JSON fetch value operator, `->`,
would evaluate incorrectly in some cases. This has been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Feb 3, 2022
This commit removes an invalid normalization from the NormalizeVisitor.
It was previously discovered that transforming expressions in the form
`j->'a' = '1'` to `j @> '{"a": 1}'` is invalid (see cockroachdb#49143). This
transformation rule was removed from the optimizer in cockroachdb#55316. But the
same transformation was not removed from the NormalizeVisitor. This
visitor is only used to normalize scalar expressions in table
descriptors (`DEFAULT` expressions, computed column expressions, and
partial index predicates) during a backfill.

Fixes cockroachdb#75097

Release note (bug fix): A bug has been fixed that caused incorrect
values to be written to computed columns when their expressions were of
the form `j->x = y`, where `j` is a `JSON` column and `x` and `y` are
constants. This bug also caused corruption of partial indexes with
`WHERE` clauses containing expressions of the same form. This bug was
present since version 2.0.0.
craig bot pushed a commit that referenced this issue Feb 3, 2022
75908: scripts: per-branch bump-pebble.sh script r=jbowens a=nicktrav

Currently, the master branch, in addition to each release branch relies
on the same `bump-pebbble.sh` from the master branch. There are subtle
differences between master and the release branches (i.e. build system)
that ends up breaking the script as the changes are introduced on the
master branch but not backported to the release branches.

One solution to this problem is to continue to maintain one script on
the master branch, but include switching logic for each release branch
to account for the differences.

An alternative approach is to have a script per release branch. Rather
than having switching logic, the script has the appropriate logic for
that branch. When a new release branch is cut, the script inherits the
most up-to-date logic from master, and all that needs to change is the
name of the branch and the corresponding Pebble branch.

Pin the `bump-pebble.sh` script to the master branch. The script will
error out if it is run from a different branch.

Release note: None

75914: tree: remove invalid normalization r=mgartner a=mgartner

This commit removes an invalid normalization from the NormalizeVisitor.
It was previously discovered that transforming expressions in the form
`j->'a' = '1'` to `j @> '{"a": 1}'` is invalid (see #49143). This
transformation rule was removed from the optimizer in #55316. But the
same transformation was not removed from the NormalizeVisitor. This
visitor is only used to normalize scalar expressions in table
descriptors (`DEFAULT` expressions, computed column expressions, and
partial index predicates) during a backfill.

Fixes #75907

Release note (bug fix): A bug has been fixed that caused incorrect
values to be written to computed columns when their expressions were of
the form `j->x = y`, where `j` is a `JSON` column and `x` and `y` are
constants. This bug also caused corruption of partial indexes with
`WHERE` clauses containing expressions of the same form. This bug was
present since version 2.0.0.

75963: batcheval: use same stats timestamp for `AddSSTable` assertions r=rhu713 a=erikgrinaker

Resolves #75643.
Resolves #75642.

Release note: None

Co-authored-by: Nick Travers <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants