Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: cast UNION inputs to identical types #59148

Closed
RaduBerinde opened this issue Jan 19, 2021 · 2 comments · Fixed by #60560
Closed

opt: cast UNION inputs to identical types #59148

RaduBerinde opened this issue Jan 19, 2021 · 2 comments · Fixed by #60560
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@RaduBerinde
Copy link
Member

Info from #58285: the following case causes an internal error:

CREATE TABLE t (_int4 INT4);
INSERT INTO t VALUES (2);
WITH
  cte (col)
    AS (
      SELECT * FROM (VALUES (1:::INT8))
      UNION
        SELECT
          *
        FROM
          (VALUES ((SELECT _int4 FROM t LIMIT 1)))
    )
SELECT
  col
FROM
  cte;

EXPLAIN (TYPES):

  • root
  │ columns: (col int)
  │
  ├── • render
  │   │ columns: (col int)
  │   │ estimated row count: 2
  │   │ render 0: (column1)[int]
  │   │
  │   └── • union
  │       │ columns: (column1 int)
  │       │ estimated row count: 2
  │       │
  │       ├── • values
  │       │     columns: (column1 int)
  │       │     size: 1 column, 1 row
  │       │     row 0, expr 0: (1)[int]
  │       │
  │       └── • values
  │             columns: (column1 int4)
  │             size: 1 column, 1 row
  │             row 0, expr 0: (@S1)[int4]
  │
  └── • subquery
      │ id: @S1
      │ original sql: (SELECT _int4 FROM t LIMIT 1)
      │ exec mode: one row
      │
      └── • scan
            columns: (_int4 int4)
            estimated row count: 1 (missing stats)
            table: t@primary
            spans: LIMITED SCAN
            limit: 1

We're UNIONing INT (treated as INT8) with INT4 which trips up the vectorized engine.

@RaduBerinde RaduBerinde added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 19, 2021
@RaduBerinde
Copy link
Member Author

A simpler repro:

create table ab (a int4, b int8);
insert into ab values (1,1), (1,2), (2,1), (2,2);
set vectorize = experimental_always;
select a from ab union select b from ab;
ERROR: mismatched types at index 0: expected [int4]	actual [int] 

This was referenced Jan 26, 2021
@yuzefovich
Copy link
Member

Have another repro here.

@RaduBerinde RaduBerinde self-assigned this Feb 14, 2021
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 14, 2021
This change makes the optbuilder more strict with respect to set
operations. Previously, it would only require types to be `Equivalent`
across the two sides. This leads to errors in vectorized execution,
when we e.g. try to union a INT8 with an INT4.

We now require the types to be `Identical`, and we add casts as
necessary. We use the type from the left side in this case. This can
lead to questionable behavior when the right side is a "wider" type,
but we don't have any facility to robustly determine what the type
should be.

Fixes cockroachdb#59148.

Release note (bug fix): fixed execution errors for some queries that
use set operations (UNION / EXCEPT / INTERSECT) where a column has
types of different widths on the two sides (e.g. INT4 vs INT8).
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 17, 2021
This change makes the optbuilder more strict when building set
operations. Previously, it could build expressions which have
corresponding left/right types which are `Equivalent()`, but not
`Identical()`. This leads to errors in vectorized execution, when we
e.g. try to union a INT8 with an INT4.

We now make the types on both sides `Identical()`, adding casts as
necessary. We try to do a best-effort attempt to use the larger
numeric type when possible (e.g. int4->int8, int->float, float->decimal).

Fixes cockroachdb#59148.

Release note (bug fix): fixed execution errors for some queries that
use set operations (UNION / EXCEPT / INTERSECT) where a column has
types of different widths on the two sides (e.g. INT4 vs INT8).
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 17, 2021
This change makes the optbuilder more strict when building set
operations. Previously, it could build expressions which have
corresponding left/right types which are `Equivalent()`, but not
`Identical()`. This leads to errors in vectorized execution, when we
e.g. try to union a INT8 with an INT4.

We now make the types on both sides `Identical()`, adding casts as
necessary. We try to do a best-effort attempt to use the larger
numeric type when possible (e.g. int4->int8, int->float, float->decimal).

Fixes cockroachdb#59148.

Release note (bug fix): fixed execution errors for some queries that
use set operations (UNION / EXCEPT / INTERSECT) where a column has
types of different widths on the two sides (e.g. INT4 vs INT8).
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 17, 2021
This change makes the optbuilder more strict when building set
operations. Previously, it could build expressions which have
corresponding left/right types which are `Equivalent()`, but not
`Identical()`. This leads to errors in vectorized execution, when we
e.g. try to union a INT8 with an INT4.

We now make the types on both sides `Identical()`, adding casts as
necessary. We try to do a best-effort attempt to use the larger
numeric type when possible (e.g. int4->int8, int->float, float->decimal).

Fixes cockroachdb#59148.

Release note (bug fix): fixed execution errors for some queries that
use set operations (UNION / EXCEPT / INTERSECT) where a column has
types of different widths on the two sides (e.g. INT4 vs INT8).
craig bot pushed a commit that referenced this issue Feb 18, 2021
57827: importccl: cleanup how we ignore stmts in IMPORT PGDUMP r=pbardea,miretskiy a=adityamaru

There are two kinds of unsupported statements when processing an IMPORT
PGDUMP:
- Unparseable
- Parseable, but unsupported

Previously, we had a set of regex statements to skip over statements
which fell into the first category. Those which fell into the second
category were skipped when we saw an AST Node of that kind.

This change does not ignore any additional statements, but removes the
regex. It does this by leaning on existing yacc grammar rules, or adding
new "unsupported" rules. These unsupported rules raise a special kind of
error which we catch during IMPORT PGDUMP, and is decorated with
required information.

This change also introduces a `ignore_unsupported` flag to IMPORT PGDUMP
which will allow users to skip all the stmts (of both categories
mentioned above). If not set, we will fail with either a parse or
unsupported stmt error.

Release note (sql change): New IMPORT PGDUMP option `ignore_unsupported`
to skip over all the unsupported PGDUMP stmts. The collection of these
statements will be appropriately documented.

59716: storage: limit RevertRange batches to 32mb r=dt a=dt

The existing limit in key-count/span-count can produce batches in excess of 64mb,
if, for example, they have very large keys. These batches then are rejected for
exceeding the raft command size limit.

This adds an aditional hard-coded limit of 32mb on the write batch to which keys
or spans to clear are added (if the command is executed against a non-Batch the
limit is ignored). The size of the batch is re-checked once every 32 keys.

Release note (bug fix): avoid creating batches that exceed the raft command limit (64mb)  when reverting ranges that contain very large keys.

60466: sql: test namespace entry for multi-region enum is drained properly r=arulajmani a=arulajmani

Testing only patch. When the primary region is removed from a
multi-region database we queue up an async job to reclaim the namespace
entry for the multi-region type descriptor (and its array alias). This
patch adds a test to make sure that the namespace entry is drained
properly even if the job runs into an error.

This patch adds a new testing knob to the type schema changer,
RunAfterAllFailOrCancel, to synch on the `OnFailOrCancel` job
completing.

Release note: None

60499: storage: unsafe key mangling for MVCCIterator r=sumeerbhola a=sumeerbhola

It can uncover bugs in code that misuses unsafe
keys. It uncovered the bug fixed in
https://github.com/cockroachdb/cockroach/pull/60460/files#diff-84108c53fd1e766ef8802b87b394981d3497d87c40d86e084f2ed77bba0ca71a

Release note: None

60560: opt: cast to identical types for set operations r=RaduBerinde a=RaduBerinde

This change makes the optbuilder more strict when building set
operations. Previously, it could build expressions which have
corresponding left/right types which are `Equivalent()`, but not
`Identical()`. This leads to errors in vectorized execution, when we
e.g. try to union a INT8 with an INT4.

We now make the types on both sides `Identical()`, adding casts as
necessary. We try to do a best-effort attempt to use the larger
numeric type when possible (e.g. int4->int8, int->float, float->decimal).

Fixes #59148.

Release note (bug fix): fixed execution errors for some queries that
use set operations (UNION / EXCEPT / INTERSECT) where a column has
types of different widths on the two sides (e.g. INT4 vs INT8).

60601: opt: add test constraining partial index on virtual column r=mgartner a=mgartner

No code changes were necessary in order to generate a constrained scan
for a partial index on a virtual column.

Release note: None

60659: sql: fix dangling zone config on REGIONAL BY ROW -> REGIONAL BY TABLE r=ajstorm a=otan

We previously did not set NumReplicas to 0 for RBR -> RBT conversions.
This means the zone configuration do not cleanup and inherit properly
after the async job completes for dropping old indexes since it no
longer thinks the RBR zone config is a placeholder, leaving a
dangling reference. Patch this up correctly.

N.B.: In theory there's nothing wrong with this. But this patch makes
SHOW ZONE CONFIGURATION consistent when you convert RBR to RBT
compared to making a RBT table directly.

Release note: None



Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: David Taylor <[email protected]>
Co-authored-by: arulajmani <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
@craig craig bot closed this as completed in 4037bd8 Feb 18, 2021
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 24, 2021
This change makes the optbuilder more strict when building set
operations. Previously, it could build expressions which have
corresponding left/right types which are `Equivalent()`, but not
`Identical()`. This leads to errors in vectorized execution, when we
e.g. try to union a INT8 with an INT4.

We now make the types on both sides `Identical()`, adding casts as
necessary. We try to do a best-effort attempt to use the larger
numeric type when possible (e.g. int4->int8, int->float, float->decimal).

Fixes cockroachdb#59148.

Release note (bug fix): fixed execution errors for some queries that
use set operations (UNION / EXCEPT / INTERSECT) where a column has
types of different widths on the two sides (e.g. INT4 vs INT8).
This was referenced Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants