Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: internal error when recursive CTE query refers to an outer CTE #54324

Closed
RaduBerinde opened this issue Sep 14, 2020 · 0 comments · Fixed by #54325
Closed

opt: internal error when recursive CTE query refers to an outer CTE #54324

RaduBerinde opened this issue Sep 14, 2020 · 0 comments · Fixed by #54325
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@RaduBerinde
Copy link
Member

CREATE TABLE ab (a INT PRIMARY KEY, b INT);
INSERT INTO ab VALUES (1,1);

WITH
  cte1 AS MATERIALIZED (SELECT a FROM ab WHERE a = b)
SELECT * FROM
  (
    WITH RECURSIVE
      cte2 (x) AS (SELECT 1 UNION ALL SELECT x + a FROM cte2, cte1 WHERE x < 10)
    SELECT * FROM cte2
  );

results in internal error: interface conversion: exec.Node is *explain.Node, not *sql.bufferNode.

This happens when explain infrastructure is in use (which is always the case for the first instance of a statement fingerprint); the scan buffer node was built against the explain factory, but the inner CTE query is built against the regular factory.

@RaduBerinde RaduBerinde added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Sep 14, 2020
@RaduBerinde RaduBerinde self-assigned this Sep 14, 2020
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Sep 14, 2020
When the new explain infrastructure is in use, the plan is built
against an explain.Factory but the "inner" recursive CTE plan is built
against a normal factory. This leads to an internal error. To avoid
this, we unwrap the node in `ConstructScanBuffer`.

Fixes cockroachdb#54324.

Release note (bug fix): fixed an internal error in some cases when
recursive CTEs are used.
craig bot pushed a commit that referenced this issue Sep 14, 2020
54230: kv: disallow 1PC evaluation when heartbeating and txn record present r=nvanbenschoten a=nvanbenschoten

Fixes #53403.
Fixes #53518.
Fixes #53772
Fixes #54094.

This commit disables one-phase commit evaluation for transactions with existing PENDING transaction records. As we saw in #53518 (comment), failing to do so can lead to a transaction that commits on the one-phase commit fast-path but still has a PENDING transaction record. It doesn't seem like this can actually cause serious correctness issues today other than in the presence of replays because a 1PC transaction does not have remote intents (by definition). However, it was creating the appearance of abandoned transaction records in `kv/contention/nodes=4` and causing that test to fail.

This commit needs a backport to release-20.2 and release-20.1. This was not an issue before release-20.1 because before then, we would never begin a transaction's heartbeat loop for 1PC transactions. This changed in v20.1 because of unreplicated locks. We allow transactions that acquire unreplicated locks to still hit the one-phase commit fast-path, but we also need to start heartbeating once a transaction has acquired any locks so that it doesn't get aborted by conflicting transactions.

In the vast majority of these cases, the heartbeat loop will never actually fire (for any txn that takes less than 1s), so with this change, we'll still be able to perform a 1PC evaluation. However, this is adding in a disk read for those cases, which is a little disappointing but doesn't seem easy to avoid without disabling the heartbeat loop before issuing the 1PC batch (another alternative, happy to discuss). The upside of this is that we now have enough information on the server to avoid a bit of work for 1PC txns that have not previously acquired locks (see TODO in evaluate1PC).

54325: opt: unwrap explain.Node in ConstructScanBuffer r=RaduBerinde a=RaduBerinde

When the new explain infrastructure is in use, the plan is built
against an explain.Factory but the "inner" recursive CTE plan is built
against a normal factory. This leads to an internal error. To avoid
this, we unwrap the node in `ConstructScanBuffer`.

Note that the new explain infrastructure is used automatically for the
first instance of a query fingerprint, in order to populate the plan
in the UI.

Fixes #54324.

Release note (bug fix): fixed an internal error in some cases when
recursive CTEs are used.


54350: roachtest: disable load-based splitting in copy/bank/rows=10000000,nodes=9,txn=false r=nvanbenschoten a=nvanbenschoten

Fixes #54301.

Speculative fix for that roachtest.

54351: workload/schemachange: add enum support r=otan a=ajwerner

This commit adds support for creating enums, adding enum columns, and using
enums to insert data into rows. This found a bug so I'm pleased.

Release note: None

54352: kv: increment bytesSent, not batchSize in kvBatchSnapshotStrategy.sendBatch r=nvanbenschoten a=nvanbenschoten

Partly responsible for #54311.

This commit fixes a bug introduced in #48579 where the snapshot
batch size was increased when each batch was sent instead of the
bytesSent metric. This had two effects:
1. it undermined the memory footprint limit (256 KB) placed on
   snapshot senders by doubling the batch size on each subsequent
   batch.
2. it failed to track the snapshot data rate properly, so the log
   message introduced in #48579 always contained "0 B/s".

This needs to be backported to release-20.1 and release-20.2.

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@craig craig bot closed this as completed in a513dc2 Sep 14, 2020
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Sep 16, 2020
When the new explain infrastructure is in use, the plan is built
against an explain.Factory but the "inner" recursive CTE plan is built
against a normal factory. This leads to an internal error. To avoid
this, we unwrap the node in `ConstructScanBuffer`.

Note that the new explain infrastructure is used automatically for the
first instance of a query fingerprint, in order to populate the plan
in the UI.

Fixes cockroachdb#54324.

Release note (bug fix): fixed an internal error in some cases when
recursive CTEs are used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant