Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected hang on subquery #105882

Closed
DerZc opened this issue Jun 30, 2023 · 4 comments · Fixed by #105931
Closed

Unexpected hang on subquery #105882

DerZc opened this issue Jun 30, 2023 · 4 comments · Fixed by #105931
Assignees
Labels
A-sql-vec SQL vectorized engine C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner

Comments

@DerZc
Copy link

DerZc commented Jun 30, 2023

Describe the problem

Please describe the issue you observed, and any steps we can take to reproduce it:

The following program hangs on latest commit version:

USE test;
DROP DATABASE IF EXISTS database61 CASCADE;
CREATE DATABASE database61;
USE database61;

SET CLUSTER SETTING diagnostics.reporting.enabled    = false;
SET CLUSTER SETTING diagnostics.reporting.send_crash_reports = false;
SET CLUSTER SETTING sql.metrics.statement_details.plan_collection.enabled = 'off';
SET CLUSTER SETTING sql.stats.automatic_collection.enabled = 'off';
SET CLUSTER SETTING timeseries.storage.enabled = 'off';
set experimental_enable_hash_sharded_indexes='on';

CREATE TABLE t1 (c0 INT);

UPSERT INTO t1 (c0) VALUES(1);
SELECT (SELECT COUNT(t1a.rowid) FROM t1 AS t1a WHERE ((t1.rowid) IN (SELECT MAX(t1.rowid) FROM t1))) FROM t1;

To Reproduce

What did you do? Describe in your own words.

Expected behavior
A clear and concise description of what you expected to happen.
No hang.

Environment:

  • CockroachDB version 180af61
  • Server OS: Ubuntu 22.04
  • Client app CLI

Jira issue: CRDB-29254

@DerZc DerZc added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jun 30, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jun 30, 2023

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/sql-foundations (found keywords: DROP DATABASE)
  • @cockroachdb/sql-queries (found keywords: plan)
  • @dhartunian (found keywords: metrics)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jun 30, 2023
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jun 30, 2023
@michae2
Copy link
Collaborator

michae2 commented Jun 30, 2023

Thank you for finding this issue, @DerZc!

More information: looks like this only reproduces using the vectorized engine.

# michae2@michae2-crl-mbp ~crdb % cockroach demo -e "SET statement_timeout = '10s'; SET vectorize = on; CREATE TABLE t1 (c0 INT); UPSERT INTO t1 (c0) VALUES (1); SELECT (SELECT COUNT(t1a.rowid) FROM t1 AS t1a WHERE ((t1.rowid) IN (SELECT MAX(t1.rowid) FROM t1))) FROM t1;"
SET
SET
CREATE TABLE
INSERT 0 1
ERROR: this query requires additional disk space: apply-join-disk: disk budget exceeded: 1048576 bytes requested, 104857600 currently allocated, 104857600 bytes in budget
SQLSTATE: 53100
Failed running "demo"

# michae2@michae2-crl-mbp 1 ~crdb % cockroach demo -e "SET statement_timeout = '10s'; SET vectorize = off; CREATE TABLE t1 (c0 INT); UPSERT INTO t1 (c0) VALUES (1); SELECT (SELECT COUNT(t1a.rowid) FROM t1 AS t1a WHERE ((t1.rowid) IN (SELECT MAX(t1.rowid) FROM t1))) FROM t1;"
SET
SET
CREATE TABLE
INSERT 0 1
  count
---------
      1
(1 row)

And I think this was introduced sometime in v22.1 (though maybe I'm conflating this with the vectorized engine being used by default)?

# michae2@michae2-crl-mbp 1 ~/old_crdbs % ./cockroach-v21.2.17.darwin-10.9-amd64/cockroach demo -e "SET statement_timeout = '10s'; SET vectorize = experimental_always; CREATE TABLE t1 (c0 INT); UPSERT INTO t1 (c0) VALUES (1); SELECT (SELECT COUNT(t1a.rowid) FROM t1 AS t1a WHERE ((t1.rowid) IN (SELECT MAX(t1.rowid) FROM t1))) FROM t1;"
  count
---------
      1
(1 row)

# michae2@michae2-crl-mbp ~/old_crdbs % ./cockroach-v22.1.20.darwin-10.9-amd64/cockroach demo -e "SET statement_timeout = '10s'; SET vectorize = on; CREATE TABLE t1 (c0 INT); UPSERT INTO t1 (c0) VALUES (1); SELECT (SELECT COUNT(t1a.rowid) FROM t1 AS t1a WHERE ((t1.rowid) IN (SELECT MAX(t1.rowid) FROM t1))) FROM t1;"
ERROR: this query requires additional disk space: in-mem temp storage: disk budget exceeded: 1048576 bytes requested, 104857600 currently allocated, 104857600 bytes in budget
SQLSTATE: 53100
Failed running "demo"

@michae2 michae2 added the A-sql-vec SQL vectorized engine label Jun 30, 2023
@michae2
Copy link
Collaborator

michae2 commented Jun 30, 2023

The plan looks correct to me:

[email protected]:26257/demoapp/defaultdb> EXPLAIN (OPT, VERBOSE) SELECT (SELECT COUNT(t1a.rowid) FROM t1 AS t1a WHERE ((t1outer.rowid) IN (SELECT MAX(t1inner.rowid) FROM t1 AS t1inner))) FROM t1 AS t1outer;
                                                                      info
-------------------------------------------------------------------------------------------------------------------------------------------------
  project
   ├── columns: count:15
   ├── stats: [rows=1]
   ├── cost: 69.175
   ├── distribution: us-east1
   ├── prune: (15)
   ├── group-by (hash)
   │    ├── columns: t1outer.rowid:2 count:14
   │    ├── grouping columns: t1outer.rowid:2
   │    ├── stats: [rows=1, distinct(2)=1, null(2)=0]
   │    ├── cost: 69.135
   │    ├── key: (2)
   │    ├── fd: (2)-->(14)
   │    ├── distribution: us-east1
   │    ├── left-join-apply
   │    │    ├── columns: t1outer.rowid:2 canary:16
   │    │    ├── stats: [rows=1, distinct(2)=1, null(2)=0]
   │    │    ├── cost: 69.065
   │    │    ├── distribution: us-east1
   │    │    ├── prune: (16)
   │    │    ├── interesting orderings: (+2)
   │    │    ├── scan t1 [as=t1outer]
   │    │    │    ├── columns: t1outer.rowid:2
   │    │    │    ├── stats: [rows=1, distinct(2)=1, null(2)=0]
   │    │    │    │   histogram(2)=  0          1
   │    │    │    │                <--- 878440277775843329
   │    │    │    ├── cost: 30.12
   │    │    │    ├── key: (2)
   │    │    │    ├── distribution: us-east1
   │    │    │    ├── prune: (2)
   │    │    │    ├── interesting orderings: (+2)
   │    │    │    └── unfiltered-cols: (1-4)
   │    │    ├── project
   │    │    │    ├── columns: canary:16
   │    │    │    ├── outer: (2)
   │    │    │    ├── stats: [rows=1]
   │    │    │    ├── cost: 38.885
   │    │    │    ├── fd: ()-->(16)
   │    │    │    ├── distribution: us-east1
   │    │    │    ├── prune: (16)
   │    │    │    ├── semi-join (cross)
   │    │    │    │    ├── outer: (2)
   │    │    │    │    ├── stats: [rows=1, distinct(2)=1, null(2)=0]
   │    │    │    │    ├── cost: 38.845
   │    │    │    │    ├── distribution: us-east1
   │    │    │    │    ├── scan t1 [as=t1a]
   │    │    │    │    │    ├── stats: [rows=1]
   │    │    │    │    │    ├── cost: 29.625
   │    │    │    │    │    ├── distribution: us-east1
   │    │    │    │    │    └── unfiltered-cols: (5-8)
   │    │    │    │    ├── scalar-group-by
   │    │    │    │    │    ├── columns: max:13
   │    │    │    │    │    ├── cardinality: [1 - 1]
   │    │    │    │    │    ├── stats: [rows=1, distinct(13)=1, null(13)=0]
   │    │    │    │    │    ├── cost: 9.15
   │    │    │    │    │    ├── key: ()
   │    │    │    │    │    ├── fd: ()-->(13)
   │    │    │    │    │    ├── distribution: us-east1
   │    │    │    │    │    ├── prune: (13)
   │    │    │    │    │    ├── scan t1,rev [as=t1inner]
   │    │    │    │    │    │    ├── columns: t1inner.rowid:10
   │    │    │    │    │    │    ├── limit: 1(rev)
   │    │    │    │    │    │    ├── stats: [rows=1]
   │    │    │    │    │    │    ├── cost: 9.11
   │    │    │    │    │    │    ├── key: ()
   │    │    │    │    │    │    ├── fd: ()-->(10)
   │    │    │    │    │    │    └── distribution: us-east1
   │    │    │    │    │    └── aggregations
   │    │    │    │    │         └── const-agg [as=max:13, outer=(10)]
   │    │    │    │    │              └── t1inner.rowid:10
   │    │    │    │    └── filters
   │    │    │    │         └── t1outer.rowid:2 = max:13 [outer=(2,13), constraints=(/2: (/NULL - ]; /13: (/NULL - ]), fd=(2)==(13), (13)==(2)]
   │    │    │    └── projections
   │    │    │         └── true [as=canary:16]
   │    │    └── filters (true)
   │    └── aggregations
   │         └── count [as=count:14, outer=(16)]
   │              └── canary:16
   └── projections
        └── count:14 [as=count:15, outer=(14)]
(80 rows)

The tricky thing about this apply join is that it contains a semi-join where the filter depends on a column from outside the apply join. Maybe the column, t1outer.rowid, isn't being passed into the left-apply-join correctly? Somehow causing an infinite loop?

@DrewKimball
Copy link
Collaborator

Looks like it's because the left input of the cross-join has no columns - the vectorized implementation doesn't expect this, and doesn't advance its state when outputting a batch. That's what causes the infinite loop.

DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jun 30, 2023
Previously, the cross-joiner wouldn't advance its internal state when
its left input projected no columns. This would result in an infinite
loop as the right rows were repeatedly emitted. This patch advances
the state when there are no left columns as if values from the left
side were emitted.

Fixes cockroachdb#105882

Release note (bug fix): Fixed a bug introduced in v22.1 that could cause
a join to infinite-loop in rare cases when (1) the join filter is not an
equality and (2) no columns from the left input are returned.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jul 14, 2023
Previously, the cross-joiner wouldn't advance its internal state when
its left input projected no columns. This would result in an infinite
loop as the right rows were repeatedly emitted. This patch advances
the state when there are no left columns as if values from the left
side were emitted.

Fixes cockroachdb#105882

Release note (bug fix): Fixed a bug introduced in v22.1 that could cause
a join to infinite-loop in rare cases when (1) the join filter is not an
equality and (2) no columns from the left input are returned.
craig bot pushed a commit that referenced this issue Jul 15, 2023
105931: colexec: don't infinite loop cross-join with zero-column left input r=DrewKimball a=DrewKimball

Previously, the cross-joiner wouldn't advance its internal state when its left input projected no columns. This would result in an infinite loop as the right rows were repeatedly emitted. This patch advances the state when there are no left columns as if values from the left side were emitted.

Fixes #105882

Release note (bug fix): Fixed a bug introduced in v22.1 that could cause a join to infinite-loop in rare cases when (1) the join filter is not an equality and (2) no columns from the left input are returned.

Co-authored-by: Drew Kimball <[email protected]>
@craig craig bot closed this as completed in 72c8fa1 Jul 15, 2023
blathers-crl bot pushed a commit that referenced this issue Jul 15, 2023
Previously, the cross-joiner wouldn't advance its internal state when
its left input projected no columns. This would result in an infinite
loop as the right rows were repeatedly emitted. This patch advances
the state when there are no left columns as if values from the left
side were emitted.

Fixes #105882

Release note (bug fix): Fixed a bug introduced in v22.1 that could cause
a join to infinite-loop in rare cases when (1) the join filter is not an
equality and (2) no columns from the left input are returned.
blathers-crl bot pushed a commit that referenced this issue Jul 15, 2023
Previously, the cross-joiner wouldn't advance its internal state when
its left input projected no columns. This would result in an infinite
loop as the right rows were repeatedly emitted. This patch advances
the state when there are no left columns as if values from the left
side were emitted.

Fixes #105882

Release note (bug fix): Fixed a bug introduced in v22.1 that could cause
a join to infinite-loop in rare cases when (1) the join filter is not an
equality and (2) no columns from the left input are returned.
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
yuzefovich pushed a commit that referenced this issue Aug 15, 2023
Previously, the cross-joiner wouldn't advance its internal state when
its left input projected no columns. This would result in an infinite
loop as the right rows were repeatedly emitted. This patch advances
the state when there are no left columns as if values from the left
side were emitted.

Fixes #105882

Release note (bug fix): Fixed a bug introduced in v22.1 that could cause
a join to infinite-loop in rare cases when (1) the join filter is not an
equality and (2) no columns from the left input are returned.
yuzefovich pushed a commit that referenced this issue Aug 15, 2023
Previously, the cross-joiner wouldn't advance its internal state when
its left input projected no columns. This would result in an infinite
loop as the right rows were repeatedly emitted. This patch advances
the state when there are no left columns as if values from the left
side were emitted.

Fixes #105882

Release note (bug fix): Fixed a bug introduced in v22.1 that could cause
a join to infinite-loop in rare cases when (1) the join filter is not an
equality and (2) no columns from the left input are returned.
yuzefovich pushed a commit that referenced this issue Aug 16, 2023
Previously, the cross-joiner wouldn't advance its internal state when
its left input projected no columns. This would result in an infinite
loop as the right rows were repeatedly emitted. This patch advances
the state when there are no left columns as if values from the left
side were emitted.

Fixes #105882

Release note (bug fix): Fixed a bug introduced in v22.1 that could cause
a join to infinite-loop in rare cases when (1) the join filter is not an
equality and (2) no columns from the left input are returned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-vec SQL vectorized engine C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants