Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in-mem temp storage disk budget exceeded on multiple DISTINCT ON query #74554

Closed
msirek opened this issue Jan 6, 2022 · 2 comments · Fixed by #80117
Closed

in-mem temp storage disk budget exceeded on multiple DISTINCT ON query #74554

msirek opened this issue Jan 6, 2022 · 2 comments · Fixed by #80117
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community

Comments

@msirek
Copy link
Contributor

msirek commented Jan 6, 2022

Describe the problem

While testing a PR fix, I found a test case which exhausts the 100MB in-mem temp storage pool in the fakedist-disk flavor of logic tests. The query doesn't involve many rows, but it does use multiple DISTINCT ON clauses.

To Reproduce

In the cockroach source code, create file pkg/sql/logictest/testdata/logic_test/distinct_failure with the following contents:

statement ok
CREATE TABLE A(a1 INT, a2 INT, a3 INT, a4 INT, PRIMARY KEY(a1, a2, a3, a4))

statement ok
CREATE TABLE B(b1 INT, b2 INT, b3 INT, b4 INT,
               INDEX (b1, b2) STORING (b3, b4),
               INDEX (b2) STORING (b1, b3, b4),
               INDEX (b3) STORING (b1, b2, b4))

statement ok
CREATE TABLE C(c1 INT, c2 INT, c3 INT, c4 INT)

statement ok
CREATE TABLE D(d1 INT, d2 INT, d3 INT, d4 INT,
               INDEX d (d1) STORING (d2, d3, d4))

# shared column values
statement ok
INSERT INTO A VALUES (0, 0, 0, 0),(1, 10, 100, 1000);
INSERT INTO B VALUES (0, 0, 0, 0),(1, 10, 100, 1000);

# duplicate rows
statement ok
INSERT INTO A VALUES (11, 110, 1100, 11000);
INSERT INTO B VALUES (11, 110, 1100, 11000), (11, 110, 1100, 11000);

# null values
statement ok
INSERT INTO B VALUES (NULL, NULL, NULL, NULL), (NULL, NULL, NULL, NULL);

# duplicates in first three columns
statement ok
INSERT INTO A VALUES (12, 120, 1200, 1), (12, 120, 1200, 2);
INSERT INTO B VALUES (12, 120, 1200, 1), (12, 120, 1200, 2), (12, 120, 1200, 2);

# nulls combined with the duplicates
statement ok
INSERT INTO B VALUES (NULL, 120, 1200, 1)   , (12, NULL, 1200, 2);

# partially shared combinations
statement ok
INSERT INTO A VALUES (2, 20, 200, 2000), (3, 30, 300, 3000), (4, 40, 400, 4000), (5, 50, 500, 5000), (6, 60, 600, 6000), (7, 70, 700, 7000);
INSERT INTO B VALUES (2, 20, 200, 2000), (3, 30, 300, 3000),                     (5, 50, 500, 5000), (6, 60, 600, 6000)                    , (8, 80, 800, 8000), (9, 90, 900, 9000);

# combinations with null values
statement ok
INSERT INTO B VALUES (2, NULL, 200, NULL), (3, 30, 300, NULL)    , (NULL, 40, 400, 4000)  , (NULL, NULL, NULL, 5000)                       , (7, NULL, 700, NULL);
INSERT INTO B VALUES (2, 20, NULL, 200)  , (3, 30, 300, 3000)    , (4, NULL, NULL, NULL)  , (5, 50, NULL, 5000)                            , (7, NULL, 700, NULL);

# combinations with null and unique values
statement ok
INSERT INTO B VALUES (82, NULL, 207, NULL), (NULL, 567, NULL, 789);

# combinations with unique values
statement ok
INSERT INTO A VALUES (15,   55, 555, 5555), (15, 55,   500, 5555), (15, 50, 555,  5555);
INSERT INTO B VALUES (17,   77, 777, 7777), (17, 77,   700, 7777), (17, 70, 777,  7777);
INSERT INTO B VALUES (NULL, 77, 777, 7777), (17, NULL, 777, 7777), (17, 77, NULL, 7777);

# cross column value matches (e.g. a1 = b2)
statement ok
INSERT INTO A VALUES (101, 200, 3000, 40);
INSERT INTO A VALUES (102, 5, 60, 70);
INSERT INTO A VALUES (103, 7, 8, 70);
INSERT INTO A VALUES (104, 5, 5, 5);
INSERT INTO A VALUES (50, 5, 5000, 500);
INSERT INTO A VALUES (80, 11, 110, 11000);
INSERT INTO B VALUES (30, 7, 40, 2);
INSERT INTO B VALUES (120, 80, 90, 10);

# Multiple DISTINCT ON clauses exhausts the 100MB in-mem temp storage pool
query IIII rowsort
SELECT DISTINCT ON (a1,a2,a3,a4) * FROM (
SELECT     *
FROM       A
           WHERE EXISTS (SELECT * FROM B WHERE a1 = b1)
           UNION ALL
SELECT DISTINCT ON (a1,a2,a3,a4) * FROM (
SELECT     *
FROM       A
           WHERE EXISTS (SELECT * FROM B WHERE a2 = b2)
           UNION ALL
SELECT DISTINCT ON (a1,a2,a3,a4) * FROM (
SELECT     *
FROM       A
           WHERE EXISTS (SELECT * FROM B WHERE a3 = b3)
           UNION ALL
SELECT DISTINCT ON (a1,a2,a3,a4) * FROM (
SELECT     *
FROM       A
           WHERE EXISTS (SELECT * FROM B WHERE a4 = b4)
))));
----
0    0    0     0
1    10   100   1000
2    20   200   2000
3    30   300   3000
4    40   400   4000
5    50   500   5000
6    60   600   6000
7    70   700   7000
11   110  1100  11000
12   120  1200  1
12   120  1200  2
80   11   110   11000
15   50   555   5555
103  7    8     70
15   55   500   5555

Next, run the test:

make test PKG=./pkg/sql/logictest TESTS="TestLogic/fakedist-disk/distinct_failure"

The test fails with the following information:

    expected success, but found
    (53100) in-mem temp storage: disk budget exceeded: 1048576 bytes requested, 104857600 currently allocated, 104857600 bytes in budget
    resource.go:57: in NewBudgetExceededError()
logic.go:2444: 
     pq: in-mem temp storage: disk budget exceeded: 1048576 bytes requested, 104857600 currently allocated, 104857600 bytes in budget

Expected behavior
The test should pass.

Additional data / screenshots

Environment:

  • CockroachDB version v22.1.0-alpha
  • Server OS: Linux/Ubuntu 20.04 LTS
  • logic tests

Additional context
Found issue when testing #74303

Jira issue: CRDB-12148

@msirek msirek added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 6, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jan 6, 2022

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I was unable to automatically find someone to ping.

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-untriaged blathers was unable to find an owner labels Jan 6, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jan 7, 2022
@yuzefovich yuzefovich removed X-blathers-untriaged blathers was unable to find an owner T-sql-queries SQL Queries Team labels Jan 7, 2022
@yuzefovich
Copy link
Member

I saw this flake on my one of PRs now that #74303 has been merged. We need to fix the flake somehow.

msirek pushed a commit to msirek/cockroach that referenced this issue Apr 8, 2022
This commit comments out a flaky test which will be fixed by cockroachdb#74554.

Release note: none
craig bot pushed a commit that referenced this issue Apr 9, 2022
79667: logictest: Ignore test flake in disjunction_in_join r=msirek a=msirek

This commit comments out a flaky test which will be fixed by #74554.

Release note: none

79669: dev: address paper cuts and improve `dev <cmd> -h` consistency r=irfansharif a=irfansharif

- Disable test sharding under --rewrite; sharded package test runners
  can trample over one another when updating data files
- Generate cgo files with naked `dev gen`
- Remove the `dev gen go+docs` target but retain the fast path when both
  the `go` and `docs` targets are specified (`dev gen go docs` for e.g.)
- Remove the `all` and `all_tests` moniker for `dev test`, and retain
  the sanitized path when testing everything using `dev test pkg/...`.
  The latter syntax feels closer to the `go test` expansion.
- Edit some help text padding to wrap more cleanly in terminals

Release note: None

79689: ci: build `cockroach-sql` binary r=jlinder a=rail

Previously, the `cockroach-sql` binary was added to the source code, but
has never been added as a compilation target in our CI.

This patch enables building `cockroach-sql` as a part of CI.

Release note: None

79690: changefeedccl: Make scan request size configurable. r=miretskiy a=miretskiy

Add a `changefeed.backfill.scan_request_size` setting to control
scan request size during backfill.  The default is maintained
at 16MB.  However, some latency sensitive environments may choose
to lower this setting, while increasing scan parallelism to
ensure that the latches are held for shorter periods of time.

Release Notes: Add a `changefeed.backfill.scan_request_size` setting
to control scan request size during backfill.

Relese Justification: Low danger stability and performance improvement.

Co-authored-by: Mark Sirek <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Rail Aliiev <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
@craig craig bot closed this as completed in a24275c Apr 18, 2022
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants