Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
84324: sqlsmith: make order-dependent aggregation functions deterministic r=msirek,mgartner a=michae2

**cmd: add smith command**

Add a new top-level command `smith` which dumps randomly-generated
sqlsmith queries. This is useful for testing modifications to sqlsmith.

Assists: #83024

Release note: None

**sqlsmith: make order-dependent aggregation functions deterministic**

Some aggregation functions (e.g. string_agg) have results that depend
on the order of input rows. To make sqlsmith more deterministic, add
ORDER BY clauses to these aggregation functions whenever their argument
is a column reference. (When their argument is a constant, ordering
doesn't matter.)

Fixes: #83024

Release note: None

84398: colflow: prevent deadlocks when many queries spill to disk at same time r=yuzefovich a=yuzefovich

**colflow: prevent deadlocks when many queries spill to disk at same time**

This commit fixes a long-standing issue which could cause
memory-intensive queries to deadlock on acquiring the file descriptors
quota when vectorized execution spills to disk. This bug has been
present since the introduction of disk-spilling (over two and a half
years ago, introduced in #45318 and partially mitigated in #45892), but
we haven't seen this in any user reports, only in `tpch_concurrency`
roachtest runs, so the severity seems pretty minor.

Consider the following query plan:
```
   Node 1                   Node 2

TableReader              TableReader
    |                         |
HashRouter                HashRouter
    |     \  ___________ /    |
    |      \/__________       |
    |      /           \      |
HashAggregator         HashAggregator
```
and let's imagine that each hash aggregator has to spill to disk. This
would require acquiring the file descriptors quota. Now, imagine that
because of that hash aggregators' spilling, each of the hash routers has
slow outputs causing them to spill too. As a result, this query plan can
require `A + 2 * R` number of FDs of a single node to succeed where `A`
is the quota for a single hash aggregator (equal to 16 - with the
default value of `COCKROACH_VEC_MAX_OPEN_FDS` environment variable which
is 256) and `R` is the quota for a single router output (2). This means
that we can estimate that 20 FDs from each node are needed for the query
to finish execution with 16 FDs being acquired first.

Now imagine that this query is run with concurrency of 16. We can end up
in such a situation that all hash aggregators have spilled, fully
exhausting the global node limit on each node, so whenever the hash
router outputs need to spill, they block forever since no FDs will ever
be released, until a query is canceled or a node is shutdown. In other
words, we have a deadlock.

This commit fixes this situation by introducing a retry mechanism to
exponentially backoff when trying to acquire the FD quota, until a time
out. The randomizations provided by the `retry` package should be
sufficient so that some of the queries succeed while others result in
an error.

Unfortunately, I don't see a way to prevent this deadlock from occurring
in the first place without possible increase in latency in some case.
The difficult thing is that we currently acquire FDs only once we need
them, meaning once a particular component spills to disk. We could
acquire the maximum number of FDs that a query might need up-front,
before the query execution starts, but that could lead to starvation of
the queries that ultimately won't spill to disk. This seems like a much
worse impact than receiving timeout errors on some analytical queries
when run with high concurrency. We're not an OLAP database, so this
behavior seems ok.

Fixes: #80290.

Release note (bug fix): Previously, CockroachDB could deadlock when
evaluating analytical queries f multiple queries had to spill to disk
at the same time. This is now fixed by making some of the queries error
out instead.

**roachtest: remove some debugging printouts in tpch_concurrency**

This was added to track down the deadlock fixed in the previous commit,
so we no longer need it.

Release note: None

84430: sql/schemachanger/scplan: allow plan to move to NonRevertible earlier r=ajwerner a=ajwerner

This is critical for DROP COLUMN. In `DROP COLUMN` we cannot perform the
primary index swap until the dropping column reaches `DELETE_ONLY`, which is
not revertible. The primary index swap itself is revertible. Given this fact,
we need a mechanism to be able to "promote" revertible operations (operations
which don't destroy information or cause irrevocable visible side effects) to
be grouped with or after non-revertible operations. This commit makes that
happen naturally.

Release note: None

84433: rowexec: increase the batch size for join reader ordering strategy r=yuzefovich a=yuzefovich

This commit increases the default value of
`sql.distsql.join_reader_ordering_strategy.batch_size` cluster setting
(which determines the input row batch size for the lookup joins when
ordering needs to be maintained) from 10KiB to 100KiB since the previous
number is likely to have been too conservative. We have seen support
cases (https://github.com/cockroachlabs/support/issues/1627) where
bumping up this setting was needed to get reasonable performance, we
also change this setting to 100KiB in our TPC-E setup
(https://github.com/cockroachlabs/tpc-e/blob/d47d3ea5ce71ecb1be5e0e466a8aa7c94af63d17/tier-a/src/schema.rs#L404).

I did some historical digging, and I believe that the number 10KiB was
chosen somewhat arbitrarily with no real justification in #48058. That
PR changed how we measure the input row batches from the number of rows
to the memory footprint of the input rows. Prior to that change we had
100 rows limit, so my guess that 10KiB was somewhat dependent on that
number.

The reason we don't want this batch size to be too large is that we
buffer all looked up rows in a disk-backed container, so if too many
responses come back (because we included too many input rows in the
batch), that container has to spill to disk. To make sure we don't
regress in such scenarios this commit teaches the join reader to lower
the batch size bytes limit if the container does spill to disk, until
10 KiB which is treated as the minimum.

Release note: None

Co-authored-by: Michael Erickson <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
  • Loading branch information
4 people committed Jul 14, 2022
5 parents 2546b2a + f118f5d + 6189bb3 + 20acd12 + e9f1c05 commit e8818d5
Show file tree
Hide file tree
Showing 84 changed files with 1,120 additions and 621 deletions.
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,7 @@
/pkg/cmd/skip-test/ @cockroachdb/test-eng
/pkg/cmd/skiperrs/ @cockroachdb/sql-experience
/pkg/cmd/skipped-tests/ @cockroachdb/test-eng
/pkg/cmd/smith/ @cockroachdb/sql-queries
/pkg/cmd/smithcmp/ @cockroachdb/sql-queries
/pkg/cmd/smithtest/ @cockroachdb/sql-queries
/pkg/cmd/teamcity-trigger/ @cockroachdb/dev-inf
Expand Down
3 changes: 3 additions & 0 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -860,6 +860,8 @@ GO_TARGETS = [
"//pkg/cmd/skiperrs:skiperrs_lib",
"//pkg/cmd/skipped-tests:skipped-tests",
"//pkg/cmd/skipped-tests:skipped-tests_lib",
"//pkg/cmd/smith:smith",
"//pkg/cmd/smith:smith_lib",
"//pkg/cmd/smithcmp:smithcmp",
"//pkg/cmd/smithcmp:smithcmp_lib",
"//pkg/cmd/smithtest:smithtest",
Expand Down Expand Up @@ -2107,6 +2109,7 @@ GET_X_DATA_TARGETS = [
"//pkg/cmd/skip-test:get_x_data",
"//pkg/cmd/skiperrs:get_x_data",
"//pkg/cmd/skipped-tests:get_x_data",
"//pkg/cmd/smith:get_x_data",
"//pkg/cmd/smithcmp:get_x_data",
"//pkg/cmd/smithtest:get_x_data",
"//pkg/cmd/teamcity-trigger:get_x_data",
Expand Down
79 changes: 29 additions & 50 deletions pkg/ccl/schemachangerccl/testdata/end_to_end/create_index
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ notified job registry to adopt jobs: [1]
begin transaction #2
commit transaction #2
begin transaction #3
## PostCommitPhase stage 1 of 7 with 3 MutationType ops
## PostCommitPhase stage 1 of 6 with 3 MutationType ops
upsert descriptor #104
...
- ABSENT
Expand Down Expand Up @@ -314,14 +314,14 @@ upsert descriptor #104
unexposedParentSchemaId: 101
- version: "2"
+ version: "3"
update progress of schema change job #1: "PostCommitPhase stage 2 of 7 with 1 BackfillType op pending"
update progress of schema change job #1: "PostCommitPhase stage 2 of 6 with 1 BackfillType op pending"
commit transaction #3
begin transaction #4
## PostCommitPhase stage 2 of 7 with 1 BackfillType op
## PostCommitPhase stage 2 of 6 with 1 BackfillType op
backfill indexes [2] from index #1 in table #104
commit transaction #4
begin transaction #5
## PostCommitPhase stage 3 of 7 with 3 MutationType ops
## PostCommitPhase stage 3 of 6 with 3 MutationType ops
upsert descriptor #104
...
- PUBLIC
Expand Down Expand Up @@ -350,10 +350,10 @@ upsert descriptor #104
unexposedParentSchemaId: 101
- version: "3"
+ version: "4"
update progress of schema change job #1: "PostCommitPhase stage 4 of 7 with 1 MutationType op pending"
update progress of schema change job #1: "PostCommitPhase stage 4 of 6 with 1 MutationType op pending"
commit transaction #5
begin transaction #6
## PostCommitPhase stage 4 of 7 with 3 MutationType ops
## PostCommitPhase stage 4 of 6 with 3 MutationType ops
upsert descriptor #104
...
- PUBLIC
Expand Down Expand Up @@ -382,29 +382,35 @@ upsert descriptor #104
unexposedParentSchemaId: 101
- version: "4"
+ version: "5"
update progress of schema change job #1: "PostCommitPhase stage 5 of 7 with 1 BackfillType op pending"
update progress of schema change job #1: "PostCommitPhase stage 5 of 6 with 1 BackfillType op pending"
commit transaction #6
begin transaction #7
## PostCommitPhase stage 5 of 7 with 1 BackfillType op
## PostCommitPhase stage 5 of 6 with 1 BackfillType op
merge temporary indexes [3] into backfilled indexes [2] in table #104
commit transaction #7
begin transaction #8
## PostCommitPhase stage 6 of 7 with 1 ValidationType op
## PostCommitPhase stage 6 of 6 with 1 ValidationType op
validate forward indexes [2] in table #104
commit transaction #8
begin transaction #9
## PostCommitPhase stage 7 of 7 with 4 MutationType ops
## PostCommitNonRevertiblePhase stage 1 of 2 with 5 MutationType ops
upsert descriptor #104
...
- PUBLIC
- PUBLIC
- - MERGE_ONLY
- - ABSENT
- PUBLIC
- - WRITE_ONLY
- PUBLIC
- PUBLIC
+ - TRANSIENT_DELETE_ONLY
- PUBLIC
- PUBLIC
+ - PUBLIC
+ - PUBLIC
- WRITE_ONLY
- PUBLIC
jobId: "1"
relevantStatements:
...
BY LIST (id) (PARTITION p1 VALUES IN (1))
statementTag: CREATE INDEX
Expand Down Expand Up @@ -448,7 +454,8 @@ upsert descriptor #104
+ version: 4
+ modificationTime: {}
mutations:
- direction: ADD
- - direction: ADD
+ - direction: DROP
index:
- constraintId: 2
- createdExplicitly: true
Expand Down Expand Up @@ -485,35 +492,6 @@ upsert descriptor #104
- index:
constraintId: 3
createdExplicitly: true
...
time: {}
unexposedParentSchemaId: 101
- version: "5"
+ version: "6"
update progress of schema change job #1: "PostCommitNonRevertiblePhase stage 1 of 2 with 1 MutationType op pending"
set schema change job #1 to non-cancellable
commit transaction #9
begin transaction #10
## PostCommitNonRevertiblePhase stage 1 of 2 with 3 MutationType ops
upsert descriptor #104
...
- PUBLIC
- PUBLIC
- - WRITE_ONLY
+ - TRANSIENT_DELETE_ONLY
- PUBLIC
- PUBLIC
...
- money
version: 4
- modificationTime:
- wallTime: "1640995200000000009"
+ modificationTime: {}
mutations:
- - direction: ADD
+ - direction: DROP
index:
constraintId: 3
...
version: 4
mutationId: 1
Expand All @@ -524,11 +502,12 @@ upsert descriptor #104
...
time: {}
unexposedParentSchemaId: 101
- version: "6"
+ version: "7"
- version: "5"
+ version: "6"
update progress of schema change job #1: "PostCommitNonRevertiblePhase stage 2 of 2 with 2 MutationType ops pending"
commit transaction #10
begin transaction #11
set schema change job #1 to non-cancellable
commit transaction #9
begin transaction #10
## PostCommitNonRevertiblePhase stage 2 of 2 with 4 MutationType ops
upsert descriptor #104
...
Expand Down Expand Up @@ -694,7 +673,7 @@ upsert descriptor #104
- money
version: 4
- modificationTime:
- wallTime: "1640995200000000010"
- wallTime: "1640995200000000009"
- mutations:
- - direction: DROP
- index:
Expand Down Expand Up @@ -737,12 +716,12 @@ upsert descriptor #104
...
time: {}
unexposedParentSchemaId: 101
- version: "7"
+ version: "8"
- version: "6"
+ version: "7"
write *eventpb.FinishSchemaChange to event log for descriptor 104
create job #2 (non-cancelable: true): "GC for "
descriptor IDs: [104]
update progress of schema change job #1: "all stages completed"
commit transaction #11
commit transaction #10
notified job registry to adopt jobs: [2]
# end PostCommitPhase
3 changes: 3 additions & 0 deletions pkg/cmd/dev/build.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,9 @@ var buildTargetMapping = map[string]string{
"roachprod-stress": "//pkg/cmd/roachprod-stress:roachprod-stress",
"roachtest": "//pkg/cmd/roachtest:roachtest",
"short": "//pkg/cmd/cockroach-short:cockroach-short",
"smith": "//pkg/cmd/smith:smith",
"smithcmp": "//pkg/cmd/smithcmp:smithcmp",
"smithtest": "//pkg/cmd/smithtest:smithtest",
"staticcheck": "@co_honnef_go_tools//cmd/staticcheck:staticcheck",
"stress": stressTarget,
"swagger": "@com_github_go_swagger_go_swagger//cmd/swagger:swagger",
Expand Down
2 changes: 1 addition & 1 deletion pkg/cmd/roachtest/tests/query_comparison_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ func runOneRoundQueryComparison(

// Initialize a smither that generates only deterministic SELECT statements.
smither, err := sqlsmith.NewSmither(conn, rnd,
sqlsmith.DisableMutations(), sqlsmith.DisableImpureFns(), sqlsmith.DisableLimits(),
sqlsmith.DisableMutations(), sqlsmith.DisableNondeterministicFns(), sqlsmith.DisableLimits(),
sqlsmith.UnlikelyConstantPredicate(), sqlsmith.FavorCommonData(),
sqlsmith.UnlikelyRandomNulls(), sqlsmith.DisableCrossJoins(),
sqlsmith.DisableIndexHints(), sqlsmith.DisableWith(),
Expand Down
2 changes: 1 addition & 1 deletion pkg/cmd/roachtest/tests/tlp.go
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ func runOneTLP(

// Initialize a smither that will never generate mutations.
tlpSmither, err := sqlsmith.NewSmither(conn, rnd,
sqlsmith.DisableMutations(), sqlsmith.DisableImpureFns())
sqlsmith.DisableMutations(), sqlsmith.DisableNondeterministicFns())
if err != nil {
t.Fatal(err)
}
Expand Down
17 changes: 0 additions & 17 deletions pkg/cmd/roachtest/tests/tpch_concurrency.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ package tests
import (
"context"
"fmt"
"strings"
"time"

"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster"
Expand Down Expand Up @@ -107,22 +106,6 @@ func registerTPCHConcurrency(r registry.Registry) {
// Run each query once on each connection.
for queryNum := 1; queryNum <= tpch.NumQueries; queryNum++ {
t.Status("running Q", queryNum)
// To aid during the debugging later, we'll print the DistSQL
// diagram of the query.
rows, err := conn.Query("EXPLAIN (DISTSQL) " + tpch.QueriesByNumber[queryNum])
if err != nil {
return err
}
defer rows.Close()
for rows.Next() {
var line string
if err = rows.Scan(&line); err != nil {
t.Fatal(err)
}
if strings.Contains(line, "Diagram:") {
t.Status(line)
}
}
// The way --max-ops flag works is as follows: the global ops
// counter is incremented **after** each worker completes a
// single operation, so it is possible for all connections start
Expand Down
22 changes: 22 additions & 0 deletions pkg/cmd/smith/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
load("//build/bazelutil/unused_checker:unused.bzl", "get_x_data")
load("@io_bazel_rules_go//go:def.bzl", "go_binary", "go_library")

go_library(
name = "smith_lib",
srcs = ["main.go"],
importpath = "github.com/cockroachdb/cockroach/pkg/cmd/smith",
visibility = ["//visibility:private"],
deps = [
"//pkg/internal/sqlsmith",
"//pkg/util/randutil",
"@com_github_cockroachdb_errors//:errors",
],
)

go_binary(
name = "smith",
embed = [":smith_lib"],
visibility = ["//visibility:public"],
)

get_x_data(name = "get_x_data")
Loading

0 comments on commit e8818d5

Please sign in to comment.