Skip to content

Commit

Permalink
colflow: prevent deadlocks when many queries spill to disk at same time
Browse files Browse the repository at this point in the history
This commit fixes a long-standing issue which could cause
memory-intensive queries to deadlock on acquiring the file descriptors
quota when vectorized execution spills to disk. This bug has been
present since the introduction of disk-spilling (over two and a half
years ago, introduced in cockroachdb#45318 and partially mitigated in cockroachdb#45892), but
we haven't seen this in any user reports, only in `tpch_concurrency`
roachtest runs, so the severity seems pretty minor.

Consider the following query plan:
```
   Node 1                   Node 2

TableReader              TableReader
    |                         |
HashRouter                HashRouter
    |     \  ___________ /    |
    |      \/__________       |
    |      /           \      |
HashAggregator         HashAggregator
```
and let's imagine that each hash aggregator has to spill to disk. This
would require acquiring the file descriptors quota. Now, imagine that
because of that hash aggregators' spilling, each of the hash routers has
slow outputs causing them to spill too. As a result, this query plan can
require `A + 2 * R` number of FDs of a single node to succeed where `A`
is the quota for a single hash aggregator (equal to 16 - with the
default value of `COCKROACH_VEC_MAX_OPEN_FDS` environment variable which
is 256) and `R` is the quota for a single router output (2). This means
that we can estimate that 20 FDs from each node are needed for the query
to finish execution with 16 FDs being acquired first.

Now imagine that this query is run with concurrency of 16. We can end up
in such a situation that all hash aggregators have spilled, fully
exhausting the global node limit on each node, so whenever the hash
router outputs need to spill, they block forever since no FDs will ever
be released, until a query is canceled or a node is shutdown. In other
words, we have a deadlock.

This commit fixes this situation by introducing a retry mechanism to
exponentially backoff when trying to acquire the FD quota, until a time
out. The randomizations provided by the `retry` package should be
sufficient so that some of the queries succeed while others result in
an error.

Unfortunately, I don't see a way to prevent this deadlock from occurring
in the first place without possible increase in latency in some case.
The difficult thing is that we currently acquire FDs only once we need
them, meaning once a particular component spills to disk. We could
acquire the maximum number of FDs that a query might need up-front,
before the query execution starts, but that could lead to starvation of
the queries that ultimately won't spill to disk. This seems like a much
worse impact than receiving timeout errors on some analytical queries
when run with high concurrency. We're not an OLAP database, so this
behavior seems ok.

Release note (bug fix): Previously, CockroachDB could deadlock when
evaluating analytical queries f multiple queries had to spill to disk
at the same time. This is now fixed by making some of the queries error
out instead.
  • Loading branch information
yuzefovich committed Jul 13, 2022
1 parent 750b231 commit 1c9a092
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 5 deletions.
1 change: 1 addition & 0 deletions pkg/sql/colflow/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ go_library(
"//pkg/util/mon",
"//pkg/util/optional",
"//pkg/util/randutil",
"//pkg/util/retry",
"//pkg/util/syncutil",
"//pkg/util/timeutil",
"//pkg/util/tracing",
Expand Down
39 changes: 34 additions & 5 deletions pkg/sql/colflow/vectorized_flow.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/util/metric"
"github.com/cockroachdb/cockroach/pkg/util/mon"
"github.com/cockroachdb/cockroach/pkg/util/optional"
"github.com/cockroachdb/cockroach/pkg/util/retry"
"github.com/cockroachdb/cockroach/pkg/util/syncutil"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
"github.com/cockroachdb/errors"
Expand Down Expand Up @@ -78,13 +79,41 @@ func newCountingSemaphore(sem semaphore.Semaphore, globalCount *metric.Gauge) *c
return s
}

var acquireTimeoutErr = errors.New(
"acquiring of file descriptors timed out, consider increasing " +
"COCKROACH_VEC_MAX_OPEN_FDS environment variable",
)

func (s *countingSemaphore) Acquire(ctx context.Context, n int) error {
if err := s.Semaphore.Acquire(ctx, n); err != nil {
return err
if s.TryAcquire(n) {
return nil
}
atomic.AddInt64(&s.count, int64(n))
s.globalCount.Inc(int64(n))
return nil
// Currently there is not enough capacity in the semaphore to acquire the
// desired number, so we set up a retry loop that exponentially backs off,
// until either the semaphore opens up or we time out (most likely due to a
// deadlock).
//
// The latter situation is possible when multiple queries already hold some
// of the quota and each of them needs more to proceed resulting in a
// deadlock. We get out of such a deadlock by randomly erroring out one of
// the queries (which would release some quota back to the semaphore) making
// it possible for other queries to proceed.
opts := retry.Options{
InitialBackoff: 100 * time.Millisecond,
Multiplier: 2.0,
RandomizationFactor: 0.25,
MaxBackoff: 10 * time.Second,
}
for r := retry.StartWithCtx(ctx, opts); r.Next(); {
if s.TryAcquire(n) {
return nil
}
}
if ctx.Err() != nil {
return ctx.Err()
}
log.Warning(ctx, "acquiring of file descriptors for disk-spilling timed out")
return acquireTimeoutErr
}

func (s *countingSemaphore) TryAcquire(n int) bool {
Expand Down

0 comments on commit 1c9a092

Please sign in to comment.