Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
81576: multitenantccl: fix rare deadlock in the tenant cost controller r=irfansharif a=stevendanna

This fixes a rare deadlock in the main loop of the tenant cost
controller. At the heart of the cause of the deadlock are the facts
that (1) we use 2 state variables (`requestInProgress` and
`requestRequiresRetry`) to control whether to send a new token bucket
request and (2) in the case of a quiescing stopper the main loop
itself writes to a response channel that only it is reading.

The loop logic is only correct if we have at most 1 outstanding bucket
response. It attempts enforce this invariant with the
`requestInProgress` variable. However, it also triggers a bucket
request based on the value of `requestNeedsRetry`:

https://github.com/cockroachdb/cockroach/blob/92947c29c55ff909f50a5e625811d34a1bbe71f7/pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go#L532-L535

Unfortunately, at least one sequence of events (see cockroachdb#81575 for a more
complete theory) can lead to `requestNeedsRetry` and
`requestInProgress` both being true, triggering a new request while
another still has a pending response.

When this happens in the face of a quiescing stopper, the end result
can be the main loop being blocked attempting to send the second
response to itself.

https://github.com/cockroachdb/cockroach/blob/92947c29c55ff909f50a5e625811d34a1bbe71f7/pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go#L408

This small change ensures that we set requestNeedsRetry to false when
setting requestInProgress to true, ensuring that we won't attempt
another send when we already potentially have one outstanding.

Note that we can probably make this more robust by always doing a
non-blocking send from sendBucketResponse in the failure branch and
checking the stopper before the rest of the channels used in that loop,
but I've opted for the smaller change for now.

Fixes cockroachdb#81575

Release note: None

Co-authored-by: Steven Danna <[email protected]>
  • Loading branch information
craig[bot] and stevendanna committed May 25, 2022
2 parents 9024015 + e35fdae commit 709881d
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,8 @@ func (c *tenantSideCostController) sendTokenBucketRequest(ctx context.Context) {
c.run.lastRequestTime = c.run.now
// TODO(radu): in case of an error, we undercount some consumption.
c.run.lastReportedConsumption = c.run.consumption

c.run.requestNeedsRetry = false
c.run.requestInProgress = true

ctx, _ = c.stopper.WithCancelOnQuiesce(ctx)
Expand Down Expand Up @@ -586,7 +588,6 @@ func (c *tenantSideCostController) mainLoop(ctx context.Context) {
c.run.fallbackRateStart = time.Time{}
}
if c.run.requestNeedsRetry || c.shouldReportConsumption() {
c.run.requestNeedsRetry = false
c.sendTokenBucketRequest(ctx)
}
if c.testInstr != nil {
Expand Down

0 comments on commit 709881d

Please sign in to comment.