Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
81576: multitenantccl: fix rare deadlock in the tenant cost controller r=irfansharif a=stevendanna This fixes a rare deadlock in the main loop of the tenant cost controller. At the heart of the cause of the deadlock are the facts that (1) we use 2 state variables (`requestInProgress` and `requestRequiresRetry`) to control whether to send a new token bucket request and (2) in the case of a quiescing stopper the main loop itself writes to a response channel that only it is reading. The loop logic is only correct if we have at most 1 outstanding bucket response. It attempts enforce this invariant with the `requestInProgress` variable. However, it also triggers a bucket request based on the value of `requestNeedsRetry`: https://github.com/cockroachdb/cockroach/blob/92947c29c55ff909f50a5e625811d34a1bbe71f7/pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go#L532-L535 Unfortunately, at least one sequence of events (see cockroachdb#81575 for a more complete theory) can lead to `requestNeedsRetry` and `requestInProgress` both being true, triggering a new request while another still has a pending response. When this happens in the face of a quiescing stopper, the end result can be the main loop being blocked attempting to send the second response to itself. https://github.com/cockroachdb/cockroach/blob/92947c29c55ff909f50a5e625811d34a1bbe71f7/pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go#L408 This small change ensures that we set requestNeedsRetry to false when setting requestInProgress to true, ensuring that we won't attempt another send when we already potentially have one outstanding. Note that we can probably make this more robust by always doing a non-blocking send from sendBucketResponse in the failure branch and checking the stopper before the rest of the channels used in that loop, but I've opted for the smaller change for now. Fixes cockroachdb#81575 Release note: None Co-authored-by: Steven Danna <[email protected]>
- Loading branch information