Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: catchup scan quota acquisition #105058

Closed
miretskiy opened this issue Jun 16, 2023 · 1 comment · Fixed by #105083
Closed

kvcoord: catchup scan quota acquisition #105058

miretskiy opened this issue Jun 16, 2023 · 1 comment · Fixed by #105083
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@miretskiy
Copy link
Contributor

miretskiy commented Jun 16, 2023

observed on customer test cluster:


goroutine 1271105048 [select, 319 minutes]:
github.com/cockroachdb/cockroach/pkg/util/quotapool.(*AbstractPool).Acquire(0xc0aab5f340, {0x7281950, 0xc0c85c7320}, {0x7255b30, 0xc0afa1f690})
	github.com/cockroachdb/cockroach/pkg/util/quotapool/quotapool.go:281 +0x75c
github.com/cockroachdb/cockroach/pkg/util/quotapool.(*IntPool).acquireMaybeWait(0xc0b9f430f8, {0x7281950, 0xc0c85c7320}, 0x1, 0x1)
	github.com/cockroachdb/cockroach/pkg/util/quotapool/intpool.go:178 +0x13f
github.com/cockroachdb/cockroach/pkg/util/quotapool.(*IntPool).Acquire(...)
	github.com/cockroachdb/cockroach/pkg/util/quotapool/intpool.go:147
github.com/cockroachdb/cockroach/pkg/util/limit.(*ConcurrentRequestLimiter).Begin(0xc0b9f430e0, {0x72818a8, 0xc0b4d9b800})
	github.com/cockroachdb/cockroach/pkg/util/limit/limiter.go:58 +0x22a
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.acquireCatchupScanQuota({0x72818a8, 0xc0b4d9b800}, 0xc0008f1900, 0xc0b9f430e0)
	github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go:574 +0x99
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*rangefeedMuxer).startSingleRangeFeed(0xc05b1e18f0, {0x72818a8, 0xc0b4d9b800}, {{0xc008327890, 0x16, 0x18}, {0xc0083278d8, 0x16, 0x18}}, {0x1768f5473b4aa152, ...}, ...)
	github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go:213 +0x159
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.divideSpanOnRangeBoundaries({0x72818a8, 0xc0b4d9b800}, 0x18?, {{0xc008327890, 0x16, 0x18}, {0xc0083278d8, 0x16, 0x18}}, {0x1768f5473b4aa152, ...}, ...)
	github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go:394 +0x42f
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.divideAllSpansOnRangeBoundaries.func1.1({0x72818a8, 0xc0b4d9b800})
	github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go:243 +0x157
github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1()
	github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:168 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:72 +0xa5

Determine if there is an issue with muxrangefeed catchup scan quota management.

Jira issue: CRDB-28843

@miretskiy miretskiy added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. T-kv-replication labels Jun 16, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jun 16, 2023

cc @cockroachdb/replication

craig bot pushed a commit that referenced this issue Jun 20, 2023
104401: kvserver: add TestCreateManyUnappliedProbes r=pavelkalinnikov a=tbg

This is the test used for #102953.

This PR also makes modest progress on #75729, not by making log application
stand-alone, but by making it somewhat less convoluted to manufacture log
entries programmatically. It would be desirable to be able to test the
replication layer in CRDB in the same way that Raft allows via the
InteractionEnv[^1].

[^1]: https://github.com/etcd-io/raft/blob/6bf4f7fe3122b064e0a0d76314298dca6f379fc7/interaction_test.go

Epic: none
Release note: none


105083: kvcoord: Release catchup reservation before re-acquire attempt r=miretskiy a=miretskiy

Release catchup scan reservation prior to attemt to re-acquire it.  Failure to do so could result in a stuck mux rangefeed when enough ranges encounter an error, such as range split, prior to receiving the first checkpoint event, that would cause additional attempts to acquire catchup scan quota.

Fixes #105058

Release note (bug fix): Fix a bug in mux rangefeed implementation that may cause mux rangefeed to become stuck if enough ranges encounter certain error concurrently.

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
@craig craig bot closed this as completed in be09f10 Jun 20, 2023
blathers-crl bot pushed a commit that referenced this issue Jun 20, 2023
Release catchup scan reservation prior to attemt to re-acquire
it.  Failure to do so could result in a stuck mux rangefeed when enough
ranges encounter an error at the same time (which can happen if
e.g. a node gets restarted).

Fixes #105058

Release note (bug fix): Fix a bug in mux rangefeed implementation that
may cause mux rangefeed to become stuck if enough ranges enounter
certain error concurrently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant