Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: use of finalized trace in TestStoreRangeRebalance in multi-cpu #2593

Closed
mrtracy opened this issue Sep 21, 2015 · 6 comments
Closed
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered).

Comments

@mrtracy
Copy link
Contributor

mrtracy commented Sep 21, 2015

This occurred on a six-node cluster which was under constant insertion load; splits were occuring and ranges were being rebalanced. At some point the node which was receiving the insertion load crashed with this error; I was able to restart it.

Unfortunately I do not currently have a test which reproduces this; working on that.

Stack trace:

W0919 00:47:53.450129 9746 kv/dist_sender.go:569 failed to invoke *proto.MergeRequest ["\x00tsdcr.store.keycount.2\x00\x01\n\x01\f\x06\x1d\\",""): range 1 was not found
panic: use of finalized Trace:
Name Origin Ts Dur Desc File
c638073448831290.11 localhost:8082 00:47:53.449029 920.404µs node server/node.go:483
c638073448831290.11 localhost:8082 00:47:53.449049 884.354µs ·executing Batch storage/store.go:1198
c638073448831290.11 localhost:8082 00:47:53.449055 876.869µs ··read-write path storage/replica.go:584
c638073448831290.11 localhost:8082 00:47:53.449060 7.317µs ···command queue storage/replica.go:816
c638073448831290.11 localhost:8082 00:47:53.449072 859.503µs ···raft storage/replica.go:875
c638073448831290.11 localhost:8082 00:47:53.449940 0 ·error: *proto.RangeNotFoundError server/node.go:490


goroutine 85 [running]:
github.com/cockroachdb/cockroach/util/tracer.(*Trace).epoch(0xc82230e140, 0x4dac6d0, 0xe, 0x0)
/Users/matt/go/src/github.com/cockroachdb/cockroach/util/tracer/tracer.go:106 +0xda
github.com/cockroachdb/cockroach/util/tracer.(*Trace).Epoch(0xc82230e140, 0x4dac6d0, 0xe, 0x10)
/Users/matt/go/src/github.com/cockroachdb/cockroach/util/tracer/tracer.go:101 +0x4d
github.com/cockroachdb/cockroach/storage.(*Replica).processRaftCommand(0xc8203ea000, 0xc8209338c0, 0x10, 0x2499, 0x1, 0x200000002, 0x140547af3362b531, 0x0, 0x140547af335f193a, 0x6569eff0d6f26a13, ...)
/Users/matt/go/src/github.com/cockroachdb/cockroach/storage/replica.go:941 +0x256
github.com/cockroachdb/cockroach/storage.(*Store).processRaft.func1()
/Users/matt/go/src/github.com/cockroachdb/cockroach/storage/store.go:1498 +0xa39
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc820072d80, 0xc82024ca30)
/Users/matt/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:88 +0x52
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
/Users/matt/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:89 +0x62
@tbg
Copy link
Member

tbg commented Sep 21, 2015

what's on this line here? /Users/matt/go/src/github.com/cockroachdb/cockroach/storage/replica.go:941 +0x256

This error can occur if you have a panic (since the Finalize call is usually in defer()). It can also happen if the trace isn't properly duplicated when the ctx crosses goroutine boundaries (which of course is a bit unexpected, but it's been quite effective at making sure the trace doesn't do that (it should not)).

@tbg
Copy link
Member

tbg commented Oct 14, 2015

ok, I took another look. It looks like the request is aborted with a multiraft.ErrGroupDeleted which turns into a RangeNotFoundError, but a little later the request still pops up in processRaftCommand. This can happen if the group is deleted while the request is pending, in which case it's cancelled explicitly by MultiRaft, but may already have been picked up by the leader.

I also noticed that we're leaking pending commands: r.pendingCmds is written to when the command is being proposed, but it's deleted only in processRaftCommand (but we don't necessarily ever get there).

I think the latter problem can be solved by some refactoring which deals with pendingCmds and errChan simultaneously (i.e. delete from the former after reading from the latter). That change would also greatly reduce the chances of seeing the trace issue again, though only because we'd be "virtually certain" that the cleanup of the pending cmd would precede the eventual processing of the command, which would then run with a "dummy" context and not use the trace. But there isn't any guarantee and I don't see a straightforward solution.

tbg added a commit to tbg/cockroach that referenced this issue Oct 30, 2015
as observed in cockroachdb#2593, the following could happen:

* client proposes command to raft, waits
* command commits, but hasn't reached client yet
* group gets removed, triggering the waiting client
* client decides the request is done, leaves
* committed command executes
* r.pendingCmds still holds the command, so the contained trace
  panics (use-after-finalize).

The issue here is that `Replica`, `Store` and `Raft` are
moving parts, that can't fully move in lockstep with the removal.
(the store provides the Raft storage, so you can't remove the
group under the lock).

Instead, this change first "quiesces" the Replica, after which
it can be dropped from Raft, and, finally, from `Store`.
tbg added a commit to tbg/cockroach that referenced this issue Nov 2, 2015
as observed in cockroachdb#2593, the following could happen:

* client proposes command to raft, waits
* command commits, but hasn't reached client yet
* group gets removed, triggering the waiting client
* client decides the request is done, leaves
* committed command executes
* r.pendingCmds still holds the command, so the contained trace
  panics (use-after-finalize).

The issue here is that `Replica`, `Store` and `Raft` are
moving parts, that can't fully move in lockstep with the removal.
(the store provides the Raft storage, so you can't remove the
group under the lock).

Instead, this change first "quiesces" the Replica, after which
it can be dropped from Raft, and, finally, from `Store`.
tbg added a commit to tbg/cockroach that referenced this issue Nov 2, 2015
as observed in cockroachdb#2593, the following could happen:

* client proposes command to raft, waits
* command commits, but hasn't reached client yet
* group gets removed, triggering the waiting client
* client decides the request is done, leaves
* committed command executes
* r.pendingCmds still holds the command, so the contained trace
  panics (use-after-finalize).

The issue here is that `Replica`, `Store` and `Raft` are
moving parts, that can't fully move in lockstep with the removal.
(the store provides the Raft storage, so you can't remove the
group under the lock).

Instead, this change first "quiesces" the Replica, after which
it can be dropped from Raft, and, finally, from `Store`.
@tbg
Copy link
Member

tbg commented Nov 2, 2015

should be fixed with #2977. If it reoccurs, we'll investigate anew.

@tbg tbg closed this as completed Nov 2, 2015
@tbg
Copy link
Member

tbg commented Nov 3, 2015

Alas. Investigating anew.

@tbg tbg reopened this Nov 3, 2015
@tbg tbg assigned mrtracy and unassigned tbg Nov 4, 2015
@tbg
Copy link
Member

tbg commented Nov 4, 2015

@mrtracy you got this one, right?

@mrtracy
Copy link
Contributor Author

mrtracy commented Nov 5, 2015

Yes, working on it

@tamird tamird changed the title panic: use of finalized trace storage: panic: use of finalized trace in TestStoreRangeRebalance in multi-cpu Nov 5, 2015
@tamird tamird added C-test-failure Broken test (automatically or manually discovered). multi-cpu labels Nov 5, 2015
@tamird tamird changed the title storage: panic: use of finalized trace in TestStoreRangeRebalance in multi-cpu storage: use of finalized trace in TestStoreRangeRebalance in multi-cpu Nov 5, 2015
mrtracy pushed a commit to mrtracy/cockroach that referenced this issue Nov 10, 2015
The cause of the issue occurred as such:

+ Incoming client request to a node creates a trace context.
+ The context is attached to a raft command which is proposed. The command is
added to the 'pending' map in multiraft before being proposed. The client
request will be answered once the proposed command is committed and applied.
+ Concurrently, another raft command changes the configuration of the range's
raft group, removing this node's replica. In existing code, all pending commands
on that node which target that replica are synchronously dismissed with an
error; the trace is therefore finalized.
+ However, while the replica has been removed from the group, the group itself
has not yet been removed from the node; the proposed command can actually commit,
it just commits after the configuration change.
+ When the committed change is applied, it attempts to use the trace, but the
trace has already been finalized.

The fix is to no longer abort pending commands on a replica just because that
replica has been removed from the group; it is not yet safe to immediately
abort pending requests, because they may actually complete. Instead, we do not
abort commands until the group itself is removed (by the range GC queue).
mrtracy pushed a commit to mrtracy/cockroach that referenced this issue Nov 10, 2015
Replica.Quiesce() was added in an attempt to fix cockroachdb#2593; however, it did not fix
that issue, and was not necessary to fix it in the first place. This commit
removes Replica.Quiesce().
mrtracy pushed a commit to mrtracy/cockroach that referenced this issue Nov 10, 2015
The cause of the issue occurred as such:

+ Incoming client request to a node creates a trace context.
+ The context is attached to a raft command which is proposed. The command is
added to the 'pending' map in multiraft before being proposed. The client
request will be answered once the proposed command is committed and applied.
+ Concurrently, another raft command changes the configuration of the range's
raft group, removing this node's replica. In existing code, all pending commands
on that node which target that replica are synchronously dismissed with an
error; the trace is therefore finalized.
+ However, while the replica has been removed from the group, the group itself
has not yet been removed from the node; the proposed command can actually commit,
it just commits after the configuration change.
+ When the committed change is applied, it attempts to use the trace, but the
trace has already been finalized.

The fix is to no longer abort pending commands on a replica just because that
replica has been removed from the group; it is not yet safe to immediately
abort pending requests, because they may actually complete. Instead, we do not
abort commands until the group itself is removed (by the range GC queue).
craig bot pushed a commit that referenced this issue Jan 30, 2024
117544: deps: upgrade Shopify/sarama v1.38.1 to IBM/sarama v1.42.1 r=rharding6373 a=wenyihu6

This patch updates sarama library to the latest version. Note that the ownership
of the sarama library has been transferred from Shopify to IBM.

Fixes: #117522
Release note: none

Here is the list of commits between the two versions upgrade. 
```
d88a48a chore: update CHANGELOG.md to v1.42.1 (#2711)
385b3b4 fix(config): relax ClientID validation after 1.0.0 (#2706)
3364ff0 chore(doc): add CODE_OF_CONDUCT.md
768496e chore(ci): bump actions/dependency-review-action from 3.1.0 to 3.1.1 (#2707)
27710af fix: make fetchInitialOffset use correct protocol (#2705)
a46917f chore(ci): bump actions/dependency-review-action from 2.5.1 to 3.1.0 (#2702)
4168f7c chore(ci): bump ossf/scorecard-action from 2.1.2 to 2.3.1 (#2703)
7155d51 chore(ci): add kafka 3.6.0 to FVT and versions
e0c3c62 fix(txmgr): ErrOffsetsLoadInProgress is retriable
2e077cf Fix default retention time value in offset commit (#2700)
f97ced2 Merge pull request #2678 from lzakharov/fix-data-race-in-async-produce
56d5044 fix: data race in Broker.AsyncProduce
a15034a Fix data race on Broker.done channel (#2698)
82f0e48 Asynchronously close brokers during a RefreshBrokers (#2693)
ee1944b chore(ci): bump github/codeql-action from 2.22.4 to 2.22.5 (#2695)
1d4de95 chore(ci): bump actions/upload-artifact from 3.1.0 to 3.1.3 (#2696)
3ca69a8 chore(doc): add OpenSSF Scorecard badge (#2691)
d2023bf feat(test): add a simple fuzzing example (#2039)
b8b29e1 chore(doc): add OpenSSF badge (#2690)
d38f08c fix(ci): always run CodeQL on every commit (#2689)
c5815ae chore(ci): bump github/codeql-action from 2.2.4 to 2.22.4 (#2686)
3a893f5 Merge pull request #2688 from IBM/dnwe/security-dot-md
7ae18cb fix(ci): ignore markdown changes for dep review
3b0f32e fix(doc): add SECURITY.md for vuln reporting
40ec971 chore(ci): bump actions/checkout from 3.1.0 to 4.1.1 (#2687)
25137dc chore(ci): add Dependency Review Actions
8ce03ed chore(ci): add golangci-lint and gitleaks checks
9658e0e chore(ci): ensure GH actions are pinned by hash
3d56b4c chore(ci): ensure gh permissions are explicit
8892f3f chore(ci): add dependabot to /examples tree
05af18e chore(ci): ossf scorecard.yml (#2683)
c42b2e0 fix(client): ignore empty Metadata responses when refreshing (#2672)
6678dd1 chore(deps): bump the golang-org-x group with 1 update (#2671)
24f1249 fix: pre-compile regex for parsing kafka version (#2663)
64d2044 fix(docs): correct topic name in rebalancing strategy example (#2657)
44f6db5 chore(deps): bump the golang-org-x group with 2 updates (#2661)
e16473b chore(ci): bump docker/setup-buildx-action from 2 to 3 (#2653)
98ec384 fix: use least loaded broker to refresh metadata
4b55bb3 perf: Alloc records in batch (#2646)
05cb9fa fix(consumer): don't retry session if ctx canceled
0b17025 chore(deps): bump the golang-org-x group with 1 update (#2641)
9e75986 chore(ci): bump actions/checkout from 3 to 4
9b0419d fix(consumer): guard against nil client
f3c4194 fix: typo
87229d9 fix: add retry logic to AlterUserScramCredentials
ae5eee5 fix(client): force Event Hubs to use V1_0_0_0 (#2633)
dedd86d fix: make clear that error is configuration issue not server error (#2628)
a4eafb4 chore(proto): doc CreateTopics/JoinGroup fields
503ade3 fix: add paragraph break to fix markdown render
ffaa252 fix(gh): correct issue template comments
78c7b63 chore(gh): add new style issue templates
c7e6bca chore(ci): ignore .md-only changes
09395f6 chore(docs): remove gopkg.in link
261043a chore(ci): add workflow_dispatch to stale
bbf6ee4 chore(ci): improve stale behaviour
b1bf950 chore(docs): add 1.41.0 to CHANGELOG.md
2b4ba74 chore(lint): bump golangci-lint and tweak config
9282d75 fix(doc): add missing doc for mock consumer
e9bd1b8 fix(proto): handle V3 member metadata and empty owned partitions
96c37d1 fix(docs): use go install for fetching tools
5cd9fa6 fix: flaky TestFuncOffsetManager
1bcf2d9 feat(fvt): test wider range of kafkas
827ec18 fix(fvt): reduce minimum compression-ratio metric
d44ebdc fix(fvt): fresh metrics registry for each test
2b54832 fix(fvt): ensure correct version in consumer tests
270f507 chore(fvt): tweak to work across more versions
b4e0554 feat(ci): experiment with tcpdump during FVT
0bb3316 fix(fvt): versioned cfg for invalid topic producer
d4dc7bc fix(examples): sync exactly_once and consumergroup
913b18f fix(fvt): Metadata version in ensureFullyReplicated
8681621 fix(fvt): handle msgset vs batchset
26792a3 feat(fvt): add healthcheck, depends_on and --wait
d2dba29 feat(gzip): switch to klauspost/compress gzip (#2600)
f8daee4 chore(deps): bump github.com/eapache/go-resiliency from 1.3.0 to 1.4.0 (#2598)
868ed33 Merge pull request #2595 from IBM/dnwe/close
b0363d1 fix(test): ensure some more clients are closed
31a8693 fix(fvt): disable keepalive on toxiproxy client
45313c3 fix(test): add missing closes to admin client tests (#2594)
a5b6e6a Merge pull request #2593 from IBM/dnwe/toxiproxy
0b9db06 chore(ci): implement toxiproxy client
4d8bb31 chore(ci): replace toxiproxy client dep
3d7b37f feat(fvt): experiment with per-kafka-version image
bd81a11 chore(fvt): tidyup broker await
8d0df91 chore(deps): bump module github.com/klauspost/compress to v1.16.7
6ff3567 chore(test): fix a couple of leaks
f033fc7 chore(deps): bump github.com/eapache/go-xerial-snappy digest to c322873
0409ed9 chore(deps): bump module github.com/jcmturner/gokrb5/v8 to v8.4.4
9dc4305 chore(deps): bump module github.com/pierrec/lz4/v4 to v4.1.18
108e264 chore(fvt): roll some tests back to DefaultVersion
f4e6453 chore(test): use modern protocol versions in FVT
991b2b0 chore(test): speedup some slow tests
fa7db9a chore(ci): pre-build FVT docker image
e31a540 chore(ci): use latest Go in actions (#2580)
500399c chore(test): ensure all mockresponses use version
43eae9b feat: add new error for MockDeleteTopicsResponse (#2475)
4cde6b3 chore(config): make DefaultVersion V2_1_0_0 (#2574)
8a09ef3 Merge pull request #2575 from IBM/dnwe/mockbroker
f4f435c chore(test): add verbose logger for unittests
03368ff chore(test): ensure MockBroker closed within test
e8808a6 chore(proto): match HeartbeatResponse version (#2576)
be809f9 chore(ci): remove manual go cache
a3024e7 chore(test): add V2_1_0_0 ApiVersions
00741ec feat(proto): add support for TxnOffsetCommit V2
0c39b9f feat(proto): add support for ListOffsetsRequest V4
765bfa3 chore(config): make DefaultVersion V2_0_0_0
bb864d7 fix(test): shutdown MockBroker
1c9ebab Merge pull request #2570 from IBM/dnwe/proto
d9cb01e fix(proto): use full ranges for remaining proto
826ef81 fix(proto): use full range of Txn protocols
09c8186 fix: avoid logging value of proxy.Dialer
76ca69a feat(proto): support for Metadata V6-V10
10dd922 fix(proto): use full range of ListGroupsRequest
4175433 fix(proto): use full range of SyncGroupRequest
29487f1 bug: implement unsigned modulus for partitioning with crc32 hashing
f35d212 fix: a rebalance isn't triggered after adding new partitions
4659dd0 chore(deps): bump the golang-org-x group with 1 update (#2561)
d8d9e73 Merge pull request #2558 from IBM/dnwe/proto
e4bf4df fix(proto): tweak LeaveGroupRequest requiredVersion
24b54f6 fix(proto): use full range of OffsetFetchRequest
57969b4 fix(proto): use full range of HeartbeatRequest
3d1c345 fix(proto): use full range of FindCoordinator
cf96776 chore: add .pre-commit-config
02d1209 chore(ci): tidyup and improve actions workflows
09ced0b fix(proto): use full range of MetadataRequest
8c40629 fix(proto): use up to V3 of OffsetRequest
cdf36d5 fix(proto): use range of OffsetCommitRequest vers
1a8a3ed fix(consumer): support JoinGroup V4
40b52c5 fix(consumer): use full range of FetchRequest vers
5530d61 fix(example): check if msg channel is closed
68312a5 chore(test): add V2_0_0_0 ApiVersions
c10bd1e fix(test): test timing error
82a6d57 fix(proto): correct JoinGroup usage for wider version range
23d4561 chore(proto): permit DeleteGroupsRequest V1
6010af0 chore(proto): permit AlterConfigsRequest V1
c32ffd1 chore(proto): permit CreatePartitionsRequest V1
1532d9f fix(proto): ensure req+resp requiredVersion match (#2548)
2a5f0f6 chore: add 1.40.1 to CHANGELOG.md
4cce955 Fix printing of final metrics
1d8f80e fix(test): use correct v7 mock ProduceResponse
973a9b7 fix(producer): use newer ProduceReq as appropriate
fb761f2 feat: support up to v4 of the ListGroups API (#2541)
017083e fix(consumer): use newer LeaveGroup as appropriate (#2544)
ce1ac25 Merge pull request #2538 from IBM/dnwe/is-valid-version
a9126ad fix(proto): use DeleteRecordsRequest v1
b8cc2b1 feat(proto): add test around supported versions
ee2872c fix(admin): remove group member needs >= 2.4.0
3b82606 fix(proto): correct consumer metadata shim
c240c67 fix(proto): extend txn types for identical versions
40fa609 fix(proto): ensure req+resp requiredVersion match
fa37d61 fix(proto): use DescribeLogDirsRequest v1
3dfbf99 feat(proto): add isValidVersion to protocol types
02c5de3 chore(deps): bump the golang-org-x group with 1 update (#2542)
6d094b8 Merge the two CONTRIBUTING.md's (#2543)
bbee916 chore: rollup fvt kafka to latest three (#2537)
87209f8 Merge pull request #2536 from hindessm/mrh/sleep-when-throttled
102513a test: add throttling support test
5ac5dc0 feat: support for sleeping when throttled
7d7ac52 Implement resolve_canonical_bootstrap_servers_only (#2156)
f07b129 Merge pull request #2533 from hindessm/mrh/extend-throttling-metric-scope
b678d34 chore(typo): trivial typo
e18c6cf feat: refactor throttle metrics to handle more responses
2d7ccb8 feat: add throttleTime function for responses with time.Duration fields
fa93017 feat: add throttleTime function for responses with int32 ms fields
ae24dbf chore(typo): fix field documentation typo
aa72f59 fix: avoiding burning cpu if all partitions are paused (#2532)
34bc8f9 fix: correct unsupported version check (#2528)
c28ecc0 fix(fvt): ensure fully-replicated at test start (#2531)
849c8b1 fix(test): allow testing of skipped test without IsTransactional panic (#2525)
ecf43f4 fix: concurrent issue on updateMetaDataMs (#2522)
e07f521 Merge pull request #2520 from hindessm/mrh/admin-retry-logic
aad8cf3 fix: add retry logic to ListPartitionReassignments
66ef5a9 fix: add retry logic to DescribeCluster
c7ce32f fix: add retry logic to DescribeTopics
80899bf Added support for Kerberos authentication with a credentials cache (KRB5_CCACHE_AUTH). (#2457)
735f33b feat(consumer): use buffer pools for decompression (#2484)
669d2bc fix: comments for PauseAll and ResumeAll (#2523)
63ff8d1 Merge pull request #2519 from hindessm/mrh/fix-retry-logic-again
df58534 fix: use safer condition
3ba807b fix: admin retry logic
08ff0ff Merge pull request #2517 from hindessm/mrh/fix-some-retry-issues
39c18fc fix: off-by-one errors in attempts remaining logging
7888004 fix: admin retry logic
6ecdb50 Merge pull request #2514 from hindessm/main
f6ccc6f fix(test): ubi-minimal seems to be missing zoneinfo files
b53dbe9 fix(tools): remove default duplication from help
d439508 chore(docs): fix iotuil.Discard reference
441b083 chore(typos): random typos spotted while browsing code
12c24a8 chore(deps): bump github.com/stretchr/testify from 1.8.1 to 1.8.3 (#2512)
55ea700 chore(deps): bump github.com/klauspost/compress from 1.15.14 to 1.16.6 (#2513)
2420fcd chore(deps): bump the golang-org-x group with 2 updates
cc72418 chore(ci): bump actions/setup-go from 3 to 4 (#2508)
dcf5196 chore(ci): remove empty scope from dependabot.yml
fb81408 Merge pull request #2504 from EladLeev/golangci-cleanup
5185d46 chore(ci): tweak scope in dependabot.yml
492d4f9 chore(ci): remove scope from dependabot.yml
eb52957 chore(ci): add dependabot.yml
da48ff2 chore(ci): add depguard config
3654162 chore: add 1.40.0 to CHANGELOG.md
0863085 chore(ci): bump golangci, remove deprecated linters
848522f chore(ci): add simple apidiff workflow
ee207f8 chore(ci): fix stale action params
3f22fd3 chore(ci): migrate probot-stale to actions/stale
4b9e8f6 fix: restore (*OffsetCommitRequest) AddBlock func
c2cab9d chore: migrate module to github.com/IBM/sarama
fd35e17 chore: bytes.Equal instead bytes.Compare (#2485)
fd21bd2 chore(ci): remove Shopify/shopify-cla-action (#2489)
9127f1c fix: data race in balance strategy (#2453)
2f8dcd0 fix(mock consumer): HighWaterMarkOffset (#2447)
7dbf0b5 fix: use version 4 of DescribeGroupsRequest only if kafka broker version is >= 2.4
1015b4f chore(deps): bump golang.org/x/net from 0.5.0 to 0.7.0 (#2452)
397cee4 fix: simplify some balance_strategy.go logic
d8bcfcc chore: refresh CHANGELOG.md from github-releases
40329aa chore: add kafka 3.3.2 (#2434)
0b15695 fix(metrics): fix race condition when calling Broker.Open() twice (#2428)
66e60c7 fix(consumer): don't retry FindCoordinator forever (#2427)
```

117678: roachtest: add elastic backup equivalent test for aws r=sumeerbhola a=aadityasondhi

Informs #107770.

Release note: None

Co-authored-by: Wenyi Hu <[email protected]>
Co-authored-by: Aaditya Sondhi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered).
Projects
None yet
Development

No branches or pull requests

3 participants