Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
k/group_manager: return not_coordinator quickly in tx operations
group_manager::attached_partition::catchup_lock can get blocked for extended periods of time. For example in the following scenario: 1. consumer_offsets partition leader gets isolated 2. some group operation acquires a read lock and tries to replicate a batch to the consumer_offsets partition. This operation hangs for an indefinite period of time. 3. the consumer_offsets leader steps down 4. group state cleanup gets triggered, tries to acquire a write lock, hangs until (2) finishes Meanwhile, clients trying to perform any tx group operations will get a coordinator_load_in_progress errors and blindly retry, without even trying to find the real coordinator. Check for leadership without the read lock first to prevent that (this is basically a "double-check" pattern as we have to check the second time under the lock.)
- Loading branch information