Skip to content

Commit

Permalink
c/controller_backend: try to force-abort reconfiguration only on leaders
Browse files Browse the repository at this point in the history
Previously, when force-aborting a reconfiguration, we appended an
aborting configuration on all replicas. This can lead to log inconsistencies
as on followers the configuration will be duplicated (one from own append,
one replicated by the leader). Although these inconsistencies are
expected for force-abort, if the leader is alive, we can minimize the chance
of their appearance by waiting on followers for the aborting config to be
replicated from the leader.

Fixes #17847

(cherry picked from commit 8e221d3)
  • Loading branch information
ztlpn committed Apr 24, 2024
1 parent 83f04be commit 0b97630
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion src/v/cluster/controller_backend.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1777,12 +1777,20 @@ ss::future<std::error_code> controller_backend::force_abort_replica_set_update(
}
co_return errc::waiting_for_recovery;
} else {
auto current_leader = partition->get_leader_id();
if (current_leader && current_leader != _self) {
// The leader is alive and we are a follower. Wait for the leader to
// replicate the aborting configuration, but don't append it
// ourselves to minimize the chance of log inconsistency.
co_return errc::not_leader;
}

auto ec = co_await partition->force_abort_replica_set_update(rev);

if (ec) {
co_return ec;
}
auto current_leader = partition->get_leader_id();
current_leader = partition->get_leader_id();
if (!current_leader.has_value() || current_leader == _self) {
co_return check_configuration_update(
_self, partition, replicas, rev);
Expand Down

0 comments on commit 0b97630

Please sign in to comment.