From 0b97630d2b60a197424f77e0c748ceb62fe09758 Mon Sep 17 00:00:00 2001
From: Alexey Zatelepin <ztlpn@redpanda.com>
Date: Tue, 23 Apr 2024 12:52:19 +0200
Subject: [PATCH] c/controller_backend: try to force-abort reconfiguration only
 on leaders

Previously, when force-aborting a reconfiguration, we appended an
aborting configuration on all replicas. This can lead to log inconsistencies
as on followers the configuration will be duplicated (one from own append,
one replicated by the leader). Although these inconsistencies are
expected for force-abort, if the leader is alive, we can minimize the chance
of their appearance by waiting on followers for the aborting config to be
replicated from the leader.

Fixes https://github.com/redpanda-data/redpanda/issues/17847

(cherry picked from commit 8e221d36888652b0d79b81925473e4bb80da2351)
---
 src/v/cluster/controller_backend.cc | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/v/cluster/controller_backend.cc b/src/v/cluster/controller_backend.cc
index 6676a2092af3..84beb921b722 100644
--- a/src/v/cluster/controller_backend.cc
+++ b/src/v/cluster/controller_backend.cc
@@ -1777,12 +1777,20 @@ ss::future<std::error_code> controller_backend::force_abort_replica_set_update(
         }
         co_return errc::waiting_for_recovery;
     } else {
+        auto current_leader = partition->get_leader_id();
+        if (current_leader && current_leader != _self) {
+            // The leader is alive and we are a follower. Wait for the leader to
+            // replicate the aborting configuration, but don't append it
+            // ourselves to minimize the chance of log inconsistency.
+            co_return errc::not_leader;
+        }
+
         auto ec = co_await partition->force_abort_replica_set_update(rev);
 
         if (ec) {
             co_return ec;
         }
-        auto current_leader = partition->get_leader_id();
+        current_leader = partition->get_leader_id();
         if (!current_leader.has_value() || current_leader == _self) {
             co_return check_configuration_update(
               _self, partition, replicas, rev);