Fixed Raft voter priority override with single replica topics #10800

mmaslankaprv · 2023-05-16T10:27:35Z

Redpanda Raft implementation exposes an API allowing to override a voter
priority. This is used by the drain manager when a node is in
maintenance mode. In current implementation when the only voter is in
maintenance mode the Raft group is not able to elect a leader as the
reported priority it to low (the priority override in maintenance is set to 0).

Fixed Raft implementation to make sure that it prioritize an
availability over the user priority preference. If a node is the only
voter the priority override is ignored.

Backports Required

Release Notes

Bug Fixes

Fixed not being able to elect a leader in situation when only voter is in maintenance mode

tests/rptest/tests/maintenance_test.py

ztlpn · 2023-05-16T18:41:45Z

tests/rptest/tests/maintenance_test.py

+
+        target = random.choice(self.redpanda.nodes)
+
+        self._enable_maintenance(target)


I wonder, what will happen with the maintenance status in this case? Presumably the operator will wait for the node to become fully drained before rebooting it and this will fail because leaders for single-replica topics have nowhere to move.

Drain manager doesn't care about leaders of single replica partitions. It is the same for the case where a maintenance mode is enabled and node wasn't restarted - single replica partition leaders will stay in place.

bharathv · 2023-05-16T21:08:29Z

Patch looks fine but I'm wondering how the only replica ended up leaderless in the first place, we have checks against it? The sequence of actions in drain_manager is..

block_new_leadership -- override voter priority to 0
transfer_leadership -- find another replica to be the leader.

(2) always returns error, no?

mmaslankaprv · 2023-05-17T13:36:34Z

Patch looks fine but I'm wondering how the only replica ended up leaderless in the first place, we have checks against it? The sequence of actions in drain_manager is..

block_new_leadership -- override voter priority to 0

transfer_leadership -- find another replica to be the leader.

(2) always returns error, no?

Node restart is critical in this case

Redpanda Raft implementation exposes an API allowing to override a voter priority. This is used by the drain manager when a node is in maintenance mode. In current implementation when the only voter is in maintenance mode the Raft group is not able to elect a leader as the reported priority it to low (the priority override in maintenance is set to 0). Fixed Raft implementation to make sure that it prioritize an availability over the user priority preference. If a node is the only voter the priority override is ignored. Fixes: redpanda-data/cloudv2#6174 Signed-off-by: Michal Maslanka <[email protected]>

Signed-off-by: Michal Maslanka <[email protected]>

Since now leader is elected earlier there is a race condition in updating Health Report when a single node starts. Made the timeout longer to allow the `feature_manager` to retry activating cluster version. Signed-off-by: Michal Maslanka <[email protected]>

mmaslankaprv · 2023-06-20T08:06:40Z

ci failure: #11454

vbotbuildovich · 2023-06-20T08:07:06Z

/backport v23.1.x

vbotbuildovich · 2023-06-20T08:07:07Z

/backport v22.3.x

vbotbuildovich · 2023-06-20T08:07:08Z

/backport v22.2.x

vbotbuildovich · 2023-06-20T08:08:04Z

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-10800-v22.2.x-206 remotes/upstream/v22.2.x
git cherry-pick -x 1b21f8372646133d1e0bf1015a5fd5ec255e4abe 04191fb403b35121ded7ac3ecc75b10df96c1813 5ce6ef842fed8c207a901c9b76d21668f4b5d15b

Workflow run logs.

vbotbuildovich · 2023-06-20T08:08:09Z

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-10800-v22.3.x-205 remotes/upstream/v22.3.x
git cherry-pick -x 1b21f8372646133d1e0bf1015a5fd5ec255e4abe 04191fb403b35121ded7ac3ecc75b10df96c1813 5ce6ef842fed8c207a901c9b76d21668f4b5d15b

Workflow run logs.

mmaslankaprv requested review from dotnwat, ztlpn and bharathv May 16, 2023 10:27

github-actions bot added the area/redpanda label May 16, 2023

mmaslankaprv requested a review from rystsov May 16, 2023 18:37

ztlpn previously approved these changes May 16, 2023

View reviewed changes

mmaslankaprv dismissed ztlpn’s stale review via 0bb9778 May 17, 2023 13:49

mmaslankaprv force-pushed the fix-single-replica-maintence-mode branch from 87088e2 to 0bb9778 Compare May 17, 2023 13:49

mmaslankaprv requested a review from ztlpn May 17, 2023 13:50

bharathv approved these changes May 17, 2023

View reviewed changes

mmaslankaprv force-pushed the fix-single-replica-maintence-mode branch from 0bb9778 to ce6d675 Compare May 22, 2023 17:22

mmaslankaprv requested a review from bharathv May 22, 2023 17:22

bharathv previously approved these changes May 22, 2023

View reviewed changes

mmaslankaprv force-pushed the fix-single-replica-maintence-mode branch from ce6d675 to 2d3474e Compare May 25, 2023 09:34

mmaslankaprv dismissed bharathv’s stale review via 46b8dc0 May 29, 2023 11:14

mmaslankaprv force-pushed the fix-single-replica-maintence-mode branch from 2d3474e to 46b8dc0 Compare May 29, 2023 11:14

mmaslankaprv force-pushed the fix-single-replica-maintence-mode branch from 46b8dc0 to 241108f Compare June 13, 2023 05:54

bharathv approved these changes Jun 13, 2023

View reviewed changes

piyushredpanda mentioned this pull request Jun 13, 2023

Redpanda can not be updated from v23.1.11 to tip of the dev #11326

Closed

mmaslankaprv added 3 commits June 16, 2023 11:14

tests/maintenance_mode: added test for maintenance mode with rf=1

04191fb

Signed-off-by: Michal Maslanka <[email protected]>

mmaslankaprv force-pushed the fix-single-replica-maintence-mode branch from 241108f to 5ce6ef8 Compare June 16, 2023 14:11

mmaslankaprv merged commit 432cee6 into redpanda-data:dev Jun 20, 2023

mmaslankaprv deleted the fix-single-replica-maintence-mode branch June 20, 2023 08:07

vbotbuildovich mentioned this pull request Jun 20, 2023

[v23.1.x] Fixed Raft voter priority override with single replica topics #11544

Merged

vbotbuildovich mentioned this pull request Jun 20, 2023

[v22.2.x] Fixed Raft voter priority override with single replica topics #11545

Closed

vbotbuildovich mentioned this pull request Jun 20, 2023

[v22.3.x] Fixed Raft voter priority override with single replica topics #11546

Closed

mmaslankaprv mentioned this pull request Jun 24, 2023

[v22.3.x] Fixed Raft voter priority override with single replica topics #11667

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed Raft voter priority override with single replica topics #10800

Fixed Raft voter priority override with single replica topics #10800

mmaslankaprv commented May 16, 2023 •

edited

Loading

ztlpn May 16, 2023

mmaslankaprv May 17, 2023

bharathv commented May 16, 2023

mmaslankaprv commented May 17, 2023

mmaslankaprv commented Jun 20, 2023

vbotbuildovich commented Jun 20, 2023

vbotbuildovich commented Jun 20, 2023

vbotbuildovich commented Jun 20, 2023

vbotbuildovich commented Jun 20, 2023

vbotbuildovich commented Jun 20, 2023


		target = random.choice(self.redpanda.nodes)

		self._enable_maintenance(target)

Fixed Raft voter priority override with single replica topics #10800

Fixed Raft voter priority override with single replica topics #10800

Conversation

mmaslankaprv commented May 16, 2023 • edited Loading

Backports Required

Release Notes

Bug Fixes

ztlpn May 16, 2023

Choose a reason for hiding this comment

mmaslankaprv May 17, 2023

Choose a reason for hiding this comment

bharathv commented May 16, 2023

mmaslankaprv commented May 17, 2023

mmaslankaprv commented Jun 20, 2023

vbotbuildovich commented Jun 20, 2023

vbotbuildovich commented Jun 20, 2023

vbotbuildovich commented Jun 20, 2023

vbotbuildovich commented Jun 20, 2023

vbotbuildovich commented Jun 20, 2023

mmaslankaprv commented May 16, 2023 •

edited

Loading