-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node wise recovery #13943
node wise recovery #13943
Conversation
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/38327#018afe5f-7762-4503-a150-c716cc166f9a |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/38327#018afe6e-270d-4d05-af8b-98baefa51760 |
GET with a request body is questionable in HTTP, "A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request." Rather than trying to use methods to determine a dry run, what about using a query parameter? Also, Also, does this move all partitions or only unhealthy ones? |
Temporarily moving to draft while I polish and get tests passing with the new upcoming revision. |
Thanks, TIL.. thats a bummer.. I tend to agree with elastic's sentiment on this, let me figure out a way to encode the list of broker ids into a query parameter.
Any particular reason? In a way we are preserving existing RF so the user doesn't have to run a separate command to up-replicate again.
both (in the upcoming rev). The nodes in the list are marked "defunct" from RP perspective and the balancer tries to move replicas away gracefully (for those that have the majority) and forcefully reconfigure the rest (after a user approval). |
1adbc49
to
ebcfc9e
Compare
|
941b306
to
b49f996
Compare
failures are unrelated (changed some RPC structs, need to update compat test, will do it along with next rev), ready for review. |
f75a059
to
54c6415
Compare
54c6415
to
b05ad5e
Compare
Can only be submitted on the controller leader, would be simpler to redirect rather than expecting the caller to submit to controller leader.
Given an input set of dead nodes, generates a list of partition instances that has majority replicas on these dead nodes. This is a read-only API and makes no chanages to the metadata state.
example curl -X GET http://localhost:9644/v1/partitions/majority_lost\?defunct_nodes\=0,1 Input query parameter a comma separated list of node id integers.
A broker is by default in functional state unless explicitly marked defunct by user. Defunct state indicates that the broker is no longer reachable and the data is irrecoverable. This is currently irreversible. A defunct broker can only be decommissioned.
handle_ntp_move_begin_or_cancel
This ensures that replicas as drained from the unavailable nodes.
These ntps are force reconfigured as requested by the user. Introduces a new partition variant called 'force_reassignable_partition' that are dealt with in a separate pass.
Adds additional validation on the input set of nodes.
A Document is also a ValueType, passing a ValueType ensures that we can validate any ValueType which includes even a subset of a json (to be used in a later commit).
Includes the # of pending force reconfigurations and a sample of 10 ntps from that list in the balancer status page.
reusing feature flag from enhanced force reconfiguration.
b05ad5e
to
faf26ce
Compare
Rebased and force pushed to fix conflicts. |
const auto& assignments = (it->second).get_assignments(); | ||
const auto topic_revision = it->second.get_revision(); | ||
for (const auto& assignment : assignments) { | ||
const auto& current = assignment.replicas; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plase add a it.check()
here
PRD: link
Given an input set of nodes, generates a move plan that force reconfigures all partitions that lost majority (or all replicas) due to the unavailability of these nodes. This is intended to be used as an escape hatch to bulk move all such majority lost partitions when nodes are dead and are irrecoverable.
Adds the following admin end points. Input for both is array of node_ids.
GET /v1/partitions/majority_lost?defunct_nodes=<csv nodes>
- Given an input set of defunct nodes, generates a list of partitions that should be force recovered (aka lost majority).POST /v1/partitions/force_recover_from_nodes
- Submits the plan generated from above step and the partitions are eventually force recovered.Example invocations
partitions_to_force_recover
part of the POST payload is the plan generated from GET. The plan is validated before generating the controller command that orchestrates the moves.Monitoring: Balanacer status (
v1/cluster/partition_balancer/status
) now contains the followingpartitions_pending_force_recovery_count
- # of partitions that are yet to be force recovered.partitions_pending_force_recovery_sample
- A sample list of partitions pending force recovery (limit capped to 10)Notes on defunct nodes:
force_replicas_from_nodes
commands, all the brokers marked as "defunct" in the request are marked defunct for future processing and this is irreversible for now.Fixes: https://github.com/redpanda-data/planning/issues/84
Backports Required
Release Notes
Features