-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added migration of tx manager coordinator in recovery mode #15121
Added migration of tx manager coordinator in recovery mode #15121
Conversation
bbe5625
to
c36bac4
Compare
new failures in https://buildkite.com/redpanda/redpanda/builds/41701#018c00da-046d-49ad-8532-e225a273895b:
new failures in https://buildkite.com/redpanda/redpanda/builds/41701#018c00ea-2242-4aaa-bc0c-a59cf82382cd:
new failures in https://buildkite.com/redpanda/redpanda/builds/41701#018c00ea-223d-476c-a7cc-992d72f9a7e3:
new failures in https://buildkite.com/redpanda/redpanda/builds/41701#018c00ea-223f-46b3-bb50-09ec92ea04bb:
new failures in https://buildkite.com/redpanda/redpanda/builds/41749#018c101b-f865-4d19-ac62-8a21acd01abb:
new failures in https://buildkite.com/redpanda/redpanda/builds/41753#018c10b2-31d5-4759-a037-b48ca6fd969e:
new failures in https://buildkite.com/redpanda/redpanda/builds/41753#018c10b2-31d1-4c5e-b3d7-1032baa57b2d:
new failures in https://buildkite.com/redpanda/redpanda/builds/41757#018c114b-48eb-4c8d-99ff-6e5fcd7c0d24:
new failures in https://buildkite.com/redpanda/redpanda/builds/41872#018c1705-ddbd-48db-98b6-b713fe9c2ad1:
new failures in https://buildkite.com/redpanda/redpanda/builds/42108#018c248a-7577-4009-bcd1-cf91f0779559:
new failures in https://buildkite.com/redpanda/redpanda/builds/42115#018c25f7-5c17-4b64-8710-8445ee583552:
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/41701#018c00ea-223a-4e80-8017-97f3957dc780 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/42026#018c1f70-ff16-471d-b6b5-927208bffbab ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/42054#018c215e-f0a7-48f5-b1db-4ddf481f959b ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/42115#018c25f7-5c21-480a-a1e8-972ba5f7a986 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/42209#018c3451-c8c8-401c-b80a-10c9a98da1f4 |
0f6bbf6
to
77ff61a
Compare
77ff61a
to
1e6dea7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great to me overall.
break; | ||
} | ||
case migration_step::rehash_tx_manager_topic: { | ||
auto results = co_await ssx::parallel_transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need to wrap it in an abortable timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have a timeout in transform batches, maybe that is enough ?
!= new_partition_count) { | ||
co_return errc::topic_invalid_partitions; | ||
} | ||
auto results = co_await ssx::parallel_transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same comment as above, abortable timeout may be? to avoid hang.
1e6dea7
to
f5a670d
Compare
f5a670d
to
5290680
Compare
When migrating tx manager topic to new partition count one of the cluster nodes has to read all the data from current tx manager topic and write them back to new tx manager topic with changed partition count. Added RPC service definition that will allow migrator service to read and write tx manager topic partitions from remote nodes. Signed-off-by: Michal Maslanka <[email protected]>
Previously all tx_cache instances on every node were intialized for all the partitions. This is not necessary as not all partitions are instantiated on every shard. Changed logic in tm_cache manager to create a cache instance on demand. Signed-off-by: Michal Maslanka <[email protected]>
As tm_stm owns the tm_stm_cache it should clear it every time we start a new instance. This way we guarantee that the state will be managed solely by the tm_stm instance that is currently running. For now we do not want to clear the state on shutdown as we may start seeing UNKNOWN_SERVER_ERRORS caused by partition movements. (right now a node can always ask others about the previous tx state). Signed-off-by: Michal Maslanka <[email protected]>
5290680
to
4aab45b
Compare
4aab45b
to
9d0fca4
Compare
Signed-off-by: Michal Maslanka <[email protected]>
Signed-off-by: Michal Maslanka <[email protected]>
Sometimes it may be useful to execute topic deletion from a node which is not a controller leader. Added `cluster::topics_frontend` API allowing topic deletion from any Redpanda node. Signed-off-by: Michal Maslanka <[email protected]>
Added a service handling tx manager topic migration (rehashing of tx manager stm updates to new partition count). A service is a simple state machine that works in the following steps: 1) Create a temporary topic for new partitioning scheme 2) For each partition of current tx manager topic read all the data, assign new partition and replicate according to the new scheme 3) Delete current tx manager topic 4) Create tx manager topic with new number of partitions 5) Replicate data from temporary topic to the new tx manager topic. 6) Delete temporary topic If the process is interrupted with a failure it may be retried as the logic in migrator service determines starting condition and adjust initial step accordingly. The error handling is very simple and requires retrying operation if it failed. If any of the operation failed it will always be started from scratch to prevent leaving some intermediate state behind. Signed-off-by: Michal Maslanka <[email protected]>
Since tx migration may only happen when Redpanda is in recovery mode we instantiate migration service only when recovery mode is enabled. Signed-off-by: Michal Maslanka <[email protected]>
Added handling of special REST APIs that is only enabled in recovery mode. First API that is exposed in this way is an endpoint triggering tx manager migration. Signed-off-by: Michal Maslanka <[email protected]>
Signed-off-by: Michal Maslanka <[email protected]>
9d0fca4
to
7226b9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think only the test changed between force pushes, difficult to tell from diffs, lgtm.
Added a service handling tx manager topic migration (rehashing of tx
manager stm updates to new partition count). A service is a simple state
machine that works in the following steps:
assign new partition and replicate according to the new scheme
If the process is interrupted with a failure it may be retried as the
logic in migrator service determines starting condition and adjust
initial step accordingly.
The error handling is very simple and requires retrying operation if it
failed.
If any of the operation failed it will always be started from scratch to
prevent leaving some intermediate state behind.
Backports Required
Release Notes
Features