Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added migration of tx manager coordinator in recovery mode #15121

Merged
merged 10 commits into from
Dec 4, 2023

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Nov 24, 2023

Added a service handling tx manager topic migration (rehashing of tx
manager stm updates to new partition count). A service is a simple state
machine that works in the following steps:

  1. Create a temporary topic for new partitioning scheme
  2. For each partition of current tx manager topic read all the data,
    assign new partition and replicate according to the new scheme
  3. Delete current tx manager topic
  4. Create tx manager topic with new number of partitions
  5. Replicate data from temporary topic to the new tx manager topic.
  6. Delete temporary topic

If the process is interrupted with a failure it may be retried as the
logic in migrator service determines starting condition and adjust
initial step accordingly.

The error handling is very simple and requires retrying operation if it
failed.

If any of the operation failed it will always be started from scratch to
prevent leaving some intermediate state behind.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

Features

  • ability to change number of partitions in tx manager topic

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Nov 24, 2023

new failures in https://buildkite.com/redpanda/redpanda/builds/41701#018c00da-046d-49ad-8532-e225a273895b:

"rptest.tests.cluster_recovery_test.ClusterRecoveryTest.test_basic_controller_snapshot_restore"

new failures in https://buildkite.com/redpanda/redpanda/builds/41701#018c00ea-2242-4aaa-bc0c-a59cf82382cd:

"rptest.tests.tx_coordinator_migration_test.TxCoordinatorMigrationTest.test_migrating_tx_manager_coordinator.with_failures=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/41701#018c00ea-223d-476c-a7cc-992d72f9a7e3:

"rptest.tests.transactions_test.TxUpgradeTest.upgrade_does_not_change_tx_coordinator_assignment_test"

new failures in https://buildkite.com/redpanda/redpanda/builds/41701#018c00ea-223f-46b3-bb50-09ec92ea04bb:

"rptest.tests.cluster_recovery_test.ClusterRecoveryTest.test_basic_controller_snapshot_restore"
"rptest.tests.tx_coordinator_migration_test.TxCoordinatorMigrationTest.test_migrating_tx_manager_coordinator.with_failures=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/41749#018c101b-f865-4d19-ac62-8a21acd01abb:

"rptest.tests.tx_coordinator_migration_test.TxCoordinatorMigrationTest.test_migrating_tx_manager_coordinator.with_failures=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/41753#018c10b2-31d5-4759-a037-b48ca6fd969e:

"rptest.tests.tx_coordinator_migration_test.TxCoordinatorMigrationTest.test_migrating_tx_manager_coordinator.with_failures=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/41753#018c10b2-31d1-4c5e-b3d7-1032baa57b2d:

"rptest.tests.tx_coordinator_migration_test.TxCoordinatorMigrationTest.test_migrating_tx_manager_coordinator.with_failures=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/41757#018c114b-48eb-4c8d-99ff-6e5fcd7c0d24:

"rptest.tests.tx_coordinator_migration_test.TxCoordinatorMigrationTest.test_migrating_tx_manager_coordinator.with_failures=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/41872#018c1705-ddbd-48db-98b6-b713fe9c2ad1:

"rptest.tests.tx_coordinator_migration_test.TxCoordinatorMigrationTest.test_migrating_tx_manager_coordinator.with_failures=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/42108#018c248a-7577-4009-bcd1-cf91f0779559:

"rptest.tests.consumer_group_test.ConsumerGroupTest.test_consumer_is_removed_when_timedout.static_members=True"
"rptest.tests.consumer_group_recovery_tool_test.ConsumerOffsetsRecoveryToolTest.test_consumer_offsets_partition_count_change"
"rptest.tests.archival_test.ArchivalTest.test_all_partitions_leadership_transfer.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.archival_test.ArchivalTest.test_timeboxed_uploads.acks=0.cloud_storage_type=CloudStorageType.ABS"
"rptest.tests.prefix_truncate_recovery_test.PrefixTruncateRecoveryTest.test_prefix_truncate_recovery.acks=-1.start_empty=False"
"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_size_based_retention.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/42115#018c25f7-5c17-4b64-8710-8445ee583552:

"rptest.tests.tx_coordinator_migration_test.TxCoordinatorMigrationTest.test_migrating_tx_manager_coordinator.with_failures=True.upgrade=True"

@mmaslankaprv mmaslankaprv force-pushed the coorinator-migration branch 5 times, most recently from 0f6bbf6 to 77ff61a Compare November 29, 2023 08:37
@mmaslankaprv mmaslankaprv marked this pull request as ready for review November 29, 2023 15:06
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great to me overall.

src/v/cluster/tm_stm_cache_manager.h Outdated Show resolved Hide resolved
src/v/cluster/topics_frontend.cc Show resolved Hide resolved
src/v/cluster/topics_frontend.cc Outdated Show resolved Hide resolved
src/v/cluster/migrations/tx_manager_migrator.cc Outdated Show resolved Hide resolved
src/v/cluster/migrations/tx_manager_migrator.cc Outdated Show resolved Hide resolved
break;
}
case migration_step::rehash_tx_manager_topic: {
auto results = co_await ssx::parallel_transform(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need to wrap it in an abortable timeout?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a timeout in transform batches, maybe that is enough ?

!= new_partition_count) {
co_return errc::topic_invalid_partitions;
}
auto results = co_await ssx::parallel_transform(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same comment as above, abortable timeout may be? to avoid hang.

bharathv
bharathv previously approved these changes Nov 30, 2023
When migrating tx manager topic to new partition count one of the
cluster nodes has to read all the data from current tx manager topic and
write them back to new tx manager topic with changed partition count.
Added RPC service definition that will allow migrator service to read
and write tx manager topic partitions from remote nodes.

Signed-off-by: Michal Maslanka <[email protected]>
Previously all tx_cache instances on every node were intialized for all
the partitions. This is not necessary as not all partitions are
instantiated on every shard. Changed logic in tm_cache manager to create
a cache instance on demand.

Signed-off-by: Michal Maslanka <[email protected]>
As tm_stm owns the tm_stm_cache it should clear it every time we start a
new instance. This way we guarantee that the state will be managed
solely by the tm_stm instance that is currently running.

For now we do not want to clear the state on shutdown as we may start
seeing UNKNOWN_SERVER_ERRORS caused by partition movements. (right now a
node can always ask others about the previous tx state).

Signed-off-by: Michal Maslanka <[email protected]>
bharathv
bharathv previously approved these changes Dec 1, 2023
Sometimes it may be useful to execute topic deletion from a node which
is not a controller leader. Added `cluster::topics_frontend` API
allowing topic deletion from any Redpanda node.

Signed-off-by: Michal Maslanka <[email protected]>
Added a service handling tx manager topic migration (rehashing of tx
manager stm updates to new partition count). A service is a simple state
machine that works in the following steps:

1) Create a temporary topic for new partitioning scheme
2) For each partition of current tx manager topic read all the data,
   assign new partition and replicate according to the new scheme
3) Delete current tx manager topic
4) Create tx manager topic with new number of partitions
5) Replicate data from temporary topic to the new tx manager topic.
6) Delete temporary topic

If the process is interrupted with a failure it may be retried as the
logic in migrator service determines starting condition and adjust
initial step accordingly.

The error handling is very simple and requires retrying operation if it
failed.

If any of the operation failed it will always be started from scratch to
prevent leaving some intermediate state behind.

Signed-off-by: Michal Maslanka <[email protected]>
Since tx migration may only happen when Redpanda is in recovery mode we
instantiate migration service only when recovery mode is enabled.

Signed-off-by: Michal Maslanka <[email protected]>
Added handling of special REST APIs that is only enabled in recovery
mode. First API that is exposed in this way is an endpoint triggering
tx manager migration.

Signed-off-by: Michal Maslanka <[email protected]>
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think only the test changed between force pushes, difficult to tell from diffs, lgtm.

@mmaslankaprv mmaslankaprv merged commit ea656d6 into redpanda-data:dev Dec 4, 2023
20 checks passed
@mmaslankaprv mmaslankaprv deleted the coorinator-migration branch December 4, 2023 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants