Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow concurrent broker restarts from same AZ (broker rack) #62

Merged
merged 3 commits into from
Jun 6, 2023

Conversation

ctrlaltluc
Copy link

@ctrlaltluc ctrlaltluc commented May 25, 2023

Description

Cluster restarts are time consuming for big clusters.

Given that we are using Kafka broker.rack config to spread replicas across 3 AZs, we can speed up the cluster restart by allowing concurrent broker restarts within the same AZ. Allowing only same AZ concurrent restarts makes sure that we always have at most 1/3rd of the replicas of any topic-partition offline, not more.

Key changes:

  • add configuration parameter RollingUpgradeConfig.ConcurrentBrokerRestartsAllowed which is by default 1
  • allow concurrent broker restarts only if there are no failures or restarts in other AZs and if there are no more failures or restarts in the same AZ than specified by RollingUpgradeConfig.ConcurrentBrokerRestartsAllowed and RollingUpgradeConfig.FailureThreshold
  • do not change the semantic of RollingUpgradeConfig.FailureThreshold by allowing (same as before) multiple failures even if not in the same AZ
  • if broker.rack is not properly configured for all brokers, consider each broker as being in a separate AZ, meaning we err on the side of caution and restart at most 1 broker at once

Note: in order to enable multiple concurrent restarts, we must set both RollingUpgradeConfig.ConcurrentBrokerRestartsAllowed and RollingUpgradeConfig.FailureThreshold to a value above 1

@ctrlaltluc ctrlaltluc self-assigned this May 25, 2023
@ctrlaltluc ctrlaltluc changed the title Allow parallel broker restarts from same AZ (broker rack) Allow concurrent broker restarts from same AZ (broker rack) May 25, 2023
Copy link

@alex-necula alex-necula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

pkg/resources/kafka/kafka.go Show resolved Hide resolved
pkg/resources/kafka/kafka_test.go Show resolved Hide resolved
Copy link

@dobrerazvan dobrerazvan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! 🚢

@ctrlaltluc ctrlaltluc merged commit 9df4df5 into master Jun 6, 2023
@ctrlaltluc ctrlaltluc deleted the luciani/parallel-restart branch June 6, 2023 10:58
ctrlaltluc added a commit that referenced this pull request Jun 8, 2023
* Allow parallel broker restarts from same AZ (broker rack)

* Change parameter description to be more accurate

* Address CR comments
ctrlaltluc added a commit that referenced this pull request Jun 8, 2023
* Allow parallel broker restarts from same AZ (broker rack)

* Change parameter description to be more accurate

* Address CR comments
dvaseekara pushed a commit to dvaseekara/koperator that referenced this pull request Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants