-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade hang in group_metadata_migration when consumer group topic doesn't already exist #4469
Comments
VadimPlh
added a commit
to VadimPlh/redpanda
that referenced
this issue
Apr 28, 2022
Ufter upgrading node in cluster to 22.1.x version from 22.11.x upgraded node isn't controller leade, it enters group_metadata_migration::start and hits the "kafka_internal/group topic does not exists, activating" path this call waits for activate_feature,activate_feature loops until the feature is active, but it cannot be activated because only the controller leader runs the feature_manager logic for activating features, and the controller leader is a 21.11.x node that doesn't have the code. The node remains in 'booting' state indefinitely Fixes: redpanda-data#4469
VadimPlh
added a commit
to VadimPlh/redpanda
that referenced
this issue
Apr 29, 2022
Ufter upgrading node in cluster to 22.1.x version from 22.11.x upgraded node isn't controller leade, it enters group_metadata_migration::start and hits the "kafka_internal/group topic does not exists, activating" path this call waits for activate_feature,activate_feature loops until the feature is active, but it cannot be activated because only the controller leader runs the feature_manager logic for activating features, and the controller leader is a 21.11.x node that doesn't have the code. The node remains in 'booting' state indefinitely Fixes: redpanda-data#4469
VadimPlh
added a commit
to VadimPlh/redpanda
that referenced
this issue
Apr 29, 2022
Ufter upgrading node in cluster to 22.1.x version from 22.11.x upgraded node isn't controller leade, it enters group_metadata_migration::start and hits the "kafka_internal/group topic does not exists, activating" path this call waits for activate_feature,activate_feature loops until the feature is active, but it cannot be activated because only the controller leader runs the feature_manager logic for activating features, and the controller leader is a 21.11.x node that doesn't have the code. The node remains in 'booting' state indefinitely Fixes: redpanda-data#4469
VadimPlh
added a commit
to VadimPlh/redpanda
that referenced
this issue
May 1, 2022
Ufter upgrading node in cluster to 22.1.x version from 22.11.x upgraded node isn't controller leade, it enters group_metadata_migration::start and hits the "kafka_internal/group topic does not exists, activating" path this call waits for activate_feature,activate_feature loops until the feature is active, but it cannot be activated because only the controller leader runs the feature_manager logic for activating features, and the controller leader is a 21.11.x node that doesn't have the code. The node remains in 'booting' state indefinitely Fixes: redpanda-data#4469
VadimPlh
added a commit
to VadimPlh/redpanda
that referenced
this issue
May 1, 2022
Ufter upgrading node in cluster to 22.1.x version from 22.11.x upgraded node isn't controller leade, it enters group_metadata_migration::start and hits the "kafka_internal/group topic does not exists, activating" path this call waits for activate_feature,activate_feature loops until the feature is active, but it cannot be activated because only the controller leader runs the feature_manager logic for activating features, and the controller leader is a 21.11.x node that doesn't have the code. The node remains in 'booting' state indefinitely Fixes: redpanda-data#4469
VadimPlh
added a commit
to VadimPlh/redpanda
that referenced
this issue
May 4, 2022
Ufter upgrading node in cluster to 22.1.x version from 22.11.x upgraded node isn't controller leade, it enters group_metadata_migration::start and hits the "kafka_internal/group topic does not exists, activating" path this call waits for activate_feature,activate_feature loops until the feature is active, but it cannot be activated because only the controller leader runs the feature_manager logic for activating features, and the controller leader is a 21.11.x node that doesn't have the code. The node remains in 'booting' state indefinitely Fixes: redpanda-data#4469
VadimPlh
added a commit
to VadimPlh/redpanda
that referenced
this issue
May 4, 2022
Ufter upgrading node in cluster to 22.1.x version from 22.11.x upgraded node isn't controller leade, it enters group_metadata_migration::start and hits the "kafka_internal/group topic does not exists, activating" path this call waits for activate_feature,activate_feature loops until the feature is active, but it cannot be activated because only the controller leader runs the feature_manager logic for activating features, and the controller leader is a 21.11.x node that doesn't have the code. The node remains in 'booting' state indefinitely Fixes: redpanda-data#4469
VadimPlh
added a commit
to VadimPlh/redpanda
that referenced
this issue
May 4, 2022
Ufter upgrading node in cluster to 22.1.x version from 22.11.x upgraded node isn't controller leade, it enters group_metadata_migration::start and hits the "kafka_internal/group topic does not exists, activating" path this call waits for activate_feature,activate_feature loops until the feature is active, but it cannot be activated because only the controller leader runs the feature_manager logic for activating features, and the controller leader is a 21.11.x node that doesn't have the code. The node remains in 'booting' state indefinitely Fixes: redpanda-data#4469
VadimPlh
added a commit
to VadimPlh/redpanda
that referenced
this issue
May 4, 2022
Ufter upgrading node in cluster to 22.1.x version from 22.11.x upgraded node isn't controller leade, it enters group_metadata_migration::start and hits the "kafka_internal/group topic does not exists, activating" path this call waits for activate_feature,activate_feature loops until the feature is active, but it cannot be activated because only the controller leader runs the feature_manager logic for activating features, and the controller leader is a 21.11.x node that doesn't have the code. The node remains in 'booting' state indefinitely Fixes: redpanda-data#4469
vbotbuildovich
pushed a commit
to vbotbuildovich/redpanda
that referenced
this issue
May 4, 2022
Ufter upgrading node in cluster to 22.1.x version from 22.11.x upgraded node isn't controller leade, it enters group_metadata_migration::start and hits the "kafka_internal/group topic does not exists, activating" path this call waits for activate_feature,activate_feature loops until the feature is active, but it cannot be activated because only the controller leader runs the feature_manager logic for activating features, and the controller leader is a 21.11.x node that doesn't have the code. The node remains in 'booting' state indefinitely Fixes: redpanda-data#4469 (cherry picked from commit b7fb5bd)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Found by @VadimPlh with the new ducktape upgrade test.
I think the fix is probably simple, to spawn a background fiber with the activate feature call, so that its loop doesn't block the startup of redpanda.
The text was updated successfully, but these errors were encountered: