-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s: Improve ghost nodes removal #9750
k8s: Improve ghost nodes removal #9750
Conversation
f6b6f20
to
fbb16a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, very clear and requeue condition makes sense. There is some hesitation about this, so please do not merge until we have agreement.
nodeIDs[brokers[i].NodeID] = nil | ||
} | ||
|
||
pods, err := r.podList(ctx, rp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see one question here now that i think about it a bit more. Here you are relying on the Cluster resource to give you the podlist? Is this something that is populated into or dynamically retrieved from the state of the world in the namespace? That would lead to a race condition right? Would it be better to just retrieve all pods that have the right annotations? Those that do not exist with these will obviously be left out and you can start removing things not in the same list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the implementation as it was inaccurate. Eventual consistency of k8s api sometimes returned old annotation.
fbb16a1
to
f745014
Compare
Calling decommission in the case of changing Pod annotation might be not possible if Pod was removed along with its annotation where previous Redpanda ID was stored. There is dedicated function to handle Ghost brokers. Reference redpanda-data/redpanda#9750 redpanda-data/redpanda#13298 redpanda-data/redpanda#13132 redpanda-data/helm-charts#253 redpanda-data/redpanda#12847
Calling decommission in the case of changing Pod annotation might be not possible if Pod was removed along with its annotation where previous Redpanda ID was stored. There is dedicated function to handle Ghost brokers. Reference redpanda-data/redpanda#9750 redpanda-data/redpanda#13298 redpanda-data/redpanda#13132 redpanda-data/helm-charts#253 redpanda-data/redpanda#12847
Calling decommission in the case of changing Pod annotation might be not possible if Pod was removed along with its annotation where previous Redpanda ID was stored. There is dedicated function to handle Ghost brokers. Reference redpanda-data/redpanda#9750 redpanda-data/redpanda#13298 redpanda-data/redpanda#13132 redpanda-data/helm-charts#253 redpanda-data/redpanda#12847
Calling decommission in the case of changing Pod annotation might be not possible if Pod was removed along with its annotation where previous Redpanda ID was stored. There is dedicated function to handle Ghost brokers. Reference redpanda-data/redpanda#9750 redpanda-data/redpanda#13298 redpanda-data/redpanda#13132 redpanda-data/helm-charts#253 redpanda-data/redpanda#12847
Calling decommission in the case of changing Pod annotation might be not possible if Pod was removed along with its annotation where previous Redpanda ID was stored. There is dedicated function to handle Ghost brokers. Reference redpanda-data/redpanda#9750 redpanda-data/redpanda#13298 redpanda-data/redpanda#13132 redpanda-data/helm-charts#253 redpanda-data/redpanda#12847
On top of #9749 add ghost nodes removal function that is run after:
Thanks to all the prerequisites when the
removeGhostNodeIDs
functionis called the replica specification number should match with the running pods,
so that all PODs have latest Node IDs.
This use case can be triggered in GKE environment where Redpanda is backed by
local SSDs defined by PersistentVolume resource and the VM is recreated. The
recreation can be triggered by deleting VM or by other factors.
Backports Required
Release Notes
Features