Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s: Improve ghost nodes removal #9750

Merged

Conversation

RafalKorepta
Copy link
Contributor

On top of #9749 add ghost nodes removal function that is run after:

  • all resources successfully reconciles
  • rolling updates did not return an error
  • all pods are annotated with latest Nodes ID
  • entire status is reported to cluster customer resource
  • Redpanda configuration is in sync

Thanks to all the prerequisites when the removeGhostNodeIDs function
is called the replica specification number should match with the running pods,
so that all PODs have latest Node IDs.

This use case can be triggered in GKE environment where Redpanda is backed by
local SSDs defined by PersistentVolume resource and the VM is recreated. The
recreation can be triggered by deleting VM or by other factors.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Features

  • Remove ghost Redpanda Node IDs

@RafalKorepta RafalKorepta requested a review from a team as a code owner March 30, 2023 20:21
@RafalKorepta RafalKorepta force-pushed the rk/improve-ghost-nodes-removal branch 2 times, most recently from f6b6f20 to fbb16a1 Compare March 31, 2023 11:00
alejandroEsc
alejandroEsc previously approved these changes Mar 31, 2023
Copy link
Contributor

@alejandroEsc alejandroEsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, very clear and requeue condition makes sense. There is some hesitation about this, so please do not merge until we have agreement.

nodeIDs[brokers[i].NodeID] = nil
}

pods, err := r.podList(ctx, rp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see one question here now that i think about it a bit more. Here you are relying on the Cluster resource to give you the podlist? Is this something that is populated into or dynamically retrieved from the state of the world in the namespace? That would lead to a race condition right? Would it be better to just retrieve all pods that have the right annotations? Those that do not exist with these will obviously be left out and you can start removing things not in the same list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation as it was inaccurate. Eventual consistency of k8s api sometimes returned old annotation.

@RafalKorepta RafalKorepta merged commit f7869db into redpanda-data:dev Apr 3, 2023
RafalKorepta added a commit to redpanda-data/redpanda-operator that referenced this pull request Jun 19, 2024
Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
RafalKorepta added a commit to redpanda-data/redpanda-operator that referenced this pull request Jun 21, 2024
Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
RafalKorepta added a commit to redpanda-data/redpanda-operator that referenced this pull request Jun 28, 2024
Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
RafalKorepta added a commit to redpanda-data/redpanda-operator that referenced this pull request Jul 2, 2024
Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
RafalKorepta added a commit to redpanda-data/redpanda-operator that referenced this pull request Jul 2, 2024
Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants