Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl drain ignores terminating pods when reboot master node #1532

Closed
LuckyChen666 opened this issue Dec 11, 2023 · 3 comments · Fixed by kubernetes/kubernetes#122574
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@LuckyChen666
Copy link

LuckyChen666 commented Dec 11, 2023

What happened:
When I evict and update one of the worker nodes, a master node is also rebooting at this time. At this time, the evicted pod needs to wait for terminationGracePeriodSeconds before the eviction is successful.
Then something strange happened, the eviction was successful, and the currently terminating pod was ignored.

if apierrors.IsNotFound(err) || (p != nil && p.ObjectMeta.UID != pod.ObjectMeta.UID) {

I found that because the master node was rebooting at this time, obtaining the pod's interface failed. At this time, the above judgment p != nil && p.ObjectMeta.UID != pod.ObjectMeta.UID was met, so the current pod was ignored.

What you expected to happen:
Judgment here should be strictly controlled,such as:

if apierrors.IsNotFound(err)  {
  if (p != nil && p.ObjectMeta.UID != pod.ObjectMeta.UID) {
     ...
  }else{
   ...
  }
} else {
...
}

Maybe there is a better way to fix it.
How to reproduce it (as minimally and precisely as possible):

You can try to drain a worker node which owns terminating pod for a long time. Then reboot master node.

@LuckyChen666 LuckyChen666 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 11, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 11, 2023
@LuckyChen666 LuckyChen666 changed the title kubectl drain ignores some pods when reboot master node kubectl drain ignores terminating pods when reboot master node Dec 11, 2023
@eddiezane
Copy link
Member

I am assuming that if the client-go Pod Get function is returning an err that p is nil by Go convention. If that is the case then I believe the current logic is correct?

@LuckyChen666 thoughts?

@eddiezane
Copy link
Member

Actually no that assumption is incorrect. client go's getters use named returns and do alloc a struct.

So in this case p is never nil so our current logic is incorrect.

https://github.com/kubernetes/kubernetes/blob/cacdf6c70728086bfcab387b901d9765cbcbfb6b/staging/src/k8s.io/client-go/kubernetes/typed/core/v1/pod.go#L76

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 3, 2024
@eddiezane
Copy link
Member

/assign @brianpursley

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants