Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(kuma-cp): remove Dataplane for Pod without IP #4964

Merged
merged 7 commits into from
Sep 8, 2022

Conversation

jakubdyszkiewicz
Copy link
Contributor

Our pod -> dp reconciler was ignoring reconciliation in two cases

Pod IP is empty

Our assumption was that if the pod is empty = pod is being created.
The problem is that the Pod can lose its IP, for example - when it's evicted.
In this scenario, we wouldn't reconcile the pod to mark it as unhealthy.

Currently, evicted pods are sometimes marked as unhealthy - when at least one container is down and IP is still there.

Our Dataplane model requires the IP address Dataplane#networking.address.
When reconciling pod with empty IP I considered

  1. putting some non-routable IP address when Pod IP is missing and always marking this Dataplane as unhealthy
  2. retrieving previous networking address and always marking this Dataplane as unhealthy
  3. removing Dataplane

I picked option 3. because other options sound like a hack even though technically we won't use unhealthy Dataplane.

Pod is completed (all containers aside from kuma-sidecar are terminated)

This was added so we remove Dataplane for a finished job in pod status reconciler and not recreate them in pod reconciler.
I was tempted to remove this functionality, but I don't want to introduce a breaking change while fixing other issue.
Instead, I changed the logic to remove Dataplane when Pod Succeeded.

// PodSucceeded means that all containers in the pod have voluntarily terminated
// with a container exit code of 0, and the system is not going to restart any of these containers.
PodSucceeded PodPhase = "Succeeded"

I also added some additional logging around the pod reconciler.

Checklist prior to review

  • Link to docs PR or issue --
  • Link to UI issue or PR --
  • Is the issue worked on linked? --
  • The PR does not hardcode values that might break projects that depend on kuma (e.g. "kumahq" as a image registry) --
  • The PR will work for both Linux and Windows, system specific functions like syscall.Mkfifo have equivalent implementation on the other OS --
  • Unit Tests --
  • E2E Tests --
  • Manual Universal Tests --
  • Manual Kubernetes Tests --
  • Do you need to update UPGRADE.md? --
  • Does it need to be backported according to the backporting policy? --
  • Do you need to explicitly set a > Changelog: entry here?

Signed-off-by: Jakub Dyszkiewicz <[email protected]>
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
@jakubdyszkiewicz jakubdyszkiewicz marked this pull request as ready for review September 5, 2022 07:27
@jakubdyszkiewicz jakubdyszkiewicz requested a review from a team as a code owner September 5, 2022 07:27
@lobkovilya
Copy link
Contributor

It's kind of nice that today Pod and DPP are always 1 to 1. We don't show Pods in Kuma GUI, so if there is no DPP users may think there is no Pod either (but in reality it's evicted). Can we change the DPP model to allow an empty address only if health.ready: false?

@jakubdyszkiewicz
Copy link
Contributor Author

Can we change the DPP model to allow an empty address only if health.ready: false?

I looked around at the code. This is an interesting idea, but it would be hard to pull this off. There is a lot of code that has the assumption that we have an address. I feel that going in this direction we will generate many bugs.

@jakubdyszkiewicz
Copy link
Contributor Author

@Mergifyio backport release-1.8

@mergify
Copy link
Contributor

mergify bot commented Sep 8, 2022

backport release-1.8

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Sep 8, 2022
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
(cherry picked from commit f341cff)
@jakubdyszkiewicz jakubdyszkiewicz deleted the fix/evicted-pods branch September 8, 2022 12:00
jakubdyszkiewicz added a commit that referenced this pull request Sep 8, 2022
…4980)

fix(kuma-cp): remove Dataplane for Pod without IP (#4964)

Signed-off-by: Jakub Dyszkiewicz <[email protected]>
(cherry picked from commit f341cff)

Co-authored-by: Jakub Dyszkiewicz <[email protected]>
@lobkovilya
Copy link
Contributor

@Mergifyio backport release-1.7 release-1.6

mergify bot pushed a commit that referenced this pull request Oct 5, 2022
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
(cherry picked from commit f341cff)
mergify bot pushed a commit that referenced this pull request Oct 5, 2022
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
(cherry picked from commit f341cff)

# Conflicts:
#	pkg/plugins/runtime/k8s/controllers/pod_status_controller.go
#	test/e2e/graceful/eviction.go
#	test/e2e_env/kubernetes/jobs/jobs.go
#	test/e2e_env/kubernetes/kubernetes_suite_test.go
@mergify
Copy link
Contributor

mergify bot commented Oct 5, 2022

backport release-1.7 release-1.6

✅ Backports have been created

lobkovilya pushed a commit that referenced this pull request Oct 5, 2022
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
Signed-off-by: Ilya Lobkov <[email protected]>
# Conflicts:
#	pkg/plugins/runtime/k8s/controllers/pod_status_controller.go
#	test/e2e/graceful/eviction.go
#	test/e2e_env/kubernetes/jobs/jobs.go
#	test/e2e_env/kubernetes/kubernetes_suite_test.go
lobkovilya pushed a commit that referenced this pull request Oct 5, 2022
…5096)

fix(kuma-cp): remove Dataplane for Pod without IP (#4964)

Signed-off-by: Jakub Dyszkiewicz <[email protected]>
(cherry picked from commit f341cff)

Co-authored-by: Jakub Dyszkiewicz <[email protected]>
lobkovilya pushed a commit that referenced this pull request Oct 5, 2022
…5097)

fix(kuma-cp): remove Dataplane for Pod without IP (#4964)

Signed-off-by: Jakub Dyszkiewicz <[email protected]>
Signed-off-by: Ilya Lobkov <[email protected]>
# Conflicts:
#	pkg/plugins/runtime/k8s/controllers/pod_status_controller.go
#	test/e2e/graceful/eviction.go
#	test/e2e_env/kubernetes/jobs/jobs.go
#	test/e2e_env/kubernetes/kubernetes_suite_test.go

Co-authored-by: Jakub Dyszkiewicz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants