Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: delete misconfigured pods in error state. #338

Merged
merged 8 commits into from
May 15, 2023
Merged

Conversation

hessjcg
Copy link
Collaborator

@hessjcg hessjcg commented May 8, 2023

The pod cannot be configured to reliably instrument every single pod creation event due to limitations in the K8s API.
Thus, the operator needs to listen for pod create and update events, to check if the pod is configured correctly. If the
pod is not configured correctly, and the pod is in an error or waiting state, the operator will delete the pod so that it can
be replaced. The replacement pod will be correctly configured by the webhook as it is created.

Fixes #337

@@ -81,51 +84,68 @@ func (a *PodAdmissionWebhook) Handle(ctx context.Context, req admission.Request)
// handleCreatePodRequest Finds relevant AuthProxyWorkload resources and updates the pod
// with matching resources, returning a non-nil pod when the pod was updated.
func (a *PodAdmissionWebhook) handleCreatePodRequest(ctx context.Context, p corev1.Pod) (*corev1.Pod, error) {
l := logf.FromContext(ctx)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change moves code from the handleCreatePodRequest() function into a new function called findMatchingProxies() so that it can be reused between the webhook and the new Pod event listener.

}

// listOwners returns the list of this object's owners and its extended owners.
// Warning: this is a recursive function
func (a *PodAdmissionWebhook) listOwners(ctx context.Context, object client.Object) ([]workload.Workload, error) {
func listOwners(ctx context.Context, c client.Client, object client.Object) ([]workload.Workload, error) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactors listOwners() from a member of PodAdmissionWebhook to a plain function so that it can be used both in the webhook and the pod event listener.

@@ -168,3 +188,104 @@ func (a *PodAdmissionWebhook) listOwners(ctx context.Context, object client.Obje
}
return owners, nil
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new pod event listener implementation.

@@ -71,6 +71,11 @@ func SetupManagers(mgr manager.Manager, userAgent, defaultProxyImage string) err
setupLog.Error(err, "unable to create workload admission webhook controller")
return err
}
err = registerPodInformer(mgr, u)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Register the PodEventHandler when the operator starts up.

@hessjcg hessjcg marked this pull request as ready for review May 8, 2023 16:40
@hessjcg hessjcg requested a review from a team as a code owner May 8, 2023 16:40
@hessjcg hessjcg requested review from enocom and jackwotherspoon May 8, 2023 16:40
@jackwotherspoon
Copy link
Collaborator

jackwotherspoon commented May 8, 2023

@hessjcg Is it possible to add tests for this to make sure PodEventHandler is WAI? We should aim to have tests for all feat PRs.

@enocom
Copy link
Member

enocom commented May 8, 2023

Does this fix #337?

@hessjcg
Copy link
Collaborator Author

hessjcg commented May 8, 2023

@enocom, Yes this fixes #337.
@jackwotherspoon, I added some unit tests in pod_controller_test.go

@hessjcg hessjcg force-pushed the gh-337-pod-config-sync branch from 01a864b to 8a33ea6 Compare May 8, 2023 23:30
@enocom enocom changed the title feat: Listen for pod changes and delete misconfigured pods in error state. fix: delete misconfigured pods in error state. May 9, 2023
@enocom
Copy link
Member

enocom commented May 9, 2023

Updated the title to reflect what we're doing here.

internal/controller/pod_controller.go Outdated Show resolved Hide resolved
internal/controller/pod_controller.go Outdated Show resolved Hide resolved
}

// OnAdd is called by the informer when a Pod is added.
func (h *PodEventHandler) OnAdd(obj interface{}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this and the following methods part of an interface that Kubernetes uses?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the OnAdd, OnChange, OnDelete methods all implement cache.ResourceEventHandler. I've updated the method comment.

internal/controller/setup.go Outdated Show resolved Hide resolved
@hessjcg hessjcg requested a review from pwschuurman May 10, 2023 19:58
Copy link
Collaborator Author

@hessjcg hessjcg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in response to @enocom

internal/controller/pod_controller.go Outdated Show resolved Hide resolved
internal/controller/pod_controller.go Outdated Show resolved Hide resolved
}

// OnAdd is called by the informer when a Pod is added.
func (h *PodEventHandler) OnAdd(obj interface{}) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the OnAdd, OnChange, OnDelete methods all implement cache.ResourceEventHandler. I've updated the method comment.

internal/controller/setup.go Outdated Show resolved Hide resolved
@hessjcg hessjcg requested a review from enocom May 10, 2023 19:59
Copy link
Member

@enocom enocom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd like @pwschuurman to approve this one just to make sure we have the logic and nuances right here.

Copy link
Member

@enocom enocom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd like @pwschuurman to approve this one just to make sure we have the logic and nuances right here.

@pwschuurman
Copy link

/lgtm
/approve

@hessjcg hessjcg removed the request for review from jackwotherspoon May 12, 2023 21:55
@hessjcg
Copy link
Collaborator Author

hessjcg commented May 12, 2023

@enocom This needs an "Approve" review from you before I can merge it, since @pwschuurman is not a code owner.

@hessjcg hessjcg added the tests: run Run all the tests for this PR label May 15, 2023
@github-actions github-actions bot removed the tests: run Run all the tests for this PR label May 15, 2023
return nil, fmt.Errorf("there is an AuthProxyWorkloadConfiguration error reconciling this workload %v", wlConfigErr)
}

return wl.Pod, nil // updated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment needed here? Seems slightly vague, maybe change to // update pod? if that is what it is intended to say

updater *workload.Updater
}

// newDeletePodController constructs an podDeleteController
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// newDeletePodController constructs an podDeleteController
// newDeletePodController constructs a podDeleteController

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

return nil
}

// Check if this pod is in error and missing proxy containers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add detail here? is in error the proper wording? in error state perhaps?

Suggested change
// Check if this pod is in error and missing proxy containers
// Check if this pod is in error state and missing proxy containers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

// Check if this pod is in error and missing proxy containers
wlConfigErr := r.updater.CheckWorkloadContainers(wl, proxies)

// If this pod is in error, delete it. Simply logging an error is sufficient.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here should we be saying in error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@hessjcg hessjcg requested a review from jackwotherspoon May 15, 2023 17:51
@hessjcg hessjcg merged commit 4a02aa7 into main May 15, 2023
@hessjcg hessjcg deleted the gh-337-pod-config-sync branch May 15, 2023 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delete workloads pods that are missing an proxy container when they are in an error state
4 participants