live-EKS automatically remove completed Kubernetes Jobs created by a CronJob #3055

vijay-veeranki · 2021-07-16T14:55:37Z

When users create cronjobs, we do clean-up by completed Jobs that cleans up the Pods they create, which helps the Kubernetes cluster to use its CPU and memory resources efficiently.

This ticket is to work on the below 3 issues related to clean up in eks-live.

We suggest users use ttlSecondsAfterFinished, but that is only supported from K8's v1.20.
https://user-guide.cloud-platform.service.justice.gov.uk/documentation/other-topics/Cronjobs.html#deploying-a-cronjob-to-your-namespace.

Issue related to it:
aws/containers-roadmap#255

Investigate if there are any workarounds or should we wait until v1.20 and communicate to users before migration to EKS-Live.

We have a delete-completed-jobs concourse job which will clean up all completed jobs which do not have ttlSecondsAfterFinished defined.

Set up this job in eks-live cluster.

User can set up ".spec.successfulJobsHistoryLimit" and ".spec.failedJobsHistoryLimit fields". These fields specify how many completed and failed jobs should be kept. But it is not working as expected.

Try to set up a corn job with these fields and figure out why they are not working on EKS as it is working on live-1

poornima-krishnasamy · 2021-08-04T18:18:42Z

Because ttlSecondsAfterFinished will not be considered there is a much need for deleting completed jobs as a garbage collection.

if users use cronjob they can setup failedJobsHistoryLimit and successfulJobsHistoryLimit which we need to respect and not delete those using the delete-completed-jobs pipeline. Currently all jobs are deleted irrespective of whether it is owned by cronjob or not.
Hence we need a more robust approach something like https://github.com/lwolf/kube-cleanup-operator which will ignore cronjobs but do the cleanup of jobs after completion.

By default the fields have these values
.spec.successfulJobsHistoryLimit : 3
spec.failedJobsHistoryLimit: 1

For the ".spec.successfulJobsHistoryLimit" and ".spec.failedJobsHistoryLimit fields", it works based on the restartPolicy and BackoffLimit. The job is set to be failed based on these other parameters. When testing these fields in EKS it works as expected and seen the similar behaviour as in the kops test cluster.

poornima-krishnasamy · 2021-08-06T11:53:06Z

After discussing with the team we have decided

Not to run delete completed jobs pipeline until the KubeTooManyPods are triggered in "live" or when "ttlSecondsAfterFinished" is enabled in EKS 1.20
Let the completed/failed jobs get deleted based on users setup failedJobsHistoryLimit and successfulJobsHistoryLimit
Update the migration guide to include the setup failedJobsHistoryLimit and successfulJobsHistoryLimit and the jobs will get deleted based on default 3 and 1 limit

poornima-krishnasamy · 2021-08-06T18:04:34Z

This page needs amending after everyone moved to EKS:

https://user-guide.cloud-platform.service.justice.gov.uk/documentation/other-topics/Cronjobs.html#clean-up-finished-jobs

vijay-veeranki added the Support label Jul 16, 2021

AntonyBishop added the next-sprint label Jul 21, 2021

leoncarrington removed the next-sprint label Jul 25, 2021

poornima-krishnasamy self-assigned this Aug 2, 2021

poornima-krishnasamy mentioned this issue Aug 6, 2021

Fill in the user-guide for EKS migration - namespace resource changes #3122

Closed

3 tasks

poornima-krishnasamy closed this as completed Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

live-EKS automatically remove completed Kubernetes Jobs created by a CronJob #3055

live-EKS automatically remove completed Kubernetes Jobs created by a CronJob #3055

vijay-veeranki commented Jul 16, 2021

poornima-krishnasamy commented Aug 4, 2021 •

edited

Loading

poornima-krishnasamy commented Aug 6, 2021

poornima-krishnasamy commented Aug 6, 2021

live-EKS automatically remove completed Kubernetes Jobs created by a CronJob #3055

live-EKS automatically remove completed Kubernetes Jobs created by a CronJob #3055

Comments

vijay-veeranki commented Jul 16, 2021

poornima-krishnasamy commented Aug 4, 2021 • edited Loading

poornima-krishnasamy commented Aug 6, 2021

poornima-krishnasamy commented Aug 6, 2021

poornima-krishnasamy commented Aug 4, 2021 •

edited

Loading