-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Finalizers to Jobs so ReplicatedJobStatus can be updated before jobs are removed. #360
Comments
Hey @kannon92 @danielvegamyhre I can take on this issue but I need some more details and steps to get started as I am a new contributor |
Hi @kannon92 @danielvegamyhre , do we still need this issue? I would like to get involved in this issue recently, and once this PR #490 is merged, I can start working on resolving this issue. |
/assign |
Is this issue still needed? I am a little confused about something. If we create a Finalizer for a sub-job, it means that we cannot delete the job arbitrarily through kubectl. |
I'm not sure if this is correct, but here's what I came up with: func (r *JobSetReconciler) deleteJobs(ctx context.Context, jobsForDeletion []*batchv1.Job) error {
...
workqueue.ParallelizeUntil(ctx, constants.MaxParallelism, len(jobsForDeletion), func(i int) {
targetJob := jobsForDeletion[i]
// Skip deleting jobs with deletion timestamp already set.
if targetJob.DeletionTimestamp != nil {
return
}
// Remove finalizer before deleting the job.
err := r.removeFinalizer(ctx, targetJob)
if err != nil {
...
}
// Delete job. This deletion event will trigger another reconciliation,
// where the jobs are recreated.
foregroundPolicy := metav1.DeletePropagationForeground
if err := r.Delete(ctx, targetJob, &client.DeleteOptions{PropagationPolicy: &foregroundPolicy}); client.IgnoreNotFound(err) != nil {
...
}
log.V(2).Info("successfully deleted job", "job", klog.KObj(targetJob), "restart attempt", targetJob.Labels[targetJob.Labels[constants.RestartsKey]])
})
return errors.Join(finalErrs...)
}
func (r *JobSetReconciler) removeFinalizer(ctx context.Context, job *batchv1.Job) error {
// Remove finalizer
if controllerutil.ContainsFinalizer(job, jobset.JobSetFinalizerName) {
controllerutil.RemoveFinalizer(job, jobset.JobSetFinalizerName)
}
// Update Job object
return r.Update(ctx, job)
} func (r *JobSetReconciler) createJobs(ctx context.Context, js *jobset.JobSet, jobs []*batchv1.Job) error {
...
workqueue.ParallelizeUntil(ctx, constants.MaxParallelism, len(jobs), func(i int) {
job := jobs[i]
// Add Finalizer
if !controllerutil.ContainsFinalizer(js, jobset.JobSetFinalizerName) {
controllerutil.AddFinalizer(js, jobset.JobSetFinalizerName)
}
// Set jobset controller as owner of the job for garbage collection and reconcilation.
...
// Create the job.
// TODO(#18): Deal with the case where the job exists but is not owned by the jobset.
if err := r.Create(ctx, job); err != nil {
...
}
...
})
allErrs := errors.Join(finalErrs...)
return allErrs
} I have another question, when I delete the Finalizer, I will update the job, but do I also need to update the jobset? Or when I update the job, the controller will automatically trigger. |
What would you like to be added:
Various policies in JobSet utilize ReplicatedJobStatus for tracking. If a Job is terminated on a successful run (ie via
ttlAfterFinished
is set and the update is quick), then we could lose the latest status updates.One should make sure to drop the finalizers after the ReplicatedJobStatus has been updated.
Why is this needed:
#244 (comment)
Adding a finalizer to the jobs to confirm that an update of the replicatedJobStatus happens before the job is deleted would be useful.
The text was updated successfully, but these errors were encountered: