Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy some labels from Pods/Jobs into the Workload object #1959

Merged
merged 21 commits into from
Apr 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions apis/config/v1beta1/configuration_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,17 @@ type Integrations struct {
Frameworks []string `json:"frameworks,omitempty"`
// PodOptions defines kueue controller behaviour for pod objects
PodOptions *PodIntegrationOptions `json:"podOptions,omitempty"`

// labelKeysToCopy is a list of label keys that should be copied from the job into the
// workload object. It is not required for the job to have all the labels from this
// list. If a job does not have some label with the given key from this list, the
// constructed workload object will be created without this label. In the case
// of creating a workload from a composable job (pod group), if multiple objects
// have labels with some key from the list, the values of these labels must
// match or otherwise the workload creation would fail. The labels are copied only
// during the workload creation and are not updated even if the labels of the
// underlying job are changed.
LabelKeysToCopy []string `json:"labelKeysToCopy,omitempty"`
}

type PodIntegrationOptions struct {
Expand Down
5 changes: 5 additions & 0 deletions apis/config/v1beta1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion cmd/importer/pod/import.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ func Import(ctx context.Context, c client.Client, cache *util.ImportCache, jobs

kp := pod.FromObject(p)
// Note: the recorder is not used for single pods, we can just pass nil for now.
wl, err := kp.ConstructComposableWorkload(ctx, c, nil)
wl, err := kp.ConstructComposableWorkload(ctx, c, nil, nil)
if err != nil {
return false, fmt.Errorf("construct workload: %w", err)
}
Expand Down
1 change: 1 addition & 0 deletions cmd/kueue/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,7 @@ func setupControllers(mgr ctrl.Manager, cCache *cache.Cache, queues *queue.Manag
jobframework.WithIntegrationOptions(corev1.SchemeGroupVersion.WithKind("Pod").String(), cfg.Integrations.PodOptions),
jobframework.WithEnabledFrameworks(cfg.Integrations),
jobframework.WithManagerName(constants.KueueName),
jobframework.WithLabelKeysToCopy(cfg.Integrations.LabelKeysToCopy),
}
if err := jobframework.SetupControllers(mgr, setupLog, opts...); err != nil {
setupLog.Error(err, "Unable to create controller or webhook", "kubernetesVersion", serverVersionFetcher.GetServerVersion())
Expand Down
121 changes: 121 additions & 0 deletions keps/1834-copy-labels-into-workload/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# KEP-1834: Copy some labels from Pods/Jobs into the Workload object

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit Tests](#unit-tests)
- [Integration tests](#integration-tests)
- [Graduation Criteria](#graduation-criteria)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
<!-- /toc -->

## Summary

When creating a kueue workload, copy the job (or pod) labels into the workload object (see https://github.com/kubernetes-sigs/kueue/issues/1834). Allow to configure which labels should be copied.

## Motivation

Currently the workloads do not "inherit" any of the labels of the jobs based on which they are created. Since workloads in Kueue are internal representations of the Kubernetes jobs, allowing to easily copy the labels of the job object into the workload object is natural and useful. For instance it would facilitate selection/filtering of the workloads by the administrators.

### Goals
alculquicondor marked this conversation as resolved.
Show resolved Hide resolved

* Establish a mechanism to configure which set of labels should be copied.
* Copy the selected labels when creating the workload based on a job.

### Non-Goals
* This proposal does not contain any form of validation or analysis of the labels as they are going to be copied from one object to another.
* This proposal only concerns copying of labels into newly created workloads. Updating the labels of an existing workload (for example if the label of the underlying job is changed) is out of scope of this proposal.

## Proposal

We want to do the following API change. The proposal is to add a field named `labelKeysToCopy` into the configuration API (under `Integrations`). This field will hold a list of keys of labels that should be copied. This configuration will be global in the sense that it will apply to all the job frameworks. Since the list `labelKeysToCopy` will be empty by default, this change will not affect the existing functionality.

We will not require for all the labels with keys from `labelKeysToCopy` to be present at the job object. When some of the labels will not be present at the job object, this label will not be assigned to the created workload object.

A case that requires more attention is creating workloads from pod groups because in that case a single workload is based on multiple pods (each of which might have labels). We propose:
alculquicondor marked this conversation as resolved.
Show resolved Hide resolved
* When a label from the `labelKeysToCopy` list will be present at some of the pods from the group and the value of this label on all these pods will be identical then this label will be copied into the workload. Note that we do not require that all the pods have the label but all that do have must have the same value.
* When multiple pods from the group will have the label with different values, we will raise an exception. The exception will be raised during the workload creation.

alculquicondor marked this conversation as resolved.
Show resolved Hide resolved

### Risks and Mitigations

None.

## Design Details


The proposal contains a single modification to the API. We propose to add a new field `LabelKeysToCopy` to `Integrations` struct in the `Configuration` API.
``` go
type Integrations struct {
...
// labelKeysToCopy is a list of label keys that should be copied from the job into the
// workload object. It is not required for the job to have all the labels from this
// list. If a job does not have some label with the given key from this list, the
// constructed workload object will be created without this label. In the case
// of creating a workload from a composable job (pod group), if multiple objects
// have labels with some key from the list, the values of these labels must
// match or otherwise the workload creation would fail. The labels are copied only
// during the workload creation and are not updated even if the labels of the
// underlying job are changed.
LabelKeysToCopy []string `json:"labelKeysToCopy,omitempty"`
...
}
```
### Test Plan

<!--
**Note:** *Not required until targeted at a release.*
The goal is to ensure that we don't accept enhancements with inadequate testing.

All code is expected to have adequate tests (eventually with coverage
expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
when drafting this test plan.

[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
-->

[x] I/we understand the owners of the involved components may require updates to
existing tests to make this code solid enough prior to committing the changes necessary
to implement this enhancement.

##### Prerequisite testing updates

<!--
Based on reviewers feedback describe what additional tests need to be added prior
implementing this enhancement to ensure the enhancements have also solid foundations.
-->

#### Unit Tests

New unit tests should be added testing the functionality for jobs and pods.

#### Integration tests

The idea is to enhance the existing integrations tests to check if workload objects are created with correct labels.

### Graduation Criteria

## Implementation History

* 2024-04-08 First draft

## Drawbacks

With this KEP some workload objects that previously succeeded could fail to be created if such workload is based on a pod group with mismatched labels (i.e., pods in the same pod group having different label values for some label key that should be copied). This will only happen if a user explicitly configures this label key to be copied.

This proposal introduces the label copying only during the workload creation. If a user modifies labels on a running job, the modification will not be reflected on the workload object, which might be confusing.

## Alternatives

An alternative way of handling mismatched labels would be to copy an arbitrary value among the values on the individual pods. The downside of this approach is that such a case is most likely due to an error and should not be silently accepted.

Another alternative could be to copy all the labels by default and have a list of excluded keys.
25 changes: 25 additions & 0 deletions keps/1834-copy-labels-into-workload/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
title: Copy labels into workload
kep-number: 1834
authors:
- "@pajakd"
status: implementable
creation-date: 2024-04-08
reviewers:
- "@alculquicondor"
- "@mimowo"
approvers:
- "@alculquicondor"
alculquicondor marked this conversation as resolved.
Show resolved Hide resolved
- "@tenzen-y"

# The target maturity stage in the current dev cycle for this KEP.
stage: beta

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v0.7"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v0.7"
beta: "v0.7"
2 changes: 1 addition & 1 deletion pkg/controller/jobframework/interface.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ type ComposableJob interface {
// counts extracting from workload to all members of the ComposableJob.
Run(ctx context.Context, c client.Client, podSetsInfo []podset.PodSetInfo, r record.EventRecorder, msg string) error
// ConstructComposableWorkload returns a new Workload that's assembled out of all members of the ComposableJob.
ConstructComposableWorkload(ctx context.Context, c client.Client, r record.EventRecorder) (*kueue.Workload, error)
ConstructComposableWorkload(ctx context.Context, c client.Client, r record.EventRecorder, labelKeysToCopy []string) (*kueue.Workload, error)
// ListChildWorkloads returns all workloads related to the composable job
ListChildWorkloads(ctx context.Context, c client.Client, parent types.NamespacedName) (*kueue.WorkloadList, error)
// FindMatchingWorkloads returns all related workloads, workload that matches the ComposableJob and duplicates that has to be deleted.
Expand Down
19 changes: 15 additions & 4 deletions pkg/controller/jobframework/reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ type JobReconciler struct {
record record.EventRecorder
manageJobsWithoutQueueName bool
waitForPodsReady bool
labelKeysToCopy []string
}

type Options struct {
Expand All @@ -76,6 +77,7 @@ type Options struct {
IntegrationOptions map[string]any
EnabledFrameworks sets.Set[string]
ManagerName string
LabelKeysToCopy []string
}

// Option configures the reconciler.
Expand Down Expand Up @@ -140,6 +142,13 @@ func WithManagerName(n string) Option {
}
}

// WithLabelKeysToCopy
func WithLabelKeysToCopy(n []string) Option {
return func(o *Options) {
o.LabelKeysToCopy = n
}
}

var defaultOptions = Options{}

func NewReconciler(
Expand All @@ -153,6 +162,7 @@ func NewReconciler(
record: record,
manageJobsWithoutQueueName: options.ManageJobsWithoutQueueName,
waitForPodsReady: options.WaitForPodsReady,
labelKeysToCopy: options.LabelKeysToCopy,
}
}

Expand Down Expand Up @@ -789,11 +799,10 @@ func (r *JobReconciler) constructWorkload(ctx context.Context, job GenericJob, o
log := ctrl.LoggerFrom(ctx)

if cj, implements := job.(ComposableJob); implements {
wl, err := cj.ConstructComposableWorkload(ctx, r.client, r.record)
wl, err := cj.ConstructComposableWorkload(ctx, r.client, r.record, r.labelKeysToCopy)
if err != nil {
return nil, err
}

return wl, nil
}

Expand All @@ -803,7 +812,7 @@ func (r *JobReconciler) constructWorkload(ctx context.Context, job GenericJob, o
ObjectMeta: metav1.ObjectMeta{
Name: GetWorkloadNameForOwnerWithGVK(object.GetName(), object.GetUID(), job.GVK()),
Namespace: object.GetNamespace(),
Labels: map[string]string{},
Labels: maps.FilterKeys(job.Object().GetLabels(), r.labelKeysToCopy),
Finalizers: []string{kueue.ResourceInUseFinalizerName},
Annotations: admissioncheck.FilterProvReqAnnotations(job.Object().GetAnnotations()),
},
Expand All @@ -812,7 +821,9 @@ func (r *JobReconciler) constructWorkload(ctx context.Context, job GenericJob, o
QueueName: QueueName(job),
},
}

if wl.Labels == nil {
wl.Labels = make(map[string]string)
}
jobUid := string(job.Object().GetUID())
if errs := validation.IsValidLabelValue(jobUid); len(errs) == 0 {
wl.Labels[controllerconsts.JobUIDLabel] = jobUid
Expand Down
3 changes: 3 additions & 0 deletions pkg/controller/jobframework/reconciler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ func TestProcessOptions(t *testing.T) {
WithIntegrationOptions(corev1.SchemeGroupVersion.WithKind("Pod").String(), &configapi.PodIntegrationOptions{
PodSelector: &metav1.LabelSelector{},
}),
WithLabelKeysToCopy([]string{"toCopyKey"}),
},
wantOpts: Options{
ManageJobsWithoutQueueName: true,
Expand All @@ -124,6 +125,7 @@ func TestProcessOptions(t *testing.T) {
PodSelector: &metav1.LabelSelector{},
},
},
LabelKeysToCopy: []string{"toCopyKey"},
},
},
"a single option is passed": {
Expand All @@ -143,6 +145,7 @@ func TestProcessOptions(t *testing.T) {
WaitForPodsReady: false,
KubeServerVersion: nil,
IntegrationOptions: nil,
LabelKeysToCopy: nil,
},
},
}
Expand Down
35 changes: 35 additions & 0 deletions pkg/controller/jobs/job/job_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,41 @@ func TestReconciler(t *testing.T) {
},
},
},
"when workload is created, it has correct labels set": {
job: *baseJobWrapper.Clone().
Label("toCopyKey", "toCopyValue").
Label("dontCopyKey", "dontCopyValue").
UID("test-uid").
Obj(),
wantJob: *baseJobWrapper.Clone().
Label("toCopyKey", "toCopyValue").
Label("dontCopyKey", "dontCopyValue").
UID("test-uid").
Suspend(true).
Obj(),
reconcilerOptions: []jobframework.Option{
jobframework.WithLabelKeysToCopy([]string{"toCopyKey", "redundantToCopyKey"}),
},
wantWorkloads: []kueue.Workload{
*utiltesting.MakeWorkload("job", "ns").
Finalizers(kueue.ResourceInUseFinalizerName).
PodSets(*utiltesting.MakePodSet(kueue.DefaultPodSetName, 10).Request(corev1.ResourceCPU, "1").Obj()).
Queue("foo").
Priority(0).
Labels(map[string]string{
controllerconsts.JobUIDLabel: "test-uid",
"toCopyKey": "toCopyValue"}).
Obj(),
},
wantEvents: []utiltesting.EventRecord{
{
Key: types.NamespacedName{Name: "job", Namespace: "ns"},
EventType: "Normal",
Reason: "CreatedWorkload",
Message: "Created Workload: ns/" + GetWorkloadNameForJob(baseJobWrapper.Name, types.UID("test-uid")),
},
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add another case to verify if the labels are not populated to Workloads after the Workload is already created?

I'd like to prove this comment: "The labels are copied only during the workload creation and are not updated even if the labels of the underlying job are changed."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check in the job_controller integration test. Please take a look.

"when workload is admitted the PodSetUpdates are propagated to job": {
job: *baseJobWrapper.Clone().
Obj(),
Expand Down
Loading