Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAS: PodSet label and Workload annotation for PodTemplates #3228

Merged
merged 1 commit into from
Oct 15, 2024

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Oct 14, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Part of #2724

Special notes for your reviewer:

The annotation and label are used only for TAS now, but the names are generic since they can be used in other contexts too. For TAS they are used by TopologyUngater to lookup The admitted Workload from a Pod.

Does this PR introduce a user-facing change?

Introduce the new PodTemplate annotation kueue.x-k8s.io/workload, and label kueue.x-k8s.io/podset.
The annotation and label are alpha-level and gated by the new TopologyAwareScheduling feature gate.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 14, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Oct 14, 2024

/test all

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 14, 2024
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 14, 2024
Copy link

netlify bot commented Oct 14, 2024

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 9ce743b
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/670e2dccd94b730008e5cc77
😎 Deploy Preview https://deploy-preview-3228--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 14, 2024
@mimowo mimowo marked this pull request as ready for review October 14, 2024 13:36
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 14, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Oct 14, 2024

cc @PBundyra

@tenzen-y
Copy link
Member

/cc

@@ -39,4 +39,14 @@ const (

// ProvReqAnnotationPrefix is the prefix for annotations that should be pass to ProvisioningRequest as Parameters.
ProvReqAnnotationPrefix = "provreq.kueue.x-k8s.io/"

// WorkloadAnnotation is an annotation set on the Job's PodTemplate to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So reading this I think its limiting to a K8s job. I think you mean a general kueue supported job but not sure what term we pick to reference this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is supported for any Job (can be MPIJob, JobSet, etc.). In Kueue we say Job in many contexts meaning any of the CRDs, see the user-facing docs: https://kueue.sigs.k8s.io/docs/overview/.

@@ -772,6 +774,8 @@ func TestReconciler(t *testing.T) {
*basePodWrapper.
Clone().
Label(constants.ManagedByKueueLabel, "true").
Label(controllerconsts.PodSetLabel, "dc85db45").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define these as constants and reuse that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the best course of action, because:

  1. this string is used 105 times in the file already:
    *utiltesting.MakePodSet("dc85db45", 2).
    , and so it would mix a feature with extended cleanup
  2. if I introduce a const like podSetNameForTest at the top of the file it could suggest it should be used in all tests, but it is derived from the Pods actually, it was just copy-pasted many times.

So, I don't see this would be very clean, but let me know your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind if I do a cleanup?

I agree with you that this should not be in scope for this feature then. I didn't see how many occurences this has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, feel free to follow up! maybe instead of hard-coding a const we could compute the hash in tests, but not sure at the moment.

@kannon92
Copy link
Contributor

/hold

Since author has approval rights, we should unhold when it is ready to merge.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2024
Comment on lines 43 to 46
// WorkloadAnnotation is an annotation set on the Job's PodTemplate to
// indicate the name of the admitted Workload corresponding to the Job. The
// annotation is set when starting the Job, and removed on stopping the Job.
WorkloadAnnotation = "kueue.x-k8s.io/workload"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between this annotation and the Workload name hash?

func GetWorkloadNameForOwnerWithGVK(ownerName string, ownerUID types.UID, ownerGVK schema.GroupVersionKind) string {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, is there any case where this annotation and pre-build workload annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between this annotation and the Workload name hash?

This new annotation contains the full workload name, for example job-sample-job-12345, where 12345 is the hash. The annotation is set on PodTemplate so that newly created Pods carry the information to lookup the Workload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, is there any case where this annotation and pre-build workload annotation?

First, note that the two annotations are set at different levels. Prebuilt-workload is the Job-level annotations, whilst the new annotation is set at the PodTemplate level, so that the new Pods "know" their workload.

This might happen on MultiKueue worker cluster. The MK would set the prebuit-workload annotation, and the admission process on the worker will set the new "workload" annotation. However, the new annotation is set on the PodTemplate rather than at the Job level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new annotation contains the full workload name, for example job-sample-job-12345, where 12345 is the hash. The annotation is set on PodTemplate so that newly created Pods carry the information to lookup the Workload.

In that case, we can remove the hash mechanism after we graduate this feature to GA?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might happen on MultiKueue worker cluster. The MK would set the prebuit-workload annotation, and the admission process on the worker will set the new "workload" annotation. However, the new annotation is set on the PodTemplate rather than at the Job level.

I see, so the Pod has only new "workload" annotation. In that case, there seems no to be conflict.
Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we can remove the hash mechanism after we graduate this feature to GA?

I think the hash mechanism is here to stay, and was introduced prior to TAS. The hash in the workload name is to ensure Workload names don't exceed the max length (currently defined as 253 chars). This is needed because workload names add the prefix to Job names. For example for an MPIJob "test-job", we have "mpijob-test-job-".

Comment on lines 976 to 977
info.Labels[controllerconsts.PodSetLabel] = podSetFlavor.Name
info.Annotations[controllerconsts.WorkloadAnnotation] = w.Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we guard these labels and annotations insertion by the feature gates?

Copy link
Contributor Author

@mimowo mimowo Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's summarize the options:

  1. no feature gate (as currently)
  2. use the TopologyAwareScheduling feature-gate
  3. use another dedicated feature gate like PodTemplatePodSetLabelAndWorkloadAnnotation

TBH, I would be leaning towards 1. as the code is simple and low risk.

My second choice would be (2.) as it is introduced for the needs for TopologyAwareScheduling. if we do (2.) we can always enable it by or with other feature gates. For example TopologyAwareScheduling || AnotherFeatureNeedingTheLabels. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with @mimowo offline.
We decided to add the TAS gate here so that we can break this labels and annotations specification during the beta / GA graduation. For example, we may want to move the "Workload" annotation to labels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. After a second thought I think it is safer. For the purpose of fast lookup of the workload from Pod it would be better to have a Workload label rather than annotation. For now it is an annotation as the workload names can be longer that 63 characters, so it requires some preparatory work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to gate it for 0.9, but this has the potential to be useful beyond TAS. An efficient mapping from Pods back to their Workloads is a very useful building block for general fault tolerance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you.
So, the during beta graduation, we will consider if we move all to labels or change key name. After beta graduation, these labels are added by default.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 15, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Oct 15, 2024

FYI following the discussion here. I have updated the PR to introduce the TopologyAwareScheduling gate which I will also reuse in the follow up PRs for the feature. PTAL.

@PBundyra
Copy link
Contributor

/lgtm
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 15, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 15, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 413450b80b29c498e21873c3a408249d0ce8c058

@k8s-ci-robot k8s-ci-robot merged commit 167380e into kubernetes-sigs:main Oct 15, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.9 milestone Oct 15, 2024
@mimowo mimowo changed the title PodSet label and Workload annotation for PodTemplates TAS: PodSet label and Workload annotation for PodTemplates Oct 15, 2024
kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants