Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start JobFramework controller and webhook on new CRDs availability #2574

Merged

Conversation

ChristianZaccaria
Copy link
Contributor

@ChristianZaccaria ChristianZaccaria commented Jul 11, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add function to wait for API availability using the RESTMapper in controller manager.

This PR checks for availability of JobFramework integrations in the Kubernetes cluster before executing the action().

The function utilises the RESTMapper from the controller-runtime manager to verify the existence of the CRD by its GroupVersionKind (GVK). If one of the CRDs become available, the respective controller and webhook will start.

The user/dev will no longer be required to restart the Kueue pod to allow for integration on new CRDs available.

No restarts on the Kueue pod is performed.

Which issue(s) this PR fixes:

Fixes #2414

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Detect and enable support for job CRDs installed after Kueue starts.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 11, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 11, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @ChristianZaccaria. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 11, 2024
Copy link

netlify bot commented Jul 11, 2024

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 86e480d
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66c8e26c405f47000744dd52
😎 Deploy Preview https://deploy-preview-2574--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ChristianZaccaria
Copy link
Contributor Author

cc: @trasc thank you for your previous input on this.

@mbobrovskyi
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 11, 2024
pkg/controller/jobframework/setup.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/setup.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/setup.go Outdated Show resolved Hide resolved
Copy link
Contributor

@KPostOffice KPostOffice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I tested this out on a cluster and it worked as expected for me

pkg/controller/jobframework/setup.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/setup.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/setup.go Outdated Show resolved Hide resolved
@ChristianZaccaria
Copy link
Contributor Author

Rebased

Copy link

@Ygnas Ygnas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected.

@ChristianZaccaria
Copy link
Contributor Author

ChristianZaccaria commented Jul 15, 2024

Hi @trasc, wondering what are the next steps for this PR to be successful and merged? Thank you.

@ChristianZaccaria ChristianZaccaria force-pushed the restmapper-crd-start branch 3 times, most recently from 7f87160 to c98e070 Compare July 16, 2024 16:46
@k8s-ci-robot k8s-ci-robot removed the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 17, 2024
@ChristianZaccaria
Copy link
Contributor Author

Rebased

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@@ -123,29 +149,73 @@ func TestSetupControllers(t *testing.T) {
return gvk.GroupVersion()
})
mapper := apimeta.NewDefaultRESTMapper(gvs)
testMapper := &TestRESTMapper{
DefaultRESTMapper: mapper,
lock: sync.RWMutex{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this line is not needed

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a08aaa26c555df954eb00e2fbe84214100690f0a

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, ChristianZaccaria, Ygnas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 23, 2024
@k8s-ci-robot k8s-ci-robot merged commit 21bd446 into kubernetes-sigs:main Aug 23, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.9 milestone Aug 23, 2024
@mimowo
Copy link
Contributor

mimowo commented Sep 4, 2024

@tenzen-y @alculquicondor as we are about to prepare the 0.8.1 release - how do you find cherry-picking this PR?

I feel this issue is on a boundary between a feature and a bug. It could be considered a feature, because we didn't plan for it up front. OTOH, I participated in some debugging sessions for users, and I think the users' perception was always that this is a bug.

Finally, the change does not introduce any API changes, so I would suggest considering it for cherry-picking.

@tenzen-y
Copy link
Member

tenzen-y commented Sep 4, 2024

OTOH, I participated in some debugging sessions for users, and I think the users' perception was always that this is a bug.

Thank you for sharing this experience. In that case, I agree with cherry-picking.

@tenzen-y tenzen-y mentioned this pull request Sep 4, 2024
27 tasks
@mimowo
Copy link
Contributor

mimowo commented Sep 5, 2024

/cherry-pick release-0.8

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: #2574 failed to apply on top of branch "release-0.8":

Applying: Start Controller and Webhook on new CRDs availability
Applying: Log errors from JobFramework controller and webhook
Applying: Add test case for delayed JobFramework API becoming available
Applying: Wait for API integration to be enabled
Applying: Implement synchronization for safe concurrent access
Using index info to reconstruct a base tree...
M	pkg/controller/jobframework/integrationmanager.go
M	pkg/controller/jobframework/setup.go
M	pkg/controller/jobframework/setup_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/controller/jobframework/setup_test.go
Auto-merging pkg/controller/jobframework/setup.go
Auto-merging pkg/controller/jobframework/integrationmanager.go
Applying: Implement exponential backoff for waitForAPI() and add function type to streamline REST mapping checks
Using index info to reconstruct a base tree...
M	pkg/controller/jobframework/setup.go
M	pkg/controller/jobframework/setup_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/controller/jobframework/setup_test.go
Auto-merging pkg/controller/jobframework/setup.go
Applying: Add Job Framework API to mapper directly
Using index info to reconstruct a base tree...
M	pkg/controller/jobframework/integrationmanager.go
M	pkg/controller/jobframework/integrationmanager_test.go
M	pkg/controller/jobframework/setup.go
M	pkg/controller/jobframework/setup_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/controller/jobframework/setup_test.go
Auto-merging pkg/controller/jobframework/setup.go
Auto-merging pkg/controller/jobframework/integrationmanager_test.go
CONFLICT (content): Merge conflict in pkg/controller/jobframework/integrationmanager_test.go
Auto-merging pkg/controller/jobframework/integrationmanager.go
CONFLICT (content): Merge conflict in pkg/controller/jobframework/integrationmanager.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0007 Add Job Framework API to mapper directly
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mimowo
Copy link
Contributor

mimowo commented Sep 5, 2024

There is a conflict in integrationmanager. @ChristianZaccaria would you like to prepare the cherry-pick manually? You can use the https://github.com/kubernetes-sigs/kueue/blob/main/hack/cherry_pick_pull.sh script

@ChristianZaccaria
Copy link
Contributor Author

Thank you, on it now.

@ChristianZaccaria
Copy link
Contributor Author

For reference, here is the generated PR: #2991 Thanks!

k8s-ci-robot pushed a commit that referenced this pull request Sep 5, 2024
…RDs availability (#2991)

* Start Controller and Webhook on new CRDs availability

* Log errors from JobFramework controller and webhook

* Add test case for delayed JobFramework API becoming available

* Wait for API integration to be enabled

* Implement synchronization for safe concurrent access

* Implement exponential backoff for waitForAPI() and add function type to streamline REST mapping checks

* Add Job Framework API to mapper directly

* Cast mgr.GetRESTMapper() to *TestRESTMapper and ensure proper locking for tests
kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024
…ubernetes-sigs#2574)

* Start Controller and Webhook on new CRDs availability

* Log errors from JobFramework controller and webhook

* Add test case for delayed JobFramework API becoming available

* Wait for API integration to be enabled

* Implement synchronization for safe concurrent access

* Implement exponential backoff for waitForAPI() and add function type to streamline REST mapping checks

* Add Job Framework API to mapper directly

* Cast mgr.GetRESTMapper() to *TestRESTMapper and ensure proper locking for tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a watch for Ray and Training Operator CRDs