-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CDI for device plugins KEP for GA graduation #4446
Update CDI for device plugins KEP for GA graduation #4446
Conversation
/cc @klueska |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fill in this section as part of this update?
keps/sig-node/4009-add-cdi-devices-to-device-plugin-api/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/4009-add-cdi-devices-to-device-plugin-api/README.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Kevin Hannon <[email protected]>
9bbd2ef
to
6697eb3
Compare
Thanks for pointing this out. Added the links, but DevicePlugin tests are failing. I'm considering running CDI-related tests separately. Will update the links when it's done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should have coverage for both containerd/crio? I've seen some container runtime related features have this as a requirement for GA. I think CDI is supported in both crio and containerd but not sure about test coverage.
@kannon92 Yes, I think adding CRI-O job would make sense. It would be even easier than Containerd job as CDI is enabled in CRI-O out of the box. |
/assign @johnbelamaric or a shadow! |
@klueska @SergeyKanzhelev @mrunalp Can you lgtm and approve, please? |
/lgtm |
Links to tests: | ||
- TBD: Will fill-in by code freeze | ||
Links to test grid: | ||
- https://testgrid.k8s.io/sig-node-containerd#e2e-cos-device-plugin-gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests are all red https://testgrid.k8s.io/sig-node-containerd#e2e-cos-device-plugin-gpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to fix them and add CRI-O tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea All DevicePlugin test cases except one (unrrelated to CDI) pass locally:
$ make test-e2e-node FOCUS='Device Plugin' SKIP='\[Flaky\]' PARALLELISM=1
...
Summarizing 1 Failure:
[FAIL] [sig-node] Device Plugin [Feature:DevicePluginProbe] [NodeFeature:DevicePluginProbe] [Serial] DevicePlugin [Serial] [Disruptive] [BeforeEach] Keeps device plugin assignments across node reboots (no pod restart, no device plugin re-registration) [sig-node, Feature:DevicePluginProbe, NodeFeature:DevicePluginProbe, Serial, Disruptive]
k8s.io/kubernetes/test/e2e/framework/pod/pod_client.go:106
Ran 10 of 559 Specs in 859.812 seconds
FAIL! -- 9 Passed | 1 Failed | 0 Pending | 549 Skipped
--- FAIL: TestE2eNode (859.85s)
FAIL
Ginkgo ran 1 suite in 14m20.00390501s
Test infra seems to be broken as https://testgrid.k8s.io/sig-node-containerd#e2e-cos-device-plugin-gpu fails to start device plugins tests with this error:
Last output from querying API server follows:
-----------------------------------------------------
* Trying 34.168.10.172:443...
* connect to 34.168.10.172 port 443 failed: Connection refused
* Failed to connect to 34.168.10.172 port 443 after 39 ms: Couldn't connect to server
* Closing connection 0
curl: (7) Failed to connect to 34.168.10.172 port 443 after 39 ms: Couldn't connect to server
From the first look it looks like it uses external IP instead of internal one.
Anyway, my point is that this should be fixed before this feature goes GA, but it shouldn't prevent to merge this PR.
Considering that Device Plugins are already GA and most test cases are not related to this KEP, would it make sense to create new jobs to test only this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea @kannon92 @SergeyKanzhelev WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is your call really, I just pointed out that the links to demonstrate the test health were red ... tindependently of the KEP I think you should have coverage on device plugins too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they definitely need to be fixed. I've created an issue for that: kubernetes/test-infra#31849
And I'm going to separate CDI-related tests at least until this feature goes GA. Will update both links when it's done.
|
||
Links to k8s-triage for tests: | ||
- TBD: Will fill-in by code freeze | ||
- https://storage.googleapis.com/k8s-triage/index.html?test=DevicePlugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and totally flake https://storage.googleapis.com/k8s-triage/index.html?test=DevicePlugin
@aojea ^^^^ |
/approve for PRR. Obviously the tests Antonio pointed out need to be working before release. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bart0sh, johnbelamaric, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
One-line PR description: Update
CDI for device plugins
KEP for GA graduationIssue link: Add CDI devices to device plugin API #4009