Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu: copy nfdhook functionality to gpu-plugin #1492

Merged
merged 3 commits into from
Sep 12, 2023

Conversation

tkatila
Copy link
Contributor

@tkatila tkatila commented Aug 4, 2023

In order to support v0.14+ NFD and extended resources, write node labels into a file and store it for NFD's features.d directory.

The old functionality still exists in the initcontainer, but if it's used NFD has to be configured to allow it.

Fixes #1181

@tkatila tkatila force-pushed the gpu/labeling-integration-to-gpuplugin branch from bbc8d79 to 218a29d Compare August 4, 2023 10:58
@tkatila tkatila requested a review from ozhuraki as a code owner August 4, 2023 10:58
@codecov-commenter
Copy link

codecov-commenter commented Aug 4, 2023

Codecov Report

Merging #1492 (9006fff) into main (8d3ef43) will decrease coverage by 0.52%.
Report is 16 commits behind head on main.
The diff coverage is 39.62%.

❗ Current head 9006fff differs from pull request most recent head 55e57e3. Consider uploading reports for the commit 55e57e3 to get more accurate results

@@            Coverage Diff             @@
##             main    #1492      +/-   ##
==========================================
- Coverage   50.04%   49.52%   -0.52%     
==========================================
  Files          43       42       -1     
  Lines        4884     4917      +33     
==========================================
- Hits         2444     2435       -9     
- Misses       2301     2341      +40     
- Partials      139      141       +2     
Files Changed Coverage Δ
cmd/gpu_plugin/rm/gpu_plugin_resource_manager.go 69.88% <0.00%> (ø)
cmd/internal/labeler/labeler.go 63.63% <8.19%> (ø)
cmd/gpu_plugin/gpu_plugin.go 86.74% <50.00%> (-1.54%) ⬇️
pkg/controllers/gpu/controller.go 47.70% <94.11%> (+11.05%) ⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

cmd/gpu_plugin/gpu_plugin.go Outdated Show resolved Hide resolved
cmd/gpu_nfdhook/main.go Outdated Show resolved Hide resolved
cmd/internal/labeler/labeler.go Outdated Show resolved Hide resolved
cmd/gpu_plugin/gpu_plugin.go Outdated Show resolved Hide resolved
cmd/internal/labeler/labeler.go Outdated Show resolved Hide resolved
cmd/internal/labeler/labeler.go Show resolved Hide resolved
cmd/internal/labeler/labeler.go Outdated Show resolved Hide resolved
cmd/internal/labeler/labeler.go Outdated Show resolved Hide resolved
cmd/internal/labeler/labeler.go Show resolved Hide resolved
@tkatila tkatila force-pushed the gpu/labeling-integration-to-gpuplugin branch 2 times, most recently from c8a62e9 to 107d8a0 Compare August 21, 2023 06:24
@eero-t
Copy link
Contributor

eero-t commented Aug 21, 2023

My main issues have been fixed (last two items are more of nit-picking), so I think it would be time for other(s) review this too. @uniemimu ?

@tkatila tkatila force-pushed the gpu/labeling-integration-to-gpuplugin branch from 107d8a0 to 302a14b Compare August 21, 2023 12:59
cmd/gpu_plugin/gpu_plugin.go Outdated Show resolved Hide resolved
uniemimu
uniemimu previously approved these changes Aug 25, 2023
@tkatila tkatila force-pushed the gpu/labeling-integration-to-gpuplugin branch from 302a14b to fc959f6 Compare September 5, 2023 06:58
@tkatila tkatila requested a review from kad as a code owner September 5, 2023 06:58
@tkatila tkatila force-pushed the gpu/labeling-integration-to-gpuplugin branch from fc959f6 to 1511b06 Compare September 5, 2023 06:59
@tkatila
Copy link
Contributor Author

tkatila commented Sep 5, 2023

Changes in the PR:

  • debugfs access removed, replaced with NFD's rule
    • Some deprecated labels removed in the process
  • Initcontainer not used anymore: support left in-place but deployments do not use it.
  • Ext resource labels are now added when resource management is enabled
    • No additional cli argument
  • NFD feature file path is static

@tkatila tkatila force-pushed the gpu/labeling-integration-to-gpuplugin branch from 1511b06 to 5532976 Compare September 5, 2023 07:10
@mythi
Copy link
Contributor

mythi commented Sep 5, 2023

Changes in the PR:

LGTM.

* debugfs access removed, replaced with NFD's rule
  
  * Some deprecated labels removed in the process

Update the README table to match this change?

@tkatila
Copy link
Contributor Author

tkatila commented Sep 5, 2023

Update the README table to match this change?

Yes, that's needed. It might make sense to add the labeling info to GPU plugin also. Or create a new .md to which both initcontainer and plugin points to.

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed other than the operator code (as I'm not familiar enough with that).

@tkatila tkatila force-pushed the gpu/labeling-integration-to-gpuplugin branch from d43b1f4 to b8dc72a Compare September 6, 2023 10:52
cmd/gpu_plugin/labels.md Outdated Show resolved Hide resolved
@tkatila
Copy link
Contributor Author

tkatila commented Sep 7, 2023

PTAL

I'll squash the commits once approved.

@mythi
Copy link
Contributor

mythi commented Sep 8, 2023

LGTM

@eero-t
Copy link
Contributor

eero-t commented Sep 8, 2023

FYI: we've had some internal discussion on how to best label things, so label rules & names have been changing a bit, but we should have now consensus on final ones (i.e. there will be one more small update to them with squash).

cmd/gpu_plugin/labels.md Outdated Show resolved Hide resolved
cmd/gpu_plugin/labels.md Outdated Show resolved Hide resolved
uniemimu
uniemimu previously approved these changes Sep 11, 2023
@tkatila tkatila force-pushed the gpu/labeling-integration-to-gpuplugin branch from 55e57e3 to 4235e69 Compare September 11, 2023 10:43
@tkatila
Copy link
Contributor Author

tkatila commented Sep 11, 2023

@hj-johannes-lee can you please review?

mythi
mythi previously approved these changes Sep 12, 2023
@mythi
Copy link
Contributor

mythi commented Sep 12, 2023

@uniemimu this should be OK to merge I think

@tkatila tkatila dismissed stale reviews from mythi and uniemimu via 01d5042 September 12, 2023 10:43
@tkatila tkatila force-pushed the gpu/labeling-integration-to-gpuplugin branch from 4235e69 to 01d5042 Compare September 12, 2023 10:43
NFD v0.14+ doesn't support binary NFD hooks by default, so there is
a need to move the label creation away from the GPU nfdhook.

Move extended resource label creation to plugin, and drop labels that were
already marked deprecated (platform_gen, media_version etc.).

Drop init-container from deployment files and operator. It is still possible
to use an initcontainer, but the default deployments do not support it.

Signed-off-by: Tuomas Katila <[email protected]>
@tkatila tkatila force-pushed the gpu/labeling-integration-to-gpuplugin branch from 01d5042 to bd2a35d Compare September 12, 2023 11:22
@uniemimu uniemimu merged commit 827b9a0 into intel:main Sep 12, 2023
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move from NFD hook usage to NFD feature-file usage
5 participants