Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator: remove one-cr-per-kind limitation #1579

Merged
merged 3 commits into from
Nov 13, 2023

Conversation

tkatila
Copy link
Contributor

@tkatila tkatila commented Oct 26, 2023

Differentiate objects by adding cr names as suffixes
Drop kind book keeping and related functions from controllers
Make serviceaccounts and clusterrolebindings shared between GPU CRs.
Deny GPU CRs with mixed resource manager selections.

Fixes #1529

@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2023

Codecov Report

Merging #1579 (94b6e1b) into main (dab899a) will increase coverage by 2.12%.
Report is 2 commits behind head on main.
The diff coverage is 8.33%.

❗ Current head 94b6e1b differs from pull request most recent head 4e06690. Consider uploading reports for the commit 4e06690 to get more accurate results

@@            Coverage Diff             @@
##             main    #1579      +/-   ##
==========================================
+ Coverage   49.56%   51.69%   +2.12%     
==========================================
  Files          42       42              
  Lines        4965     4869      -96     
==========================================
+ Hits         2461     2517      +56     
+ Misses       2361     2206     -155     
- Partials      143      146       +3     
Files Coverage Δ
pkg/controllers/dlb/controller.go 18.48% <100.00%> (+1.41%) ⬆️
pkg/controllers/dsa/controller.go 7.47% <100.00%> (+0.72%) ⬆️
pkg/controllers/fpga/controller.go 19.11% <100.00%> (+2.45%) ⬆️
pkg/controllers/iaa/controller.go 7.47% <100.00%> (+0.72%) ⬆️
pkg/controllers/qat/controller.go 39.45% <100.00%> (+29.93%) ⬆️
pkg/controllers/sgx/controller.go 11.96% <100.00%> (+1.22%) ⬆️
pkg/controllers/gpu/controller.go 48.88% <3.12%> (+0.87%) ⬆️
pkg/controllers/reconciler.go 5.26% <0.00%> (+0.69%) ⬆️

... and 1 file with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with operator code, but this looks pretty straighforward simplification.

Only thing I did not like is the inconsistent use of prefixedName(). Why not use it everywhere (also tests)?

cmd/operator/README.md Outdated Show resolved Hide resolved
@mythi
Copy link
Contributor

mythi commented Oct 27, 2023

Only thing I did not like is the inconsistent use of prefixedName(). Why not use it everywhere (also tests)?

There is a bigger problem to think about. I'm not sure we need to have new service accounts created for every new GpuDevicePlugin. Or do we? Following this, my question is can this be simplified so that the operator setup creates one GPU service account instead of giving it's the powers for the GPU controller to do it?

@tkatila
Copy link
Contributor Author

tkatila commented Oct 27, 2023

There is a bigger problem to think about. I'm not sure we need to have new service accounts created for every new GpuDevicePlugin. Or do we? Following this, my question is can this be simplified so that the operator setup creates one GPU service account instead of giving it's the powers for the GPU controller to do it?

Yeah, this did cross my mind when making the changes. As there can be different CRs (with and without resource management), the reconciler would need to keep track of the need for these additional objects. Create them when first one comes up and remove them when the last one goes away.

I'll think about how to do that.

@tkatila
Copy link
Contributor Author

tkatila commented Nov 1, 2023

Changed service account and cluster role bindings to be shared, and added garbage collection to remove them.
Added a check to GPU webhook to prevent cases where one CR is with resourseManager and another is without.

@tkatila tkatila force-pushed the operator/multiple-crs branch 2 times, most recently from c40f477 to 1c52c3e Compare November 1, 2023 14:06
Differentiate objects by adding cr names as suffixes
Drop kind book keeping and related functions from controllers

Signed-off-by: Tuomas Katila <[email protected]>
@tkatila tkatila marked this pull request as ready for review November 9, 2023 12:10
ozhuraki
ozhuraki previously approved these changes Nov 9, 2023
pkg/controllers/reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/reconciler.go Outdated Show resolved Hide resolved
Additional objects are shared between device plugin CRs. Once the last
CR is removed, the additional objects are also removed.

Signed-off-by: Tuomas Katila <[email protected]>
@mythi mythi merged commit f8eeeef into intel:main Nov 13, 2023
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Qat: Ability to do node configurations in cluster with large number of QAT nodes more easily
5 participants