-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multus: Add sample job manifest for multus config validation #12495
multus: Add sample job manifest for multus config validation #12495
Conversation
Hi, can someone please help me understand how to fix the |
what is the specific error? |
Not in CI, but in local testing of the job.
Note: I am not providing any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the submission @Nikhil-Ladha !
deploy/charts/library/templates/_cluster-clusterrolebinding.tpl
Outdated
Show resolved
Hide resolved
- apiGroups: [""] | ||
resources: ["configmaps", "configmaps/finalizers", "pods"] | ||
verbs: ["get", "create", "update", "delete"] | ||
- apiGroups: ["k8s.cni.cncf.io"] | ||
resources: ["network-attachment-definitions"] | ||
verbs: ["get"] | ||
- apiGroups: ["batch"] | ||
resources: ["jobs"] | ||
verbs: ["get", "list", "delete"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These definitely aren't all the privileges the serviceaccount will need. It'll also need the ability to get,create,delete daemonsets and get,list,create,delete,deletecollection deployments. Probably list pods also.
@Nikhil-Ladha am I wrong, or have you not been running the job in a development environment to see if it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been running the job in a dev env to confirm it's working, and that's when I am facing the above mentioned SCC issue.
Regarding the above privileges, they maybe required, but the thing is that the job is stuck at the SCC issue, so I never actually found out if these were also needed.
@Nikhil-Ladha are you still facing the issue? I quickly tested your yaml in Minikube ka multus-validation.yaml
role.rbac.authorization.k8s.io/rook-ceph-multus-validation created
rolebinding.rbac.authorization.k8s.io/rook-ceph-multus-validation created
serviceaccount/rook-ceph-multus-validation created
job.batch/rook-ceph-multus-validation created
kc get pods
NAME READY STATUS RESTARTS AGE
multus-validation-test-web-server 1/1 Running 0 10m
rook-ceph-multus-validation-7jngc 0/1 Error 0 10m
rook-ceph-multus-validation-95xpn 0/1 Error 0 10m
rook-ceph-multus-validation-bkzhz 0/1 Error 0 7m43s
rook-ceph-multus-validation-btcjn 0/1 Error 0 11m
rook-ceph-multus-validation-gg556 0/1 Error 0 10m
rook-ceph-multus-validation-t79sl 0/1 Error 0 10m
rook-ceph-multus-validation-vzc97 0/1 Error 0 10m
rook-ceph-operator-58665dfccf-ft5ng 1/1 Running 4 (2m4s ago) 11m
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (pr/12495) $ kc get pods rook-ceph-multus-validation-95xpn -f
error: flag needs an argument: 'f' in -f
See 'kubectl get --help' for usage.
~/go/src/github.com/rook/deploy/examples
srai@192 ~ (pr/12495) $ kc logs rook-ceph-multus-validation-95xpn -f
2023-07-19 14:55:59.452611 I | multus-validation: starting multus validation test with the following config:
namespace: rook-ceph
publicNetwork: <nad-name>
clusterNetwork: <nad-name>
daemonsPerNode: 16
resourceTimeout: 3m0s
nginxImage: nginxinc/nginx-unprivileged:stable-alpine
RESULT: multus validation test failed: failed to create validation test config object: failed to create validation test config object [{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:multus-validation-test-owner GenerateName: Namespace: SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ManagedFields:[]} Immutable:<nil> Data:map[] BinaryData:map[]}]: configmaps "multus-validation-test-owner" already exists
---
---
---
To clean up resources when you are done debugging: rook multus validation cleanup --namespace rook-ceph I don't see the rbac issue |
The issue is in an openshift cluster, I haven't tested in a Minikube cluster. |
@Nikhil-Ladha for openshift-based clusters we need to create |
Finally, able to run the job successfully on a Minkube cluster :)
If we want to configure the job for openshift clusters as well, then we would need to fix the SCC issue. |
At last the SCC issue is resolved, now I have added the extra SCC required for the job deployment on an openshift cluster. |
@BlaineEXE can you please take a look again? TIA :) |
# --- | ||
# scc for the Rook and Ceph daemons | ||
# kind: SecurityContextConstraints | ||
# apiVersion: security.openshift.io/v1 | ||
# metadata: | ||
# name: rook-ceph-multus-validation | ||
# allowPrivilegedContainer: true | ||
# allowHostDirVolumePlugin: true | ||
# allowHostPID: false | ||
# # set to true if running rook with host networking enabled | ||
# allowHostNetwork: true | ||
# # set to true if running rook with the provider as host | ||
# allowHostPorts: true | ||
# priority: | ||
# allowedCapabilities: ["MKNOD"] | ||
# allowHostIPC: true | ||
# readOnlyRootFilesystem: false | ||
# # drop all default privileges | ||
# requiredDropCapabilities: ["All"] | ||
# defaultAddCapabilities: [] | ||
# runAsUser: | ||
# type: RunAsAny | ||
# seLinuxContext: | ||
# type: RunAsAny | ||
# fsGroup: | ||
# type: RunAsAny | ||
# supplementalGroups: | ||
# type: RunAsAny | ||
# seccompProfiles: | ||
# - "*" | ||
# volumes: | ||
# - configMap | ||
# - emptyDir | ||
# - projected | ||
# users: | ||
# - system:serviceaccount:rook-ceph:rook-ceph-multus-validation # serviceaccount:namespace:cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's include this at the end.
value: DEBUG | ||
restartPolicy: Never | ||
--- | ||
# apiVersion: rbac.authorization.k8s.io/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add instruction for PSP
- "--public-network" | ||
- "<nad-name>" | ||
- "--cluster-network" | ||
- "<nad-name>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's comment these out by default so that anyone running this without reading it won't have the tool seem to run with environment/test failures, and instead it will be failures from not providing input.
- "--public-network" | |
- "<nad-name>" | |
- "--cluster-network" | |
- "<nad-name>" | |
# - --public-network=<NAD-NAME> # uncomment and replace NAD name if using public network | |
# - --cluster-network=<NAD-NAME> # uncomment and replace NAD name if using cluster network |
# TODO: Insert the NAD name for public network and cluster network | ||
# If you want to use any other flags along with the basic command, | ||
# add the `--help` flag in the end to see the list of flags available, and use accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO is more of a developer comment.
# TODO: Insert the NAD name for public network and cluster network | |
# If you want to use any other flags along with the basic command, | |
# add the `--help` flag in the end to see the list of flags available, and use accordingly. | |
# Insert the NAD name for public network and cluster network | |
# If you want to use any other flags along with the basic command, | |
# add the `--help` flag in the end to see the list of flags available, and use accordingly. |
args: | ||
- "multus" | ||
- "validation" | ||
- "run" | ||
- "--public-network" | ||
- "<nad-name>" | ||
- "--cluster-network" | ||
- "<nad-name>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another flag that is very commonly used is the --nginx-image
flag. Let's include that also (commented-out) with a quick comment note as suggested above.
# A job that runs the multus validation tool | ||
apiVersion: batch/v1 | ||
kind: Job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is the primary "thing" that users will be modifying, make sure this is the first yaml definition in this manifest file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, this would cause some delay (warnings) in the pod creation as the service account and roles used by the job won't be ready to use when the job is created. For example, check the output below after updating the yaml and moving the job definition to the top.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 46s (x2 over 46s) job-controller Error creating: pods "rook-ceph-multus-validation-" is forbidden: error looking up service account rook-ceph/rook-ceph-multus-validation: serviceaccount "rook-ceph-multus-validation" not found
Normal SuccessfulCreate 36s job-controller Created pod: rook-ceph-multus-validation-7924n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate, but I think you're right to leave where it is then. Users generally report errors like this as bugs. As long as the comment at the top of the file is the primary documentation place, users still can see the primary docs early in the file, which is good.
# TODO: Insert the NAD name for public network and cluster network | ||
# If you want to use any other flags along with the basic command, | ||
# add the `--help` flag in the end to see the list of flags available, and use accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, it seems best to me to move this doc to the header of the file. It will be easiest for users if there is one primary place to read about how to use the file, and that primary doc is at the very top. Any other comments in the file should be supplementary, to help users find "where" to make changes with brief reminders.
added a sample job manifest named `multus-validation` that validates the multus configuration in the cluster. Signed-off-by: Nikhil-Ladha <[email protected]>
multus: Add sample job manifest for multus config validation (backport #12495)
Description of your changes:
Added a sample job manifest named
multus-validation
that validates the multus configuration in the cluster.Which issue is resolved by this Pull Request:
Resolves #12172
Checklist:
skip-ci
on the PR.