Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multus: Add sample job manifest for multus config validation #12495

Merged
merged 1 commit into from
Aug 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Documentation/CRDs/Cluster/ceph-cluster-crd.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,12 +294,16 @@ installing any Custom Resources, run the tool from the operator pod.
The tool's CLI is designed to be as helpful as possible. Get help text for the multus validation
tool like so:
```console
kubectl --namespace rook-ceph exec -it deploy/rook-ceph-operator -- rook ctl multus validation run --help
kubectl --namespace rook-ceph exec -it deploy/rook-ceph-operator -- rook multus validation run --help
Nikhil-Ladha marked this conversation as resolved.
Show resolved Hide resolved
```

Then, update the args in the [multus-validation](https://github.com/rook/rook/blob/master/deploy/examples/multus-validation.yaml) job template. Minimally, add the NAD names(s) for public and/or cluster as needed and and then, create the job to validate the Multus configuration.

If the tool fails, it will suggest what things may be preventing Multus networks from working
properly, and it will request the logs and outputs that will help debug issues.

Check the logs of the pod created by the job to know the status of the validation test.

##### Known limitations with Multus

Daemons leveraging Kubernetes service IPs (Monitors, Managers, Rados Gateways) are not listening on the NAD specified in the `selectors`.
Expand Down
140 changes: 140 additions & 0 deletions deploy/examples/multus-validation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
#################################################################################################################
# We highly recommend validating your Multus configuration before you install Rook.
# This job aims to automate that operation by using the validation tool. Run this job after
# installing the rook operator and before installing any Custom Resources.
# Insert the NAD name for public network and cluster network in the Job definition below.
# If you want to use any other flags along with the basic command in the Job,
# add the `--help` flag in the end to see the list of flags available, and use accordingly.
#################################################################################################################
---
# Service account for job that validates multus configuration
apiVersion: v1
kind: ServiceAccount
metadata:
name: rook-ceph-multus-validation
namespace: rook-ceph # namespace:cluster
# imagePullSecrets:
# - name: my-registry-secret
---
# Aspects of multus validation job that require access to the operator/cluster namespace
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rook-ceph-multus-validation
namespace: rook-ceph # namespace:cluster
rules:
- apiGroups: [""]
resources: ["configmaps", "configmaps/finalizers", "pods"]
verbs: ["get", "list", "create", "update", "delete"]
- apiGroups: ["apps"]
resources: ["daemonsets"]
verbs: ["list", "create", "delete", "deletecollection"]
- apiGroups: ["k8s.cni.cncf.io"]
resources: ["network-attachment-definitions"]
verbs: ["get"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "delete"]
---
# Allow the multus validation job to run in this namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rook-ceph-multus-validation
namespace: rook-ceph # namespace:cluster
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rook-ceph-multus-validation
subjects:
- kind: ServiceAccount
name: rook-ceph-multus-validation
namespace: rook-ceph # namespace:cluster
---
# A job that runs the multus validation tool
apiVersion: batch/v1
kind: Job
Comment on lines +54 to +56
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is the primary "thing" that users will be modifying, make sure this is the first yaml definition in this manifest file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, this would cause some delay (warnings) in the pod creation as the service account and roles used by the job won't be ready to use when the job is created. For example, check the output below after updating the yaml and moving the job definition to the top.

Events:
  Type     Reason            Age                From            Message
  ----     ------            ----               ----            -------
  Warning  FailedCreate      46s (x2 over 46s)  job-controller  Error creating: pods "rook-ceph-multus-validation-" is forbidden: error looking up service account rook-ceph/rook-ceph-multus-validation: serviceaccount "rook-ceph-multus-validation" not found
  Normal   SuccessfulCreate  36s                job-controller  Created pod: rook-ceph-multus-validation-7924n

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unfortunate, but I think you're right to leave where it is then. Users generally report errors like this as bugs. As long as the comment at the top of the file is the primary documentation place, users still can see the primary docs early in the file, which is good.

metadata:
name: rook-ceph-multus-validation
namespace: rook-ceph # namespace:cluster
labels:
app: rook-ceph-multus-validation
spec:
template:
metadata:
labels:
app: rook-ceph-multus-validation
spec:
serviceAccountName: rook-ceph-multus-validation
containers:
- name: multus-validation
image: rook/ceph:master
command: ["rook"]
args:
- "multus"
- "validation"
- "run"
# - "--public-network=<NAD-NAME>" # uncomment and replace NAD name if using public network
# - "--cluster-network=<NAD-NAME>" # uncomment and replace NAD name if using cluster network
# - "--nginx-image=<IMAGE>" # uncomment and replace IMAGE with the nginx image you want use for the validation server and clients
# - "--daemons-per-node=<COUNT>" # uncomment and replace COUNT with the maximum number of daemons that should be running on each node during validation
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: ROOK_LOG_LEVEL
value: DEBUG
restartPolicy: Never
---
# This Pod Security Policy (PSP) allows the job to run in Kubernetes environments using PSPs
# apiVersion: rbac.authorization.k8s.io/v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add instruction for PSP

# kind: RoleBinding
# metadata:
# name: rook-ceph-multus-validation-psp
# namespace: rook-ceph # namespace:cluster
# roleRef:
# apiGroup: rbac.authorization.k8s.io
# kind: ClusterRole
# name: psp:rook
# subjects:
# - kind: ServiceAccount
# name: rook-ceph-multus-validation
# namespace: rook-ceph # namespace:cluster
# ---
# SecurityContextConstraints(SCC) for the Rook and Ceph daemons
# kind: SecurityContextConstraints
# apiVersion: security.openshift.io/v1
# metadata:
# name: rook-ceph-multus-validation
# allowPrivilegedContainer: true
# allowHostDirVolumePlugin: true
# allowHostPID: false
# # set to true if running rook with host networking enabled
# allowHostNetwork: true
# # set to true if running rook with the provider as host
# allowHostPorts: true
# priority:
# allowedCapabilities: ["MKNOD"]
# allowHostIPC: true
# readOnlyRootFilesystem: false
# # drop all default privileges
# requiredDropCapabilities: ["All"]
# defaultAddCapabilities: []
# runAsUser:
# type: RunAsAny
# seLinuxContext:
# type: RunAsAny
# fsGroup:
# type: RunAsAny
# supplementalGroups:
# type: RunAsAny
# seccompProfiles:
# - "*"
# volumes:
# - configMap
# - emptyDir
# - projected
# users:
# - system:serviceaccount:rook-ceph:rook-ceph-multus-validation # serviceaccount:namespace:cluster
---
Loading