The HyperConverged Cluster allows modifying the KubeVirt cluster configuration by editing the HyperConverged Cluster CR (Custom Resource).
The HyperConverged Cluster operator copies the cluster configuration values to the other operand's CRs.
The Hyperconverged Cluster Operator configures kubevirt and its supporting operators in an opinionated way and overwrites its operands when there is an unexpected change to them. Users are expected to not modify the operands directly. The HyperConverged custom resource is the source of truth for the configuration.
To make it more visible and clear for end users, the Hyperconverged Cluster Operator will count the number of these revert actions in a metric named kubevirt_hco_out_of_band_modifications_count. According to the value of that metric in the last 10 minutes, an alert named KubevirtHyperconvergedClusterOperatorCRModification will be eventually fired:
Labels
alertname=KubevirtHyperconvergedClusterOperatorCRModification
component_name=kubevirt-kubevirt-hyperconverged
severity=warning
The alert is supposed to resolve after 10 minutes if there isn't a manual intervention to operands in the last 10 minutes.
Note: The cluster configurations are supported only in API version v1beta1
or higher.
Some configurations are done separately to Infra and Workloads. The CR's Spec object contains the infra
and the
workloads
objects.
The structures of the infra
and the workloads
objects are the same. The HyperConverged Cluster operator will update
the other operator CRs, according to the specific CR structure. The meaning is if, for example, the other CR does not
support Infra cluster configurations, but only Workloads configurations, the HyperConverged Cluster operator will only
copy the Workloads configurations to this operator's CR.
Below are the cluster configuration details. Currently, only "Node Placement" configuration is supported.
Kubernetes lets the cluster admin influence node placement in several ways, see https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ for a general overview.
The HyperConverged Cluster's CR is the single entry point to let the cluster admin influence the placement of all the pods directly and indirectly managed by the HyperConverged Cluster Operator.
The nodePlacement
object is an optional field in the HyperConverged Cluster's CR, under spec.infra
and spec.workloads
fields.
Note: The HyperConverged Cluster operator does not allow modifying of the workloads' node placement configurations if there are already existing virtual machines or data volumes.
The nodePlacement
object contains the following fields:
nodeSelector
is the node selector applied to the relevant kind of pods. It specifies a map of key-value pairs: for the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). See https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector.affinity
enables pod affinity/anti-affinity placement expanding the types of constraints that can be expressed with nodeSelector. affinity is going to be applied to the relevant kind of pods in parallel with nodeSelector See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity.tolerations
is a list of tolerations applied to the relevant kind of pods. See https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ for more info.
The HyperConverged Cluster Operator and the operators for its component are supposed to be deployed by the Operator Lifecycle Manager (OLM). Thus, the HyperConverged Cluster Operator is not going to directly influence its own placement but that should be influenced by the OLM. The cluster admin indeed is allowed to influence the placement of the Pods directly created by the OLM configuring a nodeSelector or tolerations directly on the OLM subscription object.
- Place the infra resources on nodes labeled with "nodeType = infra", and workloads in nodes labeled with "nodeType = nested-virtualization", using node selector:
... spec: infra: nodePlacement: nodeSelector: nodeType: infra workloads: nodePlacement: nodeSelector: nodeType: nested-virtualization
- Place the infra resources on nodes labeled with "nodeType = infra", and workloads in nodes labeled with
"nodeType = nested-virtualization", preferring nodes with more than 8 CPUs, using affinity:
... spec: infra: nodePlacement: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodeType operator: In values: - infra workloads: nodePlacement: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodeType operator: In values: - nested-virtualization preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: my-cloud.io/num-cpus operator: gt values: - 8
- In this example, there are several nodes that are saved for KubeVirt resources (e.g. VMs), already set with the
key=kubevirt:NoSchedule
taint. This taint will prevent any scheduling to these nodes, except for pods with the matching tolerations.... spec: workloads: nodePlacement: tolerations: - key: "key" operator: "Equal" value: "kubevirt" effect: "NoSchedule"
The featureGates
field is an optional set of optional boolean feature enabler. The features in this list are advanced
or new features that are not enabled by default.
To enable a feature, add its name to the featureGates
list and set it to true
. Missing or false
feature gates
disables the feature.
Set the withHostPassthroughCPU
feature gate in order to allow migrating a virtual machine with CPU host-passthrough
mode. This can provide slightly better CPU performance, but should be enabled only when the Cluster is homogeneous from
CPU HW perspective.
Default: false
Additional information: LibvirtXMLCPUModel
Set the sriovLiveMigration
feature gate in order to allow migrating a virtual machine with SRIOV interfaces.
When enabled virt-launcher pods of virtual machines with SRIOV interfaces run with CAP_SYS_RESOURCE capability.
This may degrade virt-launcher security.
Default: false
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
spec:
infra: {}
workloads: {}
featureGates:
withHostPassthroughCPU: true
sriovLiveMigration: true
Set the live migration configurations by modifying the fields in the liveMigrationConfig
under the spec
field
Bandwidth limit of each migration, in MiB/s. The format is a number and with the Mi
suffix, e.g. 64Mi
.
default: 64Mi
The migration will be canceled if it has not completed in this time, in seconds per GiB of memory. For example, a virtual machine instance with 6GiB memory will timeout if it has not completed migration in 4800 seconds. If the Migration Method is BlockMigration, the size of the migrating disks is included in the calculation. The format is a number.
default: 800
Number of migrations running in parallel in the cluster. The format is a number.
default: 5
Maximum number of outbound migrations per node. The format is a number.
default: 2
The migration will be canceled if memory copy fails to make progress in this time, in seconds. The format is a number.
default: 150
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
spec:
liveMigrationConfig:
bandwidthPerMigration: 64Mi
completionTimeoutPerGiB: 800
parallelMigrationsPerCluster: 5
parallelOutboundMigrationsPerNode: 2
progressTimeout: 150
Administrators can control which host devices are exposed and permitted to be used in the cluster. Permitted host
devices in the cluster will need to be allowlisted in KubeVirt CR by its vendor:product
selector for PCI devices or
mediated device names. Use the permittedHostDevices
field in order to manage the permitted host devices.
The permittedHostDevices
field is an optional field under the HyperConverged spec
field.
The permittedHostDevices
field contains two optional arrays: the pciHostDevices
and the mediatedDevices
array.
HCO propagates these arrays as is to the KubeVirt custom resource; i.e. no merge is done, but a replacement.
The pciHostDevices
array is an array of PciHostDevice
objects. The fields of this object are:
-
pciDeviceSelector
- a combination of avendor_id:product_id
required to identify a PCI device on a host.This identifier 10de:1eb8 can be found using
lspci
; for example:lspci -nnv | grep -i nvidia
-
resourceName
- name by which a device is advertised and being requested. -
externalResourceProvider
- indicates that this resource is being provided by an external device plugin.KubeVirt in this case will only permit the usage of this device in the cluster but will leave the allocation and monitoring to an external device plugin.
default:
false
-
disabled
- set totrue
to disable default host devices, because these device cannot be removed
The mediatedDevices
array is an array of MediatedDevice
objects. The fields of this object are:
-
mdevNameSelector
- name of a mediated device type required to identify a mediated device on a host.For example: mdev type nvidia-226 represents GRID T4-2A.
The selector is matched against the content of
/sys/class/mdev_bus/$mdevUUID/mdev_type/name
. -
resourceName
- name by which a device is advertised and being requested. -
externalResourceProvider
- indicates that this resource is being provided by an external device plugin.KubeVirt in this case will only permit the usage of this device in the cluster but will leave the allocation and monitoring to an external device plugin.
default:
false
HCO enforces the existence of two PCI Host Devices in the list:
pciHostDevices:
- pciDeviceSelector: "10DE:1DB6"
resourceName: "nvidia.com/GV100GL_Tesla_V100",
- pciDeviceSelector: "10DE:1EB8",
resourceName: "nvidia.com/TU104GL_Tesla_T4",
It is possible to add more devices, but these two devices must be in the list. If you need to remove them, use the
disabled
filed instead of deleting them.
In this example, we're adding a new device in addition to two default ones, and disabling the default " nvidia.com/TU104GL_Tesla_T4" device:
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
spec:
permittedHostDevices:
pciHostDevices:
- pciDeviceSelector: "10DE:1DB6"
resourceName: "nvidia.com/GV100GL_Tesla_V100",
- pciDeviceSelector: "10DE:1EB8"
resourceName: "nvidia.com/TU104GL_Tesla_T4"
disabled: true
- pciDeviceSelector: "8086:6F54"
resourceName: "intel.com/qat"
externalResourceProvider: true
mediatedDevices:
- mdevNameSelector: "GRID T4-1Q"
resourceName: "nvidia.com/GRID_T4-1Q"
Administrators can Override the storage class used for scratch space during transfer operations by setting the
scratchSpaceStorageClass
field under the HyperConverged spec
field.
The scratch space storage class is determined in the following order:
value of scratchSpaceStorageClass, if that doesn't exist, use the default storage class, if there is no default storage class, use the storage class of the DataVolume, if no storage class specified, use no storage class for scratch space
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
spec:
scratchSpaceStorageClass: aStorageClassName
The administrator can limit storage workloads resources and to require minimal resources. Use the resourceRequirements
field under the HyperConverged spec
filed. Add the storageWorkloads
field under the resourceRequirements
. The
content of the storageWorkloads
field is
the standard kubernetes resource configuration
.
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
spec:
resourceRequirements:
storageWorkloads:
limits:
cpu: "500m"
memory: "2Gi"
requests:
cpu: "250m"
memory: "1Gi"
You can schedule a virtual machine (VM) on a node where the CPU model and policy attribute of the VM are compatible with the CPU models and policy attributes that the node supports. By specifying a list of obsolete CPU models in a the HyperConverged custom resource, you can exclude them from the list of labels created for CPU models.
Through the process of iteration, the list of base CPU features in the minimum CPU model are eliminated from the list of
labels generated for the node. For example, an environment might have two supported CPU models: Penryn
and Haswell
.
Use the spec.obsoleteCPUs
to configure the CPU plugin. Add the obsolete CPU list under spec.obsoleteCPUs.cpuModels
,
and the minCPUModel as the value of spec.obsoleteCPUs.minCPUModel
.
The default value for the spec.obsoleteCPUs.minCPUModel
field in KubeVirt is "Penryn"
, but it won't be visible if
missing in the CR. The default value for the spec.obsoleteCPUs.cpuModels
field is hardcoded predefined list and is not
visible. You can add new CPU models to the list, but can't remove CPU models from the predefined list. The predefined list
is not visible in the HyperConverged CR.
The hard-coded predefined list of obsolete CPU modes is:
486
pentium
pentium2
pentium3
pentiumpro
coreduo
n270
core2duo
Conroe
athlon
phenom
qemu64
qemu32
kvm64
kvm32
You don't need to add a CPU model to the spec.obsoleteCPUs.cpuModels
field if it is in this list.
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
spec:
obsoleteCPUs:
cpuModels:
- "486"
- "pentium"
- "pentium2"
- "pentium3"
- "pentiumpro"
minCPUModel: "Penryn"
us the HyperConverged spec.workloadUpdateStrategy
object to define how to handle automated workload updates at the cluster
level.
The workloadUpdateStrategy
fields are:
-
batchEvictionInterval
- BatchEvictionInterval Represents the interval to wait before issuing the next batch of shutdowns.The Default value is
1m
-
batchEvictionSize
- Represents the number of VMIs that can be forced updated per the BatchShutdownInteral intervalThe default value is
10
-
workloadUpdateMethods
- defines the methods that can be used to disrupt workloads during automated workload updates.When multiple methods are present, the least disruptive method takes precedence over more disruptive methods. For example if both LiveMigrate and Shutdown methods are listed, only VMs which are not live migratable will be restarted/shutdown.
An empty list defaults to no automated workload updating.
The default values are
LiveMigrate
andEvict
.
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
spec:
workloadUpdateStrategy:
workloadUpdateMethods:
- LiveMigrate
- Evict
batchEvictionSize: 10
batchEvictionInterval: "1m"
If there is a need to import data images from an insecure registry, these registries should be added to the
insecureRegistries
field under the storageImport
in the HyperConverged
's spec
field.
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
spec:
storageImport:
insecureRegistries:
- "private-registry-example-1:5000"
- "private-registry-example-2:5000"
...
In addition to featureGates
field in HyperConverged CR's spec, the user can set annotations in the HyperConverged CR
to unfold more configuration options.
Warning: Annotations are less formal means of cluster configuration and may be dropped without the same deprecation
process of a regular API, such as in the spec
section.
Starting from HCO version 1.3.0, OvS CNI support is disabled by default on new installations.
In order to enable the deployment of OvS CNI DaemonSet on all workload nodes, an annotation of deployOVS: true
must
be set on HyperConverged CR.
It can be set while creating the HyperConverged custom resource during the initial deployment, or during run time.
- To enable OvS CNI on the cluster, the HyperConverged CR should be similar to:
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
annotations:
deployOVS: "true"
...
- OvS CNI can also be enabled during run time of HCO, by annotating its CR:
kubectl annotate HyperConverged kubevirt-hyperconverged -n kubevirt-hyperconverged deployOVS=true --overwrite
If HCO was upgraded to 1.3.0 from a previous version, the annotation will be added as true
and OvS will be deployed.
Subsequent upgrades to newer versions will preserve the state from previous version, i.e. OvS will be deployed in the upgraded version if and only if it was deployed in the previous one.
HCO enables users to modify the operand CRs directly using jsonpatch annotations in HyperConverged CR.
Modifications done to CRs using jsonpatch annotations won't be reconciled back by HCO to the opinionated defaults.
The following annotations are supported in the HyperConverged CR:
kubevirt.kubevirt.io/jsonpatch
- for KubeVirt configurationscontainerizeddataimporter.kubevirt.io/jsonpatch
- for CDI configurationsnetworkaddonsconfig.kubevirt.io/jsonpatch
- for CNAO configurations
The content of the annotation will be a json array of patch objects, as defined in RFC6902.
- The user wants to set the KubeVirt CR’s
spec.configuration.migrations.allowPostCopy
field totrue
. In order to do that, the following annotation should be added to the HyperConverged CR:
metadata:
annotations:
kubevirt.kubevirt.io/jsonpatch: |-
[
{
"op": "add",
"path": "/spec/configuration/migrations",
"value": {"allowPostCopy": true}
}
]
From CLI it will be:
$ kubectl annotate --overwrite -n kubevirt-hyperconverged hco kubevirt-hyperconverged \
kubevirt.kubevirt.io/jsonpatch='[{"op": "add", \
"path": "/spec/configuration/migrations", \
"value": {"allowPostCopy": true} }]'
hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged annotated
$ kubectl get kubevirt -n kubevirt-hyperconverged kubevirt-kubevirt-hyperconverged -o json \
| jq '.spec.configuration.migrations.allowPostCopy'
true
$ kubectl annotate --overwrite -n kubevirt-hyperconverged hco kubevirt-hyperconverged \
kubevirt.kubevirt.io/jsonpatch='[{"op": "add", \
"path": "/spec/configuration/migrations", \
"value": {"allowPostCopy": false} }]'
hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged annotated
$ kubectl get kubevirt -n kubevirt-hyperconverged kubevirt-kubevirt-hyperconverged -o json \
| jq '.spec.configuration.migrations.allowPostCopy'
false
$ kubectl get hco -n kubevirt-hyperconverged kubevirt-hyperconverged -o json \
| jq '.status.conditions[] | select(.type == "TaintedConfiguration")'
{
"lastHeartbeatTime": "2021-03-24T17:25:49Z",
"lastTransitionTime": "2021-03-24T11:33:11Z",
"message": "Unsupported feature was activated via an HCO annotation",
"reason": "UnsupportedFeatureAnnotation",
"status": "True",
"type": "TaintedConfiguration"
}
- The user wants to configure kubevirt pods to log more verbosely
metadata:
annotations:
kubevirt.kubevirt.io/jsonpatch: |-
[
{
"op":"add",
"path":"/spec/configuration/developerConfiguration/logVerbosity",
"value":{
"virtAPI":4,
"virtController":4,
"virtHandler":4,
"virtLauncher":4,
"virtOperator":4
}
}
]
From CLI it will be:
$ patch hco kubevirt-hyperconverged --type=merge -p='{"metadata":{"annotations":{"kubevirt.kubevirt.io/jsonpatch":"[{\"op\":\"add\",\"path\":\"/spec/configuration/developerConfiguration/logVerbosity\",\"value\":{\"virtAPI\":4,\"virtController\":4,\"virtHandler\":4,\"virtLauncher\":4,\"virtOperator\":4}}]"}}}'
- The user wants to forcefully customize virt-handler configuration by setting custom values under
/spec/customizeComponents/patches
or the KV CR. In order to do that, the following annotation should be added to the HyperConverged CR:
metadata:
annotations:
kubevirt.kubevirt.io/jsonpatch: |-
[
{
"op": "add",
"path": "/spec/customizeComponents/patches",
"value": [{
"patch": "[{\"op\":\"add\",\"path\":\"/spec/template/spec/containers/0/command/-\",\"value\":\"--max-devices=250\"}]",
"resourceName": "virt-handler",
"resourceType": "Daemonset",
"type": "json"
}]
}
]
- The user wants to override the default URL used when uploading to a DataVolume, by setting the CDI CR's
spec.config.uploadProxyURLOverride
tomyproxy.example.com
. In order to do that, the following annotation should be added to the HyperConverged CR:
metadata:
annotations:
containerizeddataimporter.kubevirt.io/jsonpatch: |-
[
{
"op": "add",
"path": "/spec/config/uploadProxyURLOverride",
"value": "myproxy.example.com"
}
]
Note: The full configurations options for Kubevirt, CDI and CNAO which are available on the cluster, can be explored by using kubectl explain <resource name>.spec
. For example:
$ kubectl explain kv.spec
KIND: KubeVirt
VERSION: kubevirt.io/v1
RESOURCE: spec <Object>
DESCRIPTION:
<empty>
FIELDS:
certificateRotateStrategy <Object>
configuration <Object>
holds kubevirt configurations. same as the virt-configMap
customizeComponents <Object>
imagePullPolicy <string>
The ImagePullPolicy to use.
imageRegistry <string>
The image registry to pull the container images from Defaults to the same
registry the operator's container image is pulled from.
<truncated>
To inspect lower-level objects onder spec
, they can be specified in kubectl explain
, recursively. e.g.
$ kubectl explain kv.spec.configuration.network
KIND: KubeVirt
VERSION: kubevirt.io/v1
RESOURCE: network <Object>
DESCRIPTION:
NetworkConfiguration holds network options
FIELDS:
defaultNetworkInterface <string>
permitBridgeInterfaceOnPodNetwork <boolean>
permitSlirpInterface <boolean>
- To explore kubevirt configuration options, use
kubectl explain kv.spec
- To explore CDI configuration options, use
kubectl explain cdi.spec
- To explore CNAO configuration options, use
kubectl explain networkaddonsconfig.spec
Using the jsonpatch annotation feature incorrectly might lead to unexpected results and could potentially render the Kubevirt-Hyperconverged system unstable.
The jsonpatch annotation feature is particularly dangerous when upgrading Kubevirt-Hyperconverged, as the structure or the semantics of the underlying components' CR might be changed. Please remove any jsonpatch annotation usage prior the upgrade, to avoid any potential issues.
USE WITH CAUTION!