title | weight | description |
---|---|---|
Enforcement |
5 |
Configure policies and restrictions on tenant-basis
|
The cluster admin can "taint" the namespaces created by tenant owners with additional metadata as labels and annotations. There is no specific semantic assigned to these labels and annotations: they will be assigned to the namespaces in the tenant as they are created. This can help the cluster admin to implement specific use cases as, for example, leave only a given tenant to be backed up by a backup service.
Assigns additional labels and annotations to all namespaces created in the solar
tenant:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: oil
spec:
owners:
- name: alice
kind: User
namespaceOptions:
additionalMetadata:
annotations:
storagelocationtype: s3
labels:
projectcapsule.dev/backup: "true"
When the tenant owner creates a namespace, it inherits the given label and/or annotation:
apiVersion: v1
kind: Namespace
metadata:
annotations:
storagelocationtype: s3
labels:
capsule.clastix.io/tenant: solar
kubernetes.io/metadata.name: solar-production
name: solar-production
projectcapsule.dev/backup: "true"
name: solar-production
ownerReferences:
- apiVersion: capsule.clastix.io/v1beta2
blockOwnerDeletion: true
controller: true
kind: Tenant
name: solar
spec:
finalizers:
- kubernetes
status:
phase: Active
By default, capsule allows tenant owners to add and modify any label or annotation on their namespaces.
But there are some scenarios, when tenant owners should not have an ability to add or modify specific labels or annotations (for example, this can be labels used in Kubernetes network policies which are added by cluster administrator).
Bill, the cluster admin, can deny Alice to add specific labels and annotations on namespaces:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
namespaceOptions:
forbiddenAnnotations:
denied:
- foo.acme.net
- bar.acme.net
deniedRegex: .*.acme.net
forbiddenLabels:
denied:
- foo.acme.net
- bar.acme.net
deniedRegex: .*.acme.net
owners:
- name: alice
kind: User
Due to CVE-2021-25735 this feature is only supported for Kubernetes version older than: v1.18.18, v1.19.10, v1.20.6, v1.21.0
When using capsule together with capsule-proxy, Bill can allow Tenant Owners to modify Nodes.
By default, it will allow tenant owners to add and modify any label or annotation on their nodes.
But there are some scenarios, when tenant owners should not have an ability to add or modify specific labels or annotations (there are some types of labels or annotations, which must be protected from modifications - for example, which are set by cloud-providers or autoscalers).
Bill, the cluster admin, can deny Tenant Owners to add or modify specific labels and annotations on Nodes:
apiVersion: capsule.clastix.io/v1beta2
kind: CapsuleConfiguration
metadata:
name: default
spec:
nodeMetadata:
forbiddenAnnotations:
denied:
- foo.acme.net
- bar.acme.net
deniedRegex: .*.acme.net
forbiddenLabels:
denied:
- foo.acme.net
- bar.acme.net
deniedRegex: .*.acme.net
userGroups:
- projectcapsule.dev
- system:serviceaccounts:default
The cluster admin can "taint" the services created by the tenant owners with additional metadata as labels and annotations.
Assigns additional labels and annotations to all services created in the solar
tenant:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
serviceOptions:
additionalMetadata:
annotations:
storagelocationtype: s3
labels:
projectcapsule.dev/backup: "true"
When the tenant owner creates a service in a tenant namespace, it inherits the given label and/or annotation:
apiVersion: v1
kind: Service
metadata:
name: nginx
namespace: solar-production
labels:
projectcapsule.dev/backup: "true"
annotations:
storagelocationtype: s3
spec:
ports:
- protocol: TCP
port: 80
targetPort: 8080
selector:
run: nginx
type: ClusterIP
The cluster admin can "taint" the pods created by the tenant owners with additional metadata as labels and annotations.
Assigns additional labels and annotations to all services created in the solar
tenant:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
podOptions:
additionalMetadata:
annotations:
storagelocationtype: s3
labels:
projectcapsule.dev/backup: "true"
When the tenant owner creates a service in a tenant namespace, it inherits the given label and/or annotation:
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: solar-production
labels:
projectcapsule.dev/backup: "true"
annotations:
storagelocationtype: s3
...
This feature will be deprecated in a future release of Capsule. Instead use TenantReplications
Bill, the cluster admin, can also set Limit Ranges for each namespace in Alice's tenant by defining limits for pods and containers in the tenant spec:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
...
limitRanges:
items:
- limits:
- type: Pod
min:
cpu: "50m"
memory: "5Mi"
max:
cpu: "1"
memory: "1Gi"
- limits:
- type: Container
defaultRequest:
cpu: "100m"
memory: "10Mi"
default:
cpu: "200m"
memory: "100Mi"
min:
cpu: "50m"
memory: "5Mi"
max:
cpu: "1"
memory: "1Gi"
- limits:
- type: PersistentVolumeClaim
min:
storage: "1Gi"
max:
storage: "10Gi"
Limits will be inherited by all the namespaces created by Alice. In our case, when Alice creates the namespace solar-production
, Capsule creates the following:
apiVersion: v1
kind: LimitRange
metadata:
name: capsule-solar-0
namespace: solar-production
spec:
limits:
- max:
cpu: "1"
memory: 1Gi
min:
cpu: 50m
memory: 5Mi
type: Pod
---
apiVersion: v1
kind: LimitRange
metadata:
name: capsule-solar-1
namespace: solar-production
spec:
limits:
- default:
cpu: 200m
memory: 100Mi
defaultRequest:
cpu: 100m
memory: 10Mi
max:
cpu: "1"
memory: 1Gi
min:
cpu: 50m
memory: 5Mi
type: Container
---
apiVersion: v1
kind: LimitRange
metadata:
name: capsule-solar-2
namespace: solar-production
spec:
limits:
- max:
storage: 10Gi
min:
storage: 1Gi
type: PersistentVolumeClaim
Note: being the limit range specific of single resources, there is no aggregate to count.
Alice doesn't have permission to change or delete the resources according to the assigned RBAC profile.
kubectl -n solar-production auth can-i patch resourcequota
no
kubectl -n solar-production auth can-i delete resourcequota
no
kubectl -n solar-production auth can-i patch limitranges
no
kubectl -n solar-production auth can-i delete limitranges
no
In the future Cluster-Administrators must distribute LimitRanges via TenantReplications. This is a more flexible and powerful way to distribute LimitRanges, as it allows to distribute any kind of resource, not only LimitRanges. Here's an example of how to distribute a LimitRange to all the namespaces of a tenant:
apiVersion: capsule.clastix.io/v1beta2
kind: TenantResource
metadata:
name: solar-limitranges
namespace: solar-system
spec:
resyncPeriod: 60s
resources:
- namespaceSelector:
matchLabels:
capsule.clastix.io/tenant: solar
rawItems:
- apiVersion: v1
kind: LimitRange
metadata:
name: cpu-resource-constraint
spec:
limits:
- default: # this section defines default limits
cpu: 500m
defaultRequest: # this section defines default requests
cpu: 500m
max: # max and min define the limit range
cpu: "1"
min:
cpu: 100m
type: Container
Pods can have priority. Priority indicates the importance of a Pod relative to other Pods. If a Pod cannot be scheduled, the scheduler tries to preempt (evict) lower priority Pods to make scheduling of the pending Pod possible. See Kubernetes documentation.
In a multi-tenant cluster, not all users can be trusted, as a tenant owner could create Pods at the highest possible priorities, causing other Pods to be evicted/not get scheduled.
To prevent misuses of Pod Priority Class, Bill, the cluster admin, can enforce the allowed Pod Priority Class at tenant level:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: oil
spec:
owners:
- name: alice
kind: User
priorityClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a Pod resource if spec.priorityClassName
equals to:
- Any PriorityClass which has the label
env
with the valueproduction
If a Pod is going to use a non-allowed Priority Class, it will be rejected by the Validation Webhook enforcing it.
Note: This feature supports type PriorityClass only on API version scheduling.k8s.io/v1
This feature allows specifying a custom default value on a Tenant basis, bypassing the global cluster default (globalDefault=true) that acts only at the cluster level.
It's possible to assign each tenant a PriorityClass which will be used, if no PriorityClass is set on pod basis:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
priorityClasses:
default: "tenant-default"
matchLabels:
env: "production"
Let's create a PriorityClass which is used as the default:
kubectl apply -f - << EOF
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: tenant-default
labels:
env: "production"
value: 1313
preemptionPolicy: Never
globalDefault: false
description: "This is the default PriorityClass for the solar-tenant"
EOF
Note the globalDefault: false
which is important to avoid the PriorityClass to be used as the default for all the tenants. If a Pod has no value for spec.priorityClassName
, the default value for PriorityClass (tenant-default
) will be used.
Pods can be assigned different runtime classes. With the assigned runtime you can control Container Runtime Interface (CRI) is used for each pod. See Kubernetes documentation for more information.
To prevent misuses of Pod Runtime Classes, Bill, the cluster admin, can enforce the allowed Pod Runtime Class at tenant level:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
runtimeClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a Pod resource if spec.runtimeClassName
equals to:
- any RuntimeClass which has the label
env
with the valueproduction
If a Pod is going to use a non-allowed Runtime Class, it will be rejected by the Validation Webhook enforcing it.
Bill, the cluster admin, can dedicate a pool of worker nodes to the oil tenant, to isolate the tenant applications from other noisy neighbors.
These nodes are labeled by Bill as pool=renewable
kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
...
worker06.acme.com Ready worker 8d v1.25.2 pool=renewable
worker07.acme.com Ready worker 8d v1.25.2 pool=renewable
worker08.acme.com Ready worker 8d v1.25.2 pool=renewable
This approach requires
PodNodeSelector
Admission Controller plugin to be active. If the plugin is not active, the pods will be scheduled to any node. If your distribution does not support this feature, you can use Expression Node Selectors.
The label pool=renewable
is defined as .spec.nodeSelector
in the tenant manifest:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
nodeSelector:
pool: renewable
kubernetes.io/os: linux
The Capsule controller makes sure that any namespace created in the tenant has the annotation: scheduler.alpha.kubernetes.io/node-selector: pool=renewable
. This annotation tells the scheduler of Kubernetes to assign the node selector pool=renewable
to all the pods deployed in the tenant. The effect is that all the pods deployed by Alice are placed only on the designated pool of nodes.
Multiple node selector labels can be defined as in the following snippet:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
nodeSelector:
pool: renewable
kubernetes.io/os: linux
kubernetes.io/arch: amd64
hardware: gpu
Any attempt of Alice to change the selector on the pods will result in an error from the PodNodeSelector Admission Controller plugin.
kubectl auth can-i edit ns -n solar-production
no
Feature TBD
Bill, the cluster admin, can prevent the creation of services with specific service types.
When dealing with a shared multi-tenant scenario, multiple NodePort services can start becoming cumbersome to manage. The reason behind this could be related to the overlapping needs by the Tenant owners, since a NodePort is going to be open on all nodes and, when using hostNetwork=true
, accessible to any Pod although any specific NetworkPolicy
.
Bill, the cluster admin, can block the creation of services with NodePort service type for a given tenant
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
serviceOptions:
allowedServices:
nodePort: false
With the above configuration, any attempt of Alice to create a Service of type NodePort
is denied by the Validation Webhook enforcing it. Default value is true
.
Service with the type of ExternalName has been found subject to many security issues. To prevent tenant owners to create services with the type of ExternalName, the cluster admin can prevent a tenant to create them:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
serviceOptions:
allowedServices:
externalName: false
With the above configuration, any attempt of Alice to create a Service of type externalName
is denied by the Validation Webhook enforcing it. Default value is true
.
Same as previously, the Service of type of LoadBalancer could be blocked for various reasons. To prevent tenant owners to create these kinds of services, the cluster admin can prevent a tenant to create them:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
serviceOptions:
allowedServices:
loadBalancer: false
With the above configuration, any attempt of Alice to create a Service of type LoadBalancer
is denied by the Validation Webhook enforcing it. Default value is true
.
Note: This feature is offered only by API type
GatewayClass
in groupgateway.networking.k8s.io
versionv1
.
GatewayClass is cluster-scoped resource defined by the infrastructure provider. This resource represents a class of Gateways that can be instantiated. Read More
Bill can assign a set of dedicated GatewayClasses to the solar
tenant to force the applications in the solar
tenant to be published only by the assigned Gateway Controller:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
gatewayOptions:
allowedClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a Gateway resource if spec.gatewayClassName
equals to:
- Any
GatewayClass
which has the labelenv
with the valueproduction
If an Gateway
is going to use a non-allowed GatewayClass
, it will be rejected by the Validation Webhook enforcing it.
Alice can create an Gateway
using only an allowed GatewayClass
:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: example-gateway
namespace: solar-production
spec:
gatewayClassName: customer-class
listeners:
- name: http
protocol: HTTP
port: 80
Any attempt of Alice to use a non-valid GatewayClass
, or missing it, is denied by the Validation Webhook enforcing it.
Note: The Default GatewayClass must have a label which is allowed within the tenant. This behavior is only implemented this way for the GatewayClass default.
This feature allows specifying a custom default value on a Tenant basis. Currently there is no global default feature for a GatewayClass. Each Gateway must have a spec.gatewayClassName
set.
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
gatewayOptions:
allowedClasses:
default: "tenant-default"
matchLabels:
env: "production"
Here's how the Tenant default GatewayClass
could look like:
kubectl apply -f - << EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: tenant-default
labels:
env: "production"
spec:
controllerName: example.com/gateway-controller
EOF
If a Gateway
has no value for spec.gatewayClassName
, the tenant-default
GatewayClass
is automatically applied to the Gateway
resource.
Bill can control ingress hostnames in the solar
tenant to force the applications to be published only using the given hostname or set of hostnames:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
ingressOptions:
allowedHostnames:
allowed:
- solar.acmecorp.com
allowedRegex: ^.*acmecorp.com$
The Capsule controller assures that all Ingresses created in the tenant can use only one of the valid hostnames. Alice can create an Ingress using any allowed hostname:
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx
namespace: solar-production
spec:
ingressClassName: solar
rules:
- host: web.solar.acmecorp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx
port:
number: 80
EOF
Any attempt of Alice to use a non-valid hostname is denied by the Validation Webhook enforcing it.
In a multi-tenant environment, as more and more ingresses are defined, there is a chance of collision on the hostname leading to unpredictable behavior of the Ingress Controller. Bill, the cluster admin, can enforce hostname collision detection at different scope levels:
- Cluster
- Tenant
- Namespace
- Disabled (default)
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
- name: joe
kind: User
ingressOptions:
hostnameCollisionScope: Tenant
When a tenant owner creates an Ingress resource, Capsule will check the collision of hostname in the current ingress with all the hostnames already used, depending on the defined scope.
For example, Alice, one of the tenant owners, creates an Ingress:
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx
namespace: solar-production
spec:
rules:
- host: web.solar.acmecorp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx
port:
number: 80
EOF
Another user, Joe creates an Ingress having the same hostname:
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx
namespace: solar-development
spec:
rules:
- host: web.solar.acmecorp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx
port:
number: 80
EOF
When a collision is detected at scope defined by spec.ingressOptions.hostnameCollisionScope
, the creation of the Ingress resource will be rejected by the Validation Webhook enforcing it. When spec.ingressOptions.hostnameCollisionScope=Disabled
(default), no collision detection is made at all.
Bill, the cluster admin, can deny the use of wildcard hostname in Ingresses. Let's assume that Acme Corp. uses the domain acme.com.
As a tenant owner of solar
, Alice creates an Ingress with the host like - host: "*.acme.com"
. That can lead problems for the water
tenant because Alice can deliberately create ingress with host: water.acme.com
.
To avoid this kind of problems, Bill can deny the use of wildcard hostnames in the following way:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
ingressOptions:
allowWildcardHostnames: false
Doing this, Alice will not be able to use *.water.acme.com
, being the tenant owner of solar and green only.
An Ingress Controller is used in Kubernetes to publish services and applications outside of the cluster. An Ingress Controller can be provisioned to accept only Ingresses with a given Ingress Class.
Bill can assign a set of dedicated Ingress Classes to the solar
tenant to force the applications in the solar
tenant to be published only by the assigned Ingress Controller:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
ingressOptions:
allowedClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a Ingress resource if spec.ingressClassName
or metadata.annotations."kubernetes.io/ingress.class"
equals to:
- Any IngressClass which has the label
env
with the valueproduction
If an Ingress is going to use a non-allowed IngressClass, it will be rejected by the Validation Webhook enforcing it.
Alice can create an Ingress using only an allowed Ingress Class:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx
namespace: solar-production
spec:
ingressClassName: legacy
rules:
- host: oil.acmecorp.com
http:
paths:
- backend:
service:
name: nginx
port:
number: 80
path: /
pathType: ImplementationSpecific
Any attempt of Alice to use a non-valid Ingress Class, or missing it, is denied by the Validation Webhook enforcing it.
Note: This feature is offered only by API type IngressClass in group networking.k8s.io version v1. However, resource Ingress is supported in
networking.k8s.io/v1
andnetworking.k8s.io/v1beta1
This feature allows specifying a custom default value on a Tenant basis, bypassing the global cluster default (with the annotation metadata.annotations.ingressclass.kubernetes.io/is-default-class=true
) that acts only at the cluster level. More information: Default IngressClass
It's possible to assign each tenant an Ingress Class which will be used, if a class is not set on ingress basis:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
ingressOptions:
allowedClasses:
default: "tenant-default"
matchLabels:
env: "production"
Here's how the Tenant default IngressClass could look like:
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
labels:
env: "production"
app.kubernetes.io/component: controller
name: tenant-default
annotations:
ingressclass.kubernetes.io/is-default-class: "false"
spec:
controller: k8s.io/customer-nginx
EOF
If an Ingress has no value for spec.ingressClassName
or metadata.annotations."kubernetes.io/ingress.class"
, the tenant-default
IngressClass is automatically applied to the Ingress resource.
This feature will be deprecated in a future release of Capsule. Instead use TenantReplications. This is also true if you would like other NetworkPolicy implementation like Cilium.
Kubernetes network policies control network traffic between namespaces and between pods in the same namespace. Bill, the cluster admin, can enforce network traffic isolation between different tenants while leaving to Alice, the tenant owner, the freedom to set isolation between namespaces in the same tenant or even between pods in the same namespace.
To meet this requirement, Bill needs to define network policies that deny pods belonging to Alice's namespaces to access pods in namespaces belonging to other tenants, e.g. Bob's tenant water
, or in system namespaces, e.g. kube-system
.
Keep in mind, that because of how the NetworkPolicies API works, the users can still add a policy which contradicts what the Tenant has set, resulting in users being able to circumvent the initial limitation set by the tenant admin. Two options can be put in place to mitigate this potential privilege escalation: 1. providing a restricted role rather than the default admin one 2. using Calico's GlobalNetworkPolicy, or Cilium's CiliumClusterwideNetworkPolicy which are defined at the cluster-level, thus creating an order of packet filtering.
Also, Bill can make sure pods belonging to a tenant namespace cannot access other network infrastructures like cluster nodes, load balancers, and virtual machines running other services.
Bill can set network policies in the tenant manifest, according to the requirements:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
networkPolicies:
items:
- policyTypes:
- Ingress
- Egress
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 192.168.0.0/16
ingress:
- from:
- namespaceSelector:
matchLabels:
capsule.clastix.io/tenant: oil
- podSelector: {}
- ipBlock:
cidr: 192.168.0.0/16
podSelector: {}
The Capsule controller, watching for namespace creation, creates the Network Policies for each namespace in the tenant.
Alice has access to network policies:
kubectl -n solar-production get networkpolicies
NAME POD-SELECTOR AGE
capsule-solar-0 <none> 42h
Alice can create, patch, and delete additional network policies within her namespaces:
kubectl -n solar-production auth can-i get networkpolicies
yes
kubectl -n solar-production auth can-i delete networkpolicies
yes
kubectl -n solar-production auth can-i patch networkpolicies
yes
For example, she can create:
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
name: production-network-policy
namespace: solar-production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
EOF
Check all the network policies
kubectl -n solar-production get networkpolicies
NAME POD-SELECTOR AGE
capsule-solar-0 <none> 42h
production-network-policy <none> 3m
And delete the namespace network policies:
kubectl -n solar-production delete networkpolicy production-network-policy
Any attempt of Alice to delete the tenant network policy defined in the tenant manifest is denied by the Validation Webhook enforcing it. Any deletion by a cluster-administrator will cause the network policy to be recreated by the Capsule controller.
In the future Cluster-Administrators must distribute NetworkPolicies via TenantReplications. This is a more flexible and powerful way to distribute NetworkPolicies, as it allows to distribute any kind of resource. Here's an example of how to distribute a CiliumNetworkPolicy
to all the namespaces of a tenant:
apiVersion: capsule.clastix.io/v1beta2
kind: TenantResource
metadata:
name: solar-limitranges
namespace: solar-system
spec:
resyncPeriod: 60s
resources:
- namespaceSelector:
matchLabels:
capsule.clastix.io/tenant: solar
rawItems:
- apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "l3-rule"
spec:
endpointSelector:
matchLabels:
role: backend
ingress:
- fromEndpoints:
- matchLabels:
role: frontend
Any Tenant owner is able to create a PersistentVolumeClaim
that, backed by a given StorageClass, will provide volumes for their applications.
In most cases, once a PersistentVolumeClaim
is deleted, the bounded PersistentVolume
will be recycled due.
However, in some scenarios, the StorageClass
or the provisioned PersistentVolume
itself could change the retention policy of the volume, keeping it available for recycling and being consumable for another Pod.
In such a scenario, Capsule enforces the Volume mount only to the Namespaces belonging to the Tenant on which it's been consumed, by adding a label to the Volume as follows.
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: rancher.io/local-path
creationTimestamp: "2022-12-22T09:54:46Z"
finalizers:
- kubernetes.io/pv-protection
labels:
capsule.clastix.io/tenant: solar
name: pvc-1b3aa814-3b0c-4912-9bd9-112820da38fe
resourceVersion: "2743059"
uid: 9836ae3e-4adb-41d2-a416-0c45c2da41ff
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: melange
namespace: caladan
resourceVersion: "2743014"
uid: 1b3aa814-3b0c-4912-9bd9-112820da38fe
Once the PeristentVolume
become available again, it can be referenced by any PersistentVolumeClaim
in the solar
Tenant Namespace resources.
If another Tenant, like green
, tries to use it, it will get an error:
$ kubectl describe pv pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d
Name: pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d
Labels: capsule.clastix.io/tenant=solar
Annotations: pv.kubernetes.io/provisioned-by: rancher.io/local-path
Finalizers: [kubernetes.io/pv-protection]
StorageClass: standard
Status: Available
...
$ cat /tmp/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: melange
namespace: green-energy
spec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
volumeName: pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d
$ kubectl apply -f /tmp/pvc.yaml
Error from server: error when creating "/tmp/pvc.yaml": admission webhook "pvc.capsule.clastix.io" denied the request: PeristentVolume pvc-9788f5e4-1114-419b-a830-74e7f9a33f5d cannot be used by the following Tenant, preventing a cross-tenant mount
Persistent storage infrastructure is provided to tenants. Different types of storage requirements, with different levels of QoS, eg. SSD versus HDD, are available for different tenants according to the tenant's profile. To meet these different requirements, Bill, the cluster admin can provision different Storage Classes and assign them to the tenant:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
storageClasses:
matchLabels:
env: "production"
With the said Tenant specification, Alice can create a Persistent Volume Claims if spec.storageClassName equals to:
- Any
StorageClass
which has the label env with the value production
Capsule assures that all Persistent Volume Claims created by Alice will use only one of the valid storage classes. Assume the StorageClass ceph-rbd
has the label env: production
:
kubectl apply -f - << EOF
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc
namespace: solar-production
spec:
storageClassName: ceph-rbd
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 12Gi
EOF
If a Persistent Volume Claim is going to use a non-allowed Storage Class, it will be rejected by the Validation Webhook enforcing it.
Note: This feature supports type StorageClass only on API version
storage.k8s.io/v1
This feature allows specifying a custom default value on a Tenant basis, bypassing the global cluster default (.metadata.annotations.storageclass.kubernetes.io/is-default-class=true
) that acts only at the cluster level. See the Default Storage Class section on Kubernetes documentation.
It's possible to assign each tenant a StorageClass which will be used, if no value is set on Persistent Volume Claim basis:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: oil
spec:
owners:
- name: alice
kind: User
storageClasses:
default: "tenant-default"
matchLabels:
env: "production"
Here's how the new Storage Class could look like:
kubectl apply -f - << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: tenant-default
labels:
env: production
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
EOF
If a Persistent Volume Claim has no value for spec.storageClassName
the tenant-default
value will be used on new Persistent Volume Claim resources.
Bill is a cluster admin providing a Container as a Service platform using shared nodes.
Alice, a Tenant Owner, can start container images using private images: according to the Kubernetes architecture, the kubelet will download the layers on its cache.
Bob, an attacker, could try to schedule a Pod on the same node where Alice is running her Pods backed by private images: they could start new Pods using ImagePullPolicy=IfNotPresent
and be able to start them, even without required authentication since the image is cached on the node.
To avoid this kind of attack, Bill, the cluster admin, can force Alice, the tenant owner, to start her Pods using only the allowed values for ImagePullPolicy, enforcing the kubelet to check the authorization first.
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
imagePullPolicies:
- Always
Allowed values are: Always
, IfNotPresent
, Never
. As defined by the Kubernetes API
Any attempt of Alice to use a disallowed imagePullPolicies
value is denied by the Validation Webhook enforcing it.
Bill, the cluster admin, can set a strict policy on the applications running into Alice's tenant: he'd like to allow running just images hosted on a list of specific container registries.
The spec.containerRegistries
addresses this task and can provide a combination with hard enforcement using a list of allowed values.
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
containerRegistries:
allowed:
- docker.io
- quay.io
allowedRegex: 'internal.registry.\\w.tld'
In case of Pod running non-FQCI (non fully qualified container image) containers, the container registry enforcement will disallow the execution. If you would like to run a b
busybox:latest
container that is commonly hosted on Docker Hub, the Tenant Owner has to specify its name explicitly, likedocker.io/library/busybox:latest
.
A Pod running internal.registry.foo.tld/capsule:latest
as registry will be allowed, as well internal.registry.bar.tld
since these are matching the regular expression.
A catch-all regex entry as
.*
allows every kind of registry, which would be the same result of unsetting.spec.containerRegistries
at all.
Any attempt of Alice to use a not allowed .spec.containerRegistries
value is denied by the Validation Webhook enforcing it.
Bill needs to cordon a Tenant and its Namespaces for several reasons:
- Avoid accidental resource modification(s) including deletion during a Production Freeze Window
- During the Kubernetes upgrade, to prevent any workload updates
- During incidents or outages
- During planned maintenance of a dedicated nodes pool in a BYOD scenario
With this said, the Tenant Owner and the related Service Account living into managed Namespaces, cannot proceed to any update, create or delete action.
This is possible by just toggling the specific Tenant specification:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
cordoned: true
owners:
- kind: User
name: alice
Any operation performed by Alice, the Tenant Owner, will be rejected by the Admission controller.
Uncordoning can be done by removing the said specification key:
$ cat <<EOF | kubectl apply -f -
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
cordoned: false
owners:
- kind: User
name: alice
EOF
$ kubectl --as alice --as-group projectcapsule.dev -n solar-dev create deployment nginx --image nginx
deployment.apps/nginx created
Status of cordoning is also reported in the state of the tenant:
kubectl get tenants
NAME STATE NAMESPACE QUOTA NAMESPACE COUNT NODE SELECTOR AGE
bronze Active 2 3d13h
gold Active 2 3d13h
solar Cordoned 4 2d11h
silver Active 2 3d13h
Sometimes it is important to protect business critical tenants from accidental deletion. This can be achieved by toggling preventDeletion specification key on the tenant:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: solar
spec:
owners:
- name: alice
kind: User
preventDeletion: true