title | authors | reviewers | creation-date | last-updated | status | see-also | replaces | superseded-by | |||
---|---|---|---|---|---|---|---|---|---|---|---|
Single Controller Multitenancy |
|
|
2020-07-20 |
2020-07-20 |
implementable |
- Single Controller Multitenancy
- Table of Contents
- Glossary
- Summary
- Motivation
- Proposal
- Requirements
- Upgrade Strategy
- Additional Details
- Implementation History
- Principal Type - One of several ways to provide a form of identity that is ultimately resolved to an Azure Active Directory (AAD) Principal.
- Authorizer - An implementation of the Azure SDK for Golang's Authorizer Interface.
- CAPZ - An abbreviation of Cluster API Provider Azure.
The CAPZ operator is able to manage Azure cloud infrastructure within the permission scope of the AAD principal it is initialized with, usually through environment vars. The CAPZ operator will be provided credentials via the deployment, either explicitly via environment variables or implicitly via the default SDK credential provider chain, including Azure instance metadata service via User Assigned Identities.
In addition, CAPZ uses the environmentally configured identity for the lifetime of the deployment. This also means that an AzureCluster could be broken if the instance of CAPZ that created it is misconfigured for another set of credentials.
This proposal outlines new capabilities for CAPZ to assume a different AAD Principal, at runtime, on a per-cluster basis. The proposed changes would be fully backwards compatible and maintain the existing behavior with no changes to user configuration required.
For large organizations, especially highly-regulated organizations, there is a need to be able to perform separate duties at various levels of infrastructure - permissions, networks and accounts. Azure Role Based Authorization (RBAC) Controls provides a model which allows admins to provide identities with the least privilege to perform activities. Within this model it is appropriate for tooling running within the 'management' account to manage infrastructure within the 'workload' accounts. Within the AAD model, the controller can assume the identity of the cluster to perform cluster management activities. For CAPZ to be most useful within these organizations it will need to support multi-account models.
Some organizations may also delegate the management of clusters to another third-party. In that case, the boundary between organizations needs to be secured. In AAD, this can be accomplished by providing a third party AAD principal RBAC access to the Azure resources required to manage cluster infrastructure.
Because a single deployment of the CAPZ operator may reconcile many clusters in its lifetime, it is
necessary to modify the CAPZ operator to scope its Azure Authorizer
to within the reconciliation
process.
Unlike AWS, Azure doesn't provide mechanisms for assuming roles across account boundaries, but rather allows RBAC rights to be enabled across account boundaries. This is the largest break in this proposal with regard to prior art of the CAPA multitenancy proposal
- To enable AzureCluster resources reconciliation using a cluster specified AAD Identity
- To maintain backwards compatibility and cause no impact for users who don't intend to make use of this capability
Alex is an engineer in a large organization which has a strict Azure account architecture. This architecture dictates that Kubernetes clusters must be hosted in dedicated Subscriptions with AAD identity having RBAC rights to provision the infrastructure only in the Subscription. The workload clusters must run with a System Assigned machine identity. The organization has adopted Cluster API in order to manage Kubernetes infrastructure, and expects 'management' clusters running the Cluster API controllers to manage 'workload' clusters in dedicated Azure Subscriptions with an AAD account which only has access to that Subscription.
The current configuration exists:
- Subscription for each cluster
- AAD Service Principals with Subscription Owner rights for each Subscription
- A management Kubernetes cluster running Cluster API Provider Azure controllers
Alex can provision a new workload cluster in the specified Subscription with the corresponding AAD Service Principal by creating new Cluster API resources in the management cluster. Each of the workload cluster machines would run as the System Assigned identity described in the Cluster API resources. The CAPZ controller in the management cluster uses the Service Principal credentials when reconciling the AzureCluster so that it can create/use/destroy resources in the workload cluster.
Alex is an engineer in a large organization which has a strict Azure account architecture. This architecture dictates that Kubernetes clusters must be hosted in dedicated Subscriptions with AAD identity having RBAC rights to provision the infrastructure only in the Subscription. The workload clusters must run with a System Assigned machine identity.
Erin is a security engineer in the same company as Alex. Erin is responsible for provisioning identities. Erin will create a Service Principal for use by Alex to provision the infrastructure in Alex's cluster. The identity Erin creates should only be able to be used in a predetermined Kubernetes namespace where Alex will define the workload cluster. The identity should be able to be used by CAPZ to provision workload clusters in other namespaces.
The organization has adopted Cluster API in order to manage Kubernetes infrastructure, and expects 'management' clusters running the Cluster API controllers to manage 'workload' clusters in dedicated Azure Subscriptions with an AAD account which only has access to that Subscription.
The current configuration exists:
- Subscription for each cluster
- AAD Service Principals with Subscription Owner rights for each Subscription
- A management Kubernetes cluster running Cluster API Provider Azure controllers
Alex can provision a new workload cluster in the specified Subscription with the corresponding AAD Service Principal by creating new Cluster API resources in the management cluster in the predetermined namespace. Each of the workload cluster machines would run as the System Assigned identity described in the Cluster API resources. The CAPZ controller in the management cluster uses the Service Principal credentials when reconciling the AzureCluster so that it can create/use/destroy resources in the workload cluster.
Erin can provision an identity in a namespace of limited access and define the allowed namespaces, which will include the predetermined namespace for the workload cluster.
Erin is an engineer working in a large organization. Erin does not want to be responsible for ensuring Service Principal secrets are rotated on a regular basis. Erin would like to use an Azure User Assigned Identity to provision workload cluster infrastructure. The User Assigned Identity will have the RBAC rights needed to provision the infrastructure in Erin's subscription.
The current configuration exists:
- Subscription for the workload cluster
- A User Assigned Identity with RBAC with Subscription Owner rights for the Subscription
- A management Kubernetes cluster running Cluster API Provider Azure controllers
Erin can provision a new workload cluster in the specified Subscription with the Azure User Assigned Identity by creating new Cluster API resources in the management cluster. The CAPZ controller in the management cluster uses the User Assigned Identity credentials when reconciling the AzureCluster so that it can create/use/destroy resources in the workload cluster.
Dascha is an engineer in a smaller, less strict organization with a few Azure accounts intended to build all infrastructure. There is a single Azure Subscription named 'dev', and Dascha wants to provision a new cluster in this Subscription. An existing Kubernetes cluster is already running the Cluster API operators and managing resources in the dev Subscription. Dascha can provision a new cluster by creating Cluster API resources in the existing cluster, omitting the ProvisionerIdentity field in the AzureCluster spec. The CAPZ operator will use the Azure credentials provided in its deployment template.
ACME Industries is offering Kubernetes as a service to other organizations. ACME creates an AAD Identity for each organization and each organization grants that Identity access to provision infrastructure in one or multiple of their Azure Subscriptions. ACME Industries wants to minimise the memory footprint of managing many clusters, and wants to move to having a single instance of CAPZ to managed infrastructure across multiple organizations.
FR1. CAPZ MUST support assuming an identity specified by an AzureCluster.ProvisioningIdentity
.
FR2. CAPZ MUST support static credentials.
FR3. CAPZ MUST prevent privilege escalation allowing users to create clusters in Azure accounts they should not be able to.
FR4. CAPZ SHOULD support credential refreshing modified principal data.
FR5. CAPZ SHOULD provide validation for principal data submitted by users.
FR6. CAPZ MUST support clusterctl move scenarios.
NFR1. Each instance of CAPZ SHOULD be able to support 200 clusters using role assumption.
NFR2. CAPZ MUST call AAD APIs only when necessary to prevent rate limiting.
NFR3. Unit tests MUST exist for all credential provider code.
NFR4. e2e tests SHOULD exist for all credential provider code.
NFR5. Credential provider code COULD be audited by security engineers.
The current implementation of CAPZ requests new instances of Azure services per cluster and sub-cluster resources. The input for these services is a ClusterScope, which provides the identity Azure service will operate as.
type ClusterScope struct {
logr.Logger
client client.Client
patchHelper *patch.Helper
AzureClients
Cluster *clusterv1.Cluster
AzureCluster *infrav1.AzureCluster
}
The ClusterScope contains the information needed to make an authenticated request against an
Azure cloud. The Authorizer
currently contains the AAD identity information loaded from the CAPZ
controller environment. The Authorizer
is used to fetch and refresh tokes for all Azure clients.
// AzureClients contains all the Azure clients used by the scopes.
type AzureClients struct {
SubscriptionID string
ResourceManagerEndpoint string
ResourceManagerVMDNSSuffix string
Authorizer autorest.Authorizer
}
The signatures for the functions which create these instances are as follows:
func NewClusterScope(params ClusterScopeParams) (*ClusterScope, error) {
...
return &ClusterScope{
...
}, nil
}
The proposed changes below only apply to infrastructure nodes which run the CAPZ controllers. These changes have no impact on the identities specified on AzureMachines or other workload infrastructure.
AAD Pod Identity enables Kubernetes applications to access cloud resources securely using AAD Identities, Service Principal and User Assigned Identities. To enable CAPZ to authenticate as a User Assigned Identity, CAPZ needs to have access to the Azure IMDS service running on the host. This can be accomplished indirectly by using AAD Pod Identity.
With AAD Pod Identity running within the management cluster CAPZ can create AzureIdentityBindings and other related structures to enable CAPZ to bind to Azure Identities.
To use Azure Managed Identities, the infrastructure nodes hosting the CAPZ controller must be hosted in Azure. Outside of Azure, Service Principal identities are the only available identity type.
Changed Resources
AzureCluster
New Resources
Cluster scoped resources
AzureClusterIdentity
represents the information needed to create an AzureIdentity and AzureIdentityBinding via Pod Identity. The type should also contain a string list representing the namespace which it is allowed to be used.
Changes to AzureCluster
A new field is added to the AzureClusterSpec
to reference an AzureClusterIdentity
. We intend to use
corev1.LocalObjectReference
in order to ensure that the only objects that can be references are
either in the same namespace or are scoped to the entire cluster.
// AzureClusterIdentity is the Schema for the azureclustersidentities API
type AzureClusterIdentity struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec AzureClusterIdentitySpec `json:"spec,omitempty"`
Status AzureClusterIdentityStatus `json:"status,omitempty"`
}
type AzureClusterIdentitySpec struct {
// UserAssignedMSI or Service Principal
Type IdentityType `json:"type"`
// User assigned MSI resource id.
// +optional
ResourceID string `json:"resourceID,omitempty"`
// Both User Assigned MSI and SP can use this field.
ClientID string `json:"clientID"`
// ClientSecret is a secret reference which should contain either a Service Principal password or certificate secret.
// +optional
ClientSecret corev1.SecretReference `json:"clientSecret,omitempty"`
// Service principal primary tenant id.
TenantID string `json:"tenantID"`
// AllowedNamespaces is an array of namespaces that AzureClusters can
// use this Identity from.
//
// An empty list (default) indicates that AzureClusters can use this
// Identity from any namespace. This field is intentionally not a
// pointer because the nil behavior (no namespaces) is undesirable here.
// +optional
AllowedNamespaces []string `json:"allowedNamespaces"`
}
type AzureClusterSpec struct {
...
// +optional
IdentityRef *corev1.ObjectReference `json:"identityRef,omitempty"`
Example:
Service Principal
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AzureCluster
metadata:
name: cluster1
namespace: default
spec:
identityRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AzureClusterIdentity
name: sp-identity
namespace: default
location: westus2
networkSpec:
vnet:
name: cluster1-vnet
resourceGroup: cluster1
subscriptionID: 8000873c-41a6-11eb-8907-4db4e45a2e79
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AzureClusterIdentity
metadata:
name: sp-identity
namespace: default
spec:
clientID: 74e870e6-cac6-11ea-87d0-0242ac130003
clientSecret:
name: secretName
namespace: secretNamespace
tenantID: 6bec3eaa-cac6-11ea-87d0-0242ac130003
type: ServicePrincipal
User Assigned Identity
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AzureCluster
metadata:
name: cluster1
namespace: default
spec:
identityRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AzureClusterIdentity
name: sp-identity
namespace: default
location: westus2
networkSpec:
vnet:
name: cluster1-vnet
resourceGroup: cluster1
subscriptionID: 8000873c-41a6-11eb-8907-4db4e45a2e79
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AzureClusterIdentity
metadata:
name: sp-identity
namespace: default
spec:
clientID: 74e870e6-cac6-11ea-87d0-0242ac130003
tenantID: 6bec3eaa-cac6-11ea-87d0-0242ac130003
type: UserAssignedMSI
- If IdentityRef is specified, the CRD is fetched and used to create an
azure.Authorizer
- The controller will compare the hash of the credential provider against the same secret’s provider in a cache (NFR2).
- The controller will take the newer of the two and instantiate Azure clients with the selected
Authorizer
. - The controller will set the
AzureClusterIdentity
resource as one of the OwnerReferences of theAzureCluster
if the resource is in the same namespace as theAzureCluster
. - The controller should reconcile
AzureClusterIdentity
and create correspondingAzureIdentity
andAzureIdentityBinding
resources within the AAD Pod Identity namespace to enable the management cluster to use the identities specified. - The controller should have a label which will be used by the
AzureIdentityBinding
to inform AAD Pod Identity.
Today, clusterctl move
operates by tracking objectreferences within the same namespace, since we
are now proposing to use cluster-scoped resources, we will need to add requisite support to
clusterctl's object graph to track ownerReferences pointing at cluster-scoped resources, and ensure
they are moved. We will naively not delete cluster-scoped resources during a move, as they maybe
referenced across namespaces.
Implementations for all principal types will implement the AutoREST Authorizer
interface as well
as an additional function signature to support caching.
// Authorizer is the interface that provides a PrepareDecorator used to supply request
// authorization. Most often, the Authorizer decorator runs last so it has access to the full
// state of the formed HTTP request.
type Authorizer interface {
WithAuthorization() PrepareDecorator
}
type AzureIdentityProvider interface {
Authorizer
// Hash returns a unique hash of the data forming the credentials
// for this principal
Hash() (string, error)
}
Azure client sessions are structs implementing the Authorizer
interface. The Azure SDK for Golang
will verify the token in the Authorizer
has not expired. If the token has expired, the
Authorizer
will attempt to fetch a new token from AAD.
The controller will maintain a cache of all the Authorizers
used by the clusters. In practice,
this should be a single Authorizer
for the cluster. If we were to create clusters which span
Azure AAD Tenants, then there may be a need for multiple Authorizers
per cluster.
The intended RBAC model mirrors that for Service APIs:
For the purposes of this security model, 3 common roles have been identified:
-
Infrastructure provider: The infrastructure provider (infra) is responsible for the overall environment that the cluster(s) are operating in or the PaaS provider in a company.
-
Management cluster operator: The cluster operator (ops) is responsible for administration of the Cluster API management cluster. They manage policies, network access, application permissions.
-
Workload cluster operator: The workload cluster operator (dev) is responsible for management of the cluster relevant to their particular applications .
There are two primary components to the Service APIs security model: RBAC and namespace restrictions.
RBAC (role-based access control) is the standard used for Kubernetes authorization. This allows users to configure who can perform actions on resources in specific scopes. RBAC can be used to enable each of the roles defined above. In most cases, it will be desirable to have all resources be readable by most roles, so instead we'll focus on write access for this model.
AzureServicePrincipal, etc | Azure RBAC API | Cluster | |
---|---|---|---|
Infrastructure Provider | Yes | Yes | Yes |
Management Cluster Operators | Yes | Yes | Yes |
Workload Cluster Operator | No | No | Yes |
The extra configuration options are not possible to control with RBAC. Instead,
they will be controlled with configuration fields on AzureClusterIdentity
:
- allowedNamespaces: This field is a list of namespaces that can use the
AzureClusterIdentity
from. CAPZ will not support AzureClusters in namespaces outside this selector. An empty selector (default) indicates that AzureCluster can use this Azure Identity from any namespace. This field is intentionally not a pointer because the nil behavior (no namespaces) is undesirable here.
The CAPZ controller will need to:
- Populate condition fields on AzureClusters and indicate if it is compatible with the Azure Identity. For example, if UserAssignedMSI is specified and is not available, a condition should be set indicating failure.
- Not implement invalid configuration. For example, if the AzureCluster references an
AzureClusterIdentity
in an invalid namespace for it, it should indicate it through a condition or ignore. - Respond to changes in an
AzureClusterIdentity
spec.
For handling many accounts, the number of calls to AAD must be minimised. To minimise the number of
calls to AAD, the cache should store Authorizers
by a key consisting of two parts, the credential
type and the credential name, key={cred-type|cred-name}, value=Authorizer
.
The data changes are additive and optional, so existing AzureCluster specifications will continue to reconcile as before. These changes will only come into play when specifying Azure Identity in the new field in AzureClusterSpec. Upgrades to versions with this new field will be broken.
- Unit tests to validate that the cluster controller can reconcile an AzureCluster when PrincipalRef field is nil, or specified for each principal type.
- Propose performing an initial Azure API call and fail pre-flight if this fails.
- e2e test for each principal type.
- clusterctl e2e test with a move of a self-hosted cluster using a principalRef.
- Support using an Azure Service Principal using the IdentityRef
- Ensure
clusterctl move
works with the mechanism.
- Support using Azure User Assigned Identities
- Documentation describing all identities used in Management and Workload Clusters and their roles
- Full e2e coverage for both Service Principals and User Assigned Identities
- Identity caching to minimize authentication requests
- Two releases since beta.
- Describe cluster identities as the preferred way to provision infrastructure as opposed to env vars
- 2020/07/20: Initial proposal
- 2020/12/17: Initial PR merge #977
- 2020/12/18: Proposal update to reflect the adaptations from #977