-
Notifications
You must be signed in to change notification settings - Fork 47
Conversation
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/09928d8ccf7c4eb18f114acf04ed98b6 ❌ openstack-k8s-operators-content-provider FAILURE in 8m 43s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This general approach looks good to me.
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
bf8c23c
to
339b302
Compare
pkg/deployment/cert.go
Outdated
// create a secret to hold the certs for the service | ||
serviceCertsSecret := &corev1.Secret{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Name: GetServiceCertsSecretName(instance, serviceName), | ||
Namespace: instance.Namespace, | ||
}, | ||
Data: certsData, | ||
} | ||
_, _, err := secret.CreateOrPatchSecret(ctx, helper, instance, serviceCertsSecret) | ||
if err != nil { | ||
err = fmt.Errorf("Error creating certs secret for %s - %w", serviceName, err) | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we have to create a single secret with the certs, or just could mount certs from several cert secrets based on the label using the secretTamplate
as in https://cert-manager.io/docs/usage/certificate/ . would just have to update https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/certificate.go#L167 for having the secretTemplate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion when we pursue the second option, then it means that we can mount only the certs that are needed. Which is in general safer approach. The serviceCertsSecret
aggregates all the individual node certificates into one secret, is this correct? Having one secret with all certificates for a service across all nodes is a point of potential issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I submitted openstack-k8s-operators/lib-common#358 so that annotations/labels can be passed for the secretTemplate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand the way things work, there is a single ansibleEE pod that is created for a service, which installs the service on every node in the nodeset. That means that if there are 1000 nodes in the nodeset, then there are 1000 secrets to be mounted. Is there a limit to the number of mounts that a pod can have? Does it make sense to have 1000 mounts? I created this single secret because I was concerned with not breaching that limit.
But, I also realize that there is a limit in the size of the secret (4M?) So, how to avoid that limit?
-
We can also create multiple secrets (with as many certs as can fit in a secret) - and then just mount all these secrets as volumes.
-
Another alternative is to create a single volume per service and copy all the certs and keys into that volume. The main concern there though is that this volume will contain a bunch of secret stuff - ie. keys. We can specify that this volume is ephemeral and should be deleted - but we don't have any control of when that happens.
So, I'm looking for guidance here. What is the expectation of the size of a nodeset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on some back of the envelope calculations, I figure we can end up with about 500 certs in a single secret.
That seems reasonable as an approach.
We do need to do some work where we create multiple secrets if we go over that amount -- well, really keep track of the secret size as we add certs - but I'd like to leave that to a new PR.
I will look into only removing the cacerts from the secret once the PRs with the cacert chains are merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we rotate a cert for a node, we should only touch its own secret (that also should reduce the numbers of reconciliation loops triggered by an unrelated certificate change in the common secret?). To avoid bind mounting housands of certs in large envs, let's use kolla to copy it in instead, or bind-mount a directory (so we could avoid encrypted volumes requirement) that contains all certificates collected by ansibleEE operator directly from secrets (based on secretTemplate/labels provided), instead of bind-mounts? This is how we could mitigate mounts count and secret size limits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bogdando We're using a secret containing a bunch of other secrets here rather than a bind-mounted directory because we're concerned that the directory could stick around awhile after the ansible EE pod is completed, potentially exposing the secrets for all the compute nodes. Secrets have some built-in protections. They are stored encrypted in etcd, so they are safe at rest. They are only sent to the nodes that need them and they are mounted using ramfs - so there is no place where they are written to disk. The secret size limitation is because the data is stored in etcd.
We could potentially use a mounted volume if that volume were encrypted in some way, but as far as I can tell, there is no openstack-style encrypted cinder volume available in k8s, where a volume is mounted with an encryptor/decryptor mechanism. If you know of such a mechanism, I'd be happy to use it. I did find mention of something like that here -- https://docs.portworx.com/portworx-enterprise/operations/key-management/kubernetes-secrets/pvc-encryption-using-annotations -- but not in openshift.
For openshift, there appear to be a couple of options.
https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.6/html/planning_your_deployment/security-considerations_rhocs (configuring encryption at the container storage level)
https://red-hat-storage.github.io/ocs-training/training/ocs4/ocs4-encryption.html#_overview (integration with vault as a kms)
All of this is way beyond the scope of this patch :)
For the reconciliation loop issue, recall that there will be maybe 500 certs in a secret, so the number of extra reconciliations will be somewhat small. Also, we can add a label to the secret of secrets and some code to tell the reconciliation loop to ignore it. We can handle that in a follow on patch when we test reconciations for cert changes.
I propose we go with this mechanism for now - we know it works - and it provides the needed security without too much extra complexity.
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/fff2fa5866a34c88996d27041521cb73 ❌ openstack-k8s-operators-content-provider FAILURE in 9m 19s |
The dataplane operator will provide certs and keys to the ansibleEE pod in /var/lib/openstack/certs as well as a cacert file in /var/lib/openstack/cacerts. It will also pass an environment variable tls_certs_enabled to indicate that the certs are present. This PR copies these files to the appropriate place on the compute node and then mounts these files in the nova-compute container. We still need to confirm that these files are in the right location in the container and add config to use these certs - as well as code to trust the cacert. Will need help from compute DFG for that. Depends-On: openstack-k8s-operators/dataplane-operator#442
The dataplane operator will provide certs and keys to the ansibleEE pod in /var/lib/openstack/certs as well as a cacert file in /var/lib/openstack/cacerts. It will also pass an environment variable tls_certs_enabled to indicate that the certs are present. This PR copies these files to the appropriate place on the compute node and then mounts these files in the nova-compute container. We still need to confirm that these files are in the right location in the container and add config to use these certs - as well as code to trust the cacert. Will need help from compute DFG for that. Depends-On: openstack-k8s-operators/dataplane-operator#442
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/72c02d56d08f47188ddd22a458a3f754 ❌ openstack-k8s-operators-content-provider FAILURE in 9m 22s |
The dataplane operator will provide certs and keys to the ansibleEE pod in /var/lib/openstack/certs as well as a cacert file in /var/lib/openstack/cacerts. It will also pass an environment variable tls_certs_enabled to indicate that the certs are present. This PR copies these files to the appropriate place on the compute node and then mounts these files in the nova-compute container. We still need to confirm that these files are in the right location in the container and add config to use these certs - as well as code to trust the cacert. Will need help from compute DFG for that. Depends-On: openstack-k8s-operators/dataplane-operator#442
The dataplane operator will provide certs and keys to the ansibleEE pod in /var/lib/openstack/certs as well as a cacert file in /var/lib/openstack/cacerts. It will also pass an environment variable tls_certs_enabled to indicate that the certs are present. This PR copies these files to the appropriate place on the compute node and then mounts these files in the nova-compute container. We still need to confirm that these files are in the right location in the container and add config to use these certs - as well as code to trust the cacert. Will need help from compute DFG for that. Depends-On: openstack-k8s-operators/dataplane-operator#442
The dataplane operator will provide certs and keys to the ansibleEE pod in /var/lib/openstack/certs as well as a cacert file in /var/lib/openstack/cacerts. It will also pass an environment variable tls_certs_enabled to indicate that the certs are present. This PR copies these files to the appropriate place on the compute node and then mounts these files in the nova-compute container. We still need to confirm that these files are in the right location in the container and add config to use these certs - as well as code to trust the cacert. Will need help from compute DFG for that. Depends-On: openstack-k8s-operators/dataplane-operator#442
The dataplane operator will provide certs and keys to the ansibleEE pod in /var/lib/openstack/certs as well as a cacert file in /var/lib/openstack/cacerts. It will also pass an environment variable tls_certs_enabled to indicate that the certs are present. This PR copies these files to the appropriate place on the compute node and then mounts these files in the nova-compute container. We still need to confirm that these files are in the right location in the container and add config to use these certs - as well as code to trust the cacert. Will need help from compute DFG for that. Depends-On: openstack-k8s-operators/dataplane-operator#442
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/4cc08449e92944098ffadd557633e0a8 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 43m 25s |
recheck |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/590cc0244a8346899438cc3cbef60ca5 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 51m 40s |
The dataplane operator will provide certs and keys to the ansibleEE pod in /var/lib/openstack/certs as well as a cacert file in /var/lib/openstack/cacerts. It will also pass an environment variable tls_certs_enabled to indicate that the certs are present. This PR copies these files to the appropriate place on the compute node and then mounts these files in the nova-compute container. We still need to confirm that these files are in the right location in the container and add config to use these certs - as well as code to trust the cacert. Will need help from compute DFG for that. Depends-On: openstack-k8s-operators/dataplane-operator#442
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/43129b7ea2a94a5ab60890e7e0dd864a ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 37m 32s |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/55fc1925d0c9420d8890adf261550575 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 41m 41s |
4257550
to
3e4e97f
Compare
// HasTLSCerts - Whether the nodes have TLS certs | ||
// +kubebuilder:validation:Optional | ||
// +operator-sdk:csv:customresourcedefinitions:type=spec,xDescriptors={"urn:alm:descriptor:com.tectonic.ui:booleanSwitch"} | ||
HasTLSCerts *bool `json:"hasTLSCerts,omitempty" yaml:"hasTLSCerts,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to also use the field name TLSEnabled here as well? Or, just as a naming convention, call this one TLSCertsEnabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - will use TLSCertsEnabled
|
||
// CACerts - Secret containing the CA certificate chain | ||
// +kubebuilder:validation:Optional | ||
CACerts string `json:"caCerts,omitempty" yaml:"caCerts,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the Issuers and CACerts fields that need to be Service specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - we have been discussing creating a specific CA for ovn related services. This will allow OVN components to only accept communications from other ovn components.
pkg/deployment/cert.go
Outdated
ips = allIPs[nodeName] | ||
|
||
switch service.Name { | ||
case "nova", "libvirt": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to find a way to do this generically, instead of having special handling for individual services. Could we add something to the CRD that indicates what needs to be in the cert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack - we could do something that ends up in the crd or in the service definition. I suggest we refactor this as a follow-on CR so that we have something working while we design this out.
pkg/deployment/inventory.go
Outdated
|
||
// add TLS ansible variable | ||
if instance.Spec.TLSEnabled != nil && *instance.Spec.TLSEnabled { | ||
nodeSetGroup.Vars["tls_certs_enabled"] = "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a variable that multiple roles from edpm-ansible would use right? Could we call this edpm_tls_certs_enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - we can do that
@@ -209,6 +209,24 @@ func (r *OpenStackDataPlaneNodeSetReconciler) Reconcile(ctx context.Context, req | |||
instance.Status.DNSClusterAddresses = dnsClusterAddresses | |||
instance.Status.CtlplaneSearchDomain = ctlplaneSearchDomain | |||
|
|||
// Issue certs for TLS for services that need them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs to be refactored out into a function that can also be called from https://github.com/openstack-k8s-operators/dataplane-operator/blob/main/controllers/openstackdataplanedeployment_controller.go as that can use an entirely different set of services if servicesOverride is set.
In fact...maybe this logic should just live entirely in openstackdataplanedeployment_controller.go. I'm thinking that might be the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - will refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually -- I don't understand this comment entirely .. will sync with you,
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vakwetu The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is completely untested(although it does compile!), but it gives an idea of how EDPM certs could be generated and mounted for each of the services that need them.
While I will test and refine, I'm hoping that this PR starts conversations and questions so that we can decide on an implementation.
The basic idea is as follows: