Skip to content
This repository has been archived by the owner on Jun 25, 2024. It is now read-only.

Enable TLS-e by default #754

Merged

Conversation

SeanMooney
Copy link
Contributor

@SeanMooney SeanMooney commented Mar 11, 2024

TLS-e is intended to be on by default with an explict opt-out
when not required. This change has already been done for the
contolplane and this commit does the same for the dataplaneNodeSet

This change updates the dataplaneNodeset TLSEnabled default to true
the envtests are modifed to account for this change and
the dataplane-create-test kuttl test is exented to assert the default
behavior.

exisiting test covergeage fo tls supprot is unmodifed.

The base example cr is also updated to make it clear that tls is the
default however that change is not strictly requried.

With this change the tempest jobs should now default to useing
TLS unless they explictly opt out.

Related: OSPRH-2382

@vakwetu
Copy link
Contributor

vakwetu commented Mar 11, 2024

I'll be super happy when this lands, but we're not quite there yet. Specifically we're waiting on some work to be completed on the control plane side.

The control plane is not yet enabled for internal TLS by default.

https://github.com/openstack-k8s-operators/openstack-operator/blob/81a98c76364917b7e555711505353e4a6f79b129/apis/core/v1beta1/openstackcontrolplane_types.go#L79

@vakwetu vakwetu requested a review from stuggi March 11, 2024 21:22
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/0488afe1d9834cf98c2fe8a38188c724

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 39m 09s
podified-multinode-edpm-deployment-crc FAILURE in 1h 16m 00s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 12m 26s

@SeanMooney
Copy link
Contributor Author

I'll be super happy when this lands, but we're not quite there yet. Specifically we're waiting on some work to be completed on the control plane side.

The control plane is not yet enabled for internal TLS by default.

https://github.com/openstack-k8s-operators/openstack-operator/blob/81a98c76364917b7e555711505353e4a6f79b129/apis/core/v1beta1/openstackcontrolplane_types.go#L79

oh ok so we need a patch to enable internal tls on the contolplane first and then use depend on form this to that. to test them together before we can proceed with this

@SeanMooney
Copy link
Contributor Author

ok so this fails copying the cert the same way it was failing for me locally last week

https://logserver.rdoproject.org/54/754/c672cf974e19da1492e5cb219c8ae6348856c1d5/github-check/podified-multinode-edpm-deployment-crc/51a7438/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/pods/install-certs-edpm-deployment-4jcgf/logs/openstackansibleee.log

i tried enabling internal tls after the contolane and daata plane was intillly deployed and it never seamed to create the tls keys for the the compute node

@SeanMooney
Copy link
Contributor Author

ah there is a bug in the generic edpm_install_certs role @vakwetu

the combined secte has compute-0.ci-rdo.local-tls.key but its trying to copy
var/lib/openstack/certs/libvirt/compute-0-tls.key

so the generic role is copying using the hostname but the comnined secrert is using the fqdn

this is likely because the hostname filed is the fqdn but the node name is the hostname not the fqdn.

this can either be fixed in the ansible code or in the dataplane operator go code but we need to deciend on one or the other.

@vakwetu
Copy link
Contributor

vakwetu commented Mar 12, 2024

@SeanMooney nice catch.

The generic secret is using the node hostName as in here: https://github.com/openstack-k8s-operators/dataplane-operator/blob/main/pkg/deployment/cert.go#L136 , whereas the install-certs role is copying using the inventory_hostname - https://github.com/openstack-k8s-operators/edpm-ansible/blob/f98ed5f09467620a00e2936c27a80120b634a56c/roles/edpm_install_certs/tasks/copy_certs_and_keys.yaml#L49

I'm not sure, but is inventory_hostname being populated here :

host := nodeSetGroup.AddHost(strings.Split(node.HostName, ".")[0])
?

I have no particular preference for either method. However, as I'll be at a conference for the next few days - please go ahead and choose one accordingly. I'll review as I can.

@SeanMooney
Copy link
Contributor Author

SeanMooney commented Mar 12, 2024

thanks this is exactly what i was expcting one is
hostName := node.HostName
and the other is using
strings.Split(node.HostName, ".")[0]

while i generally recommend having unique hostnames not just unique fqdns it might be better to update the role to use ansible_hostname instead of inventory_hostname

inventory_hostname in this case is edpm-compute-0

 nodes:
    edpm-compute-0:
      ansible:
        ansibleHost: 192.168.122.100
        ansibleUser: ""
      hostName: compute-0.ci-rdo.local
      networks:
      - defaultRoute: true
        fixedIP: 192.168.122.100
        name: CtlPlane
        subnetName: subnet1
      - name: InternalApi
        subnetName: subnet1
      - name: Storage
        subnetName: subnet1
      - name: Tenant
        subnetName: subnet1

is translated into the following in the inventory

hosts:
        compute-0:
            ansible_host: 192.168.122.100
            canonical_hostname: compute-0.ci-rdo.local
            ctlplane_cidr: 24

so if we use host_vars[inventory_hostname][canonical_hostname]
or we use ansible_hostname

we will match the key in the cert secrete

i might just use host_vars[inventory_hostname][canonical_hostname]
to ensure its matching the hostname we declared rather then relying on it being discovert via ansibel gather_facts since we are always generating canonical_hostname and i dont need toworry about zuul/molecue names being differnt then we expect.

@SeanMooney
Copy link
Contributor Author

i believe openstack-k8s-operators/edpm-ansible#596 will fix this

@SeanMooney
Copy link
Contributor Author

check-rdo lets see if the depends on resolves the name missmatch

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/0205b3f0fa9546cc89a7e2d170b73999

openstack-k8s-operators-content-provider FAILURE in 9m 42s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@SeanMooney
Copy link
Contributor Author

check-rdo content-provider failed

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/a4ffc2ccb483470990dd2e88d9d95ca9

openstack-k8s-operators-content-provider FAILURE in 7m 06s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@SeanMooney
Copy link
Contributor Author

we are getting connection resets in the content provider

Get "https://proxy.golang.org/github.com/openstack-k8s-operators/lib-common/modules/common/@v/v0.3.1-0.20240229121803-169ced56d56e.mod\": read tcp 38.102.83.153:51364->142.251.33.177:443: read: connection reset by peer", "make: *** [Makefile:412: gowork] Error 1"

ill try this again later and see if i can run this locally in the mean time

@SeanMooney
Copy link
Contributor Author

check-rdo lets see if the package download issues have resolved themselves

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/be302e29aea64931bed5e245b255b4fb

openstack-k8s-operators-content-provider FAILURE in 9m 24s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@SeanMooney
Copy link
Contributor Author

ok so my fix as written wont quite work

given

    nodes:
      edpm-compute-0:
        ansible:
          ansibleHost: 192.168.122.100
          ansibleUser: ""
        hostName: edpm-compute-0
        networks:
        - defaultRoute: true
          fixedIP: 192.168.122.100
          name: CtlPlane
          subnetName: subnet1
        - name: InternalApi
          subnetName: subnet1
        - name: Storage
          subnetName: subnet1
        - name: Tenant
          subnetName: subnet1

the dataplae operator calulats the host names for edpm-comptue-0 as

   allHostnames:
      edpm-compute-0:
        ctlplane: edpm-compute-0.ctlplane.example.com
        internalapi: edpm-compute-0.internalapi.example.com
        storage: edpm-compute-0.storage.example.com
        tenant: edpm-compute-0.tenant.example.com

and canonical host name is calculated as

edpm-compute-0.ctlplane.example.com

the key in the libvirt secrete however is
edpm-compute-0-tls.key

so canonical_hostname will not always match that key.

this is what the relevnet part of the inventory looks like

   "hosts": {
      "edpm-compute-0": {
        "ansible_host": "192.168.122.100",
        "canonical_hostname": "edpm-compute-0.ctlplane.example.com",
        "ctlplane_cidr": 24,
        "ctlplane_dns_nameservers": [
          "192.168.122.80"
        ],

so really to fix this we need a change to the go code.

we need to normalise the keys of the secrete to either rthe inventory_hostname or the canonical name

@SeanMooney
Copy link
Contributor Author

SeanMooney commented Mar 12, 2024

the Depends-On: openstack-k8s-operators/edpm-ansible#596 should now be flipped and that should depend on this patch but ill wait for the ci jobs to complete before doing that.

--- later---
actually maybe not we might need to update the ansible first or split this into two prs

the job default change cant happen until the ansible code change an dhte ansible code change requires the go code change.

in anycase we like will want to merge both together if we decide to proceed with these patches.

we could merge the ansible change first since no job is testing it yet but it will technically break deployment that done have the contolpalne FQDN set in the hostname field of the nodes

install_yamls does not use the fqdn but the ci_framework does in the jobs so this would only be a minor local issues while we with for both to merge.

@SeanMooney
Copy link
Contributor Author

ok cool so this work for tempest i just need to fix the kuttl tests.
I'm going to keep the depndeinces they way i have them now.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/4cf9340b223b4335afabf15da2241ba5

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 16m 59s
podified-multinode-edpm-deployment-crc FAILURE in 1h 43m 47s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 54m 43s

@rabi
Copy link
Contributor

rabi commented Mar 14, 2024

TLS-e is intended to be on by default with an explict opt-out when not required

This is orthogonal to what we've had in earlier OSP versions where it was an opt-in. Have we discussed and agreed on the general and upgrade implications for customers and normal non-production deployments?

@SeanMooney
Copy link
Contributor Author

yes the aggree strage is that this will not be change as part of adoption.
meaning if your coming form a non tls-e env you will be expect to disabel tls in the contolplane and datapalne as part of the adoption procedure.

for greenfiled the intent is that all new greenfeild deployments should be tls-e but you can use the same option to opt our in greenfiled as is used for adoption.

i do not belive there is any intent to force tls-e only

we did discuss that as an option and decide the impact to customer was not warranted and default to on was sufficent.

TLS-e is intended to be on by default with an explict opt-out
when not required. This change has already been done for the
contolplane and this commit does the same for the dataplaneNodeSet

This change updates the dataplaneNodeset TLSEnabled default to true
the envtests are modifed to account for this change and
the dataplane-create-test kuttl test is exented to assert the default
behavior.

exisiting test covergeage fo tls supprot is unmodifed.

The base example cr is also updated to make it clear that tls is the
default however that change is not strictly requried.

With this change the tempest jobs should now default to useing
TLS unless they explictly opt out.

Related: OSPRH-2382
Signed-off-by: Fabricio Aguiar <[email protected]>
@olliewalsh
Copy link
Contributor

With this change the tempest jobs should now default to useing
TLS unless they explictly opt out.

Not sure how this PR affects tempest

@fao89
Copy link
Collaborator

fao89 commented Apr 2, 2024

/approve

@openshift-ci openshift-ci bot added the approved label Apr 2, 2024
@openshift-ci openshift-ci bot added the lgtm label Apr 2, 2024
Copy link
Contributor

openshift-ci bot commented Apr 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fao89, SeanMooney

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 4059a52 into openstack-k8s-operators:main Apr 2, 2024
8 checks passed
@vakwetu
Copy link
Contributor

vakwetu commented Apr 2, 2024

/lgtm

olliewalsh added a commit to olliewalsh/openstack-operator that referenced this pull request Apr 3, 2024
As there is no longer an external dependency, TLS-e can now be enabled
by default (with explicit opt-out e.g for adoption).

Depends-On: openstack-k8s-operators/dataplane-operator#754

Related: OSPRH-2038
olliewalsh added a commit to olliewalsh/openstack-operator that referenced this pull request Apr 4, 2024
As there is no longer an external dependency, TLS-e can now be enabled
by default (with explicit opt-out e.g for adoption).

Depends-On: openstack-k8s-operators/dataplane-operator#754

Related: OSPRH-2038
olliewalsh added a commit to olliewalsh/openstack-operator that referenced this pull request Apr 5, 2024
As there is no longer an external dependency, TLS-e can now be enabled
by default (with explicit opt-out e.g for adoption).

Depends-On: openstack-k8s-operators/dataplane-operator#754

Related: OSPRH-2038
karelyatin added a commit to karelyatin/architecture that referenced this pull request Apr 5, 2024
It's required with tls enabled, [1] already enabled
tls by default so need to include the install-certs
service before the services requiring the certs.
The issue should be visible once dataplane-operator
is updated[2]

[1] openstack-k8s-operators/dataplane-operator#754
[2] openstack-k8s-operators/openstack-operator#732
karelyatin added a commit to karelyatin/ci-framework that referenced this pull request Apr 5, 2024
It's required with tls enabled, [1] already enabled tls by default
so need to include the install-certs service before the services
requiring the certs.
The issue should be visible once dataplane-operator is updated[2]
or tlsEnabled flag is explicitly set in the nodeset.

[1] openstack-k8s-operators/dataplane-operator#754
[2] openstack-k8s-operators/openstack-operator#732
karelyatin added a commit to karelyatin/ci-framework that referenced this pull request Apr 5, 2024
It's required with tls enabled, [1] already enabled tls by default
so need to include the install-certs service before the services
requiring the certs.
The issue should be visible once dataplane-operator is updated[2]
or tlsEnabled flag is explicitly set in the nodeset.

Also update nova-custom-ceph dataplane service to
include missing tls params.

[1] openstack-k8s-operators/dataplane-operator#754
[2] openstack-k8s-operators/openstack-operator#732
karelyatin added a commit to karelyatin/architecture that referenced this pull request Apr 5, 2024
It's required with tls enabled, [1] already enabled tls by default
so need to include the missing certs.
The issue should be visible once dataplane-operator is updated[2] or
tlsEnabled flag is explicitly set in the nodeset.

[1] openstack-k8s-operators/dataplane-operator#754
[2] openstack-k8s-operators/openstack-operator#732
tosky pushed a commit to tosky/ci-framework that referenced this pull request Apr 7, 2024
It's required with tls enabled, [1] already enabled tls by default
so need to include the install-certs service before the services
requiring the certs.
The issue should be visible once dataplane-operator is updated[2]
or tlsEnabled flag is explicitly set in the nodeset.

Also update nova-custom-ceph dataplane service to
include missing tls params.

[1] openstack-k8s-operators/dataplane-operator#754
[2] openstack-k8s-operators/openstack-operator#732
karelyatin added a commit to karelyatin/architecture that referenced this pull request Apr 8, 2024
It's required with tls enabled, [1] already enabled tls by default
so need to include the missing certs.
The issue should be visible once dataplane-operator is updated[2] or
tlsEnabled flag is explicitly set in the nodeset.

[1] openstack-k8s-operators/dataplane-operator#754
[2] openstack-k8s-operators/openstack-operator#732
openshift-merge-bot bot pushed a commit to openstack-k8s-operators/ci-framework that referenced this pull request Apr 8, 2024
It's required with tls enabled, [1] already enabled tls by default
so need to include the install-certs service before the services
requiring the certs.
The issue should be visible once dataplane-operator is updated[2]
or tlsEnabled flag is explicitly set in the nodeset.

Also update nova-custom-ceph dataplane service to
include missing tls params.

[1] openstack-k8s-operators/dataplane-operator#754
[2] openstack-k8s-operators/openstack-operator#732
olliewalsh added a commit to olliewalsh/openstack-operator that referenced this pull request Apr 8, 2024
As there is no longer an external dependency, TLS-e can now be enabled
by default (with explicit opt-out e.g for adoption).

Depends-On: openstack-k8s-operators/dataplane-operator#754

Related: OSPRH-2038
olliewalsh added a commit to olliewalsh/openstack-operator that referenced this pull request Apr 9, 2024
As there is no longer an external dependency, TLS-e can now be enabled
by default (with explicit opt-out e.g for adoption).

Depends-On: openstack-k8s-operators/dataplane-operator#754

Related: OSPRH-2038
olliewalsh added a commit to olliewalsh/openstack-operator that referenced this pull request Apr 9, 2024
As there is no longer an external dependency, TLS-e can now be enabled
by default (with explicit opt-out e.g for adoption).

Depends-On: openstack-k8s-operators/dataplane-operator#754

Related: OSPRH-2038
olliewalsh added a commit to olliewalsh/openstack-operator that referenced this pull request Apr 9, 2024
As there is no longer an external dependency, TLS-e can now be enabled
by default (with explicit opt-out e.g for adoption).

Depends-On: openstack-k8s-operators/dataplane-operator#754

Related: OSPRH-2038
stuggi pushed a commit to stuggi/openstack-operator that referenced this pull request Apr 9, 2024
As there is no longer an external dependency, TLS-e can now be enabled
by default (with explicit opt-out e.g for adoption).

Depends-On: openstack-k8s-operators/dataplane-operator#754

Related: OSPRH-2038
gibizer pushed a commit to SeanMooney/nova-operator that referenced this pull request Apr 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants