Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Barbican #562

Merged

Conversation

vakwetu
Copy link
Contributor

@vakwetu vakwetu commented Nov 17, 2023

This commit adds the barbican operator to the openstack-operator.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/96f5fb72a9f441278f1b294d30bce329

openstack-k8s-operators-content-provider FAILURE in 5m 56s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@vakwetu
Copy link
Contributor Author

vakwetu commented Nov 17, 2023

/retest

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/737d77657c894f67b344e5f3aa9a5ab5

openstack-k8s-operators-content-provider NODE_FAILURE Node request 200-0006650721 failed in 0s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@vakwetu
Copy link
Contributor Author

vakwetu commented Nov 17, 2023

/retest

@vakwetu
Copy link
Contributor Author

vakwetu commented Nov 17, 2023

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/61b5da1db41b4818a27c4ad7a7058186

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 24m 50s
podified-multinode-edpm-deployment-crc FAILURE in 1h 05m 14s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 05m 15s
openstack-operator-tempest-multinode FAILURE in 1h 09m 47s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question here and for all the other samples: do we intend to have Barbican enabled by default? I'm just asking because I don't know one way or the other. Right now it's enabled.

@abays
Copy link
Contributor

abays commented Nov 20, 2023

One thing I've noticed while trying this out is that the rabbitMqClusterName field is empty in the Barbican section of the OpenStackControlPlane CR. This results in the control plane never finishing deployment, because the TransportURL created by the Barbican controller then uses that empty string as the RabbitMQ cluster name (which the Infra operator considers an error and thus it goes no further).

I see that other operators have the rabbitMqClusterName in their base spec struct as opposed to a template struct that is in-lined [1]. Perhaps using that same approach would fix the problem here. I think the webhooks and the kubebuilder defaulting are interfering here and are setting the rabbitMqClusterName to the empty string, which is somehow tricking the validation into allowing such a zero-value as if the user had explicitly provided it. We've seen similar problems in other operators.

I think this is the cause of the CI failure too:

TASK [edpm_prepare : Wait for OpenStack controlplane to be deployed _raw_params=oc wait OpenStackControlPlane {{ cr_name }} --namespace={{ cifmw_install_yamls_defaults['NAMESPACE'] }} --for=condition=ready --timeout={{ cifmw_edpm_prepare_timeout }}m] ***
Friday 17 November 2023  16:49:55 -0500 (0:00:00.724)       0:10:14.725 ******* 
fatal: [localhost]: FAILED! => ...

[1] https://github.com/openstack-k8s-operators/cinder-operator/blob/3c5c40e6cc3aba6e287b94d26770519f76ae9528/api/v1beta1/cinder_types.go#L55-L59

@abays
Copy link
Contributor

abays commented Nov 20, 2023

One thing I've noticed while trying this out is that the rabbitMqClusterName field is empty in the Barbican section of the OpenStackControlPlane CR. This results in the control plane never finishing deployment, because the TransportURL created by the Barbican controller then uses that empty string as the RabbitMQ cluster name (which the Infra operator considers an error and thus it goes no further).

I see that other operators have the rabbitMqClusterName in their base spec struct as opposed to a template struct that is in-lined [1]. Perhaps using that same approach would fix the problem here. I think the webhooks and the kubebuilder defaulting are interfering here and are setting the rabbitMqClusterName to the empty string, which is somehow tricking the validation into allowing such a zero-value as if the user had explicitly provided it. We've seen similar problems in other operators.

I think this is the cause of the CI failure too:

TASK [edpm_prepare : Wait for OpenStack controlplane to be deployed _raw_params=oc wait OpenStackControlPlane {{ cr_name }} --namespace={{ cifmw_install_yamls_defaults['NAMESPACE'] }} --for=condition=ready --timeout={{ cifmw_edpm_prepare_timeout }}m] ***
Friday 17 November 2023  16:49:55 -0500 (0:00:00.724)       0:10:14.725 ******* 
fatal: [localhost]: FAILED! => ...

[1] https://github.com/openstack-k8s-operators/cinder-operator/blob/3c5c40e6cc3aba6e287b94d26770519f76ae9528/api/v1beta1/cinder_types.go#L55-L59

I did some more testing. It seems like the struct definition as-is is actually fine. Something else must be causing CI to fail.

@vakwetu
Copy link
Contributor Author

vakwetu commented Nov 20, 2023

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/463c81acca384963926bf21b7d4e3de2

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 05m 31s
podified-multinode-edpm-deployment-crc FAILURE in 1h 06m 44s
cifmw-crc-podified-edpm-baremetal FAILURE in 24m 25s
openstack-operator-tempest-multinode RETRY_LIMIT in 1h 52m 07s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/9f9194d5630b407886782401fd61288f

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 24m 53s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 03m 00s
cifmw-crc-podified-edpm-baremetal FAILURE in 23m 06s
openstack-operator-tempest-multinode FAILURE in 1h 10m 26s

@@ -159,6 +159,17 @@ spec:
replicas: 0 # backend needs to be configured
designateBackendbind9:
replicas: 0 # backend needs to be configured
barbican:
enabled: false
Copy link
Contributor

@abays abays Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current samples have Barbican enabled by default, since they don't have enabled: false in the YAML. So we need to reflect that here. Also, since Barbican is enabled by default, its condition entry needs to be added to the status.conditions list after the first entry in the list (the one with type: Ready):

  - message: OpenStackControlPlane Barbican completed
    reason: Ready
    status: "True"
    type: OpenStackControlPlaneBarbicanReady

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also needs an entry for the Barbican "expose" condition:

    - message: OpenStackControlPlane barbican service exposed
      reason: Ready
      status: "True"
      type: OpenStackControlPlaneExposeBarbicanReady

That should go after the OpenStackControlPlaneClientReady condition.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, per JP Jung, the current plan is to always enable Barbican. This also means other projects that integrate with Barbican will be set up to use it by default. e.g. encrypted volumes will store keys there, etc.

@@ -155,6 +155,10 @@ func (r *OpenStackControlPlane) checkDepsEnabled(name string) string {
if !((r.Spec.Mariadb.Enabled || r.Spec.Galera.Enabled) && r.Spec.Memcached.Enabled && r.Spec.Keystone.Enabled) {
reqs = "MariaDB or Galera, Memcached, Keystone"
}
case "Barbican":
if !((r.Spec.Mariadb.Enabled || r.Spec.Galera.Enabled) && r.Spec.Memcached.Enabled && r.Spec.Keystone.Enabled) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, Barbican does not depend or use Memcached in any way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@vakwetu vakwetu force-pushed the add_barbican_2 branch 5 times, most recently from b02981d to 83ceac6 Compare November 21, 2023 23:05
@vakwetu
Copy link
Contributor Author

vakwetu commented Nov 21, 2023

/retest

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f2cd794f02964a72aec1d93f7c3a4913

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 48m 05s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 08m 31s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 05m 06s
openstack-operator-tempest-multinode FAILURE in 1h 31m 54s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/70dd054cee1446b1862c2aac14a77820

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 36m 09s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 03m 02s
cifmw-crc-podified-edpm-baremetal FAILURE in 12m 43s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 21m 21s

@vakwetu
Copy link
Contributor Author

vakwetu commented Nov 27, 2023

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/eb80594def0445c5a9c0c03889d478a8

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 40m 50s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 07m 04s
cifmw-crc-podified-edpm-baremetal FAILURE in 14m 25s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 25m 23s

@abays
Copy link
Contributor

abays commented Nov 28, 2023

recheck

error: Failed to attach interface\nerror: Cannot get interface MTU on 'default': No such device

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/ac14f2e0cac9465498c8aeb7338339d3

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 04m 55s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 03m 37s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 58m 09s
openstack-operator-tempest-multinode FAILURE in 1h 33m 15s

@abays
Copy link
Contributor

abays commented Nov 28, 2023

Between the last two Zuul attempts, we have full success for the 4 jobs there. Let's try one more recheck and then we can override if it flakes-out again.

@vakwetu
Copy link
Contributor Author

vakwetu commented Nov 28, 2023

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/a277a3a34c62429fbbe1f69e60f8714c

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 41m 15s
podified-multinode-edpm-deployment-crc RETRY_LIMIT in 52m 53s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 03m 20s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 19m 45s

Copy link
Contributor

@abays abays left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Nov 28, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abays, vakwetu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abays
Copy link
Contributor

abays commented Nov 28, 2023

The Zuul jobs have all passed independently over several attempts, so I think we're good there despite the hiccups. I will override by the end of the day unless another reviewer requests changes to the PR.

gibizer added a commit to gibizer/nova-operator that referenced this pull request Nov 28, 2023
This adds the barbican section to the nova.conf template so nova can
access the barbican.

Depends-On: openstack-k8s-operators/openstack-operator#562
@abays
Copy link
Contributor

abays commented Nov 29, 2023

/override rdoproject.org/github-check

@gibizer
Copy link
Contributor

gibizer commented Nov 29, 2023

I try to deploy this from a local build and I see that for me barbican is disabled by default even though the code suggests it should be enabled by default.

Copy link
Contributor

openshift-ci bot commented Nov 29, 2023

@abays: Overrode contexts on behalf of abays: rdoproject.org/github-check

In response to this:

/override rdoproject.org/github-check

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@abays
Copy link
Contributor

abays commented Nov 29, 2023

/hold

@gibizer
Copy link
Contributor

gibizer commented Nov 29, 2023

I try to deploy this from a local build and I see that for me barbican is disabled by default even though the code suggests it should be enabled by default.

I used the wrong sample and updating it from the sample from this PR did not work. But recreating the OpenStackControlPlane from the sample from this PR works. Sorry for the noise

@abays
Copy link
Contributor

abays commented Nov 29, 2023

/unhold

@openshift-merge-bot openshift-merge-bot bot merged commit b296006 into openstack-k8s-operators:main Nov 29, 2023
1 check passed
gibizer added a commit to gibizer/nova-operator that referenced this pull request Nov 30, 2023
This adds the barbican section to the nova.conf template so nova can
access the barbican.

Depends-On: openstack-k8s-operators/openstack-operator#562
gibizer added a commit to gibizer/nova-operator that referenced this pull request Dec 5, 2023
This adds the barbican section to the nova.conf template so nova can
access the barbican.

Depends-On: openstack-k8s-operators/openstack-operator#562
openshift-merge-bot bot pushed a commit to openstack-k8s-operators/nova-operator that referenced this pull request Dec 6, 2023
This adds the barbican section to the nova.conf template so nova can
access the barbican.

Depends-On: openstack-k8s-operators/openstack-operator#562
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants