Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create several services route and svc endpoint overrides #416

Closed
wants to merge 11 commits into from

Conversation

stuggi
Copy link
Contributor

@stuggi stuggi commented Jul 28, 2023

Creates the route for the keystoneapi, glance, placement, cinder, neutron, nova, heat, horizon, manila and swift. Also allows to customize the route via override.

Generates the service override for the env with what is configured in the externalEndpoints, or specified in the service template override.

Depends-On: openstack-k8s-operators/lib-common#313
Depends-On: openstack-k8s-operators/keystone-operator#289
Depends-On: openstack-k8s-operators/glance-operator#285
Depends-On: openstack-k8s-operators/placement-operator#48
Depends-On: openstack-k8s-operators/cinder-operator#248
Depends-On: openstack-k8s-operators/neutron-operator#194
Depends-On: openstack-k8s-operators/nova-operator#489
Depends-On: openstack-k8s-operators/heat-operator#227
Depends-On: openstack-k8s-operators/horizon-operator#202
Depends-On: openstack-k8s-operators/manila-operator#117
Depends-On: openstack-k8s-operators/swift-operator#43

Jira: OSP-26690

@stuggi
Copy link
Contributor Author

stuggi commented Jul 28, 2023

/hold

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 28, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stuggi
Copy link
Contributor Author

stuggi commented Jul 28, 2023

  • in osctlplane
    • the externalEndpoints is now outside the service template
    • it has an override which allows to customize the route
    • the service template has an override which allows to customize the routes for the service
  keystone:
    enabled: true
    externalEndpoints:
    - endpoint: internal
      ipAddressPool: internalapi
      loadBalancerIPs:
      - 172.17.0.80
      sharedIP: true
      sharedIPKey: ""
    override:
      route:
        metadata:
          annotations:
            "my": "annotation"
    template:
...
      override:
        service:
          - endpoint: public
            metadata:
              annotations:
                "my": "annotation"

the openstack-operator uses the service override to pass in the expected overrides for the endpoint services and results in:

apiVersion: keystone.openstack.org/v1beta1
kind: KeystoneAPI 
...
spec:
...
  override:      
    service:      
    - endpoint: public
      endpointURL: http://keystone-public-openstack.apps-crc.testing
      metadata:              
        annotations:
          dnsmasq.network.openstack.org/hostname: keystone-public.openstack.svc
          my: annotation
        labels:                               
          public: "true"         
          service: keystone
      spec:       
        ports:           
        - name: keystone-public               
          port: 5000                   
          protocol: TCP
          targetPort: 0
        selector:           
          service: keystone                   
        type: ClusterIP         
    - endpoint: internal
      endpointURL: http://keystone-internal.openstack.svc:5000
      metadata:     
        annotations:                          
          metallb.universe.tf/address-pool: internalapi
          metallb.universe.tf/allow-shared-ip: internalapi
          metallb.universe.tf/loadBalancerIPs: 172.17.0.80
        labels:         
          internal: "true"                    
          service: keystone              
      spec:      
        ports:    
        - name: keystone-internal
          port: 5000                          
          protocol: TCP         
          targetPort: 0
        selector: 
          service: keystone
        type: LoadBalancer  
  • it is also possible to just use externalEndpoints to use metallb for public and internal:
  keystone:
    externalEndpoints:
    - endpoint: internal
      ipAddressPool: internalapi
      loadBalancerIPs:
      - 172.17.0.80
    - endpoint: public
      ipAddressPool: internalapi
      loadBalancerIPs:
      - 172.17.0.81
    template:
      databaseInstance: openstack
      secret: osp-secret

results in the following override in keystoneapi:

  override:       
    service:     
    - endpoint: public                        
      endpointURL: http://keystone-public.openstack.svc:5000
      metadata:      
        annotations:
          metallb.universe.tf/address-pool: internalapi
          metallb.universe.tf/allow-shared-ip: internalapi
          metallb.universe.tf/loadBalancerIPs: 172.17.0.81
        labels:                    
          public: "true"
          service: keystone
      spec:              
        ports:                                
        - name: keystone-public          
          port: 5000
          protocol: TCP
          targetPort: 0     
        selector:                             
          service: keystone     
        type: LoadBalancer
    - endpoint: internal
      endpointURL: http://keystone-internal.openstack.svc:5000
      metadata:                               
        annotations:                                   
          metallb.universe.tf/address-pool: internalapi
          metallb.universe.tf/allow-shared-ip: internalapi
          metallb.universe.tf/loadBalancerIPs: 172.17.0.80
        labels:                               
          internal: "true"                 
          service: keystone
      spec:        
        ports:                   
        - name: keystone-internal             
          port: 5000            
          protocol: TCP
          targetPort: 0
        selector:         
          service: keystone                   
        type: LoadBalancer
  • or use the service override to use another LoadBalance for the public endpoint
  keystone:
    externalEndpoints:
    - endpoint: internal
      ipAddressPool: internalapi
      loadBalancerIPs:
      - 172.17.0.80
    template:
      databaseInstance: openstack
      secret: osp-secret
      override:
        service:
        - endpoint: public
          metadata:
            annotations:
              my: loadbalancerannotation
          spec:
            type: LoadBalancer

results in the following override in keystoneapi:

  override:
    service:
    - endpoint: public
      endpointURL: http://keystone-public.openstack.svc:5000
      metadata:
        annotations:
          my: loadbalancerannotation
        labels:
          public: "true"
          service: keystone
      spec:
        ports:
        - name: keystone-public
          port: 5000
          protocol: TCP
          targetPort: 0
        selector:
          service: keystone
        type: LoadBalancer
    - endpoint: internal
      endpointURL: http://keystone-internal.openstack.svc:5000
      metadata:
        annotations:
          metallb.universe.tf/address-pool: internalapi
          metallb.universe.tf/allow-shared-ip: internalapi
          metallb.universe.tf/loadBalancerIPs: 172.17.0.80
        labels:
          internal: "true"
          service: keystone
      spec:
        ports:
        - name: keystone-internal
          port: 5000
          protocol: TCP
          targetPort: 0
        selector:
          service: keystone
        type: LoadBalancer

@stuggi stuggi requested review from olliewalsh, gibizer and abays and removed request for viroel July 28, 2023 07:24
@softwarefactory-project-zuul
Copy link

This change depends on a change that failed to merge.

Changes openstack-k8s-operators/lib-common#313, openstack-k8s-operators/keystone-operator#289 are needed.

apis/core/v1beta1/openstackcontrolplane_types.go Outdated Show resolved Hide resolved
svc, err := sd.CreateEndpointServiceOverride()
if err != nil {
instance.Status.Conditions.Set(condition.FalseCondition(
condition.ExposeServiceReadyCondition,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add this condition to the initConditions as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two structural concerns.

  1. we have a data round trip between openstack-operator and keystone-operator. The keystone-operator/api defines the ports and the service labels used for openstack keystone api service pods. Then openstack-operator reads those out, massage it into a set of overrides, and sends it back to keystone-operator. This feels like an unnecessary coupling between the two operators.
  2. I assumed that we are moving the route management to openstack-operator as it can be done in a generic way and therefore we can move generic logic from each service operator to single place in openstack-operator. However I see that in the current PR it is not the case. This PR moves keystone specific logic from keystone-operator to openstack-operator and stores it in a keystone specific package of the openstack-operator. The keystone specific logic as far as I see is:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two structural concerns.

  1. we have a data round trip between openstack-operator and keystone-operator. The keystone-operator/api defines the ports and the service labels used for openstack keystone api service pods. Then openstack-operator reads those out, massage it into a set of overrides, and sends it back to keystone-operator. This feels like an unnecessary coupling between the two operators.

yeah, thanks for that pointer. Now thinking about this, I think we could get rid of this. It was only required when passing both endpoint urls into the service operator. we can basically only pass the public endpoint name into the service operator if it is a route. if it is a service (clusterip, or loadbalancer) let the service operator create the endpoint information.

re the service label, I can improve that, so it is not required to get it from the api module. when the openstack-operator creates a service label and passes it via the override to the service operator, the operator just have to use this instead of the local default one. initially I had used the gkv.Kind as lower string, but then realized its different then the service name const with using the passed in information we can still do that.

  1. I assumed that we are moving the route management to openstack-operator as it can be done in a generic way and therefore we can move generic logic from each service operator to single place in openstack-operator. However I see that in the current PR it is not the case. This PR moves keystone specific logic from keystone-operator to openstack-operator and stores it in a keystone specific package of the openstack-operator. The keystone specific logic as far as I see is:

I think the service labels can be done better, see above.

This is not keystone specific. this is what we have right now in the endpoint.ExposeEndpoints() func. I integrate right now another service ( glance) to see what can be done in a more generic way

as mentioned above, we can do it in a more generic way , but there will remain some logic at least for some of the services. e.g. for nova we have to create route per api and per cell for vnc proxy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two structural concerns.

  1. we have a data round trip between openstack-operator and keystone-operator. The keystone-operator/api defines the ports and the service labels used for openstack keystone api service pods. Then openstack-operator reads those out, massage it into a set of overrides, and sends it back to keystone-operator. This feels like an unnecessary coupling between the two operators.

yeah, thanks for that pointer. Now thinking about this, I think we could get rid of this. It was only required when passing both endpoint urls into the service operator. we can basically only pass the public endpoint name into the service operator if it is a route. if it is a service (clusterip, or loadbalancer) let the service operator create the endpoint information.

Sounds good.

re the service label, I can improve that, so it is not required to get it from the api module. when the openstack-operator creates a service label and passes it via the override to the service operator, the operator just have to use this instead of the local default one. initially I had used the gkv.Kind as lower string, but then realized its different then the service name const with using the passed in information we can still do that.

I think in general "tell don't ask" is a good principle. So if you can tell the service operator which label to put on the pods it creates so that the service in the pods become routeable then we can have a simpler architecture.

  1. I assumed that we are moving the route management to openstack-operator as it can be done in a generic way and therefore we can move generic logic from each service operator to single place in openstack-operator. However I see that in the current PR it is not the case. This PR moves keystone specific logic from keystone-operator to openstack-operator and stores it in a keystone specific package of the openstack-operator. The keystone specific logic as far as I see is:

I think the service labels can be done better, see above.

This is not keystone specific. this is what we have right now in the endpoint.ExposeEndpoints() func. I integrate right now another service ( glance) to see what can be done in a more generic way

You are right, it is there, I missed that. But then my question is do we need to move that logic (or the comment?) from lib-common to both the sevice operator and to openstack-operator.

as mentioned above, we can do it in a more generic way , but there will remain some logic at least for some of the services. e.g. for nova we have to create route per api and per cell for vnc proxy.

I think that is OK. If we express the need for those routes in the Nova CRD per nova api and nova vnc proxy sub CRs then we can easily see that nova needs those routes so whoever creates a Nova CRD needs to prepare them and pass information about them. This would be very similar to how nova express the need of rabbitmqcluster per cell while other operators only ask for a single rabbitmq.

Thanks @stuggi for the answers. I think this discussion is moving to the right direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, we can do it in a more generic way , but there will remain some logic at least for some of the services. e.g. for nova we have to create route per api and per cell for vnc proxy.

I think that is OK. If we express the need for those routes in the Nova CRD per nova api and nova vnc proxy sub CRs then we can easily see that nova needs those routes so whoever creates a Nova CRD needs to prepare them and pass information about them. This would be very similar to how nova express the need of rabbitmqcluster per cell while other operators only ask for a single rabbitmq.

I need to walk back on that now that I see more. In case of rabbitmq it is fairly simple that you need to pass a cluster for each cell. However for vnc and metadata we have a lot more logic in the nova-operator.

  1. neither vnc nor metadata is needed in cell0.
  2. the vnc service an optional cell service. If it is requested to be deployed in a given cell then it needs a route, if it is not requested to be depolyed then the route is not needed either
  3. the metadata service is also optional but in a different way. It can be deployed to top level only or can be deployed per cell. There can be cells where it is not deployed while deployed in others. But it can never be deployed both on the top level and in some cells.

To be able to handle these cases openstack-operator needs to grow logic that is fairly nova specific. I would like to avoid that if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another complication is mentioned by @kk7ds today during our discussion. The openstack-operator now need to know the number and type of the endpoints the service operator will expose. But that is fairly service specific. For example if cinder develops the v4 major API (or for some reason nova needs to expose the legacy v2.0 API) then that can only be done by exposing new endpoints in keystone with different paths. As far as I see supporting the extra endpoints would require not just a service operator change but also the service specific logic change in openstack-operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats not the case for glance. glanceinternal is for the internal endpoint, routes are only created for public endpoint, so for this example there will not multiple routes. In general if there are multiple endpoints for different type which are served by the same pod via different path, the same route is used. We already have this config. If there are different pods service a service with different endpoints there would be a new service required and then another route. A new endpoint type in keystone get registered with this route. If we stay with the glance (but as said above here we don't create two routes) as an example you have two api templates which each has an override to pass in an endpointURL, so both can register their keystoneEndpoint with their specific route.

Maybe the glance example wasn't perfect. Today we assume that glance internal endpoint does not need a route, but I think that is artificial restriction. I.e. in an edge deployment scenario the internal glance needed for the computes might be connected to the computes over WAN. Anyhow I think this issue is solved. As you said, if we need two differently configured backend Pods then we will anyhow need two API CRs with independent Services and Routes if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. e.g. manila is already registering multiple version endpoints using with one route:

| 270f74661cd047098a031b6c15877e87 | regionOne | manilav2     | sharev2        | True    | internal  | http://manila-internal.openstack.svc:8786/v2                   |
| bbf81f93fb754fe08e2c2486654acc79 | regionOne | manila       | share          | True    | public    | http://manila-openstack.apps-crc.testing/v1/%(project_id)s     |
| bd0670a401d44f998f71af7acdc3acde | regionOne | manilav2     | sharev2        | True    | public    | http://manila-openstack.apps-crc.testing/v2                    |
| c6ae13a34e68463bb62ee0f766852b7b | regionOne | manila       | share          | True    | internal  | http://manila-internal.openstack.svc:8786/v1/%(project_id)s    |

Yes if the multiple endpoints can be served with the same backend Pod (i.e. no configuration changes is needed for the openstack service) then sharing the same route and therefore routing to the same Pod is the way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late in this conversation about Routes, but yeah, even in an Edge scenario a given Glance Edge Pod needs a service as it's not publicly exposed. A service should be enough to ensure the communication with the Pods that live in Central. Note that this refers to an architecture where a list of GlanceAPI Pods are only "deployed in Central", but logically connected to the storage backend that lives at the Edge.
We can register endpoints (internal only) to allow the Edge Compute node to use this GlancePod, and this will result similar to what we do today for a generic deployment with the API split between internal and external.
/cc @fultonj @konan-abhi ^.
Regarding Manila, yes, for tempest purposes we had the need of configuring both v1 and v2, but they are served by the same Pod, so no concerns on my side about this as long as we can rely on the endpointURL override.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late in this conversation about Routes, but yeah, even in an Edge scenario a given Glance Edge Pod needs a service as it's not publicly exposed. A service should be enough to ensure the communication with the Pods that live in Central. Note that this refers to an architecture where a list of GlanceAPI Pods are only "deployed in Central", but logically connected to the storage backend that lives at the Edge. We can register endpoints (internal only) to allow the Edge Compute node to use this GlancePod, and this will result similar to what we do today for a generic deployment with the API split between internal and external. /cc @fultonj @konan-abhi ^. Regarding Manila, yes, for tempest purposes we had the need of configuring both v1 and v2, but they are served by the same Pod, so no concerns on my side about this as long as we can rely on the endpointURL override.

Are we mixing two things here?

  1. what service needs to be exposed on a public network
  2. what service needs to have a routable service (e.g. needs to be accessed via WAN instead of LAN) regardless of the fact that the service access is private or public

allow the Edge Compute node to use this GlancePod,

Does it mean the IP of this internal glance service needs to be exposed to a routed network?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late in this conversation about Routes, but yeah, even in an Edge scenario a given Glance Edge Pod needs a service as it's not publicly exposed. A service should be enough to ensure the communication with the Pods that live in Central. Note that this refers to an architecture where a list of GlanceAPI Pods are only "deployed in Central", but logically connected to the storage backend that lives at the Edge. We can register endpoints (internal only) to allow the Edge Compute node to use this GlancePod, and this will result similar to what we do today for a generic deployment with the API split between internal and external. /cc @fultonj @konan-abhi ^. Regarding Manila, yes, for tempest purposes we had the need of configuring both v1 and v2, but they are served by the same Pod, so no concerns on my side about this as long as we can rely on the endpointURL override.

Are we mixing two things here?

1. what service needs to be exposed on a public network

2. what service needs to have a routable service (e.g. needs to be accessed via WAN instead of LAN) regardless of the fact that the service access is private or public

allow the Edge Compute node to use this GlancePod,

Does it mean the IP of this internal glance service needs to be exposed to a routed network?

I think so, I suspect we're mixing things here (publicly exposed service which needs a Route vs internal service) and talking about Edge is not the right place/example.

pkg/openstack/common.go Outdated Show resolved Hide resolved
@abays
Copy link
Contributor

abays commented Jul 28, 2023

/test openstack-operator-build-deploy-kuttl

@softwarefactory-project-zuul
Copy link

@softwarefactory-project-zuul
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 416,f8932064f7aeaccb1fbce76f1d28468371a10846

@stuggi stuggi changed the title Create keystoneapi route and svc endpoint overrides Create keystoneapi and glanceapi route and svc endpoint overrides Aug 1, 2023
@softwarefactory-project-zuul
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 416,7e0f9e6aba055dff4cdd346b662cfaaaaaabdf84

@softwarefactory-project-zuul
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 416,9be40d4f6a544134fc6a830b9e5af265ffb8f458

@stuggi
Copy link
Contributor Author

stuggi commented Aug 1, 2023

The PR now has two commits one for keystoneapi and one for glanceapi to show an API which has sub CRDs. There are still things to improve, e.g. use a dedicated condition, set it correct and kuttl.

@softwarefactory-project-zuul
Copy link

This change depends on a change that failed to merge.

Changes openstack-k8s-operators/keystone-operator#289, openstack-k8s-operators/glance-operator#285 are needed.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). For information on how to proceed, see
http://docs.openstack.org/infra/manual/developers.html#automated-testing

https://review.rdoproject.org/zuul/buildset/bb03046769f9461280b1ab2cf2b3b8b8

openstack-operator-content-provider FAILURE in 5m 51s
⚠️ openstack-operator-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-operator-content-provider
⚠️ openstack-operator-crc-podified-edpm-deployment SKIPPED Skipped due to failed job openstack-operator-content-provider

stuggi added 8 commits August 17, 2023 16:35
Creates the route for the cinderapi, also allows to customize the
route via override.

Generats the service override for the env with what is configured in
the externalEndpoints, or specified in the service template override.

Depends-On: openstack-k8s-operators/lib-common#313
Depends-On: openstack-k8s-operators/keystone-operator#289
Depends-On: openstack-k8s-operators/cinder-operator#248

Jira: OSP-26690
Creates the route for the neutronapi, also allows to customize the
route via override.

Generats the service override for the env with what is configured in
the externalEndpoints, or specified in the service template override.

Depends-On: openstack-k8s-operators/lib-common#313
Depends-On: openstack-k8s-operators/keystone-operator#289
Depends-On: openstack-k8s-operators/neutron-operator#194

Jira: OSP-26690
Creates the route for the nova components, also allows to customize the
route via override.

Generats the service override for the env with what is configured in
the externalEndpoints, or specified in the service template override.

Depends-On: openstack-k8s-operators/lib-common#313
Depends-On: openstack-k8s-operators/keystone-operator#289
Depends-On: openstack-k8s-operators/nova-operator#489

Jira: OSP-26690
Creates the route for the heatapi and heatcfnapi, also allows to
customize the route via override.

Generats the service override for the env with what is configured in
the externalEndpoints, or specified in the service template override.

Depends-On: openstack-k8s-operators/lib-common#313
Depends-On: openstack-k8s-operators/keystone-operator#289
Depends-On: openstack-k8s-operators/heat-operator#227

Jira: OSP-26690
Creates the route for the horizon, also allows to customize the
route via override.

Generats the service override for the env with what is configured in
the externalEndpoints, or specified in the service template override.

Depends-On: openstack-k8s-operators/lib-common#313
Depends-On: openstack-k8s-operators/keystone-operator#289
Depends-On: openstack-k8s-operators/horizon-operator#202

Jira: OSP-26690
Creates the route for the manila, also allows to customize the
route via override.

Generats the service override for the env with what is configured in
the externalEndpoints, or specified in the service template override.

Depends-On: openstack-k8s-operators/lib-common#313
Depends-On: openstack-k8s-operators/keystone-operator#289
Depends-On: openstack-k8s-operators/manila-operator#117

Jira: OSP-26690
Creates the route for the swift, also allows to customize the
route via override.

Generats the service override for the env with what is configured in
the externalEndpoints, or specified in the service template override.

Depends-On: openstack-k8s-operators/lib-common#313
Depends-On: openstack-k8s-operators/keystone-operator#289
Depends-On: openstack-k8s-operators/swift-operator#43

Jira: OSP-26690
cellOverride := instance.Spec.Nova.CellOverride[cellName]

// metadata
metadataServiceName := novav1.GetMetadataServiceName(&cellName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibizer how about this, really simple approach openstack-k8s-operators/nova-operator@1b769dd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that by taking the cell name as a parameter we force the client (openstack-operator) to know that the name of the service is based on the cellname, and what is that cellname.

Also if any new field of NovaMetadataSpec will be used included in the generated Service name then that cannot be just implemented inside the GetMetadataServiceName function, it needs to be provided as a new function parameter.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2023

@stuggi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/openstack-operator-build-deploy-kuttl 79e3416 link true /test openstack-operator-build-deploy-kuttl

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

}

sd.hostname = ptr.To(route.GetHostname())
sd.endpointURL = "http://" + *sd.hostname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the resulting CRD model and the code correctly then KeystoneAPISpec.override["public"].endpointURL will be unconditionally populated with the information generated here.

I think this means that OpenStackControlPlane.spec.keystone.template.override[].endpointURL is a field in the CRD that should never be set by the human creating the OpenStackControlPlane CR. I think the best would be not to have such a spec field that is not intended to be ever set.

In the Nova CRD we intentionally not include the service CRD Spec structs directly to NovaSpec but have a parallel template struct. This way we can keep the fields with generated value out of the struct used by the user (the template) but keep it in the struct used by the service reconciler (spec).

If we want to apply that pattern here to hide the unsettable endpointURL from the OpenStackControlPlane then the OpenStackControlPlaneSpec should stop including the KeystoneAPISpec. Instead a KeystoneAPITemplate struct needs to be created that has all the KeystoneAPISpec fields except endpointURL. The value of that field is generated by the openstack-operator so it can provided that extra value when it creates the KeystoneAPISpec from the KeystoneAPITemplate.

(of course KeystoneAPISpec is just an example here, this comment is valid for all the endpointURL fields being introduced now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the resulting CRD model and the code correctly then KeystoneAPISpec.override["public"].endpointURL will be unconditionally populated with the information generated here.

I think this means that OpenStackControlPlane.spec.keystone.template.override[].endpointURL is a field in the CRD that should never be set by the human creating the OpenStackControlPlane CR. I think the best would be not to have such a spec field that is not intended to be ever set.

In the Nova CRD we intentionally not include the service CRD Spec structs directly to NovaSpec but have a parallel template struct. This way we can keep the fields with generated value out of the struct used by the user (the template) but keep it in the struct used by the service reconciler (spec).

If we want to apply that pattern here to hide the unsettable endpointURL from the OpenStackControlPlane then the OpenStackControlPlaneSpec should stop including the KeystoneAPISpec. Instead a KeystoneAPITemplate struct needs to be created that has all the KeystoneAPISpec fields except endpointURL. The value of that field is generated by the openstack-operator so it can provided that extra value when it creates the KeystoneAPISpec from the KeystoneAPITemplate.

(of course KeystoneAPISpec is just an example here, this comment is valid for all the endpointURL fields being introduced now)

The idea is that a human/user can set it if the intend is to use it with an ingress other then an OCP route (not implemented in this patch). when endpointURL gets set by a user no route gets created, and just the endpointURL passed into the service operators, to be used for the service registration in keystone. Then the user has to create the ingress manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to support both use cases (route creation by the openstack operator, and endpoint url passthrough)? Does it mean the user can first configure OpenStackControlPlane with automatic route creation then later specify the endpoint url manually and therefore instruct the openstack-operator to delete the route, and instruct the service operator to update the endpoint url in keystone according the new new user specified value? The reverse direction too to switch from manually created route to openstack-operator created route?

Comment on lines +121 to +122
// novncproxy
for _, endpointType := range []service.Endpoint{service.EndpointPublic, service.EndpointInternal} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The novncproxy is not created as a keystone endpoint so there is no concept of public and private novncproxy in nova.

(This again shows that now you need to teach openstack-operator nova specific things)


apiServiceOverrides[string(endpointType)] = *svcOverride
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed the case when NovaMetadata is deployed not per cell but with one single service on the top level. While NovaMetadata does not need to be route to the k8s public network, I think the service overrides still need to be passed through. (If I'm mistaken then I don't get while you have explicit NovaMetadat handling below for the case where NovaMetadata deployed per cell)

@@ -67,6 +173,17 @@ func ReconcileNova(ctx context.Context, instance *corev1beta1.OpenStackControlPl
// we need to support either rabbitmq vhosts or deploy a separate
// RabbitMQCluster per nova cell.
instance.Spec.Nova.Template.DeepCopyInto(&nova.Spec)
nova.Spec.APIServiceTemplate.Override.Service = apiServiceOverrides

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the case when metadata deployed on the top level instead of per cell.

metallb.universe.tf/allow-shared-ip: internalapi
metallb.universe.tf/loadBalancerIPs: 172.17.0.80
spec:
type: LoadBalancer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we expect the human user to set or is it more like something that the service operator implementation should decide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the human/user, this moves into more native k8s interface to configure the MetalLB endpoint. also allows to use different LoadBalancer than MetalLB

@gibizer
Copy link
Contributor

gibizer commented Aug 19, 2023

I think we need to talk about the goals we want to achieve with this change. So far I heard about the following goals:

  1. make it possible to implement TLS in a single place

    I think this is not possible after this change. Based on Update keystoneapi to use service override keystone-operator#289 (comment) For the TLS implementation we still need to adapt each service operator to use TLS (endpoint scheme change and cert secret usage).

  2. allow seamlessly changing the openshift Route to some other way of exposing our services (Gateway API?)

    This change makes the switch possible by changing openstack-operator. However before this change we already had every Route creation hidden within lib-common behind ExposeEndpoint() so switching to a new route resource would be a lib-common change.

    The openstack-operator based route creation forces us to teach service operator specific knowledge (the number of services to expose with a route, the name of the k8s services etc) to openstack-operator, while the lib-common based solution would allow keeping the service operator specific knowledge to stay only in the service operators. (This is mostly due to the "call path". If openstack-operator creates the route then openstack-operator needs to know the information to create them, if lib-common creates the route then because lib-common is called by the service operator it is enough if the service operator knows this information)

  3. decouple the service deployment step from the external exposure of that service

    I think this goal refers to deployment pattern supported by the oc expose command. However I don't think we are actually supporting this pattern with the proposed change. For me the pattern is:
    i) create a deployment with a service
    ii) check that it works OK
    iii) use oc expose to publish the service to the outside world
    However the proposed change pre-creates the routes even before any service is deployed.
    Also note that openstack itself might not support this pattern today as openstack needs to know the public URL of the service (e.g. for the keystone endpoint or for the compute vnc configuration) even before the openstack service can be fully deployed.

Please let me know if I missed other goals.

@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bogdando added a commit to bogdando/openstack-operator that referenced this pull request Sep 1, 2023
No longer relevant, see
openstack-k8s-operators#416 (comment)

Isol-net CR to become collapsed cells as well
  Not clear for cell-local vs global services replicas

Turn off unrelated services

TODO: this to end up in https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/tests/roles/backend_services/tasks/main.yaml#L41

Signed-off-by: Bohdan Dobrelia <[email protected]>
@bogdando
Copy link

bogdando commented Sep 1, 2023

I think we need to talk about the goals we want to achieve with this change. So far I heard about the following goals:

  1. make it possible to implement TLS in a single place
    I think this is not possible after this change. Based on Update keystoneapi to use service override keystone-operator#289 (comment) For the TLS implementation we still need to adapt each service operator to use TLS (endpoint scheme change and cert secret usage).
  2. allow seamlessly changing the openshift Route to some other way of exposing our services (Gateway API?)
    This change makes the switch possible by changing openstack-operator. However before this change we already had every Route creation hidden within lib-common behind ExposeEndpoint() so switching to a new route resource would be a lib-common change.
    The openstack-operator based route creation forces us to teach service operator specific knowledge (the number of services to expose with a route, the name of the k8s services etc) to openstack-operator, while the lib-common based solution would allow keeping the service operator specific knowledge to stay only in the service operators. (This is mostly due to the "call path". If openstack-operator creates the route then openstack-operator needs to know the information to create them, if lib-common creates the route then because lib-common is called by the service operator it is enough if the service operator knows this information)
  3. decouple the service deployment step from the external exposure of that service
    I think this goal refers to deployment pattern supported by the oc expose command. However I don't think we are actually supporting this pattern with the proposed change. For me the pattern is:
    i) create a deployment with a service
    ii) check that it works OK
    iii) use oc expose to publish the service to the outside world
    However the proposed change pre-creates the routes even before any service is deployed.
    Also note that openstack itself might not support this pattern today as openstack needs to know the public URL of the service (e.g. for the keystone endpoint or for the compute vnc configuration) even before the openstack service can be fully deployed.

Please let me know if I missed other goals.

this indeed looks like an unwanted coupling, and teaching the CP operator a lot of services-specific things, great summary.

@stuggi
Copy link
Contributor Author

stuggi commented Sep 4, 2023

I think we need to talk about the goals we want to achieve with this change. So far I heard about the following goals:

  1. make it possible to implement TLS in a single place
    I think this is not possible after this change. Based on Update keystoneapi to use service override keystone-operator#289 (comment) For the TLS implementation we still need to adapt each service operator to use TLS (endpoint scheme change and cert secret usage).

With this change it is possible to:

  • handle TLS for public endpoints in a single place as all routes get created in a single place
  • manage TLS certs in a single place for the TLS-E situation when the route should use re-encryption to the backend deployment and only pass in a cert secret to the service (analog to rabbitmq)

In the situation for TLS-E the service operator required change is:

  • use https if cert secret is provided
  • use the cert information from the secret in the deployment for e.g. httpd or service direct
  • consider the cert secret in the hash calc for configuration data to restart if the certs get renewed by cert-manager
  1. allow seamlessly changing the openshift Route to some other way of exposing our services (Gateway API?)
    This change makes the switch possible by changing openstack-operator. However before this change we already had every Route creation hidden within lib-common behind ExposeEndpoint() so switching to a new route resource would be a lib-common change.

or use a different ingress than route by passing in the endpointURL. No route would then be created and the user has to create the ingress manually.

The openstack-operator based route creation forces us to teach service operator specific knowledge (the number of services to expose with a route, the name of the k8s services etc) to openstack-operator, while the lib-common based solution would allow keeping the service operator specific knowledge to stay only in the service operators. (This is mostly due to the "call path". If openstack-operator creates the route then openstack-operator needs to know the information to create them, if lib-common creates the route then because lib-common is called by the service operator it is enough if the service operator knows this information)

  1. decouple the service deployment step from the external exposure of that service
    I think this goal refers to deployment pattern supported by the oc expose command. However I don't think we are actually supporting this pattern with the proposed change. For me the pattern is:
    i) create a deployment with a service
    ii) check that it works OK
    iii) use oc expose to publish the service to the outside world
    However the proposed change pre-creates the routes even before any service is deployed.
    Also note that openstack itself might not support this pattern today as openstack needs to know the public URL of the service (e.g. for the keystone endpoint or for the compute vnc configuration) even before the openstack service can be fully deployed.

The current flow and implementation in this patch is only to have a simple process/reconciliation. An alternative workflow idea I have is:

  • do not pre-create route (like at the moment) in the openstack-operator
  • create OSP service CR without setting endointURL
  • let the service operators create their deployment + services
  • service gets initially registered with service url instead of route url
  • service-operators to set an "create ingress" annotation on the created service for which a route/ingress should be created and the service type is not LoadBalancer
  • for default situation with OCP route
    • on reconcile of the openstack-operator it checks for the services having the "create ingress" annotation on the service
    • openstack-operator creates route and set endpointURL
    • service operator updates its keystone-endpoint using the endpointURL information
  • if a different ingress should be used (might need a switch to disable create of route) user to set the endpointURL of the service and create the ingress manually

Also in this approach the openstack-operator would create certs after the initial create for the services and set the cert secret name. The service then switches to tls.

Please let me know if I missed other goals.
Additional goals:

  • reduce service operator dependencies (go.mod)
    • don't have a dependency of the service operators to OCP specific resource type routes. with this also can be used independent upstream or with native k8s
    • don't have a dependency of the service operators to cert-manager and provide just a named secret to be used
  • allow customization of services via overrides. This allows also the usage of 3rd party LoadBalancer instead of MetalLB
  • move into being able to use the service operators on native k8s
  • allow using a different ingress than OCP route

@gibizer
Copy link
Contributor

gibizer commented Sep 4, 2023

Thanks @stuggi for the discussion both here an over a call. I think we are converging on a solution based on your latest proposal in #416 (comment) That solution can be massaged to drop all the service specific knowledge from the openstack-operator and only pass down service labels first to trigger k8s Service creation and then do a route creation in the openstack-operator based on annotated services and then pass down the resulting endpointURL to the service operators.

@gibizer
Copy link
Contributor

gibizer commented Sep 4, 2023

@stuggi I discovered that we need to include the novnc public URL into the nova-compute config[1]. So from nova-operator perspective when the endpointURL will be set on NovaNoVNCProxy the EDPM compute config Secret will be updated. Then we expect the dataplane-operator to trigger the rerun of the ansible that copies that updated config to the EDPM nodes. The last part of this is not yet implemented, the dataplane-operator does not watch the config. But we need that anyhow for rabbit password rotation for example.

[1] https://docs.openstack.org/nova/latest/admin/remote-console-access.html#configuration

@stuggi
Copy link
Contributor Author

stuggi commented Sep 12, 2023

closing in favor of #457

@stuggi stuggi closed this Sep 12, 2023
bogdando added a commit to bogdando/openstack-operator that referenced this pull request Sep 13, 2023
No longer relevant, see
openstack-k8s-operators#416 (comment)

Isol-net CR to become collapsed cells as well
  Not clear for cell-local vs global services replicas

Turn off unrelated services

TODO: this to end up in https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/tests/roles/backend_services/tasks/main.yaml#L41

Signed-off-by: Bohdan Dobrelia <[email protected]>
bogdando added a commit to bogdando/openstack-operator that referenced this pull request Sep 14, 2023
?No longer relevant, see
openstack-k8s-operators#416 (comment)

Isol-net CRs to become collapsed cells as well
  Not clear for cell-local vs global services replicas

Turn off unrelated services (for my Nova dev specifc needs)

TODO: this to end up in https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/tests/roles/backend_services/tasks/main.yaml#L41

Signed-off-by: Bohdan Dobrelia <[email protected]>
bogdando added a commit to bogdando/openstack-operator that referenced this pull request Sep 14, 2023
?No longer relevant, see
openstack-k8s-operators#416 (comment)

Isol-net CRs to become collapsed cells as well
  Not clear for cell-local vs global services replicas

Turn off unrelated services (for my Nova dev specifc needs)

TODO: this to end up in https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/tests/roles/backend_services/tasks/main.yaml#L41

Signed-off-by: Bohdan Dobrelia <[email protected]>
bogdando added a commit to bogdando/openstack-operator that referenced this pull request Sep 14, 2023
?No longer relevant, see
openstack-k8s-operators#416 (comment)

Isol-net CRs to become collapsed cells as well
  Not clear for cell-local vs global services replicas

Turn off unrelated services (for my Nova dev specifc needs)

TODO: this to end up in https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/tests/roles/backend_services/tasks/main.yaml#L41

Signed-off-by: Bohdan Dobrelia <[email protected]>
bogdando added a commit to bogdando/openstack-operator that referenced this pull request Sep 14, 2023
?No longer relevant, see
openstack-k8s-operators#416 (comment)

Isol-net CRs to become collapsed cells as well
  Not clear for cell-local vs global services replicas

Turn off unrelated services (for my Nova dev specifc needs)

TODO: this to end up in https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/tests/roles/backend_services/tasks/main.yaml#L41

Signed-off-by: Bohdan Dobrelia <[email protected]>
bogdando added a commit to bogdando/openstack-operator that referenced this pull request Sep 14, 2023
?No longer relevant, see
openstack-k8s-operators#416 (comment)

Isol-net CRs to become collapsed cells as well
  Not clear for cell-local vs global services replicas

Turn off unrelated services (for my Nova dev specifc needs)

TODO: this to end up in https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/tests/roles/backend_services/tasks/main.yaml#L41

Signed-off-by: Bohdan Dobrelia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants