Use new interface to ironic containers. #212

imain · 2019-06-10T18:23:22Z

We now set a static IP with a 'lifetime' on the provisioning interface.
We also now determine this IP from the interface itself in the containers
using the new interface.

zaneb

Does this mean we can remove the proxies from operator.yaml now?

deploy/operator_ironic.yaml

dhellmann

I think I'd like to see this documented with a design doc over in the metal3-docs repo. I'm sure it's fine, but there are a bunch of new moving parts and I'd like to have something to point new contributors to so they can understand what is going on.

deploy/operator_ironic.yaml

hardys · 2019-06-11T07:52:56Z

Yeah I'd also like some more docs/context on this - at various times we've discussed ipam and/or external DHCP with next-server for PXE and I'm still not entirely clear how this solves the IPAM problem without some more info, or how it relates to the Ironic dnsmasq pool, so we can be sure the static configuration can never overlap with a subsequent DHCP/PXE deployment on scale-out.

imain · 2019-06-11T20:37:24Z

Does this mean we can remove the proxies from operator.yaml now?

Well, once this works, yes :) There's still a bug that is not letting the worker out of inspection.

imain · 2019-06-11T20:37:55Z

I think I'd like to see this documented with a design doc over in the metal3-docs repo. I'm sure it's fine, but there are a bunch of new moving parts and I'd like to have something to point new contributors to so they can understand what is going on.

OK I'll work on that next.

hardys · 2019-06-13T07:54:43Z

There has been some discussion about using an external DHCP server for the provisioning network, so while it's good to see this work progress, I wonder if we might have a quicker path to success if we just relied on the current ironic dnsmasq on the host, and configured it to point next-server to the ironic PXE deployed via the operator?

I'm also wondering about the ens3 here - on the bootstrap RHCOS node I needed to configure ens4 on the provisioning subnet, then get ironic to bind to that interface, is that not the case on the deployed nodes? (I've not yet tested to find out, sorry)

hardys · 2019-06-13T07:57:03Z

Also what configures the second nic via network manager in this case, can anyone point me to where we do that please? Thanks!

hardys · 2019-06-13T08:36:18Z

I spotted an issue with the ironic containers which may explain why we can pass the public IP and still have the ironic service accessible via the provisioning network, see metal3-io/ironic-image#56

hardys · 2019-06-13T12:53:57Z

Also what configures the second nic via network manager in this case, can anyone point me to where we do that please? Thanks!

nvm, I see now that's what static-ip-manager does :)

I wonder if it might be cleaner to inject some ignition config which sets up the IP via NetworkManager, or as previously mentioned this all gets simpler if (for a first pass at least) we can rely on an external DHCP server to provide the IPs.

When configuring the IP on the bootstrap VM I used NetworkManager as the script based approach appears deprecated in el8, and so also I guess the ootpa images:

https://github.com/openshift-metal3/kni-installer/blob/523be53167833cc2f95ee0589b0181c6d8c69764/data/data/bootstrap/baremetal/files/usr/local/bin/startironic.sh#L20..L24

hardys · 2019-06-13T13:06:09Z

One other question - when we add the *downloader containers, we'll need to know the IP where the images can be downloaded - how can we make that persistent, and how will the actuator know what address to use, e.g what if the pod gets rescheduled onto a different node, with a different provisioning IP?

dhellmann · 2019-06-13T13:21:14Z

One other question - when we add the *downloader containers, we'll need to know the IP where the images can be downloaded - how can we make that persistent, and how will the actuator know what address to use, e.g what if the pod gets rescheduled onto a different node, with a different provisioning IP?

I thought we were going to define a Service for that URL so k8s would deal with it moving for us?

If not that, do we want to set up mDNS for that endpoint, like we've done with the Ironic API?

hardys · 2019-06-13T13:41:27Z

I thought we were going to define a Service for that URL so k8s would deal with it moving for us?

If not that, do we want to set up mDNS for that endpoint, like we've done with the Ironic API?

Yes having k8s manage it would be ideal, but I wasn't sure if that was feasable since we need the images to be accesible via the provisioning network, and that isn't managed by k8s.

Having looked more closely, I think the solution proposed here is to always bring up the same IP for the provisioning nic, regardless of where the pod runs, then we always access e.g 172.22.0.2 for ironic and potentially the images. I can see how that could work, but I'm worried about what happens if e.g there's a node/network outage and k8s moves the pod?

dhellmann · 2019-06-13T14:15:19Z

I thought we were going to define a Service for that URL so k8s would deal with it moving for us?
If not that, do we want to set up mDNS for that endpoint, like we've done with the Ironic API?

Yes having k8s manage it would be ideal, but I wasn't sure if that was feasable since we need the images to be accesible via the provisioning network, and that isn't managed by k8s.

I'm not convinced that's a strong requirement.

We need the separate provisioning network for the DHCP response, because we need that type of traffic to be isolated. I don't think the same requirement automatically applies to the image traffic which, while somewhat large, is also only part of the downloads needed to set up a node and is temporary. I doubt upgrades are going to happen over the private network, as well.

I would be very happy with a Service-based solution for now that lets us move forward to a point where we can see if that implementation bothers users.

hardys · 2019-06-13T15:12:24Z

I'm not convinced that's a strong requirement.

Perhaps I'm misunderstanding, but at the point of PXE provisioning the nodes, we only have access to the provisioning network, so one way or another we need to be able to download the IPA kernel/ramdisk and also the RHCOS image to deploy, not sure how we do that when the nodes aren't connected to the k8s controlplane/external network unless the images are exposed via the provisioning network.

Are you thinking we configure a route on all the nodes to enable reaching a k8s managed DNS entry, where the images are hosted?

dhellmann · 2019-06-13T17:33:38Z

I'm not convinced that's a strong requirement.

Perhaps I'm misunderstanding, but at the point of PXE provisioning the nodes, we only have access to the provisioning network, so one way or another we need to be able to download the IPA kernel/ramdisk and also the RHCOS image to deploy, not sure how we do that when the nodes aren't connected to the k8s controlplane/external network unless the images are exposed via the provisioning network.

Are you thinking we configure a route on all the nodes to enable reaching a k8s managed DNS entry, where the images are hosted?

Based on the introspection data, I see IPs on both NICs for the hosts I have provisioned. So I think the IPA image is running a DHCP client for all of them (and that's what I would expect).

So as far as I can tell, we have access to all of the networks at the point when we would try to download the image during provisioning, without adding any routes.

hardys · 2019-06-14T08:07:18Z

So as far as I can tell, we have access to all of the networks at the point when we would try to download the image during provisioning, without adding any routes.

Sure, but my concern is earlier than that point, the process is roughly:

DHCP on the provisioning nic (either dnsmasq or potentially external) points to the initial pxe config
The initial pxe config chain-loads the per-node (ironic managed) ipxe configuration
The per-node configuration points at an http location where we download the IPA kernel/ramdisk
The IPA ramdisk boots, downloads the OS image and writes it to disk.

Currently all of these run on the provisioning network, so if we want to download the IPA kernel/ramdisk from the public network I think we'll need to modify the pxe configuration to activate both nics, and we won't be able to support external DHCP on the provisioning network unless the pxe server location is predictable.

imain · 2019-06-14T17:10:42Z

There has been some discussion about using an external DHCP server for the provisioning network, so while it's good to see this work progress, I wonder if we might have a quicker path to success if we just relied on the current ironic dnsmasq on the host, and configured it to point next-server to the ironic PXE deployed via the operator?

I'm also wondering about the ens3 here - on the bootstrap RHCOS node I needed to configure ens4 on the provisioning subnet, then get ironic to bind to that interface, is that not the case on the deployed nodes? (I've not yet tested to find out, sorry)

OK there are a few constraints here. I've been told (and seen the results) that dnsmasq has to share a filesystem with ironic/ironic-inspector. That is how the images get switched from the inspector image to whatever other image you want to boot. It needs to write an entry for the mac into the dnsmasq config. Therefore, my understanding is, we need to run dnsmasq on the provisioning host (or at least so it has access to the k8s shared volume).

The provisioning host cannot get it's IP from DHCP because we are offering a DHCP server. I tried it, it either refuses to bind or it wipes the IP off the interface (I forget I tried a lot of things :)).

So, this leaves us where I am which is that we need static IPs on the provisioning network to run a dhcp server.

The idea of doing it via network manager is nice, but the method I'm using now gives us fault tolerance. Using the 'lifetime' option of an IP, if the pod disappears for any reason, the IP will not be refreshed and it will be released within 10 seconds of its death. A new pod can then start on another master without worry of collisions. There are other ways to manage the static IPs on the provisioning network but this one makes the most sense imo.

If you want to do other things with the provisioning network I think you'll have to set up a vlan to make that work.

imain · 2019-06-14T17:15:32Z

So as far as I can tell, we have access to all of the networks at the point when we would try to download the image during provisioning, without adding any routes.

Sure, but my concern is earlier than that point, the process is roughly:

DHCP on the provisioning nic (either dnsmasq or potentially external) points to the initial pxe config

The initial pxe config chain-loads the per-node (ironic managed) ipxe configuration

The per-node configuration points at an http location where we download the IPA kernel/ramdisk

The IPA ramdisk boots, downloads the OS image and writes it to disk.

Currently all of these run on the provisioning network, so if we want to download the IPA kernel/ramdisk from the public network I think we'll need to modify the pxe configuration to activate both nics, and we won't be able to support external DHCP on the provisioning network unless the pxe server location is predictable.

I don't think it's possible to do this atm, although I could be wrong. My understanding is that we need to be able to update the dnsmasq configuration to point to the new ramdisk. I also thought we wanted it in the pod so we can have fault tolerance? Currently you can download from the original httpd server still, so I'm not sure how the routes are being set up there. Probably the non-provisioning interface is being activated via dhcp.

imain · 2019-06-14T17:32:58Z

https://github.com/metal3-io/metal3-docs/blob/master/design/how-ironic-works.md#what-connectivity-is-required

"DHCP (via the ironic host, for IP assignment and PXE instructions)"

It may be possible to do what you are suggesting Steve, but I suspect it would require some more serious meddling with how ironic currently works.

imain · 2019-06-14T18:08:21Z

Documentation: metal3-io/metal3-docs#38

hardys · 2019-06-17T08:53:00Z

I'm pretty sure that ironic updates the mac entries in the pxe server, which isn't part of the dnsmasq configuration - @dtantsur can you confirm that please?

When you look at the dnsmasq config in the container it's static, and we just need to point at an IP where the next stage of the pxe config can be loaded:

https://github.com/metal3-io/ironic-image/blob/master/dnsmasq.conf#L9

That said @juliakreger mentioned some other contraints, so it may be an external DHCP server isn't the most viable solution, I'm just trying to understand our options.

If we stick with the hosted dnsmasq, we seem to have two options - either go with this short-lease-expiry approach, or maintain state so that the DHCP leases/reservations are persisted in the dnsmasq config - I chatted with @dhellmann about that last week and he has some ideas, so it may be worth looking into and/or prototyping.

For now I'm not blocking this approach, but my main concern is that without fencing, we could conceivably have a network outage on the public/control-plane network which causes the baremetal-operator pod to get rescheduled, are we sure in that situation this approach will be robust?

One nice side-effect of this reuse of the .2 address though is it uses the same IP as the bootstrap VM ref openshift-metal3/kni-installer#100 - but again I'm slightly concerned that if there was any error which prevented clean teardown of the VM we'll end up with IP conflicts, I guess we'd need to adopt the same lease-expiry approach there to be sure?

cc @celebdor too for his input.

imain · 2019-06-18T15:35:43Z

I'm pretty sure that ironic updates the mac entries in the pxe server, which isn't part of the dnsmasq configuration - @dtantsur can you confirm that please?

When you look at the dnsmasq config in the container it's static, and we just need to point at an IP where the next stage of the pxe config can be loaded:

https://github.com/metal3-io/ironic-image/blob/master/dnsmasq.conf#L9

That said @juliakreger mentioned some other contraints, so it may be an external DHCP server isn't the most viable solution, I'm just trying to understand our options.

Yes sorry, you are right, we could separate them. I think I was confused because we write entries for tftpd which is managed by dnsmasq also.

If we stick with the hosted dnsmasq, we seem to have two options - either go with this short-lease-expiry approach, or maintain state so that the DHCP leases/reservations are persisted in the dnsmasq config - I chatted with @dhellmann about that last week and he has some ideas, so it may be worth looking into and/or prototyping.

Yes agreed, and I'm happy with either of those solutions. The previous version I had was assigning IPs to each master, but there was no real IPAM for future changes so I abandoned it. If we can get IPAM doing that then that would be great.

Switching would be very easy as well. The container interface is such that we can just specify an interface and it will find the IP and set everything up.

For now I'm not blocking this approach, but my main concern is that without fencing, we could conceivably have a network outage on the public/control-plane network which causes the baremetal-operator pod to get rescheduled, are we sure in that situation this approach will be robust?

Yes it is true. With some more thought however I realized we can't have two pods running regardless. We will end up with 2 dhcp servers regardless :/

One nice side-effect of this reuse of the .2 address though is it uses the same IP as the bootstrap VM ref openshift-metal3/kni-installer#100 - but again I'm slightly concerned that if there was any error which prevented clean teardown of the VM we'll end up with IP conflicts, I guess we'd need to adopt the same lease-expiry approach there to be sure?

Oh we don't have to use .2. I thought the bootstrap vm was .1? Anyway any IP will work so long as its available. Shall I change it to .3?

cc @celebdor too for his input.

At any rate, I'm happy with any IPAM solution. I'm going to update this PR with the downloader images. I'm hoping it would be ok to get this merged at some point soonish so we can move forward with the pod. Switching to any other solution should be pretty straight forward in the future.

hardys · 2019-06-19T16:31:55Z

Oh we don't have to use .2. I thought the bootstrap vm was .1? Anyway any IP will work so long as its available. Shall I change it to .3?

Yeah currently the provisioning host is .1, and the provisioning interface on the bootstrap VM is unconfigured - in openshift-metal3/kni-installer#100 I bring that up as .2, so we probably should change either that PR or this one to avoid potential conflict (although you're right that if the dnsmasq is running on both IPs we've got big problems anyway)

imain · 2019-06-20T19:38:49Z

I changed this to use .3

Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212

imain · 2019-06-21T15:33:42Z

Does this mean we can remove the proxies from operator.yaml now?

Sorry Zane, missed this one. I think this replaces operator.yaml. Once this is in we can decide if we remove it or just remove the proxies.

dhellmann

Maybe we can (separately) make a variant of the the deployment configuration that would let us continue to run the operator outside of a cluster for development testing.

I think the only thing actually blocking this is the image that's being built out of your personal quay repo instead of one owned by the team.

dhellmann · 2019-06-26T19:53:40Z

deploy/operator_ironic.yaml

+                configMapKeyRef:
+                  name: ironic-bmo-configmap
+                  key: cache_url
+        - name: rhcos-downloader


I don't think we want this in the baremetal-operator pod, long term. The use of RHCOS is OpenShift-specific, and it's also only going to be used for deploying images to build a cluster. We already have users interested in using the BMO for other things, so we're trying to keep that sort of knowledge out of the BMO. We eventually need to move this over to the pod that runs the cluster-api-provider-baremetal controller under OpenShift, since that is the thing that needs to know the IP of the server hosting this image.

is this something you want addressed now? Honestly not sure how we'd do this another way atm.

Yes, I think this should be removed from this PR since it's OpenShift specific.

It should go somewhere where details of the OpenShift integration are maintained. That's openshift-metal3/dev-scripts short term, and the machine-api-operator longer term (but hopefully ASAP)

Maybe we should just copy this entire yaml to dev-scripts until the MAO change lands?

/cc @sadasu

deploy/operator_ironic.yaml

dtantsur · 2019-06-29T10:12:35Z

deploy/operator_ironic.yaml

              valueFrom:
                configMapKeyRef:
                  name: ironic-bmo-configmap
-                  key: http_ip
+                  key: provisioning_interface
        - name: ironic


Side note: we're splitting this container, see metal3-io/ironic-image#62 and openshift-metal3/dev-scripts#646

The change has merged, could you actually do the split please?

Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212

imain · 2019-07-05T01:31:55Z

OK this is working for me except for the bug Steve and Doug found where the masters are getting rebooted. If I only bring up a worker it works fine.

It would be nice to get his merged as it does not affect anything in the deployment and then we can work on getting the rest fixed/merged.

dtantsur · 2019-07-08T10:55:41Z

deploy/operator_ironic.yaml

              valueFrom:
                configMapKeyRef:
                  name: ironic-bmo-configmap
-                  key: http_ip
+                  key: provisioning_interface
        - name: ironic


The change has merged, could you actually do the split please?

hardys · 2019-07-08T11:15:20Z

Ok I tested this locally and it's working well for me now since we rebased and picked up the CAPBM fix.

If we can rebase this and address the comments from @dtantsur to align with metal3-io/ironic-image#62 and openshift-metal3/dev-scripts#646 then I think this should be good to land?

russellb · 2019-07-08T17:44:09Z

deploy/ironic_bmo_configmap.yaml

+  ironic_endpoint: "http://172.22.0.3:6385/v1/"
+  ironic_inspector_endpoint: "http://172.22.0.3:5050/v1/"
+  cache_url: "http://172.22.0.1/images"
+  rhcos_image_url: "https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/ootpa/410.8.20190520.0/"


This is OpenShift specific, so I would prefer it didn't go anywhere in the baremteal-operator repository

Also this image is now outdated, should be https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.2/420.8.20190624.0/

I guess that will be easier to maintain if we do move this yaml to dev-scripts, until the MAO integration lands (where I assume we can template this in based on knowledge of the release payload cc @sadasu )

russellb · 2019-07-08T17:45:22Z

deploy/operator_ironic.yaml

+                configMapKeyRef:
+                  name: ironic-bmo-configmap
+                  key: cache_url
+        - name: rhcos-downloader


Yes, I think this should be removed from this PR since it's OpenShift specific.

It should go somewhere where details of the OpenShift integration are maintained. That's openshift-metal3/dev-scripts short term, and the machine-api-operator longer term (but hopefully ASAP)

Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212

hardys · 2019-07-09T13:34:20Z

Ok I moved this to openshift-metal3/dev-scripts#659 for testing, working locally for me

Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212

We now set a static IP with a 'lifetime' on the provisioning interface. We also now determine this IP from the interface itself in the containers using the new interface. This also uses the new downloader images to get the images from the bootstrap node.

imain · 2019-07-11T16:01:06Z

ok I'll abandon this shortly.

imain · 2019-07-11T17:28:12Z

This is now part of:

openshift-metal3/dev-scripts#635

ocp: add baremetal capability annotation

zaneb requested changes Jun 10, 2019

View reviewed changes

deploy/operator_ironic.yaml Outdated Show resolved Hide resolved

dhellmann reviewed Jun 10, 2019

View reviewed changes

deploy/operator_ironic.yaml Outdated Show resolved Hide resolved

deploy/operator_ironic.yaml Outdated Show resolved Hide resolved

imain added a commit to imain/dev-scripts that referenced this pull request Jun 20, 2019

WIP: Use the new baremetal ironic pod to deploy the BMO

4572f1a

Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212

imain mentioned this pull request Jun 20, 2019

Use the new cluster hosted ironic pod to deploy the BMO openshift-metal3/dev-scripts#635

Merged

dhellmann requested changes Jun 26, 2019

View reviewed changes

hardys mentioned this pull request Jun 27, 2019

Prometheus doesn't collect ironic data openshift-metal3/dev-scripts#645

Closed

dtantsur reviewed Jun 29, 2019

View reviewed changes

hardys pushed a commit to hardys/dev-scripts that referenced this pull request Jul 1, 2019

WIP: Use the new baremetal ironic pod to deploy the BMO

e4b8d9a

Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212

hardys mentioned this pull request Jul 8, 2019

Replace multi-service ironic container with several containers openshift-metal3/dev-scripts#646

Merged

dtantsur requested changes Jul 8, 2019

View reviewed changes

hardys mentioned this pull request Jul 8, 2019

Not using fast-track for Ironic provisioning #246

Closed

russellb suggested changes Jul 8, 2019

View reviewed changes

imain added a commit to imain/dev-scripts that referenced this pull request Jul 9, 2019

WIP: Use the new baremetal ironic pod to deploy the BMO

df52bba

Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212

imain added a commit to imain/dev-scripts that referenced this pull request Jul 9, 2019

WIP: Use the new baremetal ironic pod to deploy the BMO

496ad6d

Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212

hardys mentioned this pull request Jul 9, 2019

metrics endpoint has port conflict #252

Closed

imain added a commit to imain/dev-scripts that referenced this pull request Jul 10, 2019

Use the new baremetal ironic pod to deploy the BMO

c55ed9d

Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212

hardys mentioned this pull request Jul 10, 2019

Switch to using cluster-hosted Ironic for worker deployments openshift-metal3/dev-scripts#659

Closed

iurygregory mentioned this pull request Jul 11, 2019

Integrate baremetal-operator with machine-api-operator openshift/machine-api-operator#302

Merged

imain closed this Jul 11, 2019

honza pushed a commit to honza/baremetal-operator that referenced this pull request Mar 25, 2022

Merge pull request metal3-io#212 from honza/capability-annotation

a62439f

ocp: add baremetal capability annotation

Use new interface to ironic containers. #212

Use new interface to ironic containers. #212

Conversation

imain commented Jun 10, 2019

zaneb left a comment

Choose a reason for hiding this comment

dhellmann left a comment

Choose a reason for hiding this comment

hardys commented Jun 11, 2019

imain commented Jun 11, 2019

imain commented Jun 11, 2019

hardys commented Jun 13, 2019

hardys commented Jun 13, 2019

hardys commented Jun 13, 2019

hardys commented Jun 13, 2019 • edited Loading

hardys commented Jun 13, 2019

dhellmann commented Jun 13, 2019

hardys commented Jun 13, 2019

dhellmann commented Jun 13, 2019

hardys commented Jun 13, 2019

dhellmann commented Jun 13, 2019

hardys commented Jun 14, 2019

imain commented Jun 14, 2019 • edited Loading

imain commented Jun 14, 2019

imain commented Jun 14, 2019 • edited Loading

imain commented Jun 14, 2019

hardys commented Jun 17, 2019

imain commented Jun 18, 2019

hardys commented Jun 19, 2019

imain commented Jun 20, 2019

imain commented Jun 21, 2019

dhellmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dtantsur Jun 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imain commented Jul 5, 2019

Choose a reason for hiding this comment

hardys commented Jul 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hardys commented Jul 9, 2019

imain commented Jul 11, 2019

imain commented Jul 11, 2019

hardys commented Jun 13, 2019 •

edited

Loading

imain commented Jun 14, 2019 •

edited

Loading

imain commented Jun 14, 2019 •

edited

Loading

dtantsur Jun 29, 2019 •

edited

Loading