Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new interface to ironic containers. #212

Closed
wants to merge 1 commit into from
Closed

Use new interface to ironic containers. #212

wants to merge 1 commit into from

Conversation

imain
Copy link
Contributor

@imain imain commented Jun 10, 2019

We now set a static IP with a 'lifetime' on the provisioning interface.
We also now determine this IP from the interface itself in the containers
using the new interface.

Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we can remove the proxies from operator.yaml now?

deploy/operator_ironic.yaml Outdated Show resolved Hide resolved
Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like to see this documented with a design doc over in the metal3-docs repo. I'm sure it's fine, but there are a bunch of new moving parts and I'd like to have something to point new contributors to so they can understand what is going on.

deploy/operator_ironic.yaml Outdated Show resolved Hide resolved
deploy/operator_ironic.yaml Outdated Show resolved Hide resolved
@hardys
Copy link
Member

hardys commented Jun 11, 2019

Yeah I'd also like some more docs/context on this - at various times we've discussed ipam and/or external DHCP with next-server for PXE and I'm still not entirely clear how this solves the IPAM problem without some more info, or how it relates to the Ironic dnsmasq pool, so we can be sure the static configuration can never overlap with a subsequent DHCP/PXE deployment on scale-out.

@imain
Copy link
Contributor Author

imain commented Jun 11, 2019

Does this mean we can remove the proxies from operator.yaml now?

Well, once this works, yes :) There's still a bug that is not letting the worker out of inspection.

@imain
Copy link
Contributor Author

imain commented Jun 11, 2019

I think I'd like to see this documented with a design doc over in the metal3-docs repo. I'm sure it's fine, but there are a bunch of new moving parts and I'd like to have something to point new contributors to so they can understand what is going on.

OK I'll work on that next.

@hardys
Copy link
Member

hardys commented Jun 13, 2019

There has been some discussion about using an external DHCP server for the provisioning network, so while it's good to see this work progress, I wonder if we might have a quicker path to success if we just relied on the current ironic dnsmasq on the host, and configured it to point next-server to the ironic PXE deployed via the operator?

I'm also wondering about the ens3 here - on the bootstrap RHCOS node I needed to configure ens4 on the provisioning subnet, then get ironic to bind to that interface, is that not the case on the deployed nodes? (I've not yet tested to find out, sorry)

@hardys
Copy link
Member

hardys commented Jun 13, 2019

Also what configures the second nic via network manager in this case, can anyone point me to where we do that please? Thanks!

@hardys
Copy link
Member

hardys commented Jun 13, 2019

I spotted an issue with the ironic containers which may explain why we can pass the public IP and still have the ironic service accessible via the provisioning network, see metal3-io/ironic-image#56

@hardys
Copy link
Member

hardys commented Jun 13, 2019

Also what configures the second nic via network manager in this case, can anyone point me to where we do that please? Thanks!

nvm, I see now that's what static-ip-manager does :)

I wonder if it might be cleaner to inject some ignition config which sets up the IP via NetworkManager, or as previously mentioned this all gets simpler if (for a first pass at least) we can rely on an external DHCP server to provide the IPs.

When configuring the IP on the bootstrap VM I used NetworkManager as the script based approach appears deprecated in el8, and so also I guess the ootpa images:

https://github.com/openshift-metal3/kni-installer/blob/523be53167833cc2f95ee0589b0181c6d8c69764/data/data/bootstrap/baremetal/files/usr/local/bin/startironic.sh#L20..L24

@hardys
Copy link
Member

hardys commented Jun 13, 2019

One other question - when we add the *downloader containers, we'll need to know the IP where the images can be downloaded - how can we make that persistent, and how will the actuator know what address to use, e.g what if the pod gets rescheduled onto a different node, with a different provisioning IP?

@dhellmann
Copy link
Member

One other question - when we add the *downloader containers, we'll need to know the IP where the images can be downloaded - how can we make that persistent, and how will the actuator know what address to use, e.g what if the pod gets rescheduled onto a different node, with a different provisioning IP?

I thought we were going to define a Service for that URL so k8s would deal with it moving for us?

If not that, do we want to set up mDNS for that endpoint, like we've done with the Ironic API?

@hardys
Copy link
Member

hardys commented Jun 13, 2019

I thought we were going to define a Service for that URL so k8s would deal with it moving for us?

If not that, do we want to set up mDNS for that endpoint, like we've done with the Ironic API?

Yes having k8s manage it would be ideal, but I wasn't sure if that was feasable since we need the images to be accesible via the provisioning network, and that isn't managed by k8s.

Having looked more closely, I think the solution proposed here is to always bring up the same IP for the provisioning nic, regardless of where the pod runs, then we always access e.g 172.22.0.2 for ironic and potentially the images. I can see how that could work, but I'm worried about what happens if e.g there's a node/network outage and k8s moves the pod?

@dhellmann
Copy link
Member

I thought we were going to define a Service for that URL so k8s would deal with it moving for us?
If not that, do we want to set up mDNS for that endpoint, like we've done with the Ironic API?

Yes having k8s manage it would be ideal, but I wasn't sure if that was feasable since we need the images to be accesible via the provisioning network, and that isn't managed by k8s.

I'm not convinced that's a strong requirement.

We need the separate provisioning network for the DHCP response, because we need that type of traffic to be isolated. I don't think the same requirement automatically applies to the image traffic which, while somewhat large, is also only part of the downloads needed to set up a node and is temporary. I doubt upgrades are going to happen over the private network, as well.

I would be very happy with a Service-based solution for now that lets us move forward to a point where we can see if that implementation bothers users.

@hardys
Copy link
Member

hardys commented Jun 13, 2019

I'm not convinced that's a strong requirement.

Perhaps I'm misunderstanding, but at the point of PXE provisioning the nodes, we only have access to the provisioning network, so one way or another we need to be able to download the IPA kernel/ramdisk and also the RHCOS image to deploy, not sure how we do that when the nodes aren't connected to the k8s controlplane/external network unless the images are exposed via the provisioning network.

Are you thinking we configure a route on all the nodes to enable reaching a k8s managed DNS entry, where the images are hosted?

@dhellmann
Copy link
Member

I'm not convinced that's a strong requirement.

Perhaps I'm misunderstanding, but at the point of PXE provisioning the nodes, we only have access to the provisioning network, so one way or another we need to be able to download the IPA kernel/ramdisk and also the RHCOS image to deploy, not sure how we do that when the nodes aren't connected to the k8s controlplane/external network unless the images are exposed via the provisioning network.

Are you thinking we configure a route on all the nodes to enable reaching a k8s managed DNS entry, where the images are hosted?

Based on the introspection data, I see IPs on both NICs for the hosts I have provisioned. So I think the IPA image is running a DHCP client for all of them (and that's what I would expect).

So as far as I can tell, we have access to all of the networks at the point when we would try to download the image during provisioning, without adding any routes.

@hardys
Copy link
Member

hardys commented Jun 14, 2019

So as far as I can tell, we have access to all of the networks at the point when we would try to download the image during provisioning, without adding any routes.

Sure, but my concern is earlier than that point, the process is roughly:

  1. DHCP on the provisioning nic (either dnsmasq or potentially external) points to the initial pxe config
  2. The initial pxe config chain-loads the per-node (ironic managed) ipxe configuration
  3. The per-node configuration points at an http location where we download the IPA kernel/ramdisk
  4. The IPA ramdisk boots, downloads the OS image and writes it to disk.

Currently all of these run on the provisioning network, so if we want to download the IPA kernel/ramdisk from the public network I think we'll need to modify the pxe configuration to activate both nics, and we won't be able to support external DHCP on the provisioning network unless the pxe server location is predictable.

@imain
Copy link
Contributor Author

imain commented Jun 14, 2019

There has been some discussion about using an external DHCP server for the provisioning network, so while it's good to see this work progress, I wonder if we might have a quicker path to success if we just relied on the current ironic dnsmasq on the host, and configured it to point next-server to the ironic PXE deployed via the operator?

I'm also wondering about the ens3 here - on the bootstrap RHCOS node I needed to configure ens4 on the provisioning subnet, then get ironic to bind to that interface, is that not the case on the deployed nodes? (I've not yet tested to find out, sorry)

OK there are a few constraints here. I've been told (and seen the results) that dnsmasq has to share a filesystem with ironic/ironic-inspector. That is how the images get switched from the inspector image to whatever other image you want to boot. It needs to write an entry for the mac into the dnsmasq config. Therefore, my understanding is, we need to run dnsmasq on the provisioning host (or at least so it has access to the k8s shared volume).

The provisioning host cannot get it's IP from DHCP because we are offering a DHCP server. I tried it, it either refuses to bind or it wipes the IP off the interface (I forget I tried a lot of things :)).

So, this leaves us where I am which is that we need static IPs on the provisioning network to run a dhcp server.

The idea of doing it via network manager is nice, but the method I'm using now gives us fault tolerance. Using the 'lifetime' option of an IP, if the pod disappears for any reason, the IP will not be refreshed and it will be released within 10 seconds of its death. A new pod can then start on another master without worry of collisions. There are other ways to manage the static IPs on the provisioning network but this one makes the most sense imo.

If you want to do other things with the provisioning network I think you'll have to set up a vlan to make that work.

@imain
Copy link
Contributor Author

imain commented Jun 14, 2019

So as far as I can tell, we have access to all of the networks at the point when we would try to download the image during provisioning, without adding any routes.

Sure, but my concern is earlier than that point, the process is roughly:

  1. DHCP on the provisioning nic (either dnsmasq or potentially external) points to the initial pxe config
  2. The initial pxe config chain-loads the per-node (ironic managed) ipxe configuration
  3. The per-node configuration points at an http location where we download the IPA kernel/ramdisk
  4. The IPA ramdisk boots, downloads the OS image and writes it to disk.

Currently all of these run on the provisioning network, so if we want to download the IPA kernel/ramdisk from the public network I think we'll need to modify the pxe configuration to activate both nics, and we won't be able to support external DHCP on the provisioning network unless the pxe server location is predictable.

I don't think it's possible to do this atm, although I could be wrong. My understanding is that we need to be able to update the dnsmasq configuration to point to the new ramdisk. I also thought we wanted it in the pod so we can have fault tolerance? Currently you can download from the original httpd server still, so I'm not sure how the routes are being set up there. Probably the non-provisioning interface is being activated via dhcp.

@imain
Copy link
Contributor Author

imain commented Jun 14, 2019

https://github.com/metal3-io/metal3-docs/blob/master/design/how-ironic-works.md#what-connectivity-is-required

"DHCP (via the ironic host, for IP assignment and PXE instructions)"

It may be possible to do what you are suggesting Steve, but I suspect it would require some more serious meddling with how ironic currently works.

@imain
Copy link
Contributor Author

imain commented Jun 14, 2019

Documentation: metal3-io/metal3-docs#38

@hardys
Copy link
Member

hardys commented Jun 17, 2019

I'm pretty sure that ironic updates the mac entries in the pxe server, which isn't part of the dnsmasq configuration - @dtantsur can you confirm that please?

When you look at the dnsmasq config in the container it's static, and we just need to point at an IP where the next stage of the pxe config can be loaded:

https://github.com/metal3-io/ironic-image/blob/master/dnsmasq.conf#L9

That said @juliakreger mentioned some other contraints, so it may be an external DHCP server isn't the most viable solution, I'm just trying to understand our options.

If we stick with the hosted dnsmasq, we seem to have two options - either go with this short-lease-expiry approach, or maintain state so that the DHCP leases/reservations are persisted in the dnsmasq config - I chatted with @dhellmann about that last week and he has some ideas, so it may be worth looking into and/or prototyping.

For now I'm not blocking this approach, but my main concern is that without fencing, we could conceivably have a network outage on the public/control-plane network which causes the baremetal-operator pod to get rescheduled, are we sure in that situation this approach will be robust?

One nice side-effect of this reuse of the .2 address though is it uses the same IP as the bootstrap VM ref openshift-metal3/kni-installer#100 - but again I'm slightly concerned that if there was any error which prevented clean teardown of the VM we'll end up with IP conflicts, I guess we'd need to adopt the same lease-expiry approach there to be sure?

cc @celebdor too for his input.

@imain
Copy link
Contributor Author

imain commented Jun 18, 2019

I'm pretty sure that ironic updates the mac entries in the pxe server, which isn't part of the dnsmasq configuration - @dtantsur can you confirm that please?

When you look at the dnsmasq config in the container it's static, and we just need to point at an IP where the next stage of the pxe config can be loaded:

https://github.com/metal3-io/ironic-image/blob/master/dnsmasq.conf#L9

That said @juliakreger mentioned some other contraints, so it may be an external DHCP server isn't the most viable solution, I'm just trying to understand our options.

Yes sorry, you are right, we could separate them. I think I was confused because we write entries for tftpd which is managed by dnsmasq also.

If we stick with the hosted dnsmasq, we seem to have two options - either go with this short-lease-expiry approach, or maintain state so that the DHCP leases/reservations are persisted in the dnsmasq config - I chatted with @dhellmann about that last week and he has some ideas, so it may be worth looking into and/or prototyping.

Yes agreed, and I'm happy with either of those solutions. The previous version I had was assigning IPs to each master, but there was no real IPAM for future changes so I abandoned it. If we can get IPAM doing that then that would be great.

Switching would be very easy as well. The container interface is such that we can just specify an interface and it will find the IP and set everything up.

For now I'm not blocking this approach, but my main concern is that without fencing, we could conceivably have a network outage on the public/control-plane network which causes the baremetal-operator pod to get rescheduled, are we sure in that situation this approach will be robust?

Yes it is true. With some more thought however I realized we can't have two pods running regardless. We will end up with 2 dhcp servers regardless :/

One nice side-effect of this reuse of the .2 address though is it uses the same IP as the bootstrap VM ref openshift-metal3/kni-installer#100 - but again I'm slightly concerned that if there was any error which prevented clean teardown of the VM we'll end up with IP conflicts, I guess we'd need to adopt the same lease-expiry approach there to be sure?

Oh we don't have to use .2. I thought the bootstrap vm was .1? Anyway any IP will work so long as its available. Shall I change it to .3?

cc @celebdor too for his input.

At any rate, I'm happy with any IPAM solution. I'm going to update this PR with the downloader images. I'm hoping it would be ok to get this merged at some point soonish so we can move forward with the pod. Switching to any other solution should be pretty straight forward in the future.

@hardys
Copy link
Member

hardys commented Jun 19, 2019

Oh we don't have to use .2. I thought the bootstrap vm was .1? Anyway any IP will work so long as its available. Shall I change it to .3?

Yeah currently the provisioning host is .1, and the provisioning interface on the bootstrap VM is unconfigured - in openshift-metal3/kni-installer#100 I bring that up as .2, so we probably should change either that PR or this one to avoid potential conflict (although you're right that if the dnsmasq is running on both IPs we've got big problems anyway)

@imain
Copy link
Contributor Author

imain commented Jun 20, 2019

I changed this to use .3

imain added a commit to imain/dev-scripts that referenced this pull request Jun 20, 2019
Switch to using the new ironic + baremetal operator pod.

Depends on: metal3-io/baremetal-operator#212
@imain
Copy link
Contributor Author

imain commented Jun 21, 2019

Does this mean we can remove the proxies from operator.yaml now?

Sorry Zane, missed this one. I think this replaces operator.yaml. Once this is in we can decide if we remove it or just remove the proxies.

Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can (separately) make a variant of the the deployment configuration that would let us continue to run the operator outside of a cluster for development testing.

I think the only thing actually blocking this is the image that's being built out of your personal quay repo instead of one owned by the team.

configMapKeyRef:
name: ironic-bmo-configmap
key: cache_url
- name: rhcos-downloader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this in the baremetal-operator pod, long term. The use of RHCOS is OpenShift-specific, and it's also only going to be used for deploying images to build a cluster. We already have users interested in using the BMO for other things, so we're trying to keep that sort of knowledge out of the BMO. We eventually need to move this over to the pod that runs the cluster-api-provider-baremetal controller under OpenShift, since that is the thing that needs to know the IP of the server hosting this image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something you want addressed now? Honestly not sure how we'd do this another way atm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this should be removed from this PR since it's OpenShift specific.

It should go somewhere where details of the OpenShift integration are maintained. That's openshift-metal3/dev-scripts short term, and the machine-api-operator longer term (but hopefully ASAP)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just copy this entire yaml to dev-scripts until the MAO change lands?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @sadasu

deploy/operator_ironic.yaml Outdated Show resolved Hide resolved
valueFrom:
configMapKeyRef:
name: ironic-bmo-configmap
key: http_ip
key: provisioning_interface
- name: ironic
Copy link
Member

@dtantsur dtantsur Jun 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: we're splitting this container, see metal3-io/ironic-image#62 and openshift-metal3/dev-scripts#646

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change has merged, could you actually do the split please?

hardys pushed a commit to hardys/dev-scripts that referenced this pull request Jul 1, 2019
Switch to using the new ironic + baremetal operator pod.

Depends on: metal3-io/baremetal-operator#212
@imain
Copy link
Contributor Author

imain commented Jul 5, 2019

OK this is working for me except for the bug Steve and Doug found where the masters are getting rebooted. If I only bring up a worker it works fine.

It would be nice to get his merged as it does not affect anything in the deployment and then we can work on getting the rest fixed/merged.

valueFrom:
configMapKeyRef:
name: ironic-bmo-configmap
key: http_ip
key: provisioning_interface
- name: ironic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change has merged, could you actually do the split please?

@hardys
Copy link
Member

hardys commented Jul 8, 2019

Ok I tested this locally and it's working well for me now since we rebased and picked up the CAPBM fix.

If we can rebase this and address the comments from @dtantsur to align with metal3-io/ironic-image#62 and openshift-metal3/dev-scripts#646 then I think this should be good to land?

ironic_endpoint: "http://172.22.0.3:6385/v1/"
ironic_inspector_endpoint: "http://172.22.0.3:5050/v1/"
cache_url: "http://172.22.0.1/images"
rhcos_image_url: "https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/ootpa/410.8.20190520.0/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OpenShift specific, so I would prefer it didn't go anywhere in the baremteal-operator repository

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this image is now outdated, should be https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.2/420.8.20190624.0/

I guess that will be easier to maintain if we do move this yaml to dev-scripts, until the MAO integration lands (where I assume we can template this in based on knowledge of the release payload cc @sadasu )

configMapKeyRef:
name: ironic-bmo-configmap
key: cache_url
- name: rhcos-downloader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this should be removed from this PR since it's OpenShift specific.

It should go somewhere where details of the OpenShift integration are maintained. That's openshift-metal3/dev-scripts short term, and the machine-api-operator longer term (but hopefully ASAP)

imain added a commit to imain/dev-scripts that referenced this pull request Jul 9, 2019
Switch to using the new ironic + baremetal operator pod.

Depends on: metal3-io/baremetal-operator#212
imain added a commit to imain/dev-scripts that referenced this pull request Jul 9, 2019
Switch to using the new ironic + baremetal operator pod.

Depends on: metal3-io/baremetal-operator#212
@hardys
Copy link
Member

hardys commented Jul 9, 2019

Ok I moved this to openshift-metal3/dev-scripts#659 for testing, working locally for me

imain added a commit to imain/dev-scripts that referenced this pull request Jul 10, 2019
Switch to using the new ironic + baremetal operator pod.

Depends on: metal3-io/baremetal-operator#212
We now set a static IP with a 'lifetime' on the provisioning interface.
We also now determine this IP from the interface itself in the containers
using the new interface.  This also uses the new downloader images to
get the images from the bootstrap node.
@imain
Copy link
Contributor Author

imain commented Jul 11, 2019

ok I'll abandon this shortly.

@imain
Copy link
Contributor Author

imain commented Jul 11, 2019

This is now part of:

openshift-metal3/dev-scripts#635

@imain imain closed this Jul 11, 2019
honza pushed a commit to honza/baremetal-operator that referenced this pull request Mar 25, 2022
ocp: add baremetal capability annotation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants