-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new interface to ironic containers. #212
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we can remove the proxies from operator.yaml now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd like to see this documented with a design doc over in the metal3-docs repo. I'm sure it's fine, but there are a bunch of new moving parts and I'd like to have something to point new contributors to so they can understand what is going on.
Yeah I'd also like some more docs/context on this - at various times we've discussed ipam and/or external DHCP with next-server for PXE and I'm still not entirely clear how this solves the IPAM problem without some more info, or how it relates to the Ironic dnsmasq pool, so we can be sure the static configuration can never overlap with a subsequent DHCP/PXE deployment on scale-out. |
Well, once this works, yes :) There's still a bug that is not letting the worker out of inspection. |
OK I'll work on that next. |
There has been some discussion about using an external DHCP server for the provisioning network, so while it's good to see this work progress, I wonder if we might have a quicker path to success if we just relied on the current ironic dnsmasq on the host, and configured it to point next-server to the ironic PXE deployed via the operator? I'm also wondering about the ens3 here - on the bootstrap RHCOS node I needed to configure ens4 on the provisioning subnet, then get ironic to bind to that interface, is that not the case on the deployed nodes? (I've not yet tested to find out, sorry) |
Also what configures the second nic via network manager in this case, can anyone point me to where we do that please? Thanks! |
I spotted an issue with the ironic containers which may explain why we can pass the public IP and still have the ironic service accessible via the provisioning network, see metal3-io/ironic-image#56 |
nvm, I see now that's what static-ip-manager does :) I wonder if it might be cleaner to inject some ignition config which sets up the IP via NetworkManager, or as previously mentioned this all gets simpler if (for a first pass at least) we can rely on an external DHCP server to provide the IPs. When configuring the IP on the bootstrap VM I used NetworkManager as the script based approach appears deprecated in el8, and so also I guess the ootpa images: |
One other question - when we add the *downloader containers, we'll need to know the IP where the images can be downloaded - how can we make that persistent, and how will the actuator know what address to use, e.g what if the pod gets rescheduled onto a different node, with a different provisioning IP? |
I thought we were going to define a Service for that URL so k8s would deal with it moving for us? If not that, do we want to set up mDNS for that endpoint, like we've done with the Ironic API? |
Yes having k8s manage it would be ideal, but I wasn't sure if that was feasable since we need the images to be accesible via the provisioning network, and that isn't managed by k8s. Having looked more closely, I think the solution proposed here is to always bring up the same IP for the provisioning nic, regardless of where the pod runs, then we always access e.g 172.22.0.2 for ironic and potentially the images. I can see how that could work, but I'm worried about what happens if e.g there's a node/network outage and k8s moves the pod? |
I'm not convinced that's a strong requirement. We need the separate provisioning network for the DHCP response, because we need that type of traffic to be isolated. I don't think the same requirement automatically applies to the image traffic which, while somewhat large, is also only part of the downloads needed to set up a node and is temporary. I doubt upgrades are going to happen over the private network, as well. I would be very happy with a Service-based solution for now that lets us move forward to a point where we can see if that implementation bothers users. |
Perhaps I'm misunderstanding, but at the point of PXE provisioning the nodes, we only have access to the provisioning network, so one way or another we need to be able to download the IPA kernel/ramdisk and also the RHCOS image to deploy, not sure how we do that when the nodes aren't connected to the k8s controlplane/external network unless the images are exposed via the provisioning network. Are you thinking we configure a route on all the nodes to enable reaching a k8s managed DNS entry, where the images are hosted? |
Based on the introspection data, I see IPs on both NICs for the hosts I have provisioned. So I think the IPA image is running a DHCP client for all of them (and that's what I would expect). So as far as I can tell, we have access to all of the networks at the point when we would try to download the image during provisioning, without adding any routes. |
Sure, but my concern is earlier than that point, the process is roughly:
Currently all of these run on the provisioning network, so if we want to download the IPA kernel/ramdisk from the public network I think we'll need to modify the pxe configuration to activate both nics, and we won't be able to support external DHCP on the provisioning network unless the pxe server location is predictable. |
OK there are a few constraints here. I've been told (and seen the results) that dnsmasq has to share a filesystem with ironic/ironic-inspector. That is how the images get switched from the inspector image to whatever other image you want to boot. It needs to write an entry for the mac into the dnsmasq config. Therefore, my understanding is, we need to run dnsmasq on the provisioning host (or at least so it has access to the k8s shared volume). The provisioning host cannot get it's IP from DHCP because we are offering a DHCP server. I tried it, it either refuses to bind or it wipes the IP off the interface (I forget I tried a lot of things :)). So, this leaves us where I am which is that we need static IPs on the provisioning network to run a dhcp server. The idea of doing it via network manager is nice, but the method I'm using now gives us fault tolerance. Using the 'lifetime' option of an IP, if the pod disappears for any reason, the IP will not be refreshed and it will be released within 10 seconds of its death. A new pod can then start on another master without worry of collisions. There are other ways to manage the static IPs on the provisioning network but this one makes the most sense imo. If you want to do other things with the provisioning network I think you'll have to set up a vlan to make that work. |
I don't think it's possible to do this atm, although I could be wrong. My understanding is that we need to be able to update the dnsmasq configuration to point to the new ramdisk. I also thought we wanted it in the pod so we can have fault tolerance? Currently you can download from the original httpd server still, so I'm not sure how the routes are being set up there. Probably the non-provisioning interface is being activated via dhcp. |
"DHCP (via the ironic host, for IP assignment and PXE instructions)" It may be possible to do what you are suggesting Steve, but I suspect it would require some more serious meddling with how ironic currently works. |
Documentation: metal3-io/metal3-docs#38 |
I'm pretty sure that ironic updates the mac entries in the pxe server, which isn't part of the dnsmasq configuration - @dtantsur can you confirm that please? When you look at the dnsmasq config in the container it's static, and we just need to point at an IP where the next stage of the pxe config can be loaded: https://github.com/metal3-io/ironic-image/blob/master/dnsmasq.conf#L9 That said @juliakreger mentioned some other contraints, so it may be an external DHCP server isn't the most viable solution, I'm just trying to understand our options. If we stick with the hosted dnsmasq, we seem to have two options - either go with this short-lease-expiry approach, or maintain state so that the DHCP leases/reservations are persisted in the dnsmasq config - I chatted with @dhellmann about that last week and he has some ideas, so it may be worth looking into and/or prototyping. For now I'm not blocking this approach, but my main concern is that without fencing, we could conceivably have a network outage on the public/control-plane network which causes the baremetal-operator pod to get rescheduled, are we sure in that situation this approach will be robust? One nice side-effect of this reuse of the .2 address though is it uses the same IP as the bootstrap VM ref openshift-metal3/kni-installer#100 - but again I'm slightly concerned that if there was any error which prevented clean teardown of the VM we'll end up with IP conflicts, I guess we'd need to adopt the same lease-expiry approach there to be sure? cc @celebdor too for his input. |
Yes sorry, you are right, we could separate them. I think I was confused because we write entries for tftpd which is managed by dnsmasq also.
Yes agreed, and I'm happy with either of those solutions. The previous version I had was assigning IPs to each master, but there was no real IPAM for future changes so I abandoned it. If we can get IPAM doing that then that would be great. Switching would be very easy as well. The container interface is such that we can just specify an interface and it will find the IP and set everything up.
Yes it is true. With some more thought however I realized we can't have two pods running regardless. We will end up with 2 dhcp servers regardless :/
Oh we don't have to use .2. I thought the bootstrap vm was .1? Anyway any IP will work so long as its available. Shall I change it to .3?
At any rate, I'm happy with any IPAM solution. I'm going to update this PR with the downloader images. I'm hoping it would be ok to get this merged at some point soonish so we can move forward with the pod. Switching to any other solution should be pretty straight forward in the future. |
Yeah currently the provisioning host is .1, and the provisioning interface on the bootstrap VM is unconfigured - in openshift-metal3/kni-installer#100 I bring that up as .2, so we probably should change either that PR or this one to avoid potential conflict (although you're right that if the dnsmasq is running on both IPs we've got big problems anyway) |
I changed this to use .3 |
Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212
Sorry Zane, missed this one. I think this replaces operator.yaml. Once this is in we can decide if we remove it or just remove the proxies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can (separately) make a variant of the the deployment configuration that would let us continue to run the operator outside of a cluster for development testing.
I think the only thing actually blocking this is the image that's being built out of your personal quay repo instead of one owned by the team.
configMapKeyRef: | ||
name: ironic-bmo-configmap | ||
key: cache_url | ||
- name: rhcos-downloader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this in the baremetal-operator pod, long term. The use of RHCOS is OpenShift-specific, and it's also only going to be used for deploying images to build a cluster. We already have users interested in using the BMO for other things, so we're trying to keep that sort of knowledge out of the BMO. We eventually need to move this over to the pod that runs the cluster-api-provider-baremetal controller under OpenShift, since that is the thing that needs to know the IP of the server hosting this image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something you want addressed now? Honestly not sure how we'd do this another way atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this should be removed from this PR since it's OpenShift specific.
It should go somewhere where details of the OpenShift integration are maintained. That's openshift-metal3/dev-scripts short term, and the machine-api-operator longer term (but hopefully ASAP)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just copy this entire yaml to dev-scripts until the MAO change lands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @sadasu
deploy/operator_ironic.yaml
Outdated
valueFrom: | ||
configMapKeyRef: | ||
name: ironic-bmo-configmap | ||
key: http_ip | ||
key: provisioning_interface | ||
- name: ironic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: we're splitting this container, see metal3-io/ironic-image#62 and openshift-metal3/dev-scripts#646
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change has merged, could you actually do the split please?
Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212
OK this is working for me except for the bug Steve and Doug found where the masters are getting rebooted. If I only bring up a worker it works fine. It would be nice to get his merged as it does not affect anything in the deployment and then we can work on getting the rest fixed/merged. |
deploy/operator_ironic.yaml
Outdated
valueFrom: | ||
configMapKeyRef: | ||
name: ironic-bmo-configmap | ||
key: http_ip | ||
key: provisioning_interface | ||
- name: ironic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change has merged, could you actually do the split please?
Ok I tested this locally and it's working well for me now since we rebased and picked up the CAPBM fix. If we can rebase this and address the comments from @dtantsur to align with metal3-io/ironic-image#62 and openshift-metal3/dev-scripts#646 then I think this should be good to land? |
ironic_endpoint: "http://172.22.0.3:6385/v1/" | ||
ironic_inspector_endpoint: "http://172.22.0.3:5050/v1/" | ||
cache_url: "http://172.22.0.1/images" | ||
rhcos_image_url: "https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/ootpa/410.8.20190520.0/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OpenShift specific, so I would prefer it didn't go anywhere in the baremteal-operator repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this image is now outdated, should be https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.2/420.8.20190624.0/
I guess that will be easier to maintain if we do move this yaml to dev-scripts, until the MAO integration lands (where I assume we can template this in based on knowledge of the release payload cc @sadasu )
configMapKeyRef: | ||
name: ironic-bmo-configmap | ||
key: cache_url | ||
- name: rhcos-downloader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this should be removed from this PR since it's OpenShift specific.
It should go somewhere where details of the OpenShift integration are maintained. That's openshift-metal3/dev-scripts short term, and the machine-api-operator longer term (but hopefully ASAP)
Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212
Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212
Ok I moved this to openshift-metal3/dev-scripts#659 for testing, working locally for me |
Switch to using the new ironic + baremetal operator pod. Depends on: metal3-io/baremetal-operator#212
We now set a static IP with a 'lifetime' on the provisioning interface. We also now determine this IP from the interface itself in the containers using the new interface. This also uses the new downloader images to get the images from the bootstrap node.
ok I'll abandon this shortly. |
This is now part of: |
ocp: add baremetal capability annotation
We now set a static IP with a 'lifetime' on the provisioning interface.
We also now determine this IP from the interface itself in the containers
using the new interface.