-
Notifications
You must be signed in to change notification settings - Fork 15
Conversation
Is there a dev-scripts change to try yet that points to use this Ironic? |
I've got a hacked up local branch based on openshift-metal3/dev-scripts#564 - I'll clean that up and push it for testing - I've been working through some issues with podman (buggy errors unmounting the shared volume on pod delete) but will rebase this today. |
This is being used to test openshift-metal3/kni-installer#100
Ok this is getting closer - I had to stop using a pod because the podman/libpod version in the current RHCOS build appears buggy, and fails to unmount the shared volume on pod rm -f, probably a race or something as it works fine with individual containers. Still TODO is to inject the static NetworkManager configuration for the second nic, I think we can default this to 172.22.0.2 since the default DHCP range for the ironic dnsmasq script doesn't start until 172.22.0.10 I'm also planning to add logic to the *downloader containers to they try to download from a mirror accessible on the default gateway IP (e.g 192.168.111.1) which should mean we can run the downloader containers and httpd on the provisioning host as a cache (at least for development). @stbenjam I pushed my WIP dev-scripts branch to https://github.com/hardys/dev-scripts/tree/bootstrap_ironic2 - still needs additional work before I open a PR but it's sufficient for my local testing. |
Oh and I need to set PROVISIONING_INTERFACE when starting the ironic container, or ensure the configured interface uses the expected name |
Ok the provisioning nic is now started by startironic using nmcli, and I'm passing the interface name to the container - that exposed that we don't bind only to that interface ref metal3-io/ironic-image#56 I also switched the default Ironic endpoint - I guess if we make Ironic accessible via the public interface of the bootstrap VM we can potentially remove that completely, but for now I've pointed directly at the 172 address used in startironic.sh |
Ok another problem, ExecStart is supposed to block, but atm the script exits - coredns/keepalived work around this by creating the containers in ExecStartPre, but podman pod start won't work here due to bugs which break removal of the pod. |
f5807ab
to
e0a6a42
Compare
This is now working but I've seen the mariadb container die a couple of times, not exactly clear why yet but I'm wondering if the VM is overloaded so will re-try next week with more memory/CPU to see if that helps |
Ugh, even after increasing the RAM I'm seeing pods randomly die, and |
This is being used to test openshift-metal3/kni-installer#100
Ok the latest rebases seem to have fixed the issues I was seeing (although strangely libpod still hasn't been bumped) - this version now works for me locally. To get this ready to merge I need to resolve the FIXME in startironic where the release version is hard-coded, that should be possible to template in bootstrap.go, although I don't think the release version is currently in the template data. The other TODO item is to add a survey input for the provisioning network CIDR - we'll need to allow some way to override the default in the case where the subnet is already routable in the local environment. |
This is being used to test openshift-metal3/kni-installer#100
Urgh, spoke too soon, still hitting libpod issues, raised https://bugzilla.redhat.com/show_bug.cgi?id=1724660 |
On the next rebase (currently blocked until operator-framework/operator-sdk#1512 is merged), we'll get rhocs 420.8.20190624.0 - hopefully that has a newer libpod. |
My understanding is that we need a manual sync which will be tracked via the bz, hopefully that will happen soon then we can rebase on a new build that contains it. In the meantime I'm going to try rebuilding with coreos-assembler and installing the same (working) version I tested today via rpm-ostree override replace |
8e34abe
to
843c602
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just focused on the baseURI templating in this review - I suggest splitting that out into a separate commit, and maybe even proposing it to openshift/installer upstream now
return nil, err | ||
} | ||
} else { | ||
deployImage = releaseImage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to pass an empty string than confuse the meaning of deployImage - i.e. if it is set, it should be an rhcos image URL, not a release image pullspec
// so we can deploy the masters with the correct image from rhcos.json | ||
if installConfig.Platform.Name() == bmtypes.Name { | ||
var err error | ||
deployImage, err = rhcos.Baremetal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like
rhcosImageBaseURI = rhcos.BaseURI()
would be more clear
pkg/rhcos/baremetal.go
Outdated
) | ||
|
||
// Baremetal fetches the BaseURI where baremetal images can be downloaded from | ||
func Baremetal() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing specific to bare metal about this - I'd just call it rhcos.BaseURI()
COREOS_DOWNLOADER_IMAGE=${COREOS_DOWNLOADER_IMAGE:-"quay.io/openshift-metal3/rhcos-downloader:master"} | ||
|
||
# This image is templated in via the installer pkg/asset/ignition/bootstrap/bootstrap.go | ||
RHCOS_IMAGE_URL="{{.DeployImage}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it RHCOS_IMAGE_BASE_URL
or similar
@@ -42,6 +44,7 @@ type bootstrapTemplateData struct { | |||
PullSecret string | |||
ReleaseImage string | |||
Proxy *configv1.ProxyStatus | |||
DeployImage string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, RHCOSBaseURI
or similar would be more clear
You could leave a comment here that the baremetal platform is the only user currently
var deployImage string | ||
// baremetal requires some additional templateData to download the RHCOS image | ||
// so we can deploy the masters with the correct image from rhcos.json | ||
if installConfig.Platform.Name() == bmtypes.Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this to be bare metal specific - just make baseURI available for templating by all platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @markmc this is all now working locally for me so I kicked off a CI run via openshift-metal3/dev-scripts#642 to replicate that, but in the meantime I will rework this to address your comments, and when you're happy we can break out that commit and propose upstream (unless @stbenjam plans to push a PR with this combination of commits soon, I don't want to complicate that..)
This is being used to test openshift-metal3/kni-installer#100
@markmc 29bd776 should address your comments I think, please re-review when you get a moment. @stbenjam Note the latest commit added which swaps out IronicURI for BootstrapProvisioningIP - if you're happy with that approach I'll use the data to template the IP for startironic.sh, and subsequently we can add similar interfaces to customize the DHCP pool for provisioning? The alternative would be to pass a CIDR, but then we'll have hard-coded assumptions about which IP in the CIDR is used for each thing, I was preferring to just expose the things which need IPs separately but open to feedback on the approach. |
Hmm, testing indicates I broke something when renaming as startironic.sh gets an empty string for |
I think this approach is good for now, but I'd prefer a CIDR eventually and let us pick the IP because we'll want to assign static IP's on the provisioning network to the masters for install-gather debugging. But there's probably all kinds of things to figure out there that it should probably be it's own task so we can deal with it later. |
This allows us to template in the baseURI from data/data/rhcos.json when needed for boostrap files (the baremetal platform requires this to download the appropriate image to provision the masters)
This adds ironic to the bootstrap VM for the baremetal IPI platform.
This allows for optional caching of the images on the host running the bootstrap VM, which is particularly useful when developing to save time and bandwidth, and potentially also for offline installs in future.
This shouldn't be hard-coded, so instead we replace the URL with the appropriate data from rhcos.json. Related: openshift-metal3#37
When running ironic and related services on the VM we need additional memory so increase it to 6G.
This means we can use the IP to configure both the nic on the bootstrap VM which Ironic uses for provisioning (in a future commit) and construct the ironic_uri
If the container exists unexpectedly for any reason, then ExecStop not called, and sometimes there are "storage for container removed" errors trying to re-start it, presumably because podman cleanup removes resources for exited containers.
This allows us to template in the baseURI from data/data/rhcos.json when needed for boostrap files (the baremetal platform will soon require this to download the appropriate image to provision the masters ref openshift-metal3/kni-installer#100
@@ -208,11 +210,25 @@ func (a *Bootstrap) getTemplateData(installConfig *types.InstallConfig, proxy *c | |||
logrus.Debugf("Using internal constant for release image %s", releaseImage) | |||
} | |||
|
|||
var rhcosImageBaseURI string | |||
if ri, ok := os.LookupEnv("OPENSHIFT_INSTALL_IMAGE_BASE_URI_OVERRIDE"); ok && ri != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@derekhiggins I think it would perhaps be better if we used this override instead of CACHEURL, because it's one less thing we'll need to hard-code (and ultimately template) in startironic.sh - what are your thoughts on instead making the https://github.com/openshift-metal3/rhcos-downloader/blob/master/get-resource.sh output similar to the original repo structure, or we can cache differently in dev-scripts, such that we just set OPENSHIFT_INSTALL_IMAGE_BASE_URI_OVERRIDE in dev-scripts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cancel that, @abhinavdahiya has indicated in openshift/installer#2035 that we shouldn't add a new variable like this
Since the kni-installer repo is no longer in use and we're now tracking this via openshift/installer#2060, I'm going to close this PR. I know you'll be posting this against openshift/installer when you're ready |
First step to resolve #68
Currently WIP but feedback welcome