Skip to content
This repository has been archived by the owner on Jul 23, 2019. It is now read-only.

WIP add Ironic to the bootstrap VM #100

Closed
wants to merge 7 commits into from

Conversation

hardys
Copy link

@hardys hardys commented Jun 10, 2019

First step to resolve #68

Currently WIP but feedback welcome

@stbenjam
Copy link
Member

Is there a dev-scripts change to try yet that points to use this Ironic?

@hardys
Copy link
Author

hardys commented Jun 12, 2019

Is there a dev-scripts change to try yet that points to use this Ironic?

I've got a hacked up local branch based on openshift-metal3/dev-scripts#564 - I'll clean that up and push it for testing - I've been working through some issues with podman (buggy errors unmounting the shared volume on pod delete) but will rebase this today.

@hardys hardys force-pushed the bootstrap_ironic branch from a184fa4 to c0635d3 Compare June 12, 2019 17:53
hardys pushed a commit to hardys/dev-scripts that referenced this pull request Jun 12, 2019
@hardys
Copy link
Author

hardys commented Jun 12, 2019

Ok this is getting closer - I had to stop using a pod because the podman/libpod version in the current RHCOS build appears buggy, and fails to unmount the shared volume on pod rm -f, probably a race or something as it works fine with individual containers.

Still TODO is to inject the static NetworkManager configuration for the second nic, I think we can default this to 172.22.0.2 since the default DHCP range for the ironic dnsmasq script doesn't start until 172.22.0.10

I'm also planning to add logic to the *downloader containers to they try to download from a mirror accessible on the default gateway IP (e.g 192.168.111.1) which should mean we can run the downloader containers and httpd on the provisioning host as a cache (at least for development).

@stbenjam I pushed my WIP dev-scripts branch to https://github.com/hardys/dev-scripts/tree/bootstrap_ironic2 - still needs additional work before I open a PR but it's sufficient for my local testing.

@hardys
Copy link
Author

hardys commented Jun 12, 2019

Oh and I need to set PROVISIONING_INTERFACE when starting the ironic container, or ensure the configured interface uses the expected name

@hardys hardys force-pushed the bootstrap_ironic branch from c0635d3 to b319acf Compare June 13, 2019 10:28
@hardys
Copy link
Author

hardys commented Jun 13, 2019

Ok the provisioning nic is now started by startironic using nmcli, and I'm passing the interface name to the container - that exposed that we don't bind only to that interface ref metal3-io/ironic-image#56

I also switched the default Ironic endpoint - I guess if we make Ironic accessible via the public interface of the bootstrap VM we can potentially remove that completely, but for now I've pointed directly at the 172 address used in startironic.sh

@hardys
Copy link
Author

hardys commented Jun 13, 2019

Ok another problem, ExecStart is supposed to block, but atm the script exits - coredns/keepalived work around this by creating the containers in ExecStartPre, but podman pod start won't work here due to bugs which break removal of the pod.

@hardys hardys force-pushed the bootstrap_ironic branch 2 times, most recently from f5807ab to e0a6a42 Compare June 14, 2019 15:42
@hardys
Copy link
Author

hardys commented Jun 14, 2019

This is now working but I've seen the mariadb container die a couple of times, not exactly clear why yet but I'm wondering if the VM is overloaded so will re-try next week with more memory/CPU to see if that helps

@hardys
Copy link
Author

hardys commented Jun 17, 2019

Ugh, even after increasing the RAM I'm seeing pods randomly die, and podman logs then says no such file or directory from overlayfs - all of this works fine on CentOS with a newer libpod version, so I'm suspecting it's the old 1.0.2-dev podman stuff causing this, but not yet figured out how to rebuild the coreos image with a newer libpod.

@hardys hardys force-pushed the bootstrap_ironic branch from e26cd91 to 3acca4e Compare June 25, 2019 13:04
hardys pushed a commit to hardys/dev-scripts that referenced this pull request Jun 25, 2019
@hardys hardys force-pushed the bootstrap_ironic branch from 3acca4e to e400bee Compare June 25, 2019 15:05
@hardys
Copy link
Author

hardys commented Jun 25, 2019

Ok the latest rebases seem to have fixed the issues I was seeing (although strangely libpod still hasn't been bumped) - this version now works for me locally.

To get this ready to merge I need to resolve the FIXME in startironic where the release version is hard-coded, that should be possible to template in bootstrap.go, although I don't think the release version is currently in the template data.

The other TODO item is to add a survey input for the provisioning network CIDR - we'll need to allow some way to override the default in the case where the subnet is already routable in the local environment.

hardys pushed a commit to hardys/dev-scripts that referenced this pull request Jun 26, 2019
@hardys
Copy link
Author

hardys commented Jun 27, 2019

Urgh, spoke too soon, still hitting libpod issues, raised https://bugzilla.redhat.com/show_bug.cgi?id=1724660

@stbenjam
Copy link
Member

On the next rebase (currently blocked until operator-framework/operator-sdk#1512 is merged), we'll get rhocs 420.8.20190624.0 - hopefully that has a newer libpod.

@hardys
Copy link
Author

hardys commented Jun 27, 2019

On the next rebase (currently blocked until operator-framework/operator-sdk#1512 is merged), we'll get rhocs 420.8.20190624.0 - hopefully that has a newer libpod.

My understanding is that we need a manual sync which will be tracked via the bz, hopefully that will happen soon then we can rebase on a new build that contains it.

In the meantime I'm going to try rebuilding with coreos-assembler and installing the same (working) version I tested today via rpm-ostree override replace

Copy link
Contributor

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just focused on the baseURI templating in this review - I suggest splitting that out into a separate commit, and maybe even proposing it to openshift/installer upstream now

return nil, err
}
} else {
deployImage = releaseImage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to pass an empty string than confuse the meaning of deployImage - i.e. if it is set, it should be an rhcos image URL, not a release image pullspec

// so we can deploy the masters with the correct image from rhcos.json
if installConfig.Platform.Name() == bmtypes.Name {
var err error
deployImage, err = rhcos.Baremetal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like

rhcosImageBaseURI = rhcos.BaseURI()

would be more clear

)

// Baremetal fetches the BaseURI where baremetal images can be downloaded from
func Baremetal() (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing specific to bare metal about this - I'd just call it rhcos.BaseURI()

COREOS_DOWNLOADER_IMAGE=${COREOS_DOWNLOADER_IMAGE:-"quay.io/openshift-metal3/rhcos-downloader:master"}

# This image is templated in via the installer pkg/asset/ignition/bootstrap/bootstrap.go
RHCOS_IMAGE_URL="{{.DeployImage}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it RHCOS_IMAGE_BASE_URL or similar

@@ -42,6 +44,7 @@ type bootstrapTemplateData struct {
PullSecret string
ReleaseImage string
Proxy *configv1.ProxyStatus
DeployImage string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, RHCOSBaseURI or similar would be more clear

You could leave a comment here that the baremetal platform is the only user currently

var deployImage string
// baremetal requires some additional templateData to download the RHCOS image
// so we can deploy the masters with the correct image from rhcos.json
if installConfig.Platform.Name() == bmtypes.Name {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this to be bare metal specific - just make baseURI available for templating by all platforms

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @markmc this is all now working locally for me so I kicked off a CI run via openshift-metal3/dev-scripts#642 to replicate that, but in the meantime I will rework this to address your comments, and when you're happy we can break out that commit and propose upstream (unless @stbenjam plans to push a PR with this combination of commits soon, I don't want to complicate that..)

@hardys hardys force-pushed the bootstrap_ironic branch from 843c602 to e019bc1 Compare July 16, 2019 12:51
hardys pushed a commit to hardys/dev-scripts that referenced this pull request Jul 16, 2019
@hardys hardys force-pushed the bootstrap_ironic branch from e019bc1 to ce9d9f6 Compare July 17, 2019 16:04
@hardys
Copy link
Author

hardys commented Jul 17, 2019

@markmc 29bd776 should address your comments I think, please re-review when you get a moment.

@stbenjam Note the latest commit added which swaps out IronicURI for BootstrapProvisioningIP - if you're happy with that approach I'll use the data to template the IP for startironic.sh, and subsequently we can add similar interfaces to customize the DHCP pool for provisioning?

The alternative would be to pass a CIDR, but then we'll have hard-coded assumptions about which IP in the CIDR is used for each thing, I was preferring to just expose the things which need IPs separately but open to feedback on the approach.

@hardys
Copy link
Author

hardys commented Jul 17, 2019

Hmm, testing indicates I broke something when renaming as startironic.sh gets an empty string for {{.RHCOSBaseURI}}

@stbenjam
Copy link
Member

stbenjam commented Jul 17, 2019

@stbenjam Note the latest commit added which swaps out IronicURI for BootstrapProvisioningIP - if you're happy with that approach I'll use the data to template the IP for startironic.sh, and subsequently we can add similar interfaces to customize the DHCP pool for provisioning?

The alternative would be to pass a CIDR, but then we'll have hard-coded assumptions about which IP in the CIDR is used for each thing, I was preferring to just expose the things which need IPs separately but open to feedback on the approach.

I think this approach is good for now, but I'd prefer a CIDR eventually and let us pick the IP because we'll want to assign static IP's on the provisioning network to the masters for install-gather debugging. But there's probably all kinds of things to figure out there that it should probably be it's own task so we can deal with it later.

Steven Hardy added 7 commits July 18, 2019 09:35
This allows us to template in the baseURI from data/data/rhcos.json
when needed for boostrap files (the baremetal platform requires this
to download the appropriate image to provision the masters)
This adds ironic to the bootstrap VM for the baremetal IPI platform.
This allows for optional caching of the images on the host running
the bootstrap VM, which is particularly useful when developing
to save time and bandwidth, and potentially also for offline installs
in future.
This shouldn't be hard-coded, so instead we replace the URL with the
appropriate data from rhcos.json.

Related: openshift-metal3#37
When running ironic and related services on the VM we need additional
memory so increase it to 6G.
This means we can use the IP to configure both the nic on the bootstrap
VM which Ironic uses for provisioning (in a future commit) and construct
the ironic_uri
If the container exists unexpectedly for any reason, then
ExecStop not called, and sometimes there are "storage for container removed"
errors trying to re-start it, presumably because podman cleanup
removes resources for exited containers.
@hardys hardys force-pushed the bootstrap_ironic branch from ce9d9f6 to 7c7d13d Compare July 18, 2019 09:47
hardys pushed a commit to hardys/installer that referenced this pull request Jul 18, 2019
This allows us to template in the baseURI from data/data/rhcos.json
when needed for boostrap files (the baremetal platform will soon require
this to download the appropriate image to provision the masters ref
openshift-metal3/kni-installer#100
@@ -208,11 +210,25 @@ func (a *Bootstrap) getTemplateData(installConfig *types.InstallConfig, proxy *c
logrus.Debugf("Using internal constant for release image %s", releaseImage)
}

var rhcosImageBaseURI string
if ri, ok := os.LookupEnv("OPENSHIFT_INSTALL_IMAGE_BASE_URI_OVERRIDE"); ok && ri != "" {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekhiggins I think it would perhaps be better if we used this override instead of CACHEURL, because it's one less thing we'll need to hard-code (and ultimately template) in startironic.sh - what are your thoughts on instead making the https://github.com/openshift-metal3/rhcos-downloader/blob/master/get-resource.sh output similar to the original repo structure, or we can cache differently in dev-scripts, such that we just set OPENSHIFT_INSTALL_IMAGE_BASE_URI_OVERRIDE in dev-scripts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancel that, @abhinavdahiya has indicated in openshift/installer#2035 that we shouldn't add a new variable like this

@russellb
Copy link
Member

Since the kni-installer repo is no longer in use and we're now tracking this via openshift/installer#2060, I'm going to close this PR. I know you'll be posting this against openshift/installer when you're ready

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically launch Ironic within the bootstrap VM to provision masters
5 participants