-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add design for managing IPs on secondary networks #39
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
<!-- | ||
This work is licensed under a Creative Commons Attribution 3.0 | ||
Unported License. | ||
|
||
http://creativecommons.org/licenses/by/3.0/legalcode | ||
--> | ||
|
||
# secondary-network-ipam | ||
|
||
## Status | ||
|
||
One of: provisional | ||
|
||
## Table of Contents | ||
|
||
<!--ts--> | ||
* [secondary-network-ipam](#secondary-network-ipam) | ||
* [Status](#status) | ||
* [Table of Contents](#table-of-contents) | ||
* [Summary](#summary) | ||
* [Motivation](#motivation) | ||
* [Goals](#goals) | ||
* [Non-Goals](#non-goals) | ||
* [Proposal](#proposal) | ||
* [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) | ||
* [Risks and Mitigations](#risks-and-mitigations) | ||
* [Design Details](#design-details) | ||
* [Work Items](#work-items) | ||
* [Dependencies](#dependencies) | ||
* [Test Plan](#test-plan) | ||
* [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) | ||
* [Version Skew Strategy](#version-skew-strategy) | ||
* [Drawbacks [optional]](#drawbacks-optional) | ||
* [Alternatives](#alternatives) | ||
* [References](#references) | ||
|
||
<!-- Added by: dhellmann, at: Mon Jun 17 12:54:02 EDT 2019 --> | ||
|
||
<!--te--> | ||
|
||
## Summary | ||
|
||
metal3 needs to manage IP addresses on the secondary network to ensure | ||
that supporting applications such as Ceph have persistent addresses on | ||
each host. | ||
|
||
## Motivation | ||
|
||
### Goals | ||
|
||
1. Configure secondary network interfaces on all hosts in the same way. | ||
1. Support PXE booting hosts for provisioning. | ||
1. Support static IPs on all hosts on secondary networks so the metal3 | ||
components are not locked to running on the master hosts. | ||
|
||
### Non-Goals | ||
|
||
1. Integrate with external IPAM solutions. | ||
1. Describe how to manage the IP or access to the web server with the | ||
image(s) to be provisioned. | ||
|
||
## Proposal | ||
|
||
### Implementation Details/Notes/Constraints | ||
|
||
Ceph, and potentially other supporting services that use the secondary | ||
network in some deployments, get confused if a client IP changes. We | ||
therefore want to ensure that those IPs do not change. | ||
|
||
We use dnsmasq to manage PXE booting servers during | ||
provisioning. dnsmasq will not bind to an interface managed by | ||
dhclient, so at least some of the hosts must have statically allocated | ||
IPs on the secondary network to allow us to run dnsmasq at all. This | ||
also means it is not sufficient to manage DHCP reservations to ensure | ||
a given host always receives the same IP. | ||
|
||
When we implement host discovery, we will want to allow discovered | ||
hosts to use part of the IP range on the provisioning network that is | ||
not used for static allocations so that a user does not have to clean | ||
up those static allocations for hosts they are not using in their | ||
cluster. | ||
|
||
To meet all of these requirements, we need to configure the secondary | ||
network interfaces on each host with a static IP address. | ||
|
||
### Risks and Mitigations | ||
|
||
We need to ensure the DHCP address range and static address range do | ||
not overlap. We should be able to ensure that with careful management | ||
of the CIDRs. | ||
|
||
[inwinstack/ipam](https://github.com/inwinstack/ipam) may not be | ||
stable or reliable, and we would have to either fix it, fork it, or | ||
build a replacement. | ||
|
||
## Design Details | ||
|
||
We need to divide the subnet range for the provisioning network | ||
between a set of addresses we can use for DHCP and a set for static | ||
IPs. | ||
|
||
We need the installer to allocate IPs for the master nodes as it | ||
provisions them, and to record that information in the kubernetes | ||
database so those same IPs are not used for other hosts later. | ||
|
||
We need to store the subnet CIDR and existing allocations in the | ||
kubernetes database somewhere so new IPs can be allocated when hosts | ||
are provisioned. | ||
|
||
The [inwinstack/ipam](https://github.com/inwinstack/ipam) controller | ||
provides `Pool` and `IP` resources for allocating IPs from address | ||
ranges. We should evaluate it to see if we can use it for managing the | ||
IP allocations. | ||
|
||
The machine-api-provider-baremetal controller is responsible for | ||
making decisions about how to configure a host, so it should request | ||
IPs for secondary networks, assign them to the interfaces, and pass | ||
the relevant data using the ignition configuration data. It will need | ||
to create host-specific ignition configuration resources because it | ||
will be different for each host. It should also set the `Machine` as | ||
an owner of the `IP` so that the reservation is deleted when the | ||
`Machine` is deleted. | ||
|
||
### Work Items | ||
|
||
1. Ensure the IP ranges for secondary networks are captured by the | ||
installer and saved to the kubernetes database as `Pool` resources. | ||
1. Ensure the installer registers the IP allocations for masters. | ||
1. Ensure the IPAM service is deployed along with the other metal3 | ||
components. | ||
1. Update the metal3 machine controller to allocate IPs and create | ||
host-specific ignition configurations containing the IPs. | ||
1. Create image to hold IPAM operator. | ||
1. Add IPAM operator to metal3 deployment. | ||
|
||
### Dependencies | ||
|
||
* [inwinstack/ipam](https://github.com/inwinstack/ipam) | ||
|
||
### Test Plan | ||
|
||
No special requirements | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
Add IPAM operator to deployment configuration | ||
|
||
### Version Skew Strategy | ||
|
||
N/A | ||
|
||
## Drawbacks [optional] | ||
|
||
This further complicates the configuration for the metal3 components | ||
by adding yet another container/Pod/Deployment. | ||
|
||
## Alternatives | ||
|
||
We could require an external DHCP and IPAM solution for the secondary | ||
networks, as we do for the primary network. This complicates | ||
deployments and requires more services running outside of the cluster | ||
to know about implementation details of the cluster in order to have | ||
the external DHCP server pass PXE requests to the dnsmasq instance | ||
that is part of the metal3 deployment, and which might change hosts | ||
and IPs if the pod is restarted. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, I guess we could do this, but I think it would require writing a new dhcp option backend for ironic and having some kind of agent on the dhcp server to update the configurations. Currently dnsmasq and ironic need to have access to the same filesystem. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this is an alternative so the details won't matter as much because we aren't going to do it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ironic is working with static configuration in this case, and as long as it points back to a vip that ironic can be on, then the world is a happy place. If not, then... we'll need to be able to set configuration. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK ironic only manages the pxe configuration - the dnsmasq configuration is static and in a different container to the ironic conductor? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right I see, I was confused. It's really just the TFTP config we are updating. So we could separate those if we wanted. |
||
We could monitor the DHCP reservations given by dnsmasq and ensure | ||
they are configured to be persistent, then also use those IPs to set | ||
static addresses on the hosts during provisioning. This would leave a | ||
reservation to be cleaned up when a host is removed, which might be | ||
tricky for a discovered host that is never actually provisioned. | ||
|
||
dhellmann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Have the dnsmasq container (or another container) manage an IP using a | ||
"lifetime" setting, as described in [this alternative | ||
proposal](https://github.com/metal3-io/metal3-docs/pull/38). That | ||
approach leaves an opportunity for two hosts to try to have the same | ||
IP if fencing doesn't work properly or if a timeout is to long. | ||
|
||
## References | ||
|
||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few of these work items are OpenShift / CoreOS specific, the "installer" and "ignition" references.
The machine controller is not involved today in creating the ignition config (or cloud-init config, or whatever user data is in use), so maybe this should go somewhere else.
I expect management of secondary interfaces should be done by something else, like https://github.com/nmstate/kubernetes-nmstate
an IPAM component allocating addresses for secondary network interfaces could then create the CRs that specify that the interface should be configured with that IP, and applying the configuration would be done by kubernetes-nmstate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was trying to make sure I didn't miss anything. Should I move this doc to an internal location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for it to be internal. It just depends if we come up with something that's more generally useful, or is an OpenShift specific integration thing. If it's just for OpenShift integration, the openshift-metal3 github org has a docs repo as well that hasn't been used much yet.
Speaking of OpenShift specific solutions and how to do network config ... I wonder if the existing MachineConfig resource is enough, provided by the machine-config-operator. We can create those resources to drop new config files on nodes, or to replace existing config files.