-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for configurable Network Interfaces #2411
Conversation
Hi @brianlieberman. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Haven't looked at the code yet, but thanks for opening this! One question about the UX:
If a user needs to pre-allocate say 250 IPs to work with Azure CNI in a 250 max pods per node configuration, how would this scale? It might make sense to allow the user to specify an IP count instead? |
Yeah ,as currently implemented it doesn't scale to that level well from a UX perspective, there's a tradeoff since currently you can specify a configuration for each IPConfig (publicIP, privateIP, allocatePublicIP). In the example we leave them empty to just provision a default of a dynamic privateIP, but that's the tradeoff we might make in order to make something like an int we can iterate instead. Open to implementation suggestions for improvement. |
I like the idea of simply exposing a number of ipConfigs too. Definitely makes for a nicer UX. Maybe just have a couple integers? |
api/v1beta1/azuremachine_webhook.go
Outdated
@@ -136,6 +136,20 @@ func (m *AzureMachine) ValidateUpdate(oldRaw runtime.Object) error { | |||
) | |||
} | |||
|
|||
if !reflect.DeepEqual(m.Spec.SubnetName, old.Spec.SubnetName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this new immutability requirement for SubnetName
closely related to this change? Should it be a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The immutability isn't strictly changed with these changes, however it appeared to be an oversight I believe while looking at managing and validating the new interface level field?
azure/scope/machine.go
Outdated
spec.SubnetName = n.SubnetName | ||
|
||
// Check for control plane interface setup on interface 0 | ||
if m.Role() == infrav1.ControlPlane && i == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a guarantee that the first NetworkInterface
in the m.AzureMachine.Spec.NetworkInterfaces
slice will always be eth0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about this as well. However, in my testing so far, I have been unable to cause the Linux interface order within the default CAPZ Ubuntu image to be anything other than matching the interface order in the Azure APIs. I have not had any luck so far on identifying the mechanism that is ensuring this though. I believe the ordering is consistent due to the behavior of the kernel module for the virtual network interface card, but I haven't specifically found the logic to say that definitively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any other more reliable indication (e.g., n.Name == "eth0"
) that could be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a feature of the kernel driver, cloud-init, or netplan that allows us to take advantage of that convention? Do you happen to have a link to some docs or code that can lend clarity there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a guarantee that the first network interface in the list from the user-specified yaml is the first interface in the Azure API. We are taking advantage of the inherent stability of Go slice element order to ensure that part doesn't get mixed up. The only part I can't say for certain is that the first interface in the slice will always be eth0 in the Linux OS. I do believe it to be the case, and it is the most reliable mechanism I have found so far. I have been unable to trigger (even after trying many times) eth0 in the Linux OS being anything other than the first NetworkInterface in the slice. I think this is likely the best way to signal to the OS the desired linux interface order, since the order in the API would then map to the ethX index in the OS 1 to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would definitely prefer a more reliable indication. Maybe we could make a helper function in the scope for getting eth0? That way it's less error prone and more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you are proposing? Utilizing the interface index appears to be the only current reliable way of ensuring your intended interface order in the OS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh re-reading your comment I think I understand. The intent is to improve the code readability. Makes sense to me to have a helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if we are able to guarantee ordinality, that the first interface reported from the linux API is going to be eth0. So a convenience func + UT that ensures that ordinality should be sufficient.
Thanks!
Reworked the IPConfig functionality to use integer quantities and updated the example in the PR description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this PR, something that touches this much code definitely takes a while to put together. Here are some suggestions on some tweaks to make.
api/v1beta1/azuremachine_types.go
Outdated
@@ -114,6 +114,8 @@ type AzureMachineSpec struct { | |||
// SubnetName selects the Subnet where the VM will be placed | |||
// +optional | |||
SubnetName string `json:"subnetName,omitempty"` | |||
|
|||
NetworkInterfaces []AzureNetworkInterface `json:"networkInterfaces,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an optional field? Just to clarify, if apply an existing template w/o the NetworkInterfaces
field do we want it to give an error or just create an empty slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we're going to want to add a header comment for this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to making this optional
Can you also please add a clarifying comment about what happens with Network Interfaces are not specified?
api/v1beta1/types.go
Outdated
// Network Interfaces to attach to each VM | ||
// +optional | ||
type AzureNetworkInterface struct { | ||
// The subnet to place the interface in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The subnet to place the interface in | |
// The subnet to place the interface in. |
Could we end the comments with periods? Not sure if the linters will complain but it's more consistent in any case.
api/v1beta1/types.go
Outdated
|
||
// IP Configuration defines options to confiure a network interface. | ||
type AzureIPConfig struct { | ||
PrivateIP string `json:"privateIP,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add comments for these fields as well?
api/v1beta1/types.go
Outdated
ID string `json:"id,omitempty"` | ||
} | ||
|
||
// IP Configuration defines options to confiure a network interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// IP Configuration defines options to confiure a network interface. | |
// AzureIPConfig defines options to configure a network interface. |
api/v1beta1/types.go
Outdated
@@ -579,6 +579,37 @@ type SubnetSpec struct { | |||
SubnetClassSpec `json:",inline"` | |||
} | |||
|
|||
// Network Interfaces to attach to each VM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Network Interfaces to attach to each VM | |
// AzureNetworkInterface defineds a network interface to attach to each VM. |
Typically we want to start the documentation comment with the name of the type/function. I believe the linters will complain about it if it doesn't match.
Subnet: &compute.APIEntityReference{ | ||
ID: to.StringPtr("/subscriptions/123/resourceGroups/my-rg/providers/Microsoft.Network/virtualNetworks/my-vnet/subnets/subnet2"), | ||
}, | ||
//LoadBalancerBackendAddressPools: &[]compute.SubResource{{ID: to.StringPtr("/subscriptions/123/resourceGroups/my-rg/providers/Microsoft.Network/loadBalancers/capz-lb/backendAddressPools/backendPool")}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
@@ -17,6 +17,7 @@ limitations under the License. | |||
package v1alpha4 | |||
|
|||
import ( | |||
machineryConversion "k8s.io/apimachinery/pkg/conversion" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
machineryConversion "k8s.io/apimachinery/pkg/conversion" | |
apiMachineryConversion "k8s.io/apimachinery/pkg/conversion" |
Nit: for consistency with import name in other conversions.
@@ -87,6 +87,10 @@ type ( | |||
// SubnetName selects the Subnet where the VMSS will be placed | |||
// +optional | |||
SubnetName string `json:"subnetName,omitempty"` | |||
|
|||
// Network Interfaces to attach to the to a virtual machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Network Interfaces to attach to the to a virtual machine | |
// NetworkInterfaces specifies the network interfaces to attach to the to a virtual machine. |
/ok-to-test |
api/v1beta1/types.go
Outdated
@@ -579,6 +579,42 @@ type SubnetSpec struct { | |||
SubnetClassSpec `json:",inline"` | |||
} | |||
|
|||
// AzureNetworkInterface defineds a network interface. | |||
// +optional | |||
type AzureNetworkInterface struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just call this NetworkInterface
. It's assumed that everything in here is in the context of Azure as this is the Azure provider and the types are used in AzureMachines
and AzureClusters
.
api/v1beta1/types.go
Outdated
@@ -579,6 +579,42 @@ type SubnetSpec struct { | |||
SubnetClassSpec `json:",inline"` | |||
} | |||
|
|||
// AzureNetworkInterface defineds a network interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// AzureNetworkInterface defineds a network interface. | |
// AzureNetworkInterface defines a network interface. |
api/v1beta1/types.go
Outdated
} | ||
|
||
// AzureIPConfig defines options to confiure a network interface. | ||
type AzureIPConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, preferrence forIPConfig
for consistency
api/v1beta1/types.go
Outdated
ID string `json:"id,omitempty"` | ||
} | ||
|
||
// AzureIPConfig defines options to confiure a network interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// AzureIPConfig defines options to confiure a network interface. | |
// AzureIPConfig defines options to configure a network interface. |
api/v1beta1/azuremachine_types.go
Outdated
@@ -114,6 +114,8 @@ type AzureMachineSpec struct { | |||
// SubnetName selects the Subnet where the VM will be placed | |||
// +optional | |||
SubnetName string `json:"subnetName,omitempty"` | |||
|
|||
NetworkInterfaces []AzureNetworkInterface `json:"networkInterfaces,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to making this optional
Can you also please add a clarifying comment about what happens with Network Interfaces are not specified?
api/v1beta1/types.go
Outdated
// The subnet to place the interface in. | ||
SubnetName string `json:"subnetName,omitempty"` | ||
|
||
// Number of private IP address to attach to the interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Number of private IP address to attach to the interface. | |
// Number of private IP addresses to attach to the interface. |
api/v1beta1/types.go
Outdated
// +optional | ||
PrivateIPConfigs int `json:"privateIPConfigs,omitempty"` | ||
|
||
// Number of public IP addresses to attach to the interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: comment name should always start with field name (not sure if linter will catch this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing applies to other type comments in the file
api/v1beta1/types.go
Outdated
} | ||
|
||
// AzureIPConfig defines options to confiure a network interface. | ||
type AzureIPConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this type being used? I can't find a reference to it
azure/scope/machine.go
Outdated
@@ -660,7 +716,7 @@ func (m *MachineScope) SetSubnetName() error { | |||
subnetName = subnet.Name | |||
} | |||
} | |||
if subnetCount == 0 || subnetCount > 1 || subnetName == "" { | |||
if (subnetCount == 0 || subnetCount > 1 || subnetName == "") && len(m.AzureMachine.Spec.NetworkInterfaces) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(m.AzureMachine.Spec.NetworkInterfaces) != 0
, this function would cause the AzureMachine subnet name to get set, which is not a valid per webhook validation as subnet name cannot be set if network interfaces are defined. Am I missing something?
(feel free to ignore this comment if you implement my above comment about deprecating AzureMachine.Spec.SubnetName)
azure/scope/machine.go
Outdated
AcceleratedNetworking: m.AzureMachine.Spec.AcceleratedNetworking, | ||
IPv6Enabled: m.IsIPv6Enabled(), | ||
EnableIPForwarding: m.AzureMachine.Spec.EnableIPForwarding, | ||
SubnetName: m.Subnet().Name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this is preferable, we're looping over all the subnets to find the one whose Name
matches the AzureMachine SubnetName
so it's equivalent but more costly than just using m.AzureMachine.Spec.SubnetName
azure/scope/machine.go
Outdated
continue | ||
} | ||
if i == 0 { | ||
spec = m.DefaultNICSpec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear to me why we're using the defaults for index 0 but not for the others. What's significant about the order the network interfaces are specified in? What's different between the default spec and the fields we're setting below in the else
statement?
azure/scope/machine.go
Outdated
for i, n := range m.AzureMachine.Spec.NetworkInterfaces { | ||
var spec *networkinterfaces.NICSpec | ||
if n.ID != "" { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this deviates significantly from the pattern we've established for other BYO existing resource in CAPZ. The way it works for other resources is that the reconciler will add a tag on each resource it creates to allow it to identify the resources it created vs. the ones the user pre-created. See the virtual networks as an example. There are pros and cons for both ways of doing it, however because this is creating an inconsistent user experience I would recommend putting BYO network interface as out of scope for this PR and tackling it in its own issue/PR so we can iterate on the design.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending few minor comments
/assign @nawazkh for another round of review
Docs are missing but will work on docs + e2e tests with @nawazkh as part of #467
@dthorsen thanks so much for doing the refactor and staying with us. Let's get this merged before January 10th so it can be included in v1.7.0. Could you please squash commits?
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jotted down my thoughts on another pass. Please take a look!
@@ -128,5 +132,6 @@ func (r *AzureMachineTemplate) Default(ctx context.Context, obj runtime.Object) | |||
} | |||
t.Spec.Template.Spec.SetDefaultCachingType() | |||
t.Spec.Template.Spec.SetDataDisksDefaults() | |||
t.Spec.Template.Spec.SetNetworkInterfacesDefaults() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we miss updating the validateUpdate function for the new NetworkInterface updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
azuremachinetemplate is immutable. And the new defaulting behavior should be permitted already through the immutability by the logic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pinning that logic!
DataDisks: []DataDisk{}, | ||
SSHPublicKey: "fake ssh key", | ||
SubnetName: "subnet1", | ||
AcceleratedNetworking: to.BoolPtr(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a test with AcceleratedNetworking: to.BoolPtr(true)
and SubnetName is ''
and NetworkInterface is not nil
?
PrivateIP string | ||
PublicIP bool | ||
PublicIPAddress string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead convert IPConfig struct as below? We will have one less field to manage and a nil pointer will intrinsicly imply weather it is assigned or not?
PrivateIP string | |
PublicIP bool | |
PublicIPAddress string | |
PrivateIP *string | |
PublicIPAddress *string |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@nawazkh I believe I addressed all of your comments. Let me know if you have any other comments before I squash and push commits. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mega job @dthorsen ! Thank you for patiently working on this and addressing all the nits 🙌🏼
/lgtm
Co-authored-by: Dane Thorsen <[email protected]>
Persisting lgtm |
/retest |
@brianlieberman: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
both of these test failures are known flakes getting worked on independently :( /retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CecileRobertMichon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/kind feature
What this PR does / why we need it:
This PR adds the ability to configure one or more NetworkInterfaces for AzureMachines and AzureMachinePools, as well as pre-warming multiple IPConfigs on each interface.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2327
TODOs:
Release note: