-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor scalesets NIC config #3188
Refactor scalesets NIC config #3188
Conversation
7e26d20
to
ccac24e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's much better than before, thanks. I wonder if we should add some more tests (unit tests might suffice?) to ensure we have IP Forwarding enabled and accelerated networking set to the correct value?
PrivateIPAddressVersion: compute.IPVersionIPv4, | ||
LoadBalancerBackendAddressPools: &backendAddressPools, | ||
nicConfigs := []compute.VirtualMachineScaleSetNetworkConfiguration{} | ||
for i, n := range vmssSpec.NetworkInterfaces { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm mistaken: if len(vmssSpec.NetworkInterfaces) == 0
this would not add any network configuration. Wondering how to handle this best. To keep code here simple, should we maybe add networkInterfaces
to the defaulter webhook if none are specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the defaulting in the webhook is already being done so there should never be a case where len(vmssSpec.NetworkInterfaces) == 0
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, true. Wonder why this didn't catch when we upgraded CAPZ.. But maybe there were other issues. Looks fine like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when the amp.Spec.Template.SubnetName
and amp.Spec.Template.AcceleratedNetworking
is not defined in the spec, will we be setting empty subnetName and dereferencing an empty pointer in the defaulting webhook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and dereferencing an empty pointer
where do you see dereferencing of a pointer in the webhook?
If accelerated networking is nil we use the value of the SKU to determine if it's supported https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/3188/files#diff-a6779b0b37d0dcdbe7dcd6f00150a22e12ac067a96269038452d27ba39349a90R603
For empty subnet name it won't be empty because it's defaulted to the Cluster subnet name if empty before it reaches this part of the code on every reconcile loop:
if m.AzureMachinePool.Spec.Template.NetworkInterfaces[0].SubnetName == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and dereferencing an empty pointer
where do you see dereferencing of a pointer in the webhook?
Sorry my bad, didn't post the question right.
My other concerns are addressed too. Thanks for pinning the info!
/lgtm
for i, n := range vmssSpec.NetworkInterfaces { | ||
nicConfig := compute.VirtualMachineScaleSetNetworkConfiguration{} | ||
nicConfig.VirtualMachineScaleSetNetworkConfigurationProperties = &compute.VirtualMachineScaleSetNetworkConfigurationProperties{} | ||
nicConfig.Name = pointer.String(vmssSpec.Name + "-nic-" + strconv.Itoa(i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nicConfig.Name = pointer.String(vmssSpec.Name + "-nic-" + strconv.Itoa(i)) | |
nicConfig.Name = pointer.String(vmssSpec.Name + "-nic-" + strconv.Itoa(i)) | |
nicConfig.EnableIPForwarding = pointer.Bool(true) |
EnableIPForwarding is necessary for the CNI to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to maintain what's there, however I'm still not convinced this should be hardcoded. For VMs it's configurable: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/services/networkinterfaces/spec.go#L211
Can you expand on "is necessary for the CNI to work"? Which CNI? What IP family?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the comment here, it's only needed for Calico + Ipv6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not the expert in CNIs but I believe it is needed for Calico with VXLAN (which is required when you use windows nodes and Calico CNI) - also in IPv4 mode. At least not having IP Forwarding enabled was the root cause of our issues when we ugpraded CAPZ.
Adding this flag as a property on the networkInterfaces object would be nice, however if it's not enabled it might be difficult to figure out whats wrong for a user.
ipconfigs := []compute.VirtualMachineScaleSetIPConfiguration{} | ||
for j := 0; j < n.PrivateIPConfigs; j++ { | ||
ipconfig := compute.VirtualMachineScaleSetIPConfiguration{ | ||
Name: pointer.String(fmt.Sprintf("ipConfig" + strconv.Itoa(j))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the semantics of the IPConfiguration API resource. Is it namespaced below the nic config and the name does therefore not matter or do we need to include e.g. vmssSpec.Name
in the name to avoid name collisions?
Asking because previous code used the vmss name in the IP configuration too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question and tested this, IPConfig name is scoped to the NIC so we don't need VMSS name in there to avoid collision.
I don't have to change this (it works as-is) but the current naming is a bit awkward since all the IP configs end up having the scale set name in it which is not really relevant (especially in the scenario where you have more than 1 NIC per VMSS). This is better aligned with Azure naming conventions.
Also verified this won't be a breaking change. Because we never update the network config on existing scale sets, this change will only apply to new NICs created going forward.
1f2d87a
to
da2995c
Compare
16d0dc2
to
dbdf066
Compare
dbdf066
to
f76a955
Compare
Started taking a look at this |
/retest |
/test pull-cluster-api-provider-azure-e2e just to see if by any chance this repros #3227 |
...and it just did /retest |
/retest |
Using this PR, I was able to bring up a cluster with multiple IPConfigs and with Azure CNI. So everything looks good to me so far, just had a concern which I have mentioned above.
|
LGTM label has been added. Git tree hash: bc990f4f7d821f4e1a44e957c12c0b4b9e5bda7f
|
/assign @jackfrancis @mboersma |
LGTM (can't resolve the conversations since I'm not part of kubernetes-sigs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
You had me at "removes 128 lines of code."
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mboersma The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
What type of PR is this?
/kind cleanup
What this PR does / why we need it: When #2411 merged it added some duplicate/dead code in the VMSS spec for network interfaces. #3243 fixed some functional issues, this PR cleans up the code to remove dead code.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3224
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
Release note: