Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor scalesets NIC config #3188

Merged
merged 1 commit into from
Mar 23, 2023

Conversation

CecileRobertMichon
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon commented Feb 17, 2023

What type of PR is this?

/kind cleanup

What this PR does / why we need it: When #2411 merged it added some duplicate/dead code in the VMSS spec for network interfaces. #3243 fixed some functional issues, this PR cleans up the code to remove dead code.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3224

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 17, 2023
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 17, 2023
@CecileRobertMichon CecileRobertMichon force-pushed the ipv6-vmss branch 2 times, most recently from 7e26d20 to ccac24e Compare February 18, 2023 00:03
Copy link
Contributor

@mweibel mweibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's much better than before, thanks. I wonder if we should add some more tests (unit tests might suffice?) to ensure we have IP Forwarding enabled and accelerated networking set to the correct value?

PrivateIPAddressVersion: compute.IPVersionIPv4,
LoadBalancerBackendAddressPools: &backendAddressPools,
nicConfigs := []compute.VirtualMachineScaleSetNetworkConfiguration{}
for i, n := range vmssSpec.NetworkInterfaces {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken: if len(vmssSpec.NetworkInterfaces) == 0 this would not add any network configuration. Wondering how to handle this best. To keep code here simple, should we maybe add networkInterfaces to the defaulter webhook if none are specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the defaulting in the webhook is already being done so there should never be a case where len(vmssSpec.NetworkInterfaces) == 0:

https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/exp/api/v1beta1/azuremachinepool_default.go#L136

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, true. Wonder why this didn't catch when we upgraded CAPZ.. But maybe there were other issues. Looks fine like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the amp.Spec.Template.SubnetName and amp.Spec.Template.AcceleratedNetworking is not defined in the spec, will we be setting empty subnetName and dereferencing an empty pointer in the defaulting webhook?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and dereferencing an empty pointer

where do you see dereferencing of a pointer in the webhook?

If accelerated networking is nil we use the value of the SKU to determine if it's supported https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/3188/files#diff-a6779b0b37d0dcdbe7dcd6f00150a22e12ac067a96269038452d27ba39349a90R603

For empty subnet name it won't be empty because it's defaulted to the Cluster subnet name if empty before it reaches this part of the code on every reconcile loop:

if m.AzureMachinePool.Spec.Template.NetworkInterfaces[0].SubnetName == "" {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and dereferencing an empty pointer

where do you see dereferencing of a pointer in the webhook?

Sorry my bad, didn't post the question right.

My other concerns are addressed too. Thanks for pinning the info!

/lgtm

azure/services/scalesets/scalesets.go Outdated Show resolved Hide resolved
for i, n := range vmssSpec.NetworkInterfaces {
nicConfig := compute.VirtualMachineScaleSetNetworkConfiguration{}
nicConfig.VirtualMachineScaleSetNetworkConfigurationProperties = &compute.VirtualMachineScaleSetNetworkConfigurationProperties{}
nicConfig.Name = pointer.String(vmssSpec.Name + "-nic-" + strconv.Itoa(i))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nicConfig.Name = pointer.String(vmssSpec.Name + "-nic-" + strconv.Itoa(i))
nicConfig.Name = pointer.String(vmssSpec.Name + "-nic-" + strconv.Itoa(i))
nicConfig.EnableIPForwarding = pointer.Bool(true)

EnableIPForwarding is necessary for the CNI to work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to maintain what's there, however I'm still not convinced this should be hardcoded. For VMs it's configurable: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/services/networkinterfaces/spec.go#L211

Can you expand on "is necessary for the CNI to work"? Which CNI? What IP family?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the comment here, it's only needed for Calico + Ipv6

Copy link
Contributor

@mweibel mweibel Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the expert in CNIs but I believe it is needed for Calico with VXLAN (which is required when you use windows nodes and Calico CNI) - also in IPv4 mode. At least not having IP Forwarding enabled was the root cause of our issues when we ugpraded CAPZ.

Adding this flag as a property on the networkInterfaces object would be nice, however if it's not enabled it might be difficult to figure out whats wrong for a user.

ipconfigs := []compute.VirtualMachineScaleSetIPConfiguration{}
for j := 0; j < n.PrivateIPConfigs; j++ {
ipconfig := compute.VirtualMachineScaleSetIPConfiguration{
Name: pointer.String(fmt.Sprintf("ipConfig" + strconv.Itoa(j))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the semantics of the IPConfiguration API resource. Is it namespaced below the nic config and the name does therefore not matter or do we need to include e.g. vmssSpec.Name in the name to avoid name collisions?

Asking because previous code used the vmss name in the IP configuration too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question and tested this, IPConfig name is scoped to the NIC so we don't need VMSS name in there to avoid collision.

I don't have to change this (it works as-is) but the current naming is a bit awkward since all the IP configs end up having the scale set name in it which is not really relevant (especially in the scenario where you have more than 1 NIC per VMSS). This is better aligned with Azure naming conventions.

Also verified this won't be a breaking change. Because we never update the network config on existing scale sets, this change will only apply to new NICs created going forward.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2023
@CecileRobertMichon CecileRobertMichon changed the title [WIP] Enable ipv6 for vmss Refactor scalesets NIC config Mar 20, 2023
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 20, 2023
@CecileRobertMichon
Copy link
Contributor Author

@mweibel thanks for the review, I removed the ipv6 change from this PR since it's not quite working yet (it's blocked on #3221), so this is ready for review now.

@nawazkh
Copy link
Member

nawazkh commented Mar 20, 2023

Started taking a look at this
/assign

@CecileRobertMichon
Copy link
Contributor Author

/retest

@CecileRobertMichon
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e

just to see if by any chance this repros #3227

@CecileRobertMichon
Copy link
Contributor Author

just to see if by any chance this repros #3227

...and it just did

/retest

@CecileRobertMichon
Copy link
Contributor Author

/retest

@nawazkh
Copy link
Member

nawazkh commented Mar 21, 2023

Using this PR, I was able to bring up a cluster with multiple IPConfigs and with Azure CNI. So everything looks good to me so far, just had a concern which I have mentioned above.

❯ clusterctl describe cluster azure-cni-v1-30372
No default config file available
NAME                                                                   READY  SEVERITY  REASON  SINCE  MESSAGE
Cluster/azure-cni-v1-30372                                             True                     26m
├─ClusterInfrastructure - AzureCluster/azure-cni-v1-30372              True                     29m
├─ControlPlane - KubeadmControlPlane/azure-cni-v1-30372-control-plane  True                     26m
│ └─Machine/azure-cni-v1-30372-control-plane-zpxs8                     True                     26m
└─Workers
  └─MachineDeployment/azure-cni-v1-30372-md-0                          True                     24m
    └─2 Machines...                                                    True                     24m    See azure-cni-v1-30372-md-0-7459f94f6b-7kmg8, azure-cni-v1-30372-md-0-7459f94f6b-c4rxg

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 21, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: bc990f4f7d821f4e1a44e957c12c0b4b9e5bda7f

@CecileRobertMichon
Copy link
Contributor Author

/assign @jackfrancis @mboersma

@mweibel
Copy link
Contributor

mweibel commented Mar 22, 2023

LGTM

(can't resolve the conversations since I'm not part of kubernetes-sigs)

Copy link
Contributor

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

You had me at "removes 128 lines of code."

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2023
@k8s-ci-robot k8s-ci-robot merged commit 4622873 into kubernetes-sigs:main Mar 23, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.9 milestone Mar 23, 2023
@nawazkh
Copy link
Member

nawazkh commented Mar 23, 2023

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MachinePools/VMSS NetworkInterfaces wrong configuration
6 participants