-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support preexisting network interfaces for AzureMachines #3575
Conversation
4109c18
to
efa6079
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #3575 +/- ##
==========================================
+ Coverage 53.17% 53.55% +0.38%
==========================================
Files 183 185 +2
Lines 18371 18587 +216
==========================================
+ Hits 9769 9955 +186
- Misses 8054 8085 +31
+ Partials 548 547 -1
☔ View full report in Codecov by Sentry. |
The failing apidiff test is only in the experimental AzureMachinePool resource. It is only a struct name change. The fields are identical to before so this isn't truly a breaking API change. The change was necessary because the bring your own NIC feature is not applicable to MachinePools. VMSS doesn't support such a thing and it doesn't really make sense anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @dthorsen! I left a few comments below. Thanks so much for working on this feature 🚀
What about VMSS in Flexible Orchestration mode? https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes#basic-setup |
I was unaware of Flexible Orchestration mode. Thank you for pointing that out @CecileRobertMichon. I will have to do some reading to better understand their usage. |
As far as I can tell all comments on this PR are addressed. It should be ready for merging unless anyone has additional comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Apologies for the late follow up here @dthorsen. We'll do our best to review this and get it merged before the release!
LGTM label has been added. Git tree hash: 2fb4ef03d8e53eb473393a4f23cd8724f7babcd7
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits, over looks good to me!
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@dthorsen: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@dthorsen this seems too disruptive of a breaking change for a minor release, what is the impact on existing users who do not wish to use this new feature? Edit: also this should say v1.10.0 (or v1.11.0 if it doesn't make it to 1.10) |
@@ -105,6 +111,12 @@ func (s *Service) Delete(ctx context.Context) error { | |||
// Order of precedence (highest -> lowest) is: error that is not an operationNotDoneError (i.e. error deleting) -> operationNotDoneError (i.e. deleting in progress) -> no error (i.e. deleted) | |||
var result error | |||
for _, nicSpec := range specs { | |||
if managed, err := s.isManaged(ctx, nicSpec); err == nil && !managed { | |||
log.V(4).Info("Skipping network interface deletion in custom network interface mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be an opportunity to make logging more visible and add a link to https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/3575/files#diff-a109f59250df65bd0d5487b914d0bce7b1a07717583e9414cfe2c5e11f4fcf13R45 for users that may not have read the release notes or docs
@CecileRobertMichon Regarding how disruptive this change is, it is a little difficult to gauge impact. This is only an issue for users that have hopped from v1.5.0 directly to v1.10.0 / probably v1.11.0 now. There is also precedent for this change in the past. Example here: https://capz.sigs.k8s.io/topics/custom-dns.html?highlight=owned#manage-dns-via-capz-tool
No, capz resource ownership tagging for network interfaces was introduced by an earlier PR done by someone else, released in v1.5.0. Ultimately, the impact would simply be that some network interfaces could be left in existence after AzureMachine deletion if they were created prior to v1.5.0, but deleted in v1.11.0 or later, and that is only if they do not tag the older network interfaces prior to upgrading to v1.11.0. If this change is deemed to disruptive for a minor release, then perhaps we could put it behind a feature flag in v1.11.0? |
Discussed this during office hours, this only impacts users using BYO group for cluster delete, but impacts all users for node deletion (scale down, rolling upgrade). We should make the logging for skipping deleting v(1) and point to the doc with the instructions for tagging existing NICs. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Just had a some small things.
// If you specify an existing Azure NetworkInterface, CAPZ will attach and detach the interface from the VM when the VM is created and deleted, | ||
// but CAPZ will not delete the NetworkInterface if the VM is deleted. | ||
// +optional | ||
Name *string `json:"name,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a string pointer instead of a string? I would assume we can tell it's unspecified by an empty string, meaning it can't be a network interface name.
|
||
// ResourceGroup optionally specifies the resource group for the NetworkInterface resource. | ||
// +optional | ||
ResourceGroup *string `json:"resourceGroup,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
// MachinePoolNetworkInterface defines a VMSS network interface profile. | ||
type MachinePoolNetworkInterface struct { | ||
// SubnetName specifies the subnet in which the new network interface will be placed. | ||
SubnetName string `json:"subnetName,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could drop the omitempty
tag if this is field not optional.
nicName := azure.GenerateNICName(m.Name(), isMultiNIC, i) | ||
nicSpecs = append(nicSpecs, m.BuildNICSpec(nicName, m.AzureMachine.Spec.NetworkInterfaces[i], isPrimary)) | ||
|
||
if nic.Name != nil && *nic.Name != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the earlier comment, I think we could probably make nic.Name
a string
instead of *string
in this case.
if spec == nil { | ||
return false, errors.New("cannot get network interface to check if it is managed: spec is nil") | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we drop newline for spacing consistency? And same for the other if
blocks below in this function.
|
||
The pre-existing network interface can be in the same resource group or a different resource group in the same | ||
subscription as the target cluster. When deleting the `AzureMachine`, the network interface will only be | ||
deleted if they are "managed" by capz, ie. they were created during `AzureMachine` deployment. Pre-existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted if they are "managed" by capz, ie. they were created during `AzureMachine` deployment. Pre-existing | |
deleted if they are "managed" by CAPZ, i.e. they were created during `AzureMachine` deployment. Pre-existing |
deleted if they are "managed" by capz, ie. they were created during `AzureMachine` deployment. Pre-existing | ||
network interfaces will *not* be deleted. If a resource group is specified, it must already exist. CAPZ will | ||
*not* create or delete a resource group that is specific to a network interface. If the resource group is | ||
omitted, it will default to the resourceGroup of the `AzureCluster`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omitted, it will default to the resourceGroup of the `AzureCluster`. | |
omitted, it will default to the resource group of the `AzureCluster`. |
@@ -0,0 +1,57 @@ | |||
# Custom Network Interfaces for AzureMachines | |||
|
|||
## Pre-existing Network Interfaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newline for consistency with other docs in the repo
## Pre-existing Network Interfaces | |
## Pre-existing Network Interfaces | |
*not* create or delete a resource group that is specific to a network interface. If the resource group is | ||
omitted, it will default to the resourceGroup of the `AzureCluster`. | ||
|
||
## Custom Network Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Custom Network Interface | |
## Custom Network Interface | |
omitted, it will default to the resourceGroup of the `AzureCluster`. | ||
|
||
## Custom Network Interface | ||
Alternatively, if you specify a network interface name and optionally a resource group, but the network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, if you specify a network interface name and optionally a resource group, but the network | |
Alternatively, if you specify a network interface name and optionally a resource group but the network |
/close Please reopen once this is ready for review |
@CecileRobertMichon: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind feature
What this PR does / why we need it:
This PR adds support for using preexisting network interfaces for AzureMachines. This is useful for workloads that must always re-attach the same network interface when a Node is replaced.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3255
Special notes for your reviewer:
TODOs:
Release note: