Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support preexisting network interfaces for AzureMachines #3575

Closed
wants to merge 1 commit into from

Conversation

dthorsen
Copy link
Contributor

@dthorsen dthorsen commented May 18, 2023

/kind feature

What this PR does / why we need it:
This PR adds support for using preexisting network interfaces for AzureMachines. This is useful for workloads that must always re-attach the same network interface when a Node is replaced.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3255

Special notes for your reviewer:

  • cherry-pick candidate

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Support preexisting network interfaces for AzureMachines
action required: all network interfaces created by CAPZ prior to version v1.5.0 
must be manually tagged with the resource ownership tag:
sigs.k8s.io_cluster-api-provider-azure_cluster_CLUSTERNAME: owned

@k8s-ci-robot k8s-ci-robot added release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 18, 2023
@k8s-ci-robot k8s-ci-robot requested review from Jont828 and marosset May 18, 2023 19:09
@dthorsen dthorsen force-pushed the byoNIC branch 2 times, most recently from 4109c18 to efa6079 Compare May 18, 2023 21:41
@codecov-commenter
Copy link

codecov-commenter commented May 18, 2023

Codecov Report

Patch coverage: 62.50% and project coverage change: +0.38 🎉

Comparison is base (4082e44) 53.17% compared to head (89267b8) 53.55%.

❗ Current head 89267b8 differs from pull request most recent head e3cdc20. Consider uploading reports for the commit e3cdc20 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3575      +/-   ##
==========================================
+ Coverage   53.17%   53.55%   +0.38%     
==========================================
  Files         183      185       +2     
  Lines       18371    18587     +216     
==========================================
+ Hits         9769     9955     +186     
- Misses       8054     8085      +31     
+ Partials      548      547       -1     
Impacted Files Coverage Δ
api/v1beta1/types.go 60.71% <ø> (ø)
azure/types.go 16.27% <ø> (ø)
exp/api/v1beta1/azuremachinepool_types.go 20.00% <ø> (ø)
...re/services/networkinterfaces/networkinterfaces.go 65.06% <48.57%> (-12.50%) ⬇️
azure/scope/machine.go 52.38% <100.00%> (+1.27%) ⬆️
exp/api/v1beta1/azuremachinepool_default.go 84.37% <100.00%> (ø)

... and 22 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dthorsen
Copy link
Contributor Author

The failing apidiff test is only in the experimental AzureMachinePool resource. It is only a struct name change. The fields are identical to before so this isn't truly a breaking API change. The change was necessary because the bring your own NIC feature is not applicable to MachinePools. VMSS doesn't support such a thing and it doesn't really make sense anyway.

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @dthorsen! I left a few comments below. Thanks so much for working on this feature 🚀

api/v1beta1/types.go Outdated Show resolved Hide resolved
azure/scope/machine_test.go Show resolved Hide resolved
azure/services/networkinterfaces/networkinterfaces.go Outdated Show resolved Hide resolved
config/capz/manager_image_patch.yaml Outdated Show resolved Hide resolved
config/capz/manager_pull_policy.yaml Outdated Show resolved Hide resolved
@CecileRobertMichon
Copy link
Contributor

VMSS doesn't support such a thing and it doesn't really make sense anyway.

What about VMSS in Flexible Orchestration mode? https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes#basic-setup

@dthorsen
Copy link
Contributor Author

I was unaware of Flexible Orchestration mode. Thank you for pointing that out @CecileRobertMichon. I will have to do some reading to better understand their usage.

@dthorsen
Copy link
Contributor Author

As far as I can tell all comments on this PR are addressed. It should be ready for merging unless anyone has additional comments.

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Apologies for the late follow up here @dthorsen. We'll do our best to review this and get it merged before the release!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 7, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2fb4ef03d8e53eb473393a4f23cd8724f7babcd7

Copy link
Member

@nawazkh nawazkh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, over looks good to me!

azure/services/networkinterfaces/networkinterfaces.go Outdated Show resolved Hide resolved
azure/services/networkinterfaces/networkinterfaces.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 10, 2023
@k8s-ci-robot k8s-ci-robot requested a review from willie-yao July 10, 2023 14:25
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sbueringer for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@dthorsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff e3cdc20 link false /test pull-cluster-api-provider-azure-apidiff
pull-cluster-api-provider-azure-test e3cdc20 link true /test pull-cluster-api-provider-azure-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@CecileRobertMichon
Copy link
Contributor

CecileRobertMichon commented Jul 10, 2023

action required: all network interfaces created by CAPZ prior to version v1.5.0
must be manually tagged with the resource ownership tag

@dthorsen this seems too disruptive of a breaking change for a minor release, what is the impact on existing users who do not wish to use this new feature?

Edit: also this should say v1.10.0 (or v1.11.0 if it doesn't make it to 1.10)

@@ -105,6 +111,12 @@ func (s *Service) Delete(ctx context.Context) error {
// Order of precedence (highest -> lowest) is: error that is not an operationNotDoneError (i.e. error deleting) -> operationNotDoneError (i.e. deleting in progress) -> no error (i.e. deleted)
var result error
for _, nicSpec := range specs {
if managed, err := s.isManaged(ctx, nicSpec); err == nil && !managed {
log.V(4).Info("Skipping network interface deletion in custom network interface mode")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be an opportunity to make logging more visible and add a link to https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/3575/files#diff-a109f59250df65bd0d5487b914d0bce7b1a07717583e9414cfe2c5e11f4fcf13R45 for users that may not have read the release notes or docs

@dthorsen
Copy link
Contributor Author

dthorsen commented Jul 13, 2023

action required: all network interfaces created by CAPZ prior to version v1.5.0
must be manually tagged with the resource ownership tag

@dthorsen this seems too disruptive of a breaking change for a minor release, what is the impact on existing users who do not wish to use this new feature?

Edit: also this should say v1.10.0 (or v1.11.0 if it doesn't make it to 1.10)

@CecileRobertMichon Regarding how disruptive this change is, it is a little difficult to gauge impact. This is only an issue for users that have hopped from v1.5.0 directly to v1.10.0 / probably v1.11.0 now. There is also precedent for this change in the past. Example here: https://capz.sigs.k8s.io/topics/custom-dns.html?highlight=owned#manage-dns-via-capz-tool

Private DNS when created by CAPZ can be managed by CAPZ tool itself automatically. To give the flexibility to have BYO as well as managed DNS zone, an enhancement is made that causes all the managed zones created in the CAPZ version before the enhancement changes to be treated as unmanaged. The enhancement is captured in PR 1791

To manage the private DNS via CAPZ please tag it manually from azure portal.

No, capz resource ownership tagging for network interfaces was introduced by an earlier PR done by someone else, released in v1.5.0. Ultimately, the impact would simply be that some network interfaces could be left in existence after AzureMachine deletion if they were created prior to v1.5.0, but deleted in v1.11.0 or later, and that is only if they do not tag the older network interfaces prior to upgrading to v1.11.0.

If this change is deemed to disruptive for a minor release, then perhaps we could put it behind a feature flag in v1.11.0?

@CecileRobertMichon
Copy link
Contributor

Discussed this during office hours, this only impacts users using BYO group for cluster delete, but impacts all users for node deletion (scale down, rolling upgrade). We should make the logging for skipping deleting v(1) and point to the doc with the instructions for tagging existing NICs.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 5, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@Jont828 Jont828 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Just had a some small things.

// If you specify an existing Azure NetworkInterface, CAPZ will attach and detach the interface from the VM when the VM is created and deleted,
// but CAPZ will not delete the NetworkInterface if the VM is deleted.
// +optional
Name *string `json:"name,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a string pointer instead of a string? I would assume we can tell it's unspecified by an empty string, meaning it can't be a network interface name.


// ResourceGroup optionally specifies the resource group for the NetworkInterface resource.
// +optional
ResourceGroup *string `json:"resourceGroup,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

// MachinePoolNetworkInterface defines a VMSS network interface profile.
type MachinePoolNetworkInterface struct {
// SubnetName specifies the subnet in which the new network interface will be placed.
SubnetName string `json:"subnetName,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could drop the omitempty tag if this is field not optional.

nicName := azure.GenerateNICName(m.Name(), isMultiNIC, i)
nicSpecs = append(nicSpecs, m.BuildNICSpec(nicName, m.AzureMachine.Spec.NetworkInterfaces[i], isPrimary))

if nic.Name != nil && *nic.Name != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the earlier comment, I think we could probably make nic.Name a string instead of *string in this case.

if spec == nil {
return false, errors.New("cannot get network interface to check if it is managed: spec is nil")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we drop newline for spacing consistency? And same for the other if blocks below in this function.


The pre-existing network interface can be in the same resource group or a different resource group in the same
subscription as the target cluster. When deleting the `AzureMachine`, the network interface will only be
deleted if they are "managed" by capz, ie. they were created during `AzureMachine` deployment. Pre-existing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deleted if they are "managed" by capz, ie. they were created during `AzureMachine` deployment. Pre-existing
deleted if they are "managed" by CAPZ, i.e. they were created during `AzureMachine` deployment. Pre-existing

deleted if they are "managed" by capz, ie. they were created during `AzureMachine` deployment. Pre-existing
network interfaces will *not* be deleted. If a resource group is specified, it must already exist. CAPZ will
*not* create or delete a resource group that is specific to a network interface. If the resource group is
omitted, it will default to the resourceGroup of the `AzureCluster`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
omitted, it will default to the resourceGroup of the `AzureCluster`.
omitted, it will default to the resource group of the `AzureCluster`.

@@ -0,0 +1,57 @@
# Custom Network Interfaces for AzureMachines

## Pre-existing Network Interfaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline for consistency with other docs in the repo

Suggested change
## Pre-existing Network Interfaces
## Pre-existing Network Interfaces

*not* create or delete a resource group that is specific to a network interface. If the resource group is
omitted, it will default to the resourceGroup of the `AzureCluster`.

## Custom Network Interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Custom Network Interface
## Custom Network Interface

omitted, it will default to the resourceGroup of the `AzureCluster`.

## Custom Network Interface
Alternatively, if you specify a network interface name and optionally a resource group, but the network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Alternatively, if you specify a network interface name and optionally a resource group, but the network
Alternatively, if you specify a network interface name and optionally a resource group but the network

@CecileRobertMichon
Copy link
Contributor

/close

Please reopen once this is ready for review

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: Closed this PR.

In response to this:

/close

Please reopen once this is ready for review

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Support bring your own Azure Network Interface
8 participants