Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a list for Azure services in reconcilers #2146

Merged
merged 2 commits into from
Mar 10, 2022

Conversation

CecileRobertMichon
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon commented Mar 4, 2022

What type of PR is this?
/kind cleanup

What this PR does / why we need it: Since we are reconciling (and deleting in reverse order) Azure services one by one in reconcilers, this is a small optimization to move the reconcile and delete to a list and range through the services instead of making a Reconcile/Delete call for each. This is possible now that we've refactored all services to implement Azure.Reconciler. Also adds unit tests.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Use a list for Azure services in AzureCluster and AzureMachine reconcilers

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. area/provider/azure Issues or PRs related to azure provider labels Mar 4, 2022
@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Mar 4, 2022
tagsSvc: tags.New(scope),
scope: scope,
services: []azure.Reconciler{
groups.New(scope),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same order as the Reconcile() calls below

inboundnatrules.New(machineScope),
networkinterfaces.New(machineScope, cache),
availabilitysets.New(machineScope, cache),
disks.New(machineScope),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same order as Reconcile() below except for disks that were not present before in Reconcile (it's a no-op) but needs to be deleted after the VM (hence needs to be before VMs in the list)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was our behavior before this change? Would the disk get deleted anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a disk delete call https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2146/files#diff-5a6d880b9a0be00329894fc2f52f15827dd6bdfe6b60aaec660728661023f41fL142 but no disk reconcile call. Disk reconcile is a no-op so adding it does not change behavior.

@CecileRobertMichon
Copy link
Contributor Author

/retest

@CecileRobertMichon
Copy link
Contributor Author

/assign @Jont828 @devigned

Copy link
Contributor

@devigned devigned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this changeset! This improves the readability of the reconciler so much.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 5, 2022
Copy link
Contributor

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/assign @shysank

@@ -152,46 +102,14 @@ func (s *azureClusterService) Delete(ctx context.Context) error {
ctx, _, done := tele.StartSpanWithLogger(ctx, "controllers.azureClusterService.Delete")
defer done()

if err := s.groupsSvc.Delete(ctx); err != nil {
if err := s.services[0].Delete(ctx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we set a variable like groupSvc before placing it in the list so we call that explicitly here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or can we make service registration a little bit more elaborate instead of just adding to an array. Something like

type RecocilerService struct {
 Name string,
 azure.Reconciler
}

func Register(serviceName, azure.Reconciler) {
}

This would help us easily add new apis to control reconciliation flow. For eg. for the case above, we can have a GetService(name) which iterates over the services list and returns the one with matching name.
Or add a Skip option to the ReconcilerService if we want to skip reconciliation etc. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if we assign it to a variable it wouldn't be accessible from here unless we add it as a separate field on the azureClusterService struct since we only have access to the s receiver from this func. we could assign a new variable here to make it explicit that it's what we expect but the variable would only be used once and it would still make the same assumption on the order so not sure that's an improvement, like this:

groupSvc := s.services[0]
if err := s.groupSvc.Delete(ctx); err != nil {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shysank I like the idea of registering the services. How would you set order and iterate over all the services without a list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make a pretty simple struct with an array and a map from a name to its element in the array? Something like

type ServiceWrapper struct {
    Services []azure.Reconciler 
    ServiceMap map[string]azure.Reconciler
}

func (s *ServiceWrapper) Add(name string, service azure.Reconciler) {
    s.Services = append(s.Services, service)
    s.ServiceMap[name] = service
}

func (s *ServiceWrapper) Get(name string) azure.Reconciler {
    return s.ServiceMap[name]
}

Copy link
Contributor

@shysank shysank Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you set order and iterate over all the services without a list?

There will still be some collection backing the services, just that it'd be something more than just []azure.Reonciler. Something like []ReconcilerService (or servicesMap in @Jont828's example). If I want to add a new service, I'll just use the Register or Add api without having to care about the underlying collection. For ordering, we could start with the order in which the services were registered (add priorities later if needed). We can extend this further by tagging services from which we can extract groups. For example, there can be a group with services independent of each other, and services in that group can be reconciled concurrently while the group itself will be reconciled serially. (shameless plug of a poc that I did earlier :) ).

Anyways, I'm fine with merging this PR with the current design since there has been so many reviews already, and we can discuss this in a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try something out. Concurrency is out of scope and we should discuss it separately but I think I can at least get a service name added so we can log the service name when there's an error to @Jont828's point below.

controllers/azurecluster_reconciler.go Outdated Show resolved Hide resolved
controllers/azurecluster_reconciler_test.go Show resolved Hide resolved
controllers/azurecluster_reconciler_test.go Outdated Show resolved Hide resolved
inboundnatrules.New(machineScope),
networkinterfaces.New(machineScope, cache),
availabilitysets.New(machineScope, cache),
disks.New(machineScope),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was our behavior before this change? Would the disk get deleted anyway?

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 9, 2022
@CecileRobertMichon
Copy link
Contributor Author

Okay @Jont828 @shysank check out the latest commits. It is a significantly larger refactor but I think it addresses your concerns. I added a new interface called ServiceReconciler which is a Reconciler that also has Name() and IsManaged() . The Name() func allows us to retrieve the service name for better error logging and for identifying the groups service which avoid assuming ordering. After I did this I realized that we were still assuming ordering by skipping the group delete in the AzureCluster reconciler (loop was going from len-1 to 1 instead of 0). So I added another commit on top that adds IsManaged which allows us to check if the group is managed by CAPZ directly from the reconciler. Feel free to tell me this is a scope creep, I kept it unsquashed so we can extract the commit if needed.

I also included AzureMachinePool and AzureManagedControlPlane reconcilers in the changes since they use the same pattern.

I think this gets us closer to what we want and we can discuss adding "groups" of services (or priorities, or a dependency graph) if we want to move towards concurrency as a follow-up. The IsManaged interface is also something I had wanted to do previously in #1684 to allow us to do caching of the vnet tags and standardize the "is resource managed" story.

type Reconciler interface {
Reconcile(ctx context.Context) error
Delete(ctx context.Context) error
}

// CredentialGetter is a Service which knows how to retrieve credentials for an Azure
// resource in a resource group.
type CredentialGetter interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wasn't used anymore

@@ -219,3 +226,9 @@ func (s *Service) isVnetLinkManaged(ctx context.Context, resourceGroupName, zone
tags := converters.MapToTags(zone.Tags)
return tags.HasOwned(s.Scope.ClusterName()), nil
}

// IsManaged returns always returns true.
// TODO: separate private DNS and VNet links so we can implement the IsManaged method for each.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is a TODO for after #1715 is merged

@@ -149,3 +156,8 @@ func (s *Service) isIPManaged(ctx context.Context, ipName string) (bool, error)
tags := converters.MapToTags(ip.Tags)
return tags.HasOwned(s.Scope.ClusterName()), nil
}

// IsManaged returns always returns true as public IPs are managed on a one-by-one basis.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is a little awkward. Each public IP can be either managed or unmanaged depending on its own tags and there is no overall "resource type is managed by CAPZ" or not, which makes it different from how the other services behave. Since we're not using this func anywhere (but it's there to satisfy the interface) I opted to always return true since that's the most cautious outcome: we assume public IPs are managed, but then we check on each one individually (https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2146/files#diff-fe943e74189aa531d871d7b774c371755043f5bd1b7e8ab14ad3d2b5fc023301R148). Open to suggestions if anyone has a better idea.

@CecileRobertMichon CecileRobertMichon changed the title Use a list for Azure services in AzureCluster and AzureMachine reconcilers Use a list for Azure services in reconcilers Mar 9, 2022
@CecileRobertMichon
Copy link
Contributor Author

looks like e2e tests are failing delete... looking

@shysank
Copy link
Contributor

shysank commented Mar 9, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2022
@CecileRobertMichon
Copy link
Contributor Author

CecileRobertMichon commented Mar 9, 2022

should I squash or keep the commits separate?

@shysank
Copy link
Contributor

shysank commented Mar 10, 2022

should I squash or keep the commits separate?

I'm fine with keeping the commits as is.

@Jont828
Copy link
Contributor

Jont828 commented Mar 10, 2022

I like the interface changes!
/lgtm

@shysank
Copy link
Contributor

shysank commented Mar 10, 2022

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shysank

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2022
@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff 2defff0 link false /test pull-cluster-api-provider-azure-apidiff
pull-cluster-api-provider-azure-e2e-windows-dockershim 2defff0 link unknown /test pull-cluster-api-provider-azure-e2e-windows-dockershim

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit 348ef60 into kubernetes-sigs:main Mar 10, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.3 milestone Mar 10, 2022
@CecileRobertMichon CecileRobertMichon deleted the service-list branch February 17, 2023 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants