Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make private dns reconcile/delete async #2007

Merged
merged 1 commit into from
Apr 4, 2022

Conversation

shysank
Copy link
Contributor

@shysank shysank commented Jan 26, 2022

What type of PR is this?
/kind cleanup
/kind bug

What this PR does / why we need it:
This pr refactors privatedns reconciliation to use async pattern set in #1541. A few things to note:

  • Private dns has 3 resources that needs to be reconciled: Zone, VirtualNetworkLink, and Records. This is done by treating each resource as a separate spec, and calling createResource/deleteResource independently for each of them.
  • Although there are 3 resources, only one condition: PrivateDnsReadyCondition is updated to indicate status of reconciliation.
  • Delete is NOT implemented for Records because it was not implemented in the current version. Implementing this is not trivial because deleting record requires a special parameter called RecordType apart from the usual suspects (resourceName, resourceGroupName, ownerResourceName). For create, I did a work around by reverse engineering the record type from RecordSetProperties set in parameters.
  • createOrUpdateAsync for Records is Synchronous operation. This is because azure api for records set does not have support for futures.
  • I have split client.go into 3 files for each of the sub resources mentioned above as it became too large and difficult to navigate.
  • This pr also fixes a bug about managing zones and links separately.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes #1715 #1983 #1872

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Make private dns reconcile/delete async

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 26, 2022
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 26, 2022
@shysank shysank force-pushed the async_privatedns branch 2 times, most recently from 03a1542 to 8dc9b37 Compare January 26, 2022 03:50
Comment on lines 48 to 49
zoneClient async.Creator
vnetLinkClient async.Creator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this because deleteZone and deleteLink requires to do a GET to check if the resource is managed or not. It seems a bit weird to use Creator to do a GET, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you consider refactoring Creator

// Creator is a client that can create or update a resource asynchronously.
type Creator interface {
	FutureHandler
	CreateOrUpdateAsync(ctx context.Context, spec azure.ResourceSpecGetter, parameters interface{}) (result interface{}, future azureautorest.FutureAPI, err error)
        Getter
}

type Getter interface {
 Get(ctx context.Context, spec azure.ResourceSpecGetter) (result interface{}, err error)
}

so that we don't have to duplicate this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because, Get in this case is not part of async, that you created a new Getter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you consider refactoring Creator

I hadn't, but that's a great idea

I was mostly trying to refactor the Client interface to make it more specific since we only need Get now and everything else is being handled by async.Reconciler

type Client interface {
	Get(context.Context, string, string) (network.VirtualNetwork, error)
	CreateOrUpdate(context.Context, string, string, network.VirtualNetwork) error
	Delete(context.Context, string, string) error
	CheckIPAddressAvailability(context.Context, string, string, string) (network.IPAddressAvailabilityResult, error)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to follow virtual networks pattern because it seems more appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented your suggestion in f29b0ff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah we swapped ideas 🙂 I'll rebase it once that pr is merged.

Comment on lines -61 to -80
isManaged, err := s.isPrivateDNSManaged(ctx, s.Scope.ResourceGroup(), zoneSpec.ZoneName)
if err != nil && !azure.ResourceNotFound(err) {
return errors.Wrapf(err, "could not get private DNS zone state of %s in resource group %s", zoneSpec.ZoneName, s.Scope.ResourceGroup())
}
// If resource is not found, it means it should be created and hence setting isVnetLinkManaged to true
// will allow the reconciliation to continue
if err != nil && azure.ResourceNotFound(err) {
isManaged = true
}
if !isManaged {
log.V(1).Info("Skipping reconciliation of unmanaged private DNS zone", "private DNS", zoneSpec.ZoneName)
log.V(1).Info("Tag the DNS manually from azure to manage it with capz."+
"Please see https://capz.sigs.k8s.io/topics/custom-dns.html#manage-dns-via-capz-tool", "private DNS", zoneSpec.ZoneName)
return nil
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved this logic to spec.Parameters since we already do a GET to fetch the existing resource. Same for vnet links.


// Parameters returns the parameters for the private dns zone.
func (z ZoneSpec) Parameters(existing interface{}) (params interface{}, err error) {
_, log, done := tele.StartSpanWithLogger(context.TODO(), "privatedns.ZoneSpec.Parameters")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to pass context.TODO to get a logger since we don't pass context here. Open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts @CecileRobertMichon?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given my comment below about the logging being temporary, I'd vote to keep this as a context.TODO() for now and just remove the context when we remove the logging in an upcoming release.

@shysank
Copy link
Contributor Author

shysank commented Jan 26, 2022

cc @CecileRobertMichon @Jont828

@shysank
Copy link
Contributor Author

shysank commented Jan 26, 2022

/test pull-cluster-api-provider-azure-e2e

@shysank
Copy link
Contributor Author

shysank commented Jan 27, 2022

/retest

@shysank shysank force-pushed the async_privatedns branch 4 times, most recently from 34814ce to 4ac9f38 Compare January 28, 2022 00:27
@shysank
Copy link
Contributor Author

shysank commented Jan 29, 2022

/test pull-cluster-api-provider-azure-e2e

@sonasingh46
Copy link
Contributor

@shysank --

This pr also fixes a bug about managing zones and links separately.

Is the link in bug in PR description correct? It opens up the same PR. Trying to see and learn about the bug that got fixed.

Copy link
Contributor

@Jont828 Jont828 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! I like that the Reconciler is split into helper functions, and in general it looks way easier to read!

azure/services/privatedns/client_records.go Outdated Show resolved Hide resolved
azure/services/privatedns/client_records.go Outdated Show resolved Hide resolved

// Parameters returns the parameters for the private dns zone.
func (z ZoneSpec) Parameters(existing interface{}) (params interface{}, err error) {
_, log, done := tele.StartSpanWithLogger(context.TODO(), "privatedns.ZoneSpec.Parameters")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts @CecileRobertMichon?

log.V(1).Info("Tag the DNS manually from azure to manage it with capz."+
"Please see https://capz.sigs.k8s.io/topics/custom-dns.html#manage-dns-via-capz-tool", "private DNS", zone.Name)
return nil, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return nil, nil at the end of this block anyway if the resource exists? If that's the case, then the check for isManaged would be redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but didn't want to change the current implementation. wdyt about doing it in a followup pr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine in either case since if we don't update, we'd just be returning the existing resource anyway. In general, I've set up Parameters() to return nil, nil if we find an existing resource and also added any checks to the fields, i.e. for a subnet, if the VNet exists when it's not managed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we discussed about this earlier and created a followup for it. I'm fine with doing it here; I left it for now because we want to focus on other things. Happy to do it either way.

Copy link
Contributor Author

@shysank shysank Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to return nil,nil if zone exists. I have updated the pr description o reflect that. ptal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sounds good.

log.V(2).Info("Skipping vnet link reconciliation for unmanaged vnet link", "vnet link",
l.Name, "private dns zone", l.ZoneName)
return nil, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for here, I'm not sure if we still need these checks if we return nil, nil to skip updating an existing spec anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, didn't want to change current implementation. wdyt about doing it in a followup pr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

azure/services/privatedns/spec.go Outdated Show resolved Hide resolved
azure/services/privatedns/privatedns.go Outdated Show resolved Hide resolved
azure/services/privatedns/privatedns.go Outdated Show resolved Hide resolved
azure/services/privatedns/privatedns.go Outdated Show resolved Hide resolved
azure/services/privatedns/privatedns.go Outdated Show resolved Hide resolved
@shysank
Copy link
Contributor Author

shysank commented Mar 7, 2022

To accomodate #2093, I've made the following changes:

  1. Split PrivateDnsReadyCondition into PrivateDnsZoneReadyCondition, PrivateDnsLinkReadyCondition, PrivateDnsRecordReadyCondition. This is to make sure we only set the appropriate ready condition if the resource is managed by capz.
  2. For resources with multiple items (eg. vnetLinks), I have considered it to be Managed even if one of them is managed.
  3. This means that we'll need to a redundant get on the resource to check if it is managed or not. Since this is a common concern, we could perhaps move this into the async framework. Something like:{Create,Delete}Resource could return if the resource is managed or not? Or allow to register listeners during various stages of {Create,Delete}Resource?
  4. Split spec,client, and reconciler logic for zone,link, and records into their own separate files, have created a separate commit for each one of them hoping it would make it easier to review. Also, would make it easier to refactor into separate services if needed.

cc @CecileRobertMichon @Jont828

Copy link
Contributor

@Jont828 Jont828 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I normally wouldn't want to split up a ready condition into 3, but in this case I think it makes sense with how the code is split into 3 separate reconcilers, clients, and specs. Also in #2146 we changed the Reconcilers by adding a Name() and Managed() function. I think it makes sense to consider the resource managed if one of them is managed, which is what we're doing in 2146.

api/v1beta1/conditions_consts.go Outdated Show resolved Hide resolved
}

// isPrivateDNSManaged returns true if the private DNS has an owned tag with the cluster name as value,
// meaning that the DNS lifecycle is managed.
func (s *Service) isPrivateDNSManaged(ctx context.Context, resourceGroup, zoneName string) (bool, error) {
zone, err := s.client.GetZone(ctx, resourceGroup, zoneName)
func (s *Service) isPrivateDNSManaged(ctx context.Context, spec azure.ResourceSpecGetter) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying about the redundant Get() calls too, it looks like we have this pattern on several services like VNets, where we need to fetch the existing resource to see if it's managed or not. I'm open to amending the interface, did you have any ideas? Since the tags are all the same, I was thinking we could have some an interface function where it could cache the result for the next Get() call for Create(). Wdyt @shysank @CecileRobertMichon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined towards merging this one as is and updating the interface in a followup. Since it affects only private clusters, this shouldn't cause huge performance degradation imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes +1 to caching as a separate issue, the scope is big enough here. #2146 (the second commit for IsManaged) should also help with that. I had tried to incorporate caching as part of https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1684/files#diff-b74c6acb4629abe58f8286da9939563cc4404e09fe9966eb61dbde9b87274189R110 but ended up reverting it to keep the changes small.

@shysank
Copy link
Contributor Author

shysank commented Mar 21, 2022

/test pull-cluster-api-provider-azure-e2e

@shysank
Copy link
Contributor Author

shysank commented Mar 24, 2022

@CecileRobertMichon @Jont828 I think I have addressed all review comments. PTAL, whenever you get a chance.

@CecileRobertMichon
Copy link
Contributor

/assign @Jont828

}

if !managed {
log.V(1).Info("Skipping reconciliation of unmanaged private DNS zone", "private DNS", zoneSpec.ResourceName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be log.V(2) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't want to change the current implementation

log.V(1).Info("Skipping reconciliation of unmanaged private DNS zone", "private DNS", zoneSpec.ZoneName)

return nil

err = s.reconcileRecords(ctx, records)
s.Scope.UpdatePutStatus(infrav1.PrivateDNSRecordReadyCondition, serviceName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only want to update the put status if we actually have specs. In this case, we could only call UpdatePutStatus if the number of specs > 0. The same goes for the delete status too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check for zero specs in

. So, if it reaches this point there is always going to be atleast one record.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sounds good. In that case I think it LGTM.

@Jont828
Copy link
Contributor

Jont828 commented Apr 1, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 1, 2022
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shysank can you please squash?

@shysank shysank force-pushed the async_privatedns branch from e2b6523 to e78ae0b Compare April 4, 2022 17:13
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 4, 2022
@k8s-ci-robot
Copy link
Contributor

@shysank: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff e78ae0b link false /test pull-cluster-api-provider-azure-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 4, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 4, 2022
@k8s-ci-robot k8s-ci-robot merged commit 82f169f into kubernetes-sigs:main Apr 4, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.3 milestone Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

async privatedns
5 participants