Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only show applicable conditions for AzureClusters and AzureMachines on async services #2093

Merged
merged 1 commit into from
Feb 28, 2022

Conversation

Jont828
Copy link
Contributor

@Jont828 Jont828 commented Feb 15, 2022

What type of PR is this?
/kind feature

What this PR does / why we need it: Only show applicable conditions for AzureClusters and AzureMachines. It doesn't make sense to set a condition to ready if it doesn't apply, i.e. there are no specs.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1868

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Only show applicable conditions for AzureClusters and AzureMachines

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. area/provider/azure Issues or PRs related to azure provider labels Feb 15, 2022
@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Feb 15, 2022
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 15, 2022
@Jont828 Jont828 changed the title Only show applicable conditions for AzureClusters and AzureMachines Only show applicable conditions for AzureClusters and AzureMachines on async services Feb 17, 2022
@Jont828
Copy link
Contributor Author

Jont828 commented Feb 17, 2022

@CecileRobertMichon Should we update the put status when we return early from a service in custom vnet mode? It looks like we currently do not update the delete status in VNets if it's not managed, so we could do the same with the other services. Wdyt?

@CecileRobertMichon
Copy link
Contributor

Should we update the put status when we return early from a service in custom vnet mode? It looks like we currently do not update the delete status in VNets if it's not managed, so we could do the same with the other services. Wdyt?

I'm on the fence on this one. On one hand I could see us mark the VNet as "ready" if it's not managed once we check it exists. But at the same time if the CAPZ controller doesn't create/manage the VNet it probably shouldn't be showing its status either. Another way to look at this is to ask who is the user for these conditions and what are their use cases. Would it matter to them to see the VNet condition when using a pre-existing VNet? What would be less surprising for a user?

@sonasingh46 @shysank thoughts?

@Jont828
Copy link
Contributor Author

Jont828 commented Feb 17, 2022

On the other hand it looks like we still reconcile the subnets whether or not the VNet is managed. It might look odd to only have a condition for subnets, but it's probably not a big deal.

@shysank
Copy link
Contributor

shysank commented Feb 18, 2022

Would it matter to them to see the VNet condition when using a pre-existing VNet? What would be less surprising for a user?

I think adding Ready condition for resources not managed by capz is kind of misleading, otoh there could be external (CI) systems relying on the ready condition as a trigger. If we were to take inspiration from how "externally managed cluster" is handled, the responsibility of updating "Ready" condition is given to the external system. This is not exactly similar to the situation here as we don't do any reconciliation for externally managed cluster, but the idea of not updating Ready condition for unmanaged resources can be replicated for consistency.

@CecileRobertMichon
Copy link
Contributor

On the other hand it looks like we still reconcile the subnets whether or not the VNet is managed

We don't Create/Update the subnets, we only check that they are there. However we do go through the CreateResource() func which might make it tricky to differentiate between "nil error, resource was created" and "nil error, no action was taken".

we probably need to add an extra check around the UpdateStatus call to make sure we only do it if that resource is managed, whether it's the vnet or subnets. I'm also leaning towards excluding unmanaged resources from status to avoid confusion about taking action. Rule should be "if there was no PUT call made, don't set service ready condition to true".

@Jont828
Copy link
Contributor Author

Jont828 commented Feb 22, 2022

@CecileRobertMichon One thing that's different about the subnets is that even if they're not managed, we still want to Reconcile by updating the ID and CIDR blocks in the yaml spec. In that case, the subnet ready condition would signify that the spec has been updated. Do we want to keep the ready condition in this situation?

@CecileRobertMichon
Copy link
Contributor

we still want to Reconcile by updating the ID and CIDR blocks in the yaml spec

We're only doing a GET API call in this case still, no PUT call. Updating the spec/status to reflect actual state isn't truly reconciling the resource in the sense that we're not making any changes to the Azure resource. I would vote to still opt out of setting a Ready condition in that case for now to simplify things.

@Jont828
Copy link
Contributor Author

Jont828 commented Feb 23, 2022

@CecileRobertMichon I'm not sure if it's simpler to handle. In the event that a subnet isn't managed but there's an error in the Reconcile logic, do we still want to update the put status so the user can see the errors?

@CecileRobertMichon
Copy link
Contributor

In the event that a subnet isn't managed but there's an error in the Reconcile logic, do we still want to update the put status so the user can see the errors?

If the subnet isn't managed and there's an error, we should return the error but not update the put status for the subnet (because we are not truly doing a "PUT"), so it's not accurate to say that the subnet failed to "create or update" and it could mislead users.

However, we need to make sure that the overall "Ready" condition is set to False if that's the case so that the Cluster doesn't show as "Ready". One thing we could do is set the generic NetworkInfrastructureReadyCondition to False in the azurecluster_controller whenever Reconcile returns an error (here: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/controllers/azurecluster_controller.go#L243) so that the summary Ready condition stays "False".

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 24, 2022
@Jont828 Jont828 force-pushed the ready-conditions branch 2 times, most recently from 91f897b to a1e3947 Compare February 24, 2022 22:58
if err := s.DeleteResource(ctx, subnetSpec, serviceName); err != nil {
if !azure.IsOperationNotDoneError(err) || result == nil {
result = err
}
}
}
s.Scope.UpdateDeleteStatus(infrav1.SubnetsReadyCondition, serviceName, result)

if s.Scope.IsVnetManaged() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't actually need this check since we're already returning early on line 115 when the vnet isn't managed

@@ -66,6 +66,9 @@ func (s *Service) Reconcile(ctx context.Context) error {
defer cancel()

vnetSpec := s.Scope.VNetSpec()
if vnetSpec == nil {
return nil
}

result, err := s.CreateResource(ctx, vnetSpec, serviceName)
s.Scope.UpdatePutStatus(infrav1.VNetReadyCondition, serviceName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we'd still be updating the status here when the vnet is unmanaged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion:

to avoid the extra GET by calling IsManaged below, we should be able to use the information we have below in Tags to determine if the vnet is managed:

on line 83:

managed := tags.HasOwned(s.Scope.ClusterName()), nil

then below on line 90:

if managed {
    s.Scope.UpdatePutStatus(infrav1.VNetReadyCondition, serviceName, err)
}
return err

Copy link
Contributor Author

@Jont828 Jont828 Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll remove that status update in that case. If I create a variable in the if block, then managed will be a local variable there, and if I take it out then I'll need to initialize it with a default value.

Also why not use IsVnetManaged() in this case when we use that everywhere else to check if the vnet is managed?

func (s *ClusterScope) IsVnetManaged() bool {
	return s.Vnet().ID == "" || s.Vnet().Tags.HasOwned(s.ClusterName())
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsVnetManaged would work too as long as it's run after line 82 so we've already set the tags

@k8s-ci-robot
Copy link
Contributor

@Jont828: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff 1278516 link false /test pull-cluster-api-provider-azure-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Jont828
Copy link
Contributor Author

Jont828 commented Feb 25, 2022

@CecileRobertMichon I've pushed some more changes, does this look ready to merge?

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/assign @shysank

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2022
@shysank
Copy link
Contributor

shysank commented Feb 28, 2022

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shysank

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2022
@k8s-ci-robot k8s-ci-robot merged commit 8a61911 into kubernetes-sigs:main Feb 28, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.2 milestone Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only show relevant conditions for the AzureCluster / AzureMachine
4 participants