Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Skip InboundNatRule reconciliation if no LB is configured #2066

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

alvaroaleman
Copy link
Member

Clusters might be externally managed in which case the apiserver
endpoint might be in a different Azure account or on a different
platform altogether. In this case, there are no inboundnatrules for the
LB to reconcile, so skip doing that if the LB name is empty.

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

If a cluster has no APIServer LoadBalancer configured, the InboundNATRule reconciliaton for machines will be skippped

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/provider/azure Issues or PRs related to azure provider labels Feb 9, 2022
@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 9, 2022
defer done()

if s.Scope.APIServerLBName() == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it possible to create a cluster without a apiserver load balancer? In other words, how does the above condition (empty string value for AzureCluster.Spec.NetworkSpec.APIServerLB.Name) ever occur?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis as mentioned in the PR description, this can happen if the cluster is managed externally, see https://capz.sigs.k8s.io/topics/externally-managed-azure-infrastructure.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The code lgtm. I am a bit confused about this line in the docs:

If the AzureCluster resource includes a “cluster.x-k8s.io/managed-by” annotation then the controller will skip any reconciliation.

If reconciliation is skipped, then why is this change needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster controllers skips reconciling it, however the azuremachine-related controllers still do:

if err := s.inboundNatRulesSvc.Reconcile(ctx); err != nil {
return errors.Wrap(err, "failed to create inbound NAT rule")
}

We want to use capz to only manage worker machines, which is how we ran into this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jont828 we were talking offline yesterday about your wanting this scenario to s.Scope.UpdatePutStatus even if it's a no-op (no LB present). Can you explain why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but it seemed to me that setting an InboundNATRulesReadyCondition will lead to confusion, because we have to set it to false which makes it look like there is an issue but there isn't. Happy to be convinced otherwise though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that setting the condition when there is no resource is confusing which is why we have #1868 open to address that. However in the meantime we need to set all conditions that are defined in https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/scope/machine.go#L550 to true otherwise the AzureMachine won't reach Ready condition as summary will include false conditions. However in this case it looks like InboundNATRulesReadyCondition is missing from the list of summary conditions (potentially a bug?).

Copy link
Member Author

@alvaroaleman alvaroaleman Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, updated the change to do the UpdatePutStatus even when there is no LB

@alvaroaleman
Copy link
Member Author

/assign @CecileRobertMichon

defer done()

if s.Scope.APIServerLBName() == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please also add a comment that explains this is to support the externally managed use case? I could see someone else reading the code later and having the exact same reaction as Jack did above

Copy link
Member Author

@alvaroaleman alvaroaleman Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, let me know what you think

Clusters might be externally managed in which case the apiserver
endpoint might be in a different Azure account or on a different
platform altogether. In this case, there are no inboundnatrules for the
LB to reconcile, so skip doing that if the LB name is empty.
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 14, 2022
@alvaroaleman
Copy link
Member Author

/retest

@alvaroaleman
Copy link
Member Author

@CecileRobertMichon who should I ping for final approval?

@CecileRobertMichon
Copy link
Contributor

/assign @jackfrancis @shysank

would like a second lgtm before I approve

@shysank
Copy link
Contributor

shysank commented Feb 15, 2022

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shysank

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 15, 2022
@alvaroaleman
Copy link
Member Author

Thank you all for taking the time to review this, I appreciate it :)

@k8s-ci-robot k8s-ci-robot merged commit 5e848fb into kubernetes-sigs:main Feb 15, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.1 milestone Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants