Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Skip InboundNatRule reconciliation if no LB is configured #2066

Merged
merged 1 commit into from
Feb 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion azure/services/inboundnatrules/inboundnatrules.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,19 @@ func New(scope InboundNatScope) *Service {

// Reconcile gets/creates/updates an inbound NAT rule.
func (s *Service) Reconcile(ctx context.Context) error {
ctx, _, done := tele.StartSpanWithLogger(ctx, "inboundnatrules.Service.Reconcile")
ctx, log, done := tele.StartSpanWithLogger(ctx, "inboundnatrules.Service.Reconcile")
defer done()

// Externally managed clusters might not have an LB
if s.Scope.APIServerLBName() == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it possible to create a cluster without a apiserver load balancer? In other words, how does the above condition (empty string value for AzureCluster.Spec.NetworkSpec.APIServerLB.Name) ever occur?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis as mentioned in the PR description, this can happen if the cluster is managed externally, see https://capz.sigs.k8s.io/topics/externally-managed-azure-infrastructure.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The code lgtm. I am a bit confused about this line in the docs:

If the AzureCluster resource includes a “cluster.x-k8s.io/managed-by” annotation then the controller will skip any reconciliation.

If reconciliation is skipped, then why is this change needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster controllers skips reconciling it, however the azuremachine-related controllers still do:

if err := s.inboundNatRulesSvc.Reconcile(ctx); err != nil {
return errors.Wrap(err, "failed to create inbound NAT rule")
}

We want to use capz to only manage worker machines, which is how we ran into this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jont828 we were talking offline yesterday about your wanting this scenario to s.Scope.UpdatePutStatus even if it's a no-op (no LB present). Can you explain why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but it seemed to me that setting an InboundNATRulesReadyCondition will lead to confusion, because we have to set it to false which makes it look like there is an issue but there isn't. Happy to be convinced otherwise though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that setting the condition when there is no resource is confusing which is why we have #1868 open to address that. However in the meantime we need to set all conditions that are defined in https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/scope/machine.go#L550 to true otherwise the AzureMachine won't reach Ready condition as summary will include false conditions. However in this case it looks like InboundNATRulesReadyCondition is missing from the list of summary conditions (potentially a bug?).

Copy link
Member Author

@alvaroaleman alvaroaleman Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, updated the change to do the UpdatePutStatus even when there is no LB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please also add a comment that explains this is to support the externally managed use case? I could see someone else reading the code later and having the exact same reaction as Jack did above

Copy link
Member Author

@alvaroaleman alvaroaleman Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, let me know what you think

log.V(4).Info("Skipping InboundNatRule reconciliation as the cluster has no LB configured")
// Until https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1868 is
// resolved, this needs to be set for the machine to be able to reach the ready condition:
// https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2066#discussion_r806150004
s.Scope.UpdatePutStatus(infrav1.InboundNATRulesReadyCondition, serviceName, nil)
return nil
}

ctx, cancel := context.WithTimeout(ctx, reconciler.DefaultAzureServiceReconcileTimeout)
defer cancel()

Expand Down
10 changes: 10 additions & 0 deletions azure/services/inboundnatrules/inboundnatrules_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,16 @@ func TestReconcileInboundNATRule(t *testing.T) {
)
},
},
{
name: "No LB, Nat rule reconciliation is skipped",
expectedError: "",
expect: func(s *mock_inboundnatrules.MockInboundNatScopeMockRecorder,
m *mock_inboundnatrules.MockclientMockRecorder,
r *mock_async.MockReconcilerMockRecorder) {
s.APIServerLBName().AnyTimes().Return("")
s.UpdatePutStatus(infrav1.InboundNATRulesReadyCondition, serviceName, nil)
},
},
{
name: "fail to get existing rules",
expectedError: "failed to get existing NAT rules: #: Internal Server Error: StatusCode=500",
Expand Down