Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Optional Override API Endpoint #1959

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions api/v1alpha3/azurecluster_conversion.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ func (src *AzureCluster) ConvertTo(dstRaw conversion.Hub) error { // nolint

dst.Spec.NetworkSpec.APIServerLB.FrontendIPsCount = restored.Spec.NetworkSpec.APIServerLB.FrontendIPsCount
dst.Spec.NetworkSpec.APIServerLB.IdleTimeoutInMinutes = restored.Spec.NetworkSpec.APIServerLB.IdleTimeoutInMinutes
dst.Spec.NetworkSpec.OverrideAPIEndpoint = restored.Spec.NetworkSpec.OverrideAPIEndpoint
dst.Spec.CloudProviderConfigOverrides = restored.Spec.CloudProviderConfigOverrides
dst.Spec.BastionSpec = restored.Spec.BastionSpec

Expand Down
1 change: 1 addition & 0 deletions api/v1alpha3/zz_generated.conversion.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions api/v1alpha4/azurecluster_conversion.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ func (src *AzureCluster) ConvertTo(dstRaw conversion.Hub) error { // nolint

// Restore list of virtual network peerings
dst.Spec.NetworkSpec.Vnet.Peerings = restored.Spec.NetworkSpec.Vnet.Peerings
dst.Spec.NetworkSpec.OverrideAPIEndpoint = restored.Spec.NetworkSpec.OverrideAPIEndpoint

return nil
}
Expand Down Expand Up @@ -79,3 +80,13 @@ func Convert_v1beta1_VnetSpec_To_v1alpha4_VnetSpec(in *infrav1beta1.VnetSpec, ou
func Convert_v1alpha4_VnetSpec_To_v1beta1_VnetSpec(in *VnetSpec, out *infrav1beta1.VnetSpec, s apiconversion.Scope) error {
return autoConvert_v1alpha4_VnetSpec_To_v1beta1_VnetSpec(in, out, s)
}

// Convert_v1alpha4_NetworkSpec_To_v1beta1_NetworkSpec is an autogenerated conversion function.
func Convert_v1alpha4_NetworkSpec_To_v1beta1_NetworkSpec(in *NetworkSpec, out *infrav1beta1.NetworkSpec, s apiconversion.Scope) error {
return autoConvert_v1alpha4_NetworkSpec_To_v1beta1_NetworkSpec(in, out, s)
}

// Convert_v1beta1_NetworkSpec_To_v1alpha4_NetworkSpec is an autogenerated conversion function.
func Convert_v1beta1_NetworkSpec_To_v1alpha4_NetworkSpec(in *infrav1beta1.NetworkSpec, out *NetworkSpec, s apiconversion.Scope) error {
return autoConvert_v1beta1_NetworkSpec_To_v1alpha4_NetworkSpec(in, out, s)
}
31 changes: 11 additions & 20 deletions api/v1alpha4/zz_generated.conversion.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions api/v1beta1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package v1beta1
import (
"github.com/pkg/errors"
"k8s.io/apimachinery/pkg/api/resource"
clusterv1 "sigs.k8s.io/cluster-api/api/v1beta1"
)

const (
Expand Down Expand Up @@ -75,6 +76,10 @@ type NetworkSpec struct {
// +optional
APIServerLB LoadBalancerSpec `json:"apiServerLB,omitempty"`

// override API Endpoint passed back to Cluster API (hope you know what you are doing, good luck!)
// +optional
OverrideAPIEndpoint *clusterv1.APIEndpoint `json:"overrideAPIEndpoint,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AzureClusterSpec already has a ControlPlaneEndpoint field. Right now it can't be used because it gets overwritten by the controller. Have you considered using that instead of adding a separate "override" field? We could change the logic in the controller to only set the endpoint if/when it's empty so that it doesn't overwrite any values set by the user.

https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/api/v1beta1/azurecluster_types.go#L50
https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/controllers/azurecluster_controller.go#L243

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern there was other parts of cluster-api reacting to that key changing. I didn't want to setup a race condition between the Azure provider and the rest of cluster-api. Do you think that's a concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. We should test what the impact is of the APIEndpoint getting set too early on the CAPI controllers, before the LB exists and is reachable. This goes back to kubernetes-sigs/cluster-api#3715, it would make it a lot easier if there was a Status field for the endpoint and we could have a separate Spec field for configuration. I'm a bit concerned about having two separate Spec fields that contain duplicated data, one that's not truly configurable and one as an "override" of the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the endpoint should be it's own kind, any config in the overall state machine should really be separated out into discrete documents that get created/deleted never patched. Just my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I got this to work, I deployed a VM with an socat (for now) in front of the API Server LBs. Along with this patch I was able to use the VM in the middle and it worked... This gets around that hair pin issue with the LBs not allowing the backend pool nodes to talk to the IP. Socat should be replaced with a proper HAProxy or similar, also this configuration does depend on IPs being known prior deployment of the workload cluster.

$ clusterctl describe cluster testhub
NAME                                                        READY  SEVERITY  REASON  SINCE  MESSAGE                                                              
/testhub                                                    True                     7m10s                                                                       
├─ClusterInfrastructure - AzureCluster/testhub              True                     16m                                                                         
├─ControlPlane - KubeadmControlPlane/testhub-control-plane  True                     7m10s                                                                       
│ └─Machine/testhub-control-plane-7884b                     True                     14m                                                                         
└─Workers                                                                                                                                                        
  └─MachineDeployment/testhub-md-0                          True                     6s                                                                          
    └─3 Machines...                                         True                     5m58s  See testhub-md-0-64cbfcd97b-mrvpx, testhub-md-0-64cbfcd97b-vqpms, ...

I did notice other options for HTTP load balancing in Azure, does the go SDK not support them? Are they too expensive to depend on directly?

Might be more of an argument for allowing this sort of override as using a WAF for the API endpoint would be something outside the scope of cluster-api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmlb2000 I brought this up in CAPI office hours this morning kubernetes-sigs/cluster-api#3715 (comment)

@fabriziopandini pointed out we should try to fix the CAPI behavior so it doesn't react to the Spec.ControlPlaneEndpoint changes alone and instead uses the InfrastructureReady Condition to know that Cluster infra is ready and it should proceed. If there is indeed a race condition that's something we could fix in CAPI.

@dlipovetsky mentioned another reason for using Spec vs Status is that Spec allows us to make the field immutable if it had previously been set whereas a Status field would be mutable.

So in summary I think the right course of action here would be to:

  1. change this PR to allow setting the Spec.ControlPlaneEndpoint so the user can configure it, and add a webhook validation to make sure it is immutable (if it was already set to something it should not be changed, only allow changes when previous values were ""). If it is not set, the AzureController will set it as before.
  2. verify that it doesn't mess with any cluster-api order of operations. If it does, we can work together to open a PR in CAPI that fixes whatever uses the control plane endpoint to not use it too early and wait for the infrastructure provider to signal it is ready and has provisioned the cluster with the InfrastructureReady condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CecileRobertMichon I agree with all of this save one point... I'd feel better opening up another issue/pull request and just leave this one alone until the new path completely resolves itself. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmlb2000 a separate pull request sounds good as long as we keep this one on hold


// NodeOutboundLB is the configuration for the node outbound load balancer.
// +optional
NodeOutboundLB *LoadBalancerSpec `json:"nodeOutboundLB,omitempty"`
Expand Down
5 changes: 5 additions & 0 deletions api/v1beta1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions azure/scope/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -649,6 +649,10 @@ func (s *ClusterScope) AdditionalTags() infrav1.Tags {

// APIServerPort returns the APIServerPort to use when creating the load balancer.
func (s *ClusterScope) APIServerPort() int32 {
netSpec := s.AzureCluster.Spec.NetworkSpec
if netSpec.OverrideAPIEndpoint != nil {
return netSpec.OverrideAPIEndpoint.Port
}
if s.Cluster.Spec.ClusterNetwork != nil && s.Cluster.Spec.ClusterNetwork.APIServerPort != nil {
return *s.Cluster.Spec.ClusterNetwork.APIServerPort
}
Expand All @@ -657,6 +661,10 @@ func (s *ClusterScope) APIServerPort() int32 {

// APIServerHost returns the hostname used to reach the API server.
func (s *ClusterScope) APIServerHost() string {
netSpec := s.AzureCluster.Spec.NetworkSpec
if netSpec.OverrideAPIEndpoint != nil && len(netSpec.OverrideAPIEndpoint.Host) > 0 {
return netSpec.OverrideAPIEndpoint.Host
}
if s.IsAPIServerPrivate() {
return azure.GeneratePrivateFQDN(s.GetPrivateDNSZoneName())
}
Expand Down
15 changes: 15 additions & 0 deletions azure/scope/cluster_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,21 @@ func TestAPIServerHost(t *testing.T) {
},
want: "apiserver.example.private",
},
{
name: "override host returned to cluster api.",
azureCluster: infrav1.AzureCluster{
Spec: infrav1.AzureClusterSpec{
SubscriptionID: fakeSubscriptionID,
NetworkSpec: infrav1.NetworkSpec{
OverrideAPIEndpoint: &clusterv1.APIEndpoint{
Host: "apiserver.example.private",
Port: 443,
},
},
},
},
want: "apiserver.example.private",
},
}

for _, tc := range tests {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1788,6 +1788,21 @@ spec:
description: LBType defines an Azure load balancer Type.
type: string
type: object
overrideAPIEndpoint:
description: override API Endpoint passed back to Cluster API
(hope you know what you are doing, good luck!)
properties:
host:
description: The hostname on which the API server is serving.
type: string
port:
description: The port on which the API server is serving.
format: int32
type: integer
required:
- host
- port
type: object
privateDNSZoneName:
description: PrivateDNSZoneName defines the zone name for the
Azure Private DNS.
Expand Down