Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Optional Override API Endpoint #1959

Conversation

dmlb2000
Copy link
Contributor

@dmlb2000 dmlb2000 commented Jan 9, 2022

This adds an optional field to the Network Spec to override what
is being sent back to Cluster API as the API Endpoint for K8S.
This is more of a, Hope you know what you are doing! feature.

Signed-off-by: David ML Brown Jr [email protected]

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:
This enables more advanced BYO infrastructure between the K8S management cluster and Azure.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1958

Special notes for your reviewer:

First time really touching the Go part of your code. I'm also not as familiar with Go, so if I'm missing something more fundamental help would be appreciated.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Add Optional Override API Endpoint to Network spec in Azure Cluster definition.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 9, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @dmlb2000. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 9, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign fabriziopandini after the PR has been reviewed.
You can assign the PR to them by writing /assign @fabriziopandini in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Jan 9, 2022
@mboersma
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 10, 2022
@dmlb2000 dmlb2000 force-pushed the fix-1958-add-override-api-endpoint branch 3 times, most recently from 839f55e to 5a131a5 Compare January 11, 2022 03:38
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 11, 2022
@dmlb2000 dmlb2000 force-pushed the fix-1958-add-override-api-endpoint branch from 5a131a5 to a91c06f Compare January 11, 2022 04:47
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 11, 2022
@dmlb2000 dmlb2000 force-pushed the fix-1958-add-override-api-endpoint branch from a91c06f to ee000b4 Compare January 11, 2022 04:56
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 11, 2022
This adds an optional field to the Network API to override what
is being sent back to Cluster API as the API Endpoint for K8S.
This is more of a, Hope you know what you are doing! feature.

Signed-off-by: David ML Brown Jr <[email protected]>
@dmlb2000 dmlb2000 force-pushed the fix-1958-add-override-api-endpoint branch from ee000b4 to 55b91ba Compare January 11, 2022 05:20
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 11, 2022
@dmlb2000
Copy link
Contributor Author

/retest

@dmlb2000
Copy link
Contributor Author

Not sure what the deal is with the error log on the e2e job. Help with parsing that error would be helpful.

@dmlb2000
Copy link
Contributor Author

/retest

@CecileRobertMichon
Copy link
Contributor

/hold

until @dmlb2000 can verify this is a valid workaround for their use case

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 11, 2022
@@ -75,6 +76,10 @@ type NetworkSpec struct {
// +optional
APIServerLB LoadBalancerSpec `json:"apiServerLB,omitempty"`

// override API Endpoint passed back to Cluster API (hope you know what you are doing, good luck!)
// +optional
OverrideAPIEndpoint *clusterv1.APIEndpoint `json:"overrideAPIEndpoint,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AzureClusterSpec already has a ControlPlaneEndpoint field. Right now it can't be used because it gets overwritten by the controller. Have you considered using that instead of adding a separate "override" field? We could change the logic in the controller to only set the endpoint if/when it's empty so that it doesn't overwrite any values set by the user.

https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/api/v1beta1/azurecluster_types.go#L50
https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/controllers/azurecluster_controller.go#L243

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern there was other parts of cluster-api reacting to that key changing. I didn't want to setup a race condition between the Azure provider and the rest of cluster-api. Do you think that's a concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. We should test what the impact is of the APIEndpoint getting set too early on the CAPI controllers, before the LB exists and is reachable. This goes back to kubernetes-sigs/cluster-api#3715, it would make it a lot easier if there was a Status field for the endpoint and we could have a separate Spec field for configuration. I'm a bit concerned about having two separate Spec fields that contain duplicated data, one that's not truly configurable and one as an "override" of the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the endpoint should be it's own kind, any config in the overall state machine should really be separated out into discrete documents that get created/deleted never patched. Just my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I got this to work, I deployed a VM with an socat (for now) in front of the API Server LBs. Along with this patch I was able to use the VM in the middle and it worked... This gets around that hair pin issue with the LBs not allowing the backend pool nodes to talk to the IP. Socat should be replaced with a proper HAProxy or similar, also this configuration does depend on IPs being known prior deployment of the workload cluster.

$ clusterctl describe cluster testhub
NAME                                                        READY  SEVERITY  REASON  SINCE  MESSAGE                                                              
/testhub                                                    True                     7m10s                                                                       
├─ClusterInfrastructure - AzureCluster/testhub              True                     16m                                                                         
├─ControlPlane - KubeadmControlPlane/testhub-control-plane  True                     7m10s                                                                       
│ └─Machine/testhub-control-plane-7884b                     True                     14m                                                                         
└─Workers                                                                                                                                                        
  └─MachineDeployment/testhub-md-0                          True                     6s                                                                          
    └─3 Machines...                                         True                     5m58s  See testhub-md-0-64cbfcd97b-mrvpx, testhub-md-0-64cbfcd97b-vqpms, ...

I did notice other options for HTTP load balancing in Azure, does the go SDK not support them? Are they too expensive to depend on directly?

Might be more of an argument for allowing this sort of override as using a WAF for the API endpoint would be something outside the scope of cluster-api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmlb2000 I brought this up in CAPI office hours this morning kubernetes-sigs/cluster-api#3715 (comment)

@fabriziopandini pointed out we should try to fix the CAPI behavior so it doesn't react to the Spec.ControlPlaneEndpoint changes alone and instead uses the InfrastructureReady Condition to know that Cluster infra is ready and it should proceed. If there is indeed a race condition that's something we could fix in CAPI.

@dlipovetsky mentioned another reason for using Spec vs Status is that Spec allows us to make the field immutable if it had previously been set whereas a Status field would be mutable.

So in summary I think the right course of action here would be to:

  1. change this PR to allow setting the Spec.ControlPlaneEndpoint so the user can configure it, and add a webhook validation to make sure it is immutable (if it was already set to something it should not be changed, only allow changes when previous values were ""). If it is not set, the AzureController will set it as before.
  2. verify that it doesn't mess with any cluster-api order of operations. If it does, we can work together to open a PR in CAPI that fixes whatever uses the control plane endpoint to not use it too early and wait for the infrastructure provider to signal it is ready and has provisioned the cluster with the InfrastructureReady condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CecileRobertMichon I agree with all of this save one point... I'd feel better opening up another issue/pull request and just leave this one alone until the new path completely resolves itself. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmlb2000 a separate pull request sounds good as long as we keep this one on hold

@CecileRobertMichon
Copy link
Contributor

/close

in favor of #1978

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: Closed this PR.

In response to this:

/close

in favor of #1978

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add optional override for API Endpoint
4 participants