Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA: improvements to the load balancer documentation #1685

Closed
neolit123 opened this issue Jul 25, 2019 · 38 comments
Closed

HA: improvements to the load balancer documentation #1685

neolit123 opened this issue Jul 25, 2019 · 38 comments
Labels
area/HA help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/design Categorizes issue or PR as related to design. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@neolit123
Copy link
Member

neolit123 commented Jul 25, 2019

i'm starting to see more reports of people having issues with setting up LB in a cloud provider, see:
kubernetes/website#14258
https://stackoverflow.com/questions/56768956/how-to-use-kubeadm-init-configuration-parameter-controlplaneendpoint/57121454#57121454
i also saw a report on the VMware internal slack the other day.

the common problems are:

  • confusion about L4 vs L7 load balancers, L4 should be sufficient.
  • not having SSL/TLS for the LB failing api-server heatlh-checks
  • possibly related - using an older version of kubeadm that does not have the config-map retry logic

ideally we should document some more LB aspects/best practices of this in our HA doc, even if LBs are out of scope for kubeadm:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/

some of the comments here need discussion:
https://stackoverflow.com/a/57121454

  1. It fails on the first master node where "kubeadm init" is executed because it tries to communicate with itself through the load balancer.
  2. On all the other master nodes where "kubeadm join" is executed, there's a 1/N chance of failure when the load balancer selects the node itself and not any of the (N-1) nodes that are already in the cluster.

for 1. the primary CP node should be doing exactly that as there are no other nodes.
2 on the other hand is odd, when a new CP node is joining it should start serving on the LB only after it has finished the join control plane process..

cc @ereslibre @aojea @fabriziopandini @RockyFu267
@kubernetes/sig-cluster-lifecycle

@rcythr related to your PR:
kubernetes/website#15372

@neolit123 neolit123 added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/backlog Higher priority than priority/awaiting-more-evidence. area/HA kind/documentation Categorizes issue or PR as related to documentation. kind/design Categorizes issue or PR as related to design. labels Jul 25, 2019
@neolit123 neolit123 added this to the v1.16 milestone Jul 25, 2019
@rcythr
Copy link

rcythr commented Jul 25, 2019

I've seen several people discussing this on slack over the last few days. The problem seems to primarily be lack of hairpin routing on Azure.

Here is the relevant azure documentation line about the limitation of their LB, and a workaround.

"Unlike public Load Balancers which provide outbound connections when transitioning from private IP addresses inside the virtual network to public IP addresses, internal Load Balancers do not translate outbound originated connections to the frontend of an internal Load Balancer as both are in private IP address space. This avoids potential for SNAT port exhaustion inside unique internal IP address space where translation is not required. The side effect is that if an outbound flow from a VM in the backend pool attempts a flow to frontend of the internal Load Balancer in which pool it resides and is mapped back to itself, both legs of the flow don't match and the flow will fail. If the flow did not map back to the same VM in the backend pool which created the flow to the frontend, the flow will succeed...There are several common workarounds for reliably achieving this scenario (originating flows from a backend pool to the backend pools respective internal Load Balancer frontend) which include either insertion of a proxy layer behind the internal Load Balancer or using DSR style rules. Customers can combine an internal Load Balancer with any 3rd party proxy or substitute internal Application Gateway for proxy scenarios limited to HTTP/HTTPS. While you could use a public Load Balancer to mitigate, the resulting scenario is prone to SNAT exhaustion and should be avoided unless carefully managed." (https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-overview).

So the current advice for HA of using the LB as the control plane endpoint is broken for azure.

Its not the most elegant solution, but I wonder if a piece of networking duct tape ( I.e. an iptables rule ) could help people avoid setting up their own load balancer on Azure.
Edit: after reading the stack overflow I see the poster solved it by doing exactly this.

I'll have to look a bit more at Microsoft's workaround suggestion.

@rcythr
Copy link

rcythr commented Jul 25, 2019

It looks like this isnt just a problem with Azure. From AWS "Internal load balancers do not support hairpinning or loopback. When you register targets by instance ID, the source IP addresses of clients are preserved. If an instance is a client of an internal load balancer that it's registered with by instance ID, the connection succeeds only if the request is routed to a different instance. Otherwise, the source and destination IP addresses are the same and the connection times out." (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html)

/assign
/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jul 25, 2019
@neolit123
Copy link
Member Author

we have both azure and aws experts in k8s land, so we might be able to get more feedback on this.

@neolit123 neolit123 removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 25, 2019
@seh
Copy link

seh commented Jul 25, 2019

To clarify, AWS EC2 NLBs do support hairpin connections provided that the targets are registered by IP address, as opposed to by instance ID.

The documentation quoted above doesn’t say that explicitly, but I can confirm that it works.

@detiber
Copy link
Member

detiber commented Jul 25, 2019

For AWS:

  • AWS Classic ELBs work regardless of Public/Private networking.
  • NLBs have the limitation mentioned above by @seh.
  • ALBs should be avoided, since the admin kubeconfig relies on client tls cert authentication

@justaugustus should be able to provide some more detail on Azure based on his work with the Azure provider for Cluster API

@dsexton
Copy link

dsexton commented Jul 25, 2019

To clarify, AWS EC2 NLBs do support hairpin connections provided that the targets are registered by IP address, as opposed to by instance ID.

The documentation quoted above doesn’t say that explicitly, but I can confirm that it works.

It is important to note that autoscaling groups register by instance id. We transitioned from NLBs to ELBs because of this.

@seh
Copy link

seh commented Jul 25, 2019

They do if you tell them to. We use ASGs, but we run an initialization procedure that registers the instance by its IP address with a discoverable target group, and unregisters it as the machine shuts down (sometimes inconvenient when rebooting).

@vmendi
Copy link

vmendi commented Jul 26, 2019

Its not the most elegant solution, but I wonder if a piece of networking duct tape ( I.e. an iptables rule ) could help people avoid setting up their own load balancer on Azure.
Edit: after reading the stack overflow I see the poster solved it by doing exactly this.

I'll have to look a bit more at Microsoft's workaround suggestion.

Hi! I'm the poster from Stackoverflow. We abandoned the iptables hack and went for HAProxy. We are still working on it thought. It complicates everything, and this is our first time having to configure such a piece of software (how do we add and remove nodes? autoscaling? VRRP?). For us it feels like we are in "Kubernetes the hard way" land, even if Kubeadm tries its best to help :)

We are also wondering if going for Azure Application Gateway (L7) would work?

Thanks for paying attention to this guys!

@ereslibre
Copy link
Contributor

ereslibre commented Jul 26, 2019

confusion about L4 vs L7 load balancers, L4 should be sufficient.

A L4 load balancer should be sufficient in most cases, but since there is no heartbeat or activity sent at all during kubectl logs -f and kubectl exec -it, the LB will close the connection after a default or configured timeout (usually client or server timeout, if using haproxy) if there's no activity. This is how it looks, just tried using kind:

~ > kubectl exec -it nginx-554b9c67f9-r9ls6 bash
root@nginx-554b9c67f9-r9ls6:/# ⏎
~ >
~ > kubectl logs -f nginx-554b9c67f9-r9ls6
...
127.0.0.1 - - [26/Jul/2019:21:20:28 +0000] "GET /favicon.ico HTTP/1.1" 404 555 "http://localhost:8000/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" "-"
F0726 22:21:48.325924    1750 helpers.go:114] error: unexpected EOF
goroutine 1 [running]:
k8s.io/kubernetes/vendor/k8s.io/klog.stacks(0x2ce2301, 0x3, 0xc00057e100, 0x44)
	vendor/k8s.io/klog/klog.go:900 +0xb1
k8s.io/kubernetes/vendor/k8s.io/klog.(*loggingT).output(0x2ce2360, 0xc000000003, 0xc0003f0000, 0x2b05bc4, 0xa, 0x72, 0x0)
	vendor/k8s.io/klog/klog.go:815 +0xe6
k8s.io/kubernetes/vendor/k8s.io/klog.(*loggingT).printDepth(0x2ce2360, 0x3, 0x2, 0xc0007d17c8, 0x1, 0x1)
	vendor/k8s.io/klog/klog.go:718 +0x12b
k8s.io/kubernetes/vendor/k8s.io/klog.FatalDepth(...)
	vendor/k8s.io/klog/klog.go:1295
k8s.io/kubernetes/pkg/kubectl/cmd/util.fatal(0xc00073c020, 0x15, 0x1)
	pkg/kubectl/cmd/util/helpers.go:92 +0x1d2
k8s.io/kubernetes/pkg/kubectl/cmd/util.checkErr(0x1baef60, 0xc0000a2060, 0x19e5c98)
	pkg/kubectl/cmd/util/helpers.go:171 +0x90f
k8s.io/kubernetes/pkg/kubectl/cmd/util.CheckErr(...)
	pkg/kubectl/cmd/util/helpers.go:114
k8s.io/kubernetes/pkg/kubectl/cmd/logs.NewCmdLogs.func2(0xc000710a00, 0xc00026a960, 0x1, 0x3)
	pkg/kubectl/cmd/logs/logs.go:147 +0x1da
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute(0xc000710a00, 0xc00026a930, 0x3, 0x3, 0xc000710a00, 0xc00026a930)
	vendor/github.com/spf13/cobra/command.go:760 +0x2ae
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc00044e780, 0x19e5e80, 0xc0000c6000, 0x5)
	vendor/github.com/spf13/cobra/command.go:846 +0x2ec
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute(...)
	vendor/github.com/spf13/cobra/command.go:794
main.main()
	cmd/kubectl/kubectl.go:50 +0x1eb

goroutine 18 [chan receive]:
k8s.io/kubernetes/vendor/k8s.io/klog.(*loggingT).flushDaemon(0x2ce2360)
	vendor/k8s.io/klog/klog.go:1035 +0x8b
created by k8s.io/kubernetes/vendor/k8s.io/klog.init.0
	vendor/k8s.io/klog/klog.go:404 +0x6c

goroutine 5 [syscall]:
os/signal.signal_recv(0x0)
	GOROOT/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
	GOROOT/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
	GOROOT/src/os/signal/signal_unix.go:29 +0x41

goroutine 6 [select]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x19e7da0, 0x12a05f200, 0x0, 0x1, 0xc0000407e0)
	staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:164 +0x181
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0x19e7da0, 0x12a05f200, 0xc0000407e0)
	staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by k8s.io/kubernetes/pkg/kubectl/util/logs.InitLogs
	pkg/kubectl/util/logs/logs.go:51 +0x96
~ >

To reproduce, the only thing needed is to create an HA cluster with kind, and call to kubectl logs or kubectl exec on a pod.

I don't have a good solution if there's no activity at all. At SUSE we fixed this in the past by using HAProxy as a L7 load balancer, so we could add specific API endpoint timeouts (thus, exec and logs would have no timeouts) but it isn't certainly the best approach, as this solution comes with maintenance and documentation overhead, since the LB would be terminating the TLS connections.

I think this is worth some investigation to understand what could be done and/or documented regarding the load balancer.

@aojea
Copy link
Member

aojea commented Jul 26, 2019

I don't have a good solution if there's no activity at all. At SUSE we fixed this in the past by using HAProxy as a L7 load balancer, so we could add specific API endpoint timeouts (thus, exec and logs would have no timeouts) but it isn't certainly the best approach, as this solution comes with maintenance and documentation overhead, since the LB would be terminating the TLS connections.

who is closing the connection? you can always set keepalives to keep the connection up
or is the problem that it balances the connection to another backend? then we should sticky sessions

@ereslibre
Copy link
Contributor

who is closing the connection?

HAProxy is closing it.

or is the problem that it balances the connection to another backend? then we should sticky sessions

There's no need, you can reach any apiserver safely, as far as I can tell.

@aojea
Copy link
Member

aojea commented Jul 27, 2019

have you considered the "option clitcpka" and "option srvtcpka" on haproxy?

@ereslibre
Copy link
Contributor

have you considered the "option clitcpka" and "option srvtcpka" on haproxy?

Really worth taking a look (and maybe add them as default in the haproxy created by kind?). If this works as expected we could as well document this on kubeadm if the user wants to use haproxy.

I will have a look and report back, thanks @aojea!

@ereslibre
Copy link
Contributor

I will have a look and report back, thanks @aojea!

No luck, tcpka or clitcpka and srvtcpka still leads to the same behavior, haproxy closing the connection after the default 50s configured by kind. I think we should investigate this a little deeper. Increasing timeouts is always an option, but there's never a timeout that will fit everyone.

@aojea
Copy link
Member

aojea commented Jul 27, 2019

nice explanation about haproxy here https://stackoverflow.com/a/32635324/7794348

@crenique
Copy link

crenique commented Jul 28, 2019

Its not the most elegant solution, but I wonder if a piece of networking duct tape ( I.e. an iptables rule ) could help people avoid setting up their own load balancer on Azure.
Edit: after reading the stack overflow I see the poster solved it by doing exactly this.
I'll have to look a bit more at Microsoft's workaround suggestion.

Hi! I'm the poster from Stackoverflow. We abandoned the iptables hack and went for HAProxy. We are still working on it thought. It complicates everything, and this is our first time having to configure such a piece of software (how do we add and remove nodes? autoscaling? VRRP?). For us it feels like we are in "Kubernetes the hard way" land, even if Kubeadm tries its best to help :)

We are also wondering if going for Azure Application Gateway (L7) would work?

Thanks for paying attention to this guys!

For this Azure Application Gateway (L7) topic,
we'd like to see whether the application gateway (with internal frontend private ip) can be used as a control-plane endpoint instead of HAProxy.

1) HAProxy approach
We tried this 3 layers setup to make HA multi-master control-plane work on azure.

  Azure internal L4 loadbalancer (HAProxy virtual IP) <- control-plane endpoint
                      |
                      |
        HAProxy vm scaleset (2 nodes)  (L4 load balancer)
                      |
                      |
         k8s master vm scaleset (3 nodes)

Azure doesn't seem supporting VRRP floatip ip. so had to create redundant HAProxy nodes under a L4 loadbalacner to get a virtual ip.

This works, but it seems complicating our infrastructure.

2) Azure application gateway (L7) approach ?

if we can use Azure application gateway (= internal L7 load balancer) for a control-plane endpoint, it would make things simpler like this.

        Azure internal L7 application gateway (auto-scale)  <- control-plane endpoint
                      |
                      |
         k8s master vm scaleset (3 nodes)

But it seems a bit complicated to set up SSL certificate keys for end-to-end api-server SSL communication.
Is there any references how to set up SSL certificates for internal L7 load balancer, so api server HTTPS request to k8s masters works through the L7 load balancer ?

@neolit123
Copy link
Member Author

thanks for the comments and discussion on this,
please have a look at this pending PR, so that we can consolidate the kubeadm HA LB information that users should know about.

kubernetes/website#15411

WRT Azure LBs it seems that the cloud provider needs to make some adjustments to make it easier for users.

@BenTheElder
Copy link
Member

why doesn't kubectl logs / the logs API have ping/pong / keepalive, or perform a reconnect?

@neolit123 neolit123 removed the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Oct 13, 2019
@neolit123
Copy link
Member Author

adding help-wanted label.

@neolit123 neolit123 added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Oct 13, 2019
@ereslibre
Copy link
Contributor

Completely forgot that I was going to link my PR on this issue: kubernetes/kubernetes#81179. It has been closed already because this will come directly implemented in golang (possibly golang/net#55). Whenever that happens I will check if all use cases are covered.

The golang solution is definitely the right place to handle the generic use case, at least for sending casual HTTP/2 ping frames over the wire to the server end. On my PR I stumbled across some issues, mostly regarding how golang hides the real transport being used with the RoundTrip interface, and bundling the http/2 logic as private implementation inside http.

@neolit123 neolit123 modified the milestones: v1.17, Next Nov 13, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 11, 2020
@neolit123
Copy link
Member Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 11, 2020
@neolit123
Copy link
Member Author

neolit123 commented May 30, 2020

so this issue ended up accumulating a collection of different (or related) LB problems including lack of hairpin in Azure (maybe this works nowadays), issues in haproxy, improvements in golang with respect to sending ping frames for http2 (this seems to have merged), and potential improvement of exposing the ping interval in client-go (kubernetes/kubernetes#81179 (comment) seems viable still).

also we still are not covering the following exactly in our docs:

confusion about L4 vs L7 load balancers, L4 should be sufficient.
not having SSL/TLS for the LB failing api-server heatlh-checks
possibly related - using an older version of kubeadm that does not have the config-map retry logic

but we do have a dedicated LB doc here now:
https://github.com/kubernetes/kubeadm/blob/master/docs/ha-considerations.md

so basically with the combination of our k/website docs:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/

we are telling the users to use the guide we give them for VIP/LB or they are on they own when setting up LB in a CP.

i don't think there is anything substantionally actionable for the kubeadm team in this ticket at this point. but if you think there is something let's log separate concise tickets.

or maybe just send PRs for the new doc.

thanks for the discussion.
/close

@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

so this issue ended up accumulating a collection of different LB related problems including lack of hairpin in Azure (maybe this works nowadays), issues in haproxy, improvements in golang with respect to sending ping frames for http2 (this seems to have merged), and potential improvement of exposing the ping interval in client-go (kubernetes/kubernetes#81179 (comment) seems viable still).

also we still are not covering the following exactly in our docs:

confusion about L4 vs L7 load balancers, L4 should be sufficient.
not having SSL/TLS for the LB failing api-server heatlh-checks
possibly related - using an older version of kubeadm that does not have the config-map retry logic

but we do have a dedicated LB doc here now:
https://github.com/kubernetes/kubeadm/blob/master/docs/ha-considerations.md

so basically with the combination of our k/website docs:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/

we are telling the users to use the guide we give them for VIP/LB or they are on they own when setting up LB in a CP.

i don't think there is anything substantionally actionable for the kubeadm team in this ticket at this point. but if you think there is something let's log separate concise tickets.

or maybe just send PRs for the new doc.

thanks for the discussion.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@CecileRobertMichon
Copy link
Member

@neolit123 I'm running into this exact issue (the hairpin routing one) trying to set up private clusters with Cluster API on Azure: kubernetes-sigs/cluster-api-provider-azure#974 (comment)

Has anyone in this thread found a workaround to get kubeadm init to work when using an Azure load balancer? If not, should I open a new issue to track this?

Slack thread: https://kubernetes.slack.com/archives/C2P1JHS2E/p1603230818143600

@pschulten
Copy link

@CecileRobertMichon I chose "the very hacky iptables duct tape" (with a heavy heart) but had no issues the last couple of month (AWS/NLB)

@neolit123
Copy link
Member Author

@pschulten can you please outline your iptables routing hack in a comment here?

@CecileRobertMichon
we have LB related documentation here:
https://github.com/kubernetes/kubeadm/blob/master/docs/ha-considerations.md

we could add a section in there with solutions for solving the hairpin problem, but this feels like something someone that experienced the problem should contribute and not the kubeadm maintainers, since none of the active kubeadm maintainers have experienced this. so happy to reopen this ticket and assign you or someone else, but one of the reason it was closed was that nobody stepped up to write the docs...

@mbert you might have an opinion about this too.

@pschulten
Copy link

sure it's just something someone mentioned in a related issue.
service:

[Unit]
Description=Routes the IP of the NLB in the same subnet to loopback address (because internal NLB is unable to do hairpinning)
DefaultDependencies=no
After=docker.service

[Service]
Type=oneshot
ExecStart=/opt/bin/internal-nlb-hack.sh

[Install]
WantedBy=multi-user.target

impl (nlbname injected from the outside):

#!/bin/bash
nlbname=${nlb_name}
az=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
nlbid=$(docker run --rm -e AWS_REGION=eu-central-1 -e AWS_DEFAULT_REGION=eu-central-1 mesosphere/aws-cli elbv2 describe-load-balancers --names "$nlbname" --query 'LoadBalancers[*].LoadBalancerArn' --output text | sed 's#.*/\(.*\)$#\1#')
nlbip=$(docker run --rm -e AWS_REGION=eu-central-1 -e AWS_DEFAULT_REGION=eu-central-1 mesosphere/aws-cli ec2 describe-network-interfaces --filters Name=description,Values="*$nlbid" Name=availability-zone,Values=$az --query 'NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress' --output text)

if  [  -z "$nlbid" ] ; then
  printf "Unable to find fronting NLB ip for AZ: %s" "$az" | systemd-cat --priority=err
  exit 1
fi

printf "iptables -t nat -A OUTPUT -p all -d %s -j DNAT --to-destination 127.0.0.1" "$nlbip" | systemd-cat --priority=warning
iptables -t nat -A OUTPUT -p all -d "$nlbip" -j DNAT --to-destination 127.0.0.1

@neolit123
Copy link
Member Author

thanks!

@CecileRobertMichon
Copy link
Member

CecileRobertMichon commented Oct 21, 2020

@neolit123 I'm hesitant to document "solutions" at this point since all the possibilities are hacky workarounds. Ideally kubeadm would let us optionally either a) use the local api endpoint for the API Server check or b) skip that check altogether (I prefer option a).

I'm thinking of using the iptables workaround for now to unblock CAPZ but I'm happy to contribute a proposal/implementation if the above is something the maintainers would consider. I was mostly trying to see if this was already possible, but looks like it's not?

@detiber
Copy link
Member

detiber commented Oct 21, 2020

Creating a NAT rule for the external IP is indeed probably the best way to resolve the issue. I'm not sure there is a good way to automate it for all the various combinations that would need to be supported, though.

For example:

  • Is the host system using iptables, ebtables, etc?
  • How to discover what the external IP should be for Azure LB, AWS NLB, etc?

I do think external orchestration systems (such as Cluster API) could automate these bits because they would know more about the systems being orchestrated.

It probably would help to add some generic documentation to the docs around this.

All that said, I do agree with @CecileRobertMichon that it makes sense to allow kubeadm to use the local endpoints, especially since that would open up the possibility of having a workflow where having the LB configured is no longer a pre-requisite, and can be added as a post-installation step.

@neolit123
Copy link
Member Author

I'm hesitant to document "solutions" at this point since all the possibilities are hacky workarounds. Ideally kubeadm would let us optionally either a) use the local api endpoint for the API Server check or b) skip that check altogether (I prefer option a).

as mentioned on slack and on kubernetes-sigs/cluster-api-provider-azure#974 (comment) i do not disagree with the idea.

@cloudcafetech
Copy link

cloudcafetech commented Aug 14, 2021

@neolit123

Can you please help me to troubleshoot for HA Kubenetes using Kubeadm with corporate F5 load balancer.
I checked, not able to find proper solution for HA Kubenetes using Kubeadm with corporate F5 load balancer, most case its haproxy. With HAPRoxy its running smooth.

Q1. Any special configuration reuired for F5 load balancer to setup HA Kubenetes using Kubeadm.

Note: I am getting response from F5 load balancer (drk8s.pkonline.com) on port 6443 using nc command (nc -v -w 1 drk8s.pkonline.com 6443

  • Error
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I0812 22:25:59.895513 2968264 request.go:1107] Request Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"kubeadm-config","namespace":"kube-system","creationTimestamp":null},"data":{"ClusterConfiguration":"apiServer:\n  extraArgs:\n    authorization-mode: Node,RBAC\n  timeoutForControlPlane: 4m0s\napiVersion: kubeadm.k8s.io/v1beta2\ncertificatesDir: /etc/kubernetes/pki\nclusterName: kubernetes\ncontrolPlaneEndpoint: drk8s.pkonline.com:6443\ncontrollerManager: {}\ndns:\n  type: CoreDNS\netcd:\n  local:\n    dataDir: /var/lib/etcd\nimageRepository: k8s.gcr.io\nkind: ClusterConfiguration\nkubernetesVersion: v1.20.10\nnetworking:\n  dnsDomain: cluster.local\n  podSubnet: 192.168.0.0/16\n  serviceSubnet: 10.96.0.0/12\nscheduler: {}\n","ClusterStatus":"apiEndpoints:\n  rk8ctrlplatnprod2.wkctls.local:\n    advertiseAddress: 172.26.32.255\n    bindPort: 6443\napiVersion: kubeadm.k8s.io/v1beta2\nkind: ClusterStatus\n"}}
I0812 22:25:59.895586 2968264 round_trippers.go:425] curl -k -v -XPOST  -H "Accept: application/json, */*" -H "Content-Type: application/json" -H "User-Agent: kubeadm/v1.20.0 (linux/amd64) kubernetes/af46c47" 'https://drk8s.pkonline.com:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s'
I0812 22:25:59.898186 2968264 round_trippers.go:445] POST https://drk8s.pkonline.com:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s  in 2 milliseconds
I0812 22:25:59.898199 2968264 round_trippers.go:451] Response Headers:
I0812 22:26:00.398430 2968264 request.go:1107] Request Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"kubeadm-config","namespace":"kube-system","creationTimestamp":null},"data":{"ClusterConfiguration":"apiServer:\n  extraArgs:\n    authorization-mode: Node,RBAC\n  timeoutForControlPlane: 4m0s\napiVersion: kubeadm.k8s.io/v1beta2\ncertificatesDir: /etc/kubernetes/pki\nclusterName: kubernetes\ncontrolPlaneEndpoint: drk8s.pkonline.com:6443\ncontrollerManager: {}\ndns:\n  type: CoreDNS\netcd:\n  local:\n    dataDir: /var/lib/etcd\nimageRepository: k8s.gcr.io\nkind: ClusterConfiguration\nkubernetesVersion: v1.20.10\nnetworking:\n  dnsDomain: cluster.local\n  podSubnet: 192.168.0.0/16\n  serviceSubnet: 10.96.0.0/12\nscheduler: {}\n","ClusterStatus":"apiEndpoints:\n  rk8ctrlplatnprod2.wkctls.local:\n    advertiseAddress: 172.26.32.255\n    bindPort: 6443\napiVersion: kubeadm.k8s.io/v1beta2\nkind: ClusterStatus\n"}}
I0812 22:26:00.398524 2968264 round_trippers.go:425] curl -k -v -XPOST  -H "Accept: application/json, */*" -H "Content-Type: application/json" -H "User-Agent: kubeadm/v1.20.0 (linux/amd64) kubernetes/af46c47" 'https://drk8s.pkonline.com:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s'
I0812 22:26:00.401086 2968264 round_trippers.go:445] POST https://drk8s.pkonline.com:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s  in 2 milliseconds
I0812 22:26:00.401112 2968264 round_trippers.go:451] Response Headers:
I0812 22:26:00.898459 2968264 request.go:1107] Request Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"kubeadm-config","namespace":"kube-system","creationTimestamp":null},"data":{"ClusterConfiguration":"apiServer:\n  extraArgs:\n    authorization-mode: Node,RBAC\n  timeoutForControlPlane: 4m0s\napiVersion: kubeadm.k8s.io/v1beta2\ncertificatesDir: /etc/kubernetes/pki\nclusterName: kubernetes\ncontrolPlaneEndpoint: drk8s.pkonline.com:6443\ncontrollerManager: {}\ndns:\n  type: CoreDNS\netcd:\n  local:\n    dataDir: /var/lib/etcd\nimageRepository: k8s.gcr.io\nkind: ClusterConfiguration\nkubernetesVersion: v1.20.10\nnetworking:\n  dnsDomain: cluster.local\n  podSubnet: 192.168.0.0/16\n  serviceSubnet: 10.96.0.0/12\nscheduler: {}\n","ClusterStatus":"apiEndpoints:\n  rk8ctrlplatnprod2.wkctls.local:\n    advertiseAddress: 172.26.32.255\n    bindPort: 6443\napiVersion: kubeadm.k8s.io/v1beta2\nkind: ClusterStatus\n"}}
I0812 22:26:00.898564 2968264 round_trippers.go:425] curl -k -v -XPOST  -H "Accept: application/json, */*" -H "Content-Type: application/json" -H "User-Agent: kubeadm/v1.20.0 (linux/amd64) kubernetes/af46c47" 'https://drk8s.pkonline.com:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s'
I0812 22:26:00.901332 2968264 round_trippers.go:445] POST https://drk8s.pkonline.com:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s  in 2 milliseconds
I0812 22:26:00.901347 2968264 round_trippers.go:451] Response Headers:
I0812 22:26:01.398453 2968264 request.go:1107] Request Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"kubeadm-config","namespace":"kube-system","creationTimestamp":null},"data":{"ClusterConfiguration":"apiServer:\n  extraArgs:\n    authorization-mode: Node,RBAC\n  timeoutForControlPlane: 4m0s\napiVersion: kubeadm.k8s.io/v1beta2\ncertificatesDir: /etc/kubernetes/pki\nclusterName: kubernetes\ncontrolPlaneEndpoint: drk8s.pkonline.com:6443\ncontrollerManager: {}\ndns:\n  type: CoreDNS\netcd:\n  local:\n    dataDir: /var/lib/etcd\nimageRepository: k8s.gcr.io\nkind: ClusterConfiguration\nkubernetesVersion: v1.20.10\nnetworking:\n  dnsDomain: cluster.local\n  podSubnet: 192.168.0.0/16\n  serviceSubnet: 10.96.0.0/12\nscheduler: {}\n","ClusterStatus":"apiEndpoints:\n  rk8ctrlplatnprod2.wkctls.local:\n    advertiseAddress: 172.26.32.255\n    bindPort: 6443\napiVersion: kubeadm.k8s.io/v1beta2\nkind: ClusterStatus\n"}}

I0812 21:37:09.262180 2932684 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.0 (linux/amd64) kubernetes/af46c47" 'https://drk8s.pkonline.com:6443/healthz?timeout=10s'
I0812 21:37:09.264855 2932684 round_trippers.go:445] GET https://drk8s.pkonline.com:6443/healthz?timeout=10s  in 2 milliseconds
I0812 21:37:09.264872 2932684 round_trippers.go:451] Response Headers:

@neolit123
Copy link
Member Author

neolit123 commented Aug 14, 2021 via email

@cloudcafetech
Copy link

cloudcafetech commented Aug 14, 2021

Thanks for reply.

I tried on same host with --control-plane-endpoint=same host IP:6443 & its install successfully. So issue should be for F5 LB only.

We don't provide support in the kubeadm issue tracker anymore. Please check
the README.md for some useful links.

Please help if any pointer will be fine enough. Struggling last few days :(

@don-theojosh
Copy link

Thanks for reply.

I tried on same host with --control-plane-endpoint=same host IP:6443 & its install successfully. So issue should be for F5 LB only.

We don't provide support in the kubeadm issue tracker anymore. Please check
the README.md for some useful links.

Please help if any pointer will be fine enough. Struggling last few days :(

@cloudcafetech Did you ever find the solution?

@robinjanke
Copy link

I still have some issues using kubeadm init. I have executed the command iptables -t nat -A OUTPUT -p all -d "$nlbip" -j DNAT --to-destination 127.0.0.1 on my hetzner cloud master-node before. Also, when I do a ping, it seems like I get the response from localhost. Does any of you have an idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/HA help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/design Categorizes issue or PR as related to design. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests