-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HA: improvements to the load balancer documentation #1685
Comments
I've seen several people discussing this on slack over the last few days. The problem seems to primarily be lack of hairpin routing on Azure. Here is the relevant azure documentation line about the limitation of their LB, and a workaround. "Unlike public Load Balancers which provide outbound connections when transitioning from private IP addresses inside the virtual network to public IP addresses, internal Load Balancers do not translate outbound originated connections to the frontend of an internal Load Balancer as both are in private IP address space. This avoids potential for SNAT port exhaustion inside unique internal IP address space where translation is not required. The side effect is that if an outbound flow from a VM in the backend pool attempts a flow to frontend of the internal Load Balancer in which pool it resides and is mapped back to itself, both legs of the flow don't match and the flow will fail. If the flow did not map back to the same VM in the backend pool which created the flow to the frontend, the flow will succeed...There are several common workarounds for reliably achieving this scenario (originating flows from a backend pool to the backend pools respective internal Load Balancer frontend) which include either insertion of a proxy layer behind the internal Load Balancer or using DSR style rules. Customers can combine an internal Load Balancer with any 3rd party proxy or substitute internal Application Gateway for proxy scenarios limited to HTTP/HTTPS. While you could use a public Load Balancer to mitigate, the resulting scenario is prone to SNAT exhaustion and should be avoided unless carefully managed." (https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-overview). So the current advice for HA of using the LB as the control plane endpoint is broken for azure. Its not the most elegant solution, but I wonder if a piece of networking duct tape ( I.e. an iptables rule ) could help people avoid setting up their own load balancer on Azure. I'll have to look a bit more at Microsoft's workaround suggestion. |
It looks like this isnt just a problem with Azure. From AWS "Internal load balancers do not support hairpinning or loopback. When you register targets by instance ID, the source IP addresses of clients are preserved. If an instance is a client of an internal load balancer that it's registered with by instance ID, the connection succeeds only if the request is routed to a different instance. Otherwise, the source and destination IP addresses are the same and the connection times out." (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html) /assign |
we have both azure and aws experts in k8s land, so we might be able to get more feedback on this. |
To clarify, AWS EC2 NLBs do support hairpin connections provided that the targets are registered by IP address, as opposed to by instance ID. The documentation quoted above doesn’t say that explicitly, but I can confirm that it works. |
For AWS:
@justaugustus should be able to provide some more detail on Azure based on his work with the Azure provider for Cluster API |
It is important to note that autoscaling groups register by instance id. We transitioned from NLBs to ELBs because of this. |
They do if you tell them to. We use ASGs, but we run an initialization procedure that registers the instance by its IP address with a discoverable target group, and unregisters it as the machine shuts down (sometimes inconvenient when rebooting). |
Hi! I'm the poster from Stackoverflow. We abandoned the iptables hack and went for HAProxy. We are still working on it thought. It complicates everything, and this is our first time having to configure such a piece of software (how do we add and remove nodes? autoscaling? VRRP?). For us it feels like we are in "Kubernetes the hard way" land, even if Kubeadm tries its best to help :) We are also wondering if going for Azure Application Gateway (L7) would work? Thanks for paying attention to this guys! |
A L4 load balancer should be sufficient in most cases, but since there is no heartbeat or activity sent at all during
To reproduce, the only thing needed is to create an HA cluster with I don't have a good solution if there's no activity at all. At SUSE we fixed this in the past by using HAProxy as a L7 load balancer, so we could add specific API endpoint timeouts (thus, I think this is worth some investigation to understand what could be done and/or documented regarding the load balancer. |
who is closing the connection? you can always set keepalives to keep the connection up |
HAProxy is closing it.
There's no need, you can reach any apiserver safely, as far as I can tell. |
have you considered the "option clitcpka" and "option srvtcpka" on haproxy? |
Really worth taking a look (and maybe add them as default in the haproxy created by kind?). If this works as expected we could as well document this on I will have a look and report back, thanks @aojea! |
No luck, |
nice explanation about haproxy here https://stackoverflow.com/a/32635324/7794348 |
For this Azure Application Gateway (L7) topic, 1) HAProxy approach
Azure doesn't seem supporting VRRP floatip ip. so had to create redundant HAProxy nodes under a L4 loadbalacner to get a virtual ip. This works, but it seems complicating our infrastructure. 2) Azure application gateway (L7) approach ? if we can use Azure application gateway (= internal L7 load balancer) for a control-plane endpoint, it would make things simpler like this.
But it seems a bit complicated to set up SSL certificate keys for end-to-end api-server SSL communication. |
thanks for the comments and discussion on this, WRT Azure LBs it seems that the cloud provider needs to make some adjustments to make it easier for users. |
why doesn't kubectl logs / the logs API have ping/pong / keepalive, or perform a reconnect? |
adding help-wanted label. |
Completely forgot that I was going to link my PR on this issue: kubernetes/kubernetes#81179. It has been closed already because this will come directly implemented in golang (possibly golang/net#55). Whenever that happens I will check if all use cases are covered. The golang solution is definitely the right place to handle the generic use case, at least for sending casual HTTP/2 ping frames over the wire to the server end. On my PR I stumbled across some issues, mostly regarding how golang hides the real transport being used with the RoundTrip interface, and bundling the http/2 logic as private implementation inside http. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/lifecycle frozen |
so this issue ended up accumulating a collection of different (or related) LB problems including lack of hairpin in Azure (maybe this works nowadays), issues in haproxy, improvements in golang with respect to sending ping frames for http2 (this seems to have merged), and potential improvement of exposing the ping interval in client-go (kubernetes/kubernetes#81179 (comment) seems viable still). also we still are not covering the following exactly in our docs:
but we do have a dedicated LB doc here now: so basically with the combination of our k/website docs: we are telling the users to use the guide we give them for VIP/LB or they are on they own when setting up LB in a CP. i don't think there is anything substantionally actionable for the kubeadm team in this ticket at this point. but if you think there is something let's log separate concise tickets. or maybe just send PRs for the new doc. thanks for the discussion. |
@neolit123: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@neolit123 I'm running into this exact issue (the hairpin routing one) trying to set up private clusters with Cluster API on Azure: kubernetes-sigs/cluster-api-provider-azure#974 (comment) Has anyone in this thread found a workaround to get kubeadm init to work when using an Azure load balancer? If not, should I open a new issue to track this? Slack thread: https://kubernetes.slack.com/archives/C2P1JHS2E/p1603230818143600 |
@CecileRobertMichon I chose "the very hacky iptables duct tape" (with a heavy heart) but had no issues the last couple of month (AWS/NLB) |
@pschulten can you please outline your iptables routing hack in a comment here? @CecileRobertMichon we could add a section in there with solutions for solving the hairpin problem, but this feels like something someone that experienced the problem should contribute and not the kubeadm maintainers, since none of the active kubeadm maintainers have experienced this. so happy to reopen this ticket and assign you or someone else, but one of the reason it was closed was that nobody stepped up to write the docs... @mbert you might have an opinion about this too. |
sure it's just something someone mentioned in a related issue.
impl (nlbname injected from the outside): #!/bin/bash
nlbname=${nlb_name}
az=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
nlbid=$(docker run --rm -e AWS_REGION=eu-central-1 -e AWS_DEFAULT_REGION=eu-central-1 mesosphere/aws-cli elbv2 describe-load-balancers --names "$nlbname" --query 'LoadBalancers[*].LoadBalancerArn' --output text | sed 's#.*/\(.*\)$#\1#')
nlbip=$(docker run --rm -e AWS_REGION=eu-central-1 -e AWS_DEFAULT_REGION=eu-central-1 mesosphere/aws-cli ec2 describe-network-interfaces --filters Name=description,Values="*$nlbid" Name=availability-zone,Values=$az --query 'NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress' --output text)
if [ -z "$nlbid" ] ; then
printf "Unable to find fronting NLB ip for AZ: %s" "$az" | systemd-cat --priority=err
exit 1
fi
printf "iptables -t nat -A OUTPUT -p all -d %s -j DNAT --to-destination 127.0.0.1" "$nlbip" | systemd-cat --priority=warning
iptables -t nat -A OUTPUT -p all -d "$nlbip" -j DNAT --to-destination 127.0.0.1 |
thanks! |
@neolit123 I'm hesitant to document "solutions" at this point since all the possibilities are hacky workarounds. Ideally kubeadm would let us optionally either a) use the local api endpoint for the API Server check or b) skip that check altogether (I prefer option a). I'm thinking of using the iptables workaround for now to unblock CAPZ but I'm happy to contribute a proposal/implementation if the above is something the maintainers would consider. I was mostly trying to see if this was already possible, but looks like it's not? |
Creating a NAT rule for the external IP is indeed probably the best way to resolve the issue. I'm not sure there is a good way to automate it for all the various combinations that would need to be supported, though. For example:
I do think external orchestration systems (such as Cluster API) could automate these bits because they would know more about the systems being orchestrated. It probably would help to add some generic documentation to the docs around this. All that said, I do agree with @CecileRobertMichon that it makes sense to allow kubeadm to use the local endpoints, especially since that would open up the possibility of having a workflow where having the LB configured is no longer a pre-requisite, and can be added as a post-installation step. |
as mentioned on slack and on kubernetes-sigs/cluster-api-provider-azure#974 (comment) i do not disagree with the idea. |
Can you please help me to troubleshoot for HA Kubenetes using Kubeadm with corporate F5 load balancer. Q1. Any special configuration reuired for F5 load balancer to setup HA Kubenetes using Kubeadm.Note: I am getting response from F5 load balancer (drk8s.pkonline.com) on port 6443 using nc command (
|
Hi, it's hard to say what the problem might be right away.
We don't provide support in the kubeadm issue tracker anymore. Please check
the README.md for some useful links.
|
Thanks for reply. I tried on same host with --control-plane-endpoint=same host IP:6443 & its install successfully. So issue should be for F5 LB only.
Please help if any pointer will be fine enough. Struggling last few days :( |
@cloudcafetech Did you ever find the solution? |
I still have some issues using |
i'm starting to see more reports of people having issues with setting up LB in a cloud provider, see:
kubernetes/website#14258
https://stackoverflow.com/questions/56768956/how-to-use-kubeadm-init-configuration-parameter-controlplaneendpoint/57121454#57121454
i also saw a report on the VMware internal slack the other day.
the common problems are:
ideally we should document some more LB aspects/best practices of this in our HA doc, even if LBs are out of scope for kubeadm:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
some of the comments here need discussion:
https://stackoverflow.com/a/57121454
for 1. the primary CP node should be doing exactly that as there are no other nodes.
2 on the other hand is odd, when a new CP node is joining it should start serving on the LB only after it has finished the join control plane process..
cc @ereslibre @aojea @fabriziopandini @RockyFu267
@kubernetes/sig-cluster-lifecycle
@rcythr related to your PR:
kubernetes/website#15372
The text was updated successfully, but these errors were encountered: