Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests from within the cluster timeout #2208

Closed
asankov-cb opened this issue Apr 20, 2021 · 5 comments
Closed

Requests from within the cluster timeout #2208

asankov-cb opened this issue Apr 20, 2021 · 5 comments
Assignees
Labels
kind/external upstream bugs kind/support Categorizes issue or PR as a support question.

Comments

@asankov-cb
Copy link

What happened:
I am running a Kind cluster. This cluster has 2 services which periodically communicate to a backend via both HTTP and gRPC. However, all of the outbound requests fail with a timeout.

What you expected to happen:
Requests are successful.

How to reproduce it (as minimally and precisely as possible):
To do this you would need access to an enterprise product, which is still not-GA. I am opening this issue more as a question and looking for advice about what I need to look for, and whether someone else has had a similar problem.

When googling I came across #717. I had issues with resource limits again, so I have them pretty high. Anyhow, I bumped them up once again (CPUs - 10, Memory - 12GB, Swap - 1.5GB, Disk image size 80GB/61GB used), and I still reproduce the issue.

I am opening this issue with kind, because we are not able to reproduce the problem on any other environment (EKS, AKS, etc.).
Anything else we need to know?:
Here I have some log messages. The first one is from a failing HTTP request to the backend:

time="2021-04-20T10:14:44Z" level=warning msg="failed to update cluster status: Get \"<URL_OBFUSCATED>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

and the other ones is for failing gRPC request:

time="2021-04-20T10:14:56Z" level=error msg="failed to send resource updated message: rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: EOF\""

Environment:

  • kind version: (use kind version):
    • kind version 0.11.0-alpha
    • but also with kind version 0.10.0
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-14T05:14:17Z", GoVersion:"go1.15.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-21T01:11:42Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
  scan: Docker Scan (Docker Inc., v0.6.0)

Server:
 Containers: 92
  Running: 57
  Paused: 0
  Stopped: 35
 Images: 144
 Server Version: 20.10.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.10.25-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 10
 Total Memory: 11.7GiB
 Name: docker-desktop
 ID: NW2M:Y7PR:B26O:BX7O:2PR4:7RM2:BISS:QURF:B3MB:LJIK:7AC5:A3MN
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
  • OS (e.g. from /etc/os-release):
@asankov-cb asankov-cb added the kind/bug Categorizes issue or PR as related to a bug. label Apr 20, 2021
@BenTheElder BenTheElder added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Apr 20, 2021
@BenTheElder
Copy link
Member

To do this you would need access to an enterprise product, which is still not-GA. I am opening this issue more as a question and looking for advice about what I need to look for, and whether someone else has had a similar problem.

There's really not a lot we can do here to debug your external service timing out. There are many factors ...

kind version 0.11.0-alpha

I don't recommend using pre-release versions of kind unless you're developing Kubernetes itself.

If you do however, it's helpful to use a version built with the makefile so we get the git information which is not possible with go get etc.

@asankov-cb
Copy link
Author

There's really not a lot we can do here to debug your external service timing out. There are many factors ...

I understand that. As said, I was just wondering whether this kind of error would ring a bell for someone. If not, I guess we can close this, and I will be back to debugging the networking of my local installation.

I don't recommend using pre-release versions of kind unless you're developing Kubernetes itself.

I see, but I also tried with kind 0.10.0 and got the same errors with that as well.

@aojea
Copy link
Contributor

aojea commented Apr 21, 2021

it is a connectivity problem, so you have to find where the connectivity is broken;
Are you able to connect to the external service from the VM where kind is running?
If yes, are you able to reach the service from one of the kind nodes (container)?
if yes, the traceroute from the pod, check if the external service receives the request but the reply never gets back to the pod, ...

@asankov-cb
Copy link
Author

This was due to a bug in Docker Desktop. They introduced some kind of proxy which was not using the local cert store and presented a lot of issues. This being just one of them. It was fixed relatively quick. Currently, I am with version 3.3.3 of Docker Desktop and this does not reproduce anymore.

@BenTheElder
Copy link
Member

Phew! Thanks for following up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/external upstream bugs kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

3 participants