Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied #612

Closed
farzadmf opened this issue May 19, 2021 · 12 comments
Closed
Assignees
Labels
bug Something isn't working
Milestone

Comments

@farzadmf
Copy link

What did you do

  • How was the cluster created?

    • k3d cluster create --kubeconfig-update-default=false
  • What did you do afterwards?
    kubectl get nodes gives the following output (with a noticeable delay):

NAME                       STATUS     ROLES                  AGE   VERSION
k3d-k3s-default-server-0   NotReady   control-plane,master   36s   v1.20.6+k3s1

kubectl get pods gives the following output (with even a longer delay):

Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods)

and this:

Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)

docker logs k3d-k3s-default-server-0 2>&1| tail -10 gives the following output

I0519 23:21:18.750200       7 cache.go:39] Caches are synced for AvailableConditionController controller
I0519 23:21:18.750565       7 shared_informer.go:247] Caches are synced for cluster_authentication_trust_controller 
I0519 23:21:18.750789       7 shared_informer.go:247] Caches are synced for crd-autoregister 
I0519 23:21:18.751149       7 apf_controller.go:266] Running API Priority and Fairness config worker
I0519 23:21:19.511461       7 node.go:172] Successfully retrieved node IP: 172.19.0.2
I0519 23:21:19.511490       7 server_others.go:143] kube-proxy node IP is an IPv4 address (172.19.0.2), assume IPv4 operation
I0519 23:21:19.512050       7 server_others.go:186] Using iptables Proxier.
I0519 23:21:19.512221       7 server.go:650] Version: v1.20.6+k3s1
I0519 23:21:19.512534       7 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 393216
F0519 23:21:19.512556       7 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

OS operations (e.g. shutdown/reboot)? Not directly, but have been having this situation for a while

What did you expect to happen

I expect the cluster to be created successfully

Which OS & Architecture

  • Linux, Windows, MacOS / amd64, x86, ...?
    Manjaro Linux, output of uname -r: 5.11.19-1-MANJARO

Which version of k3d

  • output of k3d version:
k3d version v4.4.4
k3s version v1.20.6-k3s1 (default)

Which version of docker

  • output of docker version and docker info
Client:
 Version:           20.10.6
 API version:       1.41
 Go version:        go1.16.3
 Git commit:        370c28948e
 Built:             Mon Apr 12 14:10:41 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.6
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.3
  Git commit:       8728dd246c
  Built:            Mon Apr 12 14:10:25 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.5.0
  GitCommit:        8c906ff108ac28da23f69cc7b74f8e7a470d1df0.m
 runc:
  Version:          1.0.0-rc94
  GitCommit:        2c7861bc5e1b3e756392236553ec14a78a09f8bf
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-tp-docker)

Server:
 Containers: 2
  Running: 1
  Paused: 0
  Stopped: 1
 Images: 121
 Server Version: 20.10.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8c906ff108ac28da23f69cc7b74f8e7a470d1df0.m
 runc version: 2c7861bc5e1b3e756392236553ec14a78a09f8bf
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.11.19-1-MANJARO
 Operating System: Manjaro Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 31.06GiB
 Name: wrklenovo
 ID: DLVR:63PR:OMGG:5XIZ:VX4B:GIPR:6NV2:LYLN:WYJN:NWHM:6L44:WRSA
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No kernel memory limit support
WARNING: No oom kill disable support
@farzadmf farzadmf added the bug Something isn't working label May 19, 2021
@admun
Copy link

admun commented May 20, 2021

I am seeing the same issue.

running: k3d cluster create k3s-local -v /dev/mapper:/dev/mapper -a 3 -p 8081:80@loadbalancer

k3d --version
k3d version v4.4.3
k3s version v1.20.6-k3s1 (default)

docker --version
Docker version 20.10.6, build 370c289

docker version
Client: Docker Engine - Community
Version: 20.10.6
API version: 1.41
Go version: go1.13.15
Git commit: 370c289
Built: Fri Apr 9 22:47:35 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.6
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 8728dd2
Built: Fri Apr 9 22:45:20 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.4
GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc:
Version: 1.0.0-rc93
GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
docker-init:
Version: 0.19.0
GitCommit: de40ad0

cat /etc/os-release
NAME=Fedora
VERSION="34 (Workstation Edition)"
ID=fedora
VERSION_ID=34
VERSION_CODENAME=""
PLATFORM_ID="platform:f34"
PRETTY_NAME="Fedora 34 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:34"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/34/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=34
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=34
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation

@iwilltry42 iwilltry42 self-assigned this May 20, 2021
@iwilltry42 iwilltry42 added this to the Backlog milestone May 20, 2021
@iwilltry42
Copy link
Member

iwilltry42 commented May 20, 2021

Hi @farzadmf , thanks for opening this issue!
However, it seems like this is a duplicate of #607 that I've just now created an FAQ entry for: https://k3d.io/faq/faq/#nodes-fail-to-start-or-get-stuck-in-notready-state-with-log-nf_conntrack_max-permission-denied

Your kernel version didn't match, so I quickly looked up the 5.11.19 release and verified that that "breaking commit" also landed there (if you search for nf_conntrack). I updated the FAQ and the original issue accordingly.

Feel free to reopen, if you think this is something different or you need more support 👍

@farzadmf
Copy link
Author

Thank you so much @iwilltry42 for the FAQ link; it's working now 👍

Also, looking at the FAQ, I see that k3s-io/k3s#3337 has been merged, so I thought it should be fixed in k3d as well given that I'm using the v1.20.6-k3s1 tag of rancher/k3s. Isn't that the case?

@jtyr
Copy link

jtyr commented May 20, 2021

This seems to work:

k3d cluster create \
  --k3s-server-arg '--kube-proxy-arg=conntrack-max-per-core=0' \
  --k3s-agent-arg '--kube-proxy-arg=conntrack-max-per-core=0' \
  --image rancher/k3s:v1.20.7-k3s1

@iwilltry42
Copy link
Member

Thank you so much @iwilltry42 for the FAQ link; it's working now +1

Also, looking at the FAQ, I see that k3s-io/k3s#3337 has been merged, so I thought it should be fixed in k3d as well given that I'm using the v1.20.6-k3s1 tag of rancher/k3s. Isn't that the case?

@farzadmf , actually the team just had to release a new version after they merged that PR: https://github.com/k3s-io/k3s/releases/tag/v1.20.7%2Bk3s1 (btw: it was backported to some earlier versions as well). You can choose the image that k3d uses via --image rancher/k3s:v1.20.7-k3s1 and the default image will change with the next k3d release to the k3s image that's considered stable by that time.

k3d cluster create
--k3s-server-arg '--kube-proxy-arg=conntrack-max-per-core=0'
--k3s-agent-arg '--kube-proxy-arg=conntrack-max-per-core=0'
--image rancher/k3s:v1.20.7-k3s1

@jtyr , with --image rancher/k3s:v1.20.7-k3s1 you should not need the kube-proxy args anymore. Could you give it a try without those?

@jtyr
Copy link

jtyr commented May 21, 2021

I can confirm that I can create cluster with the only additional parameter --image rancher/k3s:v1.20.7-k3s1 (no additional --k3s-server-arg nor --k3s-agent-arg is required anymore).

@admun
Copy link

admun commented May 21, 2021

I just created a cluster w/ the new 1.20.7 k3s, but seeing error in the cluster....

k3d cluster create k3s-local -v /dev/mapper:/dev/mapper -a 3 -p 8081:80@loadbalancer --image rancher/k3s:v1.20.7-k3s1

not sure this is a upstream issue or my local setup (fedora 34, docker)

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "docker.io/rancher/pause:3.1": failed to pull image "docker.io/rancher/pause:3.1": failed to pull and unpack image "docker.io/rancher/pause:3.1": failed to resolve reference "docker.io/rancher/pause:3.1": failed to do request: Head "https://registry-1.docker.io/v2/rancher/pause/manifests/3.1": dial tcp: lookup registry-1.docker.io: no such host

@iwilltry42
Copy link
Member

@admun this seems to be an issue with the network in your setup.
On Fedora this might be related to firewalld not allowing the docker bridge network.
There's something about this in other issues as well.

@admun
Copy link

admun commented May 21, 2021

I actually has firewalld turned off... will debug further

@ppicom
Copy link

ppicom commented Aug 7, 2021

This also affects the installation of rancher on a single node using docker

@iwilltry42
Copy link
Member

@ppicom , can you elaborate? What's up there? Which part of the issue is causing your problem and is it solved with one of the suggestions?

@PCatinean
Copy link

@ppicom @iwilltry42 I had the same issue on a single node docker installation and this was the workaround -> rancher/rancher#33300 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants