Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cluster doesn't shutdown gracefully #1117

Closed
arikmaor opened this issue Aug 2, 2022 · 2 comments
Closed

[BUG] Cluster doesn't shutdown gracefully #1117

arikmaor opened this issue Aug 2, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@arikmaor
Copy link
Contributor

arikmaor commented Aug 2, 2022

When stopping a k3d cluster, the pods don't exit gracefully.
This can cause stateful services such as mongodb and other databases to fail restart.

To simplify, I'm describing the behavior with a simple script.

What did you do

  • How was the cluster created?
    k3d cluster create test

  • What did you do afterwards?

    • kubectl apply -f test-pod.yaml
    • kubectl logs -f sleep-test
    • on a different terminal: k3d cluster stop test

These are the relevant files:

Dockerfile
FROM alpine
COPY main.sh main.sh
CMD sh main.sh
main.sh
WAIT_SECONDS=${WAIT_SECONDS:=5}

function cleanup() {
  echo sleeping $WAIT_SECONDS seconds before exit...
  sleep $WAIT_SECONDS
  echo done waiting!
  exit
}

trap cleanup SIGTERM SIGHUP SIGINT

while true; echo sleeping...; do sleep 5; done
test-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: sleep-test
spec:
  containers:
    - name: main
      image: MY_COMPANIES_REGITRY/sleep-test

What did you expect to happen

I expected the behavior to be the same as when I'm exiting a docker container:

$ docker run -d --name sleep-test sleep-test
c09251a7678c1cdcd5c40211b61adb13ff521dc0edac14c7dc3a70a9a5a5e3f8
$ docker stop sleep-test
sleep-test
$ docker logs sleep-test
sleeping...
sleeping...
sleeping...
sleeping 5 seconds before exit...
done waiting!

Instead I'm getting a crash:

sleeping...
sleeping...
sleeping...
error: unexpected EOF

Just to make it clear, this is just a stupid script to show the problem. The problem is critical for databases like mongodb and others that sometimes will not recover automatically from an ungraceful shutdown.

Which OS & Architecture

name: docker
endpoint: /var/run/docker.sock
version: 20.10.17
ostype: linux
os: Docker Desktop
arch: aarch64
cgroupversion: "2"
cgroupdriver: cgroupfs
filesystem: extfs

Which version of k3d

k3d version v5.4.3
k3s version v1.23.6-k3s1 (default)

Which version of docker

Client:
 Cloud integration: v1.0.24
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:04:45 2022
 OS/Arch:           darwin/arm64
 Context:           default
 Experimental:      true

Server: Docker Desktop 4.10.1 (82475)
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:01:01 2022
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.6
  GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc:
  Version:          1.1.2
  GitCommit:        v1.1.2-0-ga916309
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.8.2)
  compose: Docker Compose (Docker Inc., v2.6.1)
  extension: Manages Docker extensions (Docker Inc., v0.2.7)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 4
  Running: 0
  Paused: 0
  Stopped: 4
 Images: 67
 Server Version: 20.10.17
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: v1.1.2-0-ga916309
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.104-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 5
 Total Memory: 5.8GiB
 Name: docker-desktop
 ID: NIIU:S4SM:XLJM:LFAA:C5YD:LA2N:GHHW:7ZSG:BVA7:N7YW:RWJL:F6PK
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5000
  127.0.0.0/8
 Live Restore Enabled: false
@arikmaor arikmaor added the bug Something isn't working label Aug 2, 2022
@arikmaor
Copy link
Contributor Author

arikmaor commented Aug 3, 2022

I get the same problem when using docker stop instead of k3d cluster stop to stop the container
So perhaps it's a problem k3s image that doesn't send SIGTERM to the pods before existing?

@arikmaor
Copy link
Contributor Author

arikmaor commented Aug 4, 2022

Managed to get things working and submitted a PR for you

The proper solution IMO is to drain the node before letting the container close by adding this to k3d-entrypoint.sh:

/bin/k3s "$@" &
k3s_pid=$!

until kubectl uncordon $HOSTNAME; do sleep 3; done

function cleanup() {
  echo Draining node...
  kubectl drain $HOSTNAME --force --delete-emptydir-data
  echo Sending SIGTERM to k3s...
  kill -15 $k3s_pid
  echo Waiting for k3s to close...
  wait $k3s_pid
  echo Bye!
}

trap cleanup SIGTERM SIGINT SIGQUIT SIGHUP

wait $k3s_pid
echo Bye!

This implements what k8s recommends to do before shutting down a node for maintenance, which is similar to the situation of a computer shutdown or stopping of the container

The way I control the signals is explained in docker documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants