Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] k3d v5 client, create a new cluster: error "Failed to patch CoreDNS ConfigMap", cli rolls back and deletes cluster #779

Closed
jrhoward opened this issue Oct 9, 2021 · 7 comments
Assignees
Labels
bug Something isn't working priority/high
Milestone

Comments

@jrhoward
Copy link

jrhoward commented Oct 9, 2021

What did you do

Upgraded k3d cli to v5.0.0 to create a new cluster, a non fatal error occurred but the cli rolled back and deleted the cluster

Short description:

ERRO[0018] Failed Cluster Start: failed to inject host IP: failed to inject host record "172.24.0.1 host.k3d.internal" into CoreDNS ConfigMap: Failed to patch CoreDNS ConfigMap to include entry '172.24.0.1 host.k3d.internal' (see debug logs)
ERRO[0018] Failed to create cluster >>> Rolling Back
INFO[0018] Deleting cluster 'k3s-default'

May be related to this #492

I changed back to the latest on v4 cli and it worked

  • How was the cluster created?

    • k3d cluster create
  • What did you do afterwards?
    Nothing. I did temporarily disabled SELinix , the cluster creation still failed

What did you expect to happen

A new k3d cluster to be running with a warning about the coredns error but instead it failed, rolled back and deleted the cluster.

Screenshots or terminal output

k3d cluster create --verbose
DEBU[0000] Runtime Info:
&{Name:docker Endpoint:/var/run/docker.sock Version:20.10.8 OSType:linux OS:...Arch:x86_64 CgroupVersion:2 CgroupDriver:systemd Filesystem:UNKNOWN} 
DEBU[0000] Additional CLI Configuration:
cli:
  api-port: ""
  env: []
  k3s-node-labels: []
  k3sargs: []
  ports: []
  registries:
    create: ""
  runtime-labels: []
  volumes: [] 
DEBU[0000] Configuration:
agents: 0
image: docker.io/rancher/k3s:latest
network: ""
options:
  k3d:
    disableimagevolume: false
    disableloadbalancer: false
    disablerollback: false
    loadbalancer:
      configoverrides: []
    timeout: 0s
    wait: true
  kubeconfig:
    switchcurrentcontext: true
    updatedefaultkubeconfig: true
  runtime:
    agentsmemory: ""
    gpurequest: ""
    serversmemory: ""
registries:
  config: ""
  use: []
servers: 1
subnet: ""
token: "" 
DEBU[0000] ========== Simple Config ==========
{TypeMeta:{Kind:Simple APIVersion:k3d.io/v1alpha3} Name: Servers:1 Agents:0 ExposeAPI:{Host: HostIP: HostPort:} Image:docker.io/rancher/k3s:latest Network: Subnet: ClusterToken: Volumes:[] Ports:[] Options:{K3dOptions:{Wait:true Timeout:0s DisableLoadbalancer:false DisableImageVolume:false NoRollback:false NodeHookActions:[] Loadbalancer:{ConfigOverrides:[]}} K3sOptions:{ExtraArgs:[] NodeLabels:[]} KubeconfigOptions:{UpdateDefaultKubeconfig:true SwitchCurrentContext:true} Runtime:{GPURequest: ServersMemory: AgentsMemory: Labels:[]}} Env:[] Registries:{Use:[] Create:<nil> Config:}}
========================== 
DEBU[0000] ========== Merged Simple Config ==========
{TypeMeta:{Kind:Simple APIVersion:k3d.io/v1alpha3} Name: Servers:1 Agents:0 ExposeAPI:{Host: HostIP: HostPort:40055} Image:docker.io/rancher/k3s:latest Network: Subnet: ClusterToken: Volumes:[] Ports:[] Options:{K3dOptions:{Wait:true Timeout:0s DisableLoadbalancer:false DisableImageVolume:false NoRollback:false NodeHookActions:[] Loadbalancer:{ConfigOverrides:[]}} K3sOptions:{ExtraArgs:[] NodeLabels:[]} KubeconfigOptions:{UpdateDefaultKubeconfig:true SwitchCurrentContext:true} Runtime:{GPURequest: ServersMemory: AgentsMemory: Labels:[]}} Env:[] Registries:{Use:[] Create:<nil> Config:}}
========================== 
DEBU[0000] generated loadbalancer config:
ports:
  6443.tcp:
  - k3d-k3s-default-server-0
settings:
  workerConnections: 1024 
DEBU[0000] ===== Merged Cluster Config =====
&{TypeMeta:{Kind: APIVersion:} Cluster:{Name:k3s-default Network:{Name:k3d-k3s-default ID: External:false IPAM:{IPPrefix:zero IPPrefix IPsUsed:[] Managed:false} Members:[]} Token: Nodes:[0xc00017a600 0xc00017b500] InitNode:<nil> ExternalDatastore:<nil> KubeAPI:0xc0004ccbc0 ServerLoadBalancer:0xc0002ee4d0 ImageVolume:} ClusterCreateOpts:{DisableImageVolume:false WaitForServer:true Timeout:0s DisableLoadBalancer:false GPURequest: ServersMemory: AgentsMemory: NodeHooks:[] GlobalLabels:map[app:k3d] GlobalEnv:[] Registries:{Create:<nil> Use:[] Config:<nil>}} KubeconfigOpts:{UpdateDefaultKubeconfig:true SwitchCurrentContext:true}}
===== ===== ===== 
DEBU[0000] ===== Processed Cluster Config =====
&{TypeMeta:{Kind: APIVersion:} Cluster:{Name:k3s-default Network:{Name:k3d-k3s-default ID: External:false IPAM:{IPPrefix:zero IPPrefix IPsUsed:[] Managed:false} Members:[]} Token: Nodes:[0xc00017a600 0xc00017b500] InitNode:<nil> ExternalDatastore:<nil> KubeAPI:0xc0004ccbc0 ServerLoadBalancer:0xc0002ee4d0 ImageVolume:} ClusterCreateOpts:{DisableImageVolume:false WaitForServer:true Timeout:0s DisableLoadBalancer:false GPURequest: ServersMemory: AgentsMemory: NodeHooks:[] GlobalLabels:map[app:k3d] GlobalEnv:[] Registries:{Create:<nil> Use:[] Config:<nil>}} KubeconfigOpts:{UpdateDefaultKubeconfig:true SwitchCurrentContext:true}}
===== ===== ===== 
DEBU[0000] '--kubeconfig-update-default set: enabling wait-for-server 
INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-k3s-default'            
INFO[0000] Created volume 'k3d-k3s-default-images'      
INFO[0000] Starting new tools node...                   
DEBU[0000] Detected CgroupV2, enabling custom entrypoint (disable by setting K3D_FIX_CGROUPV2=false) 
DEBU[0000] Created container k3d-k3s-default-tools (ID: f8aa535ddf3de0bc7f83226382023913f86dd89008fa8c43bb77859bb937f35f) 
DEBU[0000] Node k3d-k3s-default-tools Start Time: 2021-10-09 20:07:33.260979924 +1100 AEDT m=+0.592069496 
INFO[0000] Starting Node 'k3d-k3s-default-tools'        
DEBU[0001] Truncated 2021-10-09 09:07:34.148847951 +0000 UTC to 2021-10-09 09:07:34 +0000 UTC 
INFO[0001] Creating node 'k3d-k3s-default-server-0'     
DEBU[0001] DockerHost:                                  
DEBU[0001] Created container k3d-k3s-default-server-0 (ID: cc3712aec0f1fb50c3ce9fe7539d847f2c19105c7833fdc5787f5db0e37d349c) 
DEBU[0001] Created node 'k3d-k3s-default-server-0'      
INFO[0001] Creating LoadBalancer 'k3d-k3s-default-serverlb' 
DEBU[0001] Created container k3d-k3s-default-serverlb (ID: e85d6c4d72c4e588ceb8fcb61698816d1d88fc07cd641e44a3e3403f95c0aebc) 
DEBU[0001] Created loadbalancer 'k3d-k3s-default-serverlb' 
INFO[0001] Using the k3d-tools node to gather environment information 
DEBU[0001] no netlabel present on container /k3d-k3s-default-tools 
DEBU[0001] failed to get IP for container /k3d-k3s-default-tools as we couldn't find the cluster network 
INFO[0001] HostIP: using network gateway...             
INFO[0001] Starting cluster 'k3s-default'               
INFO[0001] Starting servers...                          
DEBU[0001] ENABLING CGROUPSV2 MAGIC!!!                  
DEBU[0001] Node k3d-k3s-default-server-0 Start Time: 2021-10-09 20:07:34.501919974 +1100 AEDT m=+1.833009491 
DEBU[0001] Deleting node k3d-k3s-default-tools ...      
INFO[0002] Starting Node 'k3d-k3s-default-server-0'     
INFO[0002] Deleted k3d-k3s-default-tools                
DEBU[0002] Truncated 2021-10-09 09:07:35.549121744 +0000 UTC to 2021-10-09 09:07:35 +0000 UTC 
DEBU[0002] Waiting for node k3d-k3s-default-server-0 to get ready (Log: 'k3s is up and running') 
DEBU[0008] Finished waiting for log message 'k3s is up and running' from node 'k3d-k3s-default-server-0' 
INFO[0008] Starting agents...                           
INFO[0008] Starting helpers...                          
DEBU[0008] Node k3d-k3s-default-serverlb Start Time: 2021-10-09 20:07:41.226850924 +1100 AEDT m=+8.557940421 
INFO[0008] Starting Node 'k3d-k3s-default-serverlb'     
DEBU[0009] Truncated 2021-10-09 09:07:42.412852822 +0000 UTC to 2021-10-09 09:07:42 +0000 UTC 
DEBU[0009] Waiting for node k3d-k3s-default-serverlb to get ready (Log: 'start worker processes') 
DEBU[0015] Finished waiting for log message 'start worker processes' from node 'k3d-k3s-default-serverlb' 
INFO[0015] Injecting record '172.24.0.1 host.k3d.internal'... 
DEBU[0015] Executing command '[sh -c echo '172.24.0.1 host.k3d.internal' >> /etc/hosts]' in node 'k3d-k3s-default-server-0' 
DEBU[0015] Executing command '[sh -c echo '172.24.0.1 host.k3d.internal' >> /etc/hosts]' in node 'k3d-k3s-default-serverlb' 
DEBU[0016] Exec process in node 'k3d-k3s-default-server-0' exited with '0' 
DEBU[0016] Exec process in node 'k3d-k3s-default-serverlb' exited with '0' 
DEBU[0016] Successfully added host record "172.24.0.1 host.k3d.internal" to /etc/hosts in all nodes 
DEBU[0016] Executing command '[sh -c patch=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\shost.k3d.internal$/!p' -e '$a172.24.0.1 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$patch"]' in node 'k3d-k3s-default-server-0' 
DEBU[0018] error patching the CoreDNS ConfigMap to include entry '172.24.0.1 host.k3d.internal': Exec process in node 'k3d-k3s-default-server-0' failed with exit code '137'
Logs:  
ERRO[0018] Failed Cluster Start: failed to inject host IP: failed to inject host record "172.24.0.1 host.k3d.internal" into CoreDNS ConfigMap: Failed to patch CoreDNS ConfigMap to include entry '172.24.0.1 host.k3d.internal' (see debug logs) 
ERRO[0018] Failed to create cluster >>> Rolling Back
INFO[0018] Deleting cluster 'k3s-default'               
DEBU[0019] Cluster Details: &{Name:k3s-default Network:{Name:k3d-k3s-default ID:fae434a32d2b7af7a8729c2bc83923130157a101a00e4290c5c48e9fef9ae623 External:false IPAM:{IPPrefix:172.24.0.0/16 IPsUsed:[] Managed:false} Members:[]} Token:BQXuUzrmXHtIWzEDiRrC Nodes:[0xc00017a600 0xc00017b500] InitNode:<nil> ExternalDatastore:<nil> KubeAPI:0xc0004ccbc0 ServerLoadBalancer:0xc0002ee4d0 ImageVolume:k3d-k3s-default-images} 
DEBU[0019] Deleting node k3d-k3s-default-serverlb ...   
INFO[0019] Deleted k3d-k3s-default-serverlb             
DEBU[0019] Deleting node k3d-k3s-default-server-0 ...   
INFO[0020] Deleted k3d-k3s-default-server-0             
INFO[0020] Deleting cluster network 'k3d-k3s-default'   
INFO[0020] Deleting image volume 'k3d-k3s-default-images' 
FATA[0020] Cluster creation FAILED, all changes have been rolled back! 

If applicable, add screenshots or terminal output (code block) to help explain your problem.

Which OS & Architecture

  • linux/amd64?

Which version of k3d

k3d version v5.0.0
k3s version latest (default)

Which version of docker

Client: Docker Engine - Community
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:54:44 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:52:30 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
@jrhoward jrhoward added the bug Something isn't working label Oct 9, 2021
@jrhoward jrhoward changed the title [BUG] k3d v5 client, create a new cluster: error "Failed to patch CoreDNS ConfigMap", cli rolls back and deleted cluster [BUG] k3d v5 client, create a new cluster: error "Failed to patch CoreDNS ConfigMap", cli rolls back and deletes cluster Oct 9, 2021
@rodnymolina
Copy link

Same here (with debug on):

admin@test-1:~$ k3d --version
k3d version v5.0.0
k3s version v1.21.5-k3s1 (default)

admin@test-1:~$ docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:38 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:50 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.8
  GitCommit:        7eba5930496d9bbe375fdf71603e610ad737d2b2
 runc:
  Version:          1.0.0
  GitCommit:        v1.0.0-0-g84113ee
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
admin@test-1:~$
admin@test-1:~$ k3d cluster create mycluster --verbose
...
INFO[0029] Starting Node 'k3d-mycluster-serverlb'
DEBU[0030] Truncated 2021-10-09 22:52:06.790073703 +0000 UTC to 2021-10-09 22:52:06 +0000 UTC
DEBU[0030] Waiting for node k3d-mycluster-serverlb to get ready (Log: 'start worker processes')
DEBU[0036] Finished waiting for log message 'start worker processes' from node 'k3d-mycluster-serverlb'
INFO[0036] Injecting record '172.19.0.1 host.k3d.internal'...
DEBU[0036] Executing command '[sh -c echo '172.19.0.1 host.k3d.internal' >> /etc/hosts]' in node 'k3d-mycluster-server-0'
DEBU[0036] Executing command '[sh -c echo '172.19.0.1 host.k3d.internal' >> /etc/hosts]' in node 'k3d-mycluster-serverlb'
DEBU[0037] Exec process in node 'k3d-mycluster-server-0' exited with '0'
DEBU[0037] Exec process in node 'k3d-mycluster-serverlb' exited with '0'
DEBU[0037] Successfully added host record "172.19.0.1 host.k3d.internal" to /etc/hosts in all nodes
DEBU[0037] Executing command '[sh -c patch=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\shost.k3d.internal$/!p' -e '$a172.19.0.1 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$patch"]' in node 'k3d-mycluster-server-0'
DEBU[0039] Exec process in node 'k3d-mycluster-server-0' exited with '0'
DEBU[0039] Successfully added host record "172.19.0.1 host.k3d.internal" to the CoreDNS ConfigMap
DEBU[0039] Found network {Name:k3d-mycluster ID:fcc248a0c892005ddbe6902b4695444753e73eadb0dafdcd0d0d0645bd095e05 Created:2021-10-09 22:51:36.788745486 +0000 UTC Scope:local Driver:bridge EnableIPv6:false IPAM:{Driver:default Options:map[] Config:[{Subnet:172.19.0.0/16 IPRange: Gateway:172.19.0.1 AuxAddress:map[]}]} Internal:false Attachable:false Ingress:false ConfigFrom:{Network:} ConfigOnly:false Containers:map[e780cee6010f830bad156164a71da714a9745e7f49361b8bf99fc6cdcf9e2b4c:{Name:k3d-mycluster-server-0 EndpointID:d200936dac9eb3bcc4cb23e6aee40101de66831dc4ce3232275822abe03f212e MacAddress:02:42:ac:13:00:02 IPv4Address:172.19.0.2/16 IPv6Address:} f4cc16ea006f3b6880e3a9310a73ebc698e507f4174258b9753fd4faff590019:{Name:k3d-mycluster-serverlb EndpointID:2b6cd1b18df31d781d63cfc4d9b3ab35cd80a34563553bc6a6c198dfd36e6ed1 MacAddress:02:42:ac:13:00:03 IPv4Address:172.19.0.3/16 IPv6Address:}] Options:map[com.docker.network.bridge.enable_ip_masquerade:true] Labels:map[app:k3d] Peers:[] Services:map[]}
DEBU[0039] Adding 2 network members to coredns
DEBU[0039] Executing command '[sh -c patch=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\sk3d-mycluster-server-0$/!p' -e '$a172.19.0.2 k3d-mycluster-server-0' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$patch"]' in node 'k3d-mycluster-server-0'
DEBU[0042] error patching the CoreDNS ConfigMap to include entry '172.19.0.2 k3d-mycluster-server-0': Exec process in node 'k3d-mycluster-server-0' failed with exit code '1'
Logs: Error from server (ServiceUnavailable): the server is currently unable to handle the request (get configmaps coredns)
 rror from server (ServiceUnavailable): the server is currently unable to handle the request (get configmaps coredns)
ERRO[0042] Failed Cluster Start: failed to patch CoreDNS with network members: failed to add host entry "172.19.0.2 k3d-mycluster-server-0" into CoreDNS: Failed to patch CoreDNS ConfigMap to include entry '172.19.0.2 k3d-mycluster-server-0' (see debug logs)
ERRO[0042] Failed to create cluster >>> Rolling Back
INFO[0042] Deleting cluster 'mycluster'
DEBU[0042] Cluster Details: &{Name:mycluster Network:{Name:k3d-mycluster ID:fcc248a0c892005ddbe6902b4695444753e73eadb0dafdcd0d0d0645bd095e05 External:false IPAM:{IPPrefix:172.19.0.0/16 IPsUsed:[] Managed:false} Members:[]} Token:pOOGBBKLKdyCoYeXHwwh Nodes:[0xc00039cf00 0xc00039d080] InitNode:<nil> ExternalDatastore:<nil> KubeAPI:0xc0002b2100 ServerLoadBalancer:0xc000310fb0 ImageVolume:k3d-mycluster-images}
DEBU[0042] Deleting node k3d-mycluster-serverlb ...
INFO[0042] Deleted k3d-mycluster-serverlb
DEBU[0042] Deleting node k3d-mycluster-server-0 ...
INFO[0042] Deleted k3d-mycluster-server-0
INFO[0042] Deleting cluster network 'k3d-mycluster'
INFO[0042] Deleting image volume 'k3d-mycluster-images'
FATA[0042] Cluster creation FAILED, all changes have been rolled back!

@mbprtpmnr
Copy link

mbprtpmnr commented Oct 10, 2021

Also, less or more the same problem here...

[mbprtpmnr@linux k8s]$ k3d --version 
k3d version v5.0.0
k3s version v1.21.5-k3s1 (default)
[mbprtpmnr@linux k8s]$ docker version 
Client:
 Version:           20.10.9
 API version:       1.41
 Go version:        go1.17.1
 Git commit:        c2ea9bc90b
 Built:             Mon Oct  4 19:13:02 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.9
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.1
  Git commit:       79ea9d3080
  Built:            Mon Oct  4 19:12:03 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.5.7
  GitCommit:        8686ededfc90076914c5238eb96c883ea093a8ba.m
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

[mbprtpmnr@linux k8s]$ k3d cluster create --verbose
...
DEBU[0032] Created container k3d-k3s-default-serverlb (ID: 91b668d34f6dfd071282a23fc29e0ec30f21bf323f641d3fd05f9f5c9df9b807) 
DEBU[0032] Created loadbalancer 'k3d-k3s-default-serverlb' 
INFO[0032] Using the k3d-tools node to gather environment information 
DEBU[0032] no netlabel present on container /k3d-k3s-default-tools 
DEBU[0032] failed to get IP for container /k3d-k3s-default-tools as we couldn't find the cluster network 
INFO[0032] HostIP: using network gateway...             
INFO[0032] Starting cluster 'k3s-default'               
INFO[0032] Starting servers...                          
DEBU[0032] ENABLING CGROUPSV2 MAGIC!!!                  
DEBU[0032] Node k3d-k3s-default-server-0 Start Time: 2021-10-10 10:05:43.154910615 +0300 EEST m=+32.442576541 
DEBU[0032] Deleting node k3d-k3s-default-tools ...      
INFO[0033] Starting Node 'k3d-k3s-default-server-0'     
INFO[0034] Deleted k3d-k3s-default-tools                
DEBU[0034] Truncated 2021-10-10 07:05:45.508659117 +0000 UTC to 2021-10-10 07:05:45 +0000 UTC 
DEBU[0034] Waiting for node k3d-k3s-default-server-0 to get ready (Log: 'k3s is up and running') 
DEBU[0044] Finished waiting for log message 'k3s is up and running' from node 'k3d-k3s-default-server-0' 
INFO[0044] Starting agents...                           
INFO[0044] Starting helpers...                          
DEBU[0044] Node k3d-k3s-default-serverlb Start Time: 2021-10-10 10:05:55.462919763 +0300 EEST m=+44.750585668 
INFO[0046] Starting Node 'k3d-k3s-default-serverlb'     
DEBU[0053] Truncated 2021-10-10 07:06:03.594779097 +0000 UTC to 2021-10-10 07:06:03 +0000 UTC 
DEBU[0053] Waiting for node k3d-k3s-default-serverlb to get ready (Log: 'start worker processes') 
DEBU[0059] Finished waiting for log message 'start worker processes' from node 'k3d-k3s-default-serverlb' 
INFO[0059] Injecting record '172.18.0.1 host.k3d.internal'... 
DEBU[0059] Executing command '[sh -c echo '172.18.0.1 host.k3d.internal' >> /etc/hosts]' in node 'k3d-k3s-default-serverlb' 
DEBU[0059] Executing command '[sh -c echo '172.18.0.1 host.k3d.internal' >> /etc/hosts]' in node 'k3d-k3s-default-server-0' 
DEBU[0060] Exec process in node 'k3d-k3s-default-serverlb' exited with '0' 
DEBU[0060] Exec process in node 'k3d-k3s-default-server-0' exited with '0' 
DEBU[0060] Successfully added host record "172.18.0.1 host.k3d.internal" to /etc/hosts in all nodes 
DEBU[0060] Executing command '[sh -c patch=$(kubectl get cm coredns -n kube-system --template='{{.data.NodeHosts}}' | sed -n -E -e '/[0-9\.]{4,12}\shost.k3d.internal$/!p' -e '$a172.18.0.1 host.k3d.internal' | tr '\n' '^' | busybox xargs -0 printf '{"data": {"NodeHosts":"%s"}}'| sed -E 's%\^%\\n%g') && kubectl patch cm coredns -n kube-system -p="$patch"]' in node 'k3d-k3s-default-server-0' 
DEBU[0064] error patching the CoreDNS ConfigMap to include entry '172.18.0.1 host.k3d.internal': Exec process in node 'k3d-k3s-default-server-0' failed with exit code '137'
Logs:  
ERRO[0064] Failed Cluster Start: failed to inject host IP: failed to inject host record "172.18.0.1 host.k3d.internal" into CoreDNS ConfigMap: Failed to patch CoreDNS ConfigMap to include entry '172.18.0.1 host.k3d.internal' (see debug logs) 
ERRO[0064] Failed to create cluster >>> Rolling Back    
INFO[0064] Deleting cluster 'k3s-default'               
DEBU[0065] Cluster Details: &{Name:k3s-default Network:{Name:k3d-k3s-default ID:4cfd589a47fd433943e5ce1af5ba10dfc34279a96025ce694f63cd52962d566d External:false IPAM:{IPPrefix:172.18.0.0/16 IPsUsed:[] Managed:false} Members:[]} Token:VJbEMEgsjsKlgUsxIcTN Nodes:[0xc00016a600 0xc00016b500] InitNode:<nil> ExternalDatastore:<nil> KubeAPI:0xc0000aa640 ServerLoadBalancer:0xc0002e6260 ImageVolume:k3d-k3s-default-images} 
DEBU[0065] Deleting node k3d-k3s-default-serverlb ...   
INFO[0067] Deleted k3d-k3s-default-serverlb             
DEBU[0067] Deleting node k3d-k3s-default-server-0 ...   
INFO[0069] Deleted k3d-k3s-default-server-0             
INFO[0069] Deleting cluster network 'k3d-k3s-default'   
INFO[0069] Deleting image volume 'k3d-k3s-default-images' 
FATA[0069] Cluster creation FAILED, all changes have been rolled back! 

@iwilltry42 iwilltry42 self-assigned this Oct 10, 2021
@iwilltry42 iwilltry42 added this to the v5.0.1 milestone Oct 10, 2021
@iwilltry42
Copy link
Member

Hi @jrhoward , thanks for opening this issue!
It's directly related to #772, so we'll be following up over there to keep discussions/information in a single thread.
Feel free to add your insights over there 👍

@shiddy
Copy link

shiddy commented Feb 18, 2022

I believe that I am hitting the same error with

k3d version v5.3.0
k3s version v1.22.6-k3s1 (default)

running in a rancheros install on x86

when I run k3d cluster create --network host I get the following error

INFO[0000] [SimpleConfig] Hostnetwork selected - disabling injection of docker host into the cluster, server load balancer and setting the api port to the k3s default
INFO[0000] Prep: Network
INFO[0000] Re-using existing network 'host' (0c59f1592a52c9c68a5b8fbfa9dc43cb23efb9116b9029c3bd38ce335286dabd)
INFO[0000] Created image volume k3d-k3s-default-images
INFO[0000] Starting new tools node...
ERRO[0000] Failed to run tools container for cluster 'k3s-default'
INFO[0001] Creating node 'k3d-k3s-default-server-0'
INFO[0001] Using the k3d-tools node to gather environment information
INFO[0001] Starting new tools node...
ERRO[0001] Failed to run tools container for cluster 'k3s-default'
ERRO[0001] failed to gather environment information used for cluster creation: failed to run k3d-tools node for cluster 'k3s-default': failed to create node 'k3d-k3s-default-tools': runtime failed to create node 'k3d-k3s-default-tools': failed to create container for node 'k3d-k3s-default-tools': docker failed to create container 'k3d-k3s-default-tools': Error response from daemon: invalid IP address in add-host: "host-gateway"
ERRO[0001] Failed to create cluster >>> Rolling Back
INFO[0001] Deleting cluster 'k3s-default'
INFO[0001] Deleting 2 attached volumes...
WARN[0001] Failed to delete volume 'k3d-k3s-default-images' of cluster 'failed to find volume 'k3d-k3s-default-images': Error: No such volume: k3d-k3s-default-images': k3s-default -> Try to delete it manually
FATA[0001] Cluster creation FAILED, all changes have been rolled back!

Happy to run with --trace but I don't want to put a massive log here and bog up the report. lmk if that's desired

@iwilltry42
Copy link
Member

@shiddy your error looks familiar. Can you please check the docker version on your host and try to upgrade it?

@shiddy
Copy link

shiddy commented Feb 18, 2022

sure thing

before attempting an upgrade:

docker --version
Docker version 19.03.11, build 42e35e61f3

using this doc:
https://rancher.com/docs/rancher/v2.6/en/installation/resources/installing-docker/

I attempted to get another docker engine version installed: (20.10)
link here: https://releases.rancher.com/install-docker/20.10.sh

but it does not appear that docker 20.10 is available for my OS from the output:

rancher@rancher$ curl https://releases.rancher.com/install-docker/20.10.sh > deinstall.sh
rancher@rancher$ /bin/bash deinstall.sh
# Executing docker install script, commit: 7cae5f8b0decc17d6571f9f52eb840fbc13b2737
Warning: the "docker" command appears to already exist on this system.

If you already have Docker installed, this script can cause trouble, which is
why we're displaying this warning and provide the opportunity to cancel the
installation.

If you installed the current Docker package using this script and are using it
again to update Docker, you can safely ignore this message.

You may press Ctrl+C now to abort this script.
+ sleep 20
+ sudo -E sh -c 'sleep 3;ros engine list --update'
disabled docker-1.12.6
disabled docker-1.13.1
disabled docker-17.03.1-ce
disabled docker-17.03.2-ce
disabled docker-17.06.1-ce
disabled docker-17.06.2-ce
disabled docker-17.09.0-ce
disabled docker-17.09.1-ce
disabled docker-17.12.0-ce
disabled docker-17.12.1-ce
disabled docker-18.03.0-ce
disabled docker-18.03.1-ce
disabled docker-18.06.0-ce
disabled docker-18.06.1-ce
disabled docker-18.06.2-ce
disabled docker-18.06.3-ce
disabled docker-18.09.0
disabled docker-18.09.1
disabled docker-18.09.2
disabled docker-18.09.3
disabled docker-18.09.4
disabled docker-18.09.5
disabled docker-18.09.6
disabled docker-18.09.7
disabled docker-18.09.8
disabled docker-18.09.9
disabled docker-19.03.0
disabled docker-19.03.1
current  docker-19.03.11
disabled docker-19.03.2
disabled docker-19.03.3
disabled docker-19.03.4
disabled docker-19.03.5
disabled docker-19.03.7
disabled docker-19.03.8
disabled docker-19.03.9
++ sudo ros engine list
++ awk '{print $2}'
++ grep 19.03.11
++ tail -n 1
+ engine_version=docker-19.03.11
+ '[' docker-19.03.11 '!=' '' ']'
+ sudo -E sh -c 'ros engine switch -f docker-19.03.11'
INFO[0001] Project [os]: Starting project
INFO[0002] [0/19] [docker]: Starting
INFO[0002] Recreating docker
INFO[0002] [1/19] [docker]: Started
INFO[0002] Project [os]: Project started

I'm not sure if docker 19 and k3d are supported I think that this may be my problem, give me another hour or so to get more context if I can force docker to be updated to docker 20

@shiddy
Copy link

shiddy commented Feb 18, 2022

After looking through the docs it appears to be a non-trivial change to get rancheros working with Docker Engine 20, I'll attempt an install with another linux version that supports this change and I'll create another issue if it persists. Otherwise this ticket can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority/high
Projects
None yet
Development

No branches or pull requests

5 participants