Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] v5.0.0 Sporadic cluster creation failures when trying to inject host IP #772

Closed
josephprem opened this issue Oct 5, 2021 · 8 comments
Assignees
Labels
bug Something isn't working priority/high
Milestone

Comments

@josephprem
Copy link

josephprem commented Oct 5, 2021

Related Issues

Original Issue

Cluster creation fails when trying to patch core-dns configmap.

I am not sure how async the deployment of core-dns is. The core-dns cm may not be available when the function to patch the configmap is called.

Moreover, in v5.0.0 --no-host-ip (disableHostIPInjection) is removed and hence there is no consistent way to workaround this issue

What did you do

  • How was the cluster created?
    • k3d cluster create --config config.yaml
      config yaml
---
kind: Simple
apiVersion: k3d.io/v1alpha3
name: mydemo
servers: 1
agents: 0
volumes:
- volume: /.workrc/clusters/mydemo/data:/data
options:
 k3d:
   wait: true
   timeout: 2m0s
   disableLoadbalancer: true
   disableImageVolume: false
   disableRollback: true
 k3s:
   extraArgs:
   - arg: --service-cidr=10.96.0.0/12
     nodeFilters:
     - server:*
   - arg: --cluster-cidr=10.244.0.0/16
     nodeFilters:
     - server:*
   - arg: --disable=servicelb
     nodeFilters:
     - server:*
   - arg: --disable=traefik
     nodeFilters:
     - server:*
   - arg: --disable-network-policy
     nodeFilters:
     - server:*

What did you expect to happen

Expect to cluster creation to be successful

Screenshots or terminal output

[INFO] Preparing a recipe. Hope it comes out well !!!
[INFO] Generating k3d config file
[INFO] Generating inventory
[INFO] Generating kast vars
[INFO] Recipe inputs are available at /home/prem/.workrc/clusters/mydemo/config
INFO[0000] Using config file /opt/.workrc/clusters/mydemo/config/k3d_config.yml (k3d.io/v1alpha3#simple) 
WARN[0000] No node filter specified                     
WARN[0000] Failed to stat file/directory/named volume that you're trying to mount: '/home/prem/.workrc/clusters/mydemo/data' in '/home/prem/.workrc/clusters/mydemo/data:/data' -> Please make sure it exists 
INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-mydemo'                 
INFO[0000] Created volume 'k3d-mydemo-images'           
INFO[0000] Starting new tools node...                   
INFO[0000] Starting Node 'k3d-mydemo-tools'             
INFO[0001] Creating node 'k3d-mydemo-server-0'          
INFO[0001] Using the k3d-tools node to gather environment information 
INFO[0001] HostIP: using network gateway...             
INFO[0001] Starting cluster 'mydemo'                    
INFO[0001] Starting servers...                          
INFO[0001] Starting Node 'k3d-mydemo-server-0'          
INFO[0001] Deleted k3d-mydemo-tools                     
INFO[0006] Starting agents...                           
INFO[0006] Starting helpers...                          
INFO[0006] Injecting record '172.27.0.1 host.k3d.internal'... 
ERRO[0008] Failed Cluster Start: failed to inject host IP: failed to inject host record "172.27.0.1 host.k3d.internal" into CoreDNS ConfigMap: Failed to patch CoreDNS ConfigMap to include entry '172.27.0.1 host.k3d.internal' (see debug logs) 
FATA[0008] Cluster creation FAILED, rollback deactivated. 

Which OS & Architecture

  • Linux, Ubuntu 20.4

Which version of k3d

$ k3d version
k3d version v5.0.0
k3s version v1.21.5-k3s1 (default)

Which version of docker

  • output of docker version and docker info
docker version
Client: Docker Engine - Community
Version:           20.10.9
API version:       1.41
Go version:        go1.16.8
Git commit:        c2ea9bc
Built:             Mon Oct  4 16:08:29 2021
OS/Arch:           linux/amd64
Context:           default
Experimental:      true

Server: Docker Engine - Community
Engine:
 Version:          20.10.9
 API version:      1.41 (minimum version 1.12)
 Go version:       go1.16.8
 Git commit:       79ea9d3
 Built:            Mon Oct  4 16:06:37 2021
 OS/Arch:          linux/amd64
 Experimental:     false
containerd:
 Version:          1.4.11
 GitCommit:        5b46e404f6b9f661a205e28d59c982d3634148f8
runc:
 Version:          1.0.2
 GitCommit:        v1.0.2-0-g52b36a2
docker-init:
 Version:          0.19.0
 GitCommit:        de40ad0
$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Containers: 9
  Running: 1
  Paused: 0
  Stopped: 8
 Images: 151
 Server Version: 20.10.9
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.11.0-37-generic
 Operating System: Ubuntu 20.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 31GiB
 Name: prem-Precision-5540
 ID: R7EG:UM2L:ZE4N:OMKT:BMUU:D4PU:4YLG:DMLT:C6KT:XD4J:WGWY:2WDR
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
@josephprem josephprem added the bug Something isn't working label Oct 5, 2021
bors bot pushed a commit to infinyon/fluvio that referenced this issue Oct 5, 2021
Install version 4.X.X for now, since 5.0.0 new release removed the --no-hostip flag and it seems to be failing in the CI with an error similar to k3d-io/k3d#772
@josephprem
Copy link
Author

josephprem commented Oct 6, 2021

I guess when a cluster is created with agents, it buys some time for agents to join and patch cm works
for the moment this is my code post cluster create:

function _patch_coredns() {
	local _br_gateway_ip=$(docker network inspect k3d-$CLUSTER_NAME --format '{{range .IPAM.Config}}{{.Gateway}}{{end}}')
	owarn "Wait for core-dns configmap to be available before trying to patch"
	until kubectl -n kube-system get cm coredns &>/dev/null
	do
	    sleep 2
	done
   	kubectl -n kube-system get cm coredns -o yaml > coredns.yaml
   	if grep -q "NodeHosts" "coredns.yaml" && grep -q ".*${k3d-*-server-0}" "coredns.yaml"; then
       	    owarn "Patching core-dns configmap to include '${_br_gateway_ip} host.k3d.internal'"
	    cat coredns.yaml | sed -e "s/NodeHosts: |/NodeHosts: |\n    ${_br_gateway_ip} host.k3d.internal/g" > coredns-patched.yaml
	    kubectl -n kube-system apply -f coredns-patched.yaml 2>/dev/null
       	    rm coredns.yaml coredns-patched.yaml
	else
	    log "core-dns configmap already patched"
	fi
	return 0
}

@benjaminjb
Copy link
Contributor

benjaminjb commented Oct 7, 2021

I'm hitting this also, as long as I set disableloadbalancer to true / pass in --no-lb

(And trying to debug it, I run into this: #750 )

@iwilltry42 iwilltry42 self-assigned this Oct 7, 2021
@iwilltry42 iwilltry42 added this to the v5.0.1 milestone Oct 7, 2021
@iwilltry42
Copy link
Member

@josephprem , thanks for opening this issue!
Can you please add some logs with the --trace flag added to the create command?
As @benjaminjb mentioned (and also provided the fix for it), please use https://github.com/rancher/k3d/releases/tag/v5.0.1-rc.0 for this so it doesn't break.

@josephprem
Copy link
Author

Hi @iwilltry42 running with --trace or --verbose crashes unfortunately with a seg fault . Will try v5.0.1-rc.0 and capture the logs

@iwilltry42
Copy link
Member

Hey there, please also give https://github.com/rancher/k3d/releases/tag/v5.0.1-rc.1 a try.
I'd love to get some feedback on the changes. (also applies to #779, tagging @jrhoward, @rodnymolina & @mbprtpmnr here)

@jrhoward
Copy link

jrhoward commented Oct 11, 2021

Using the same OS/Arch it worked perfectly. I repeated it a couple of times with no issues

Version:

k3d-linux-amd64 version
k3d version v5.0.1-rc.1
k3s version v1.21.5-k3s2 (default)

Output:

k3d-linux-amd64 cluster create
INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-k3s-default'            
INFO[0000] Created volume 'k3d-k3s-default-images'      
INFO[0000] Starting new tools node...                   
INFO[0000] Starting Node 'k3d-k3s-default-tools'        
INFO[0001] Creating node 'k3d-k3s-default-server-0'     
INFO[0001] Creating LoadBalancer 'k3d-k3s-default-serverlb' 
INFO[0001] Using the k3d-tools node to gather environment information 
INFO[0001] HostIP: using network gateway...             
INFO[0001] Starting cluster 'k3s-default'               
INFO[0001] Starting servers...                          
INFO[0001] Starting Node 'k3d-k3s-default-server-0'     
INFO[0002] Deleted k3d-k3s-default-tools                
INFO[0008] Starting agents...                           
INFO[0008] Starting helpers...                          
INFO[0009] Starting Node 'k3d-k3s-default-serverlb'     
INFO[0015] Injecting record for host.k3d.internal into CoreDNS configmap... 
INFO[0015] Injecting '172.21.0.1 host.k3d.internal' into /etc/hosts of all nodes... 
INFO[0023] Cluster 'k3s-default' created successfully!  
INFO[0023] You can now use it like this:                


kubectl cluster-info
Kubernetes control plane is running at https://0.0.0.0:40633
CoreDNS is running at https://0.0.0.0:40633/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://0.0.0.0:40633/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy

@iwilltry42 looking good!!

@iwilltry42
Copy link
Member

I hope that others experience the same improvement as @jrhoward , so I'll go ahead and close this issue for now.
I'll push the v5.0.1 release in a minute 👍

@josephprem
Copy link
Author

Thanks a lot @iwilltry42 was busy lately. just tried today and confirm all my five tries worked without issues

# run 1
INFO[0015] Injecting record for host.k3d.internal into CoreDNS configmap... 
INFO[0015] Injecting '172.18.0.1 host.k3d.internal' into /etc/hosts of all nodes... 
# run 2
INFO[0006] Injecting record for host.k3d.internal into CoreDNS configmap... 
INFO[0006] Injecting '172.19.0.1 host.k3d.internal' into /etc/hosts of all nodes... 
# run 3
INFO[0006] Injecting record for host.k3d.internal into CoreDNS configmap... 
INFO[0006] Injecting '172.20.0.1 host.k3d.internal' into /etc/hosts of all nodes...
# run 4
INFO[0006] Injecting record for host.k3d.internal into CoreDNS configmap... 
INFO[0006] Injecting '172.21.0.1 host.k3d.internal' into /etc/hosts of all nodes... 
# run 5
INFO[0006] Injecting record for host.k3d.internal into CoreDNS configmap... 
INFO[0006] Injecting '172.22.0.1 host.k3d.internal' into /etc/hosts of all nodes...

@iwilltry42 iwilltry42 unpinned this issue Oct 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority/high
Projects
None yet
Development

No branches or pull requests

4 participants