Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K3s Upgrade Plan, 5 of 6 nodes upgraded, one node was terminated with badrequest #200

Open
braucktoon opened this issue Apr 24, 2022 · 0 comments

Comments

@braucktoon
Copy link

Version
v0.9.1

Platform/Architecture
pi@dairy:~ $ uname -a
Linux dairy 5.15.32-v8+ #1538 SMP PREEMPT Thu Mar 31 19:40:39 BST 2022 aarch64 GNU/Linux

Describe the bug
5 out of 6 nodes upgraded to latest K3S successfully, one node was terminated with BadRequest

To Reproduce
Run the OOB plan to upgrade to latest K3S. I also saw all 3 server nodes were upgraded first then the worker nodes were then processed but one failed.

Expected behavior
All 6 nodes were upgraded successfully

Actual behavior
5 of 6 were upgraded

Additional context
Davids-iMac:K3s$ k get nodes
NAME STATUS ROLES AGE VERSION
dairy Ready,SchedulingDisabled 61d v1.22.7+k3s1
gail Ready 55d v1.23.5+k3s1
glenn Ready 61d v1.23.5+k3s1
katy-kat Ready control-plane,etcd,master 61d v1.23.5+k3s1
squirrelly-dan Ready control-plane,etcd,master 61d v1.23.5+k3s1
wayne Ready control-plane,etcd,master 61d v1.23.5+k3s1
Davids-iMac:K3s$ k get pods,jobs -n system-upgrade
NAME READY STATUS RESTARTS AGE
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-2sbvm 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-462hs 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-7jkgd 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-7pr27 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-c59jp 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-cq767 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-gsc85 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-kj9lm 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-ntvt8 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-plmtr 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-r6dvh 0/1 Init:1/2 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-r9vm7 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-vs4zd 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-wnlck 0/1 Init:Error 0 15h
pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-xvtkz 0/1 Init:Error 0 15h
pod/system-upgrade-controller-8677c8fb4-62cr7 1/1 Running 0 14d

NAME COMPLETIONS DURATION AGE
job.batch/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 0/1 15h 15h

Davids-iMac:K3s$ k describe job.batch/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 -n system-upgrade
Name: apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793
Namespace: system-upgrade
Selector: controller-uid=728cbed1-ce28-46ef-be2c-8a640164a121
Labels: objectset.rio.cattle.io/hash=80f52c1aa7257a6b5bd08982446fceff8c1a2394
plan.upgrade.cattle.io/k3s-agent=f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc
upgrade.cattle.io/controller=system-upgrade-controller
upgrade.cattle.io/node=dairy
upgrade.cattle.io/plan=k3s-agent
upgrade.cattle.io/version=v1.23.5-k3s1
Annotations: batch.kubernetes.io/job-tracking:
objectset.rio.cattle.io/applied:
H4sIAAAAAAAA/+xY227bOBN+lf/nteTKiXOQgb3wxu7WaOMYddpFUQQBTY5srilSS47sGIbffTGUfGoSN+3uRS+CALFIcQ6c+b7hUCuWA3LJkbP2inFjLHJU1nga2vFfINADNpyyDc...
objectset.rio.cattle.io/id: system-upgrade-controller
objectset.rio.cattle.io/owner-gvk: upgrade.cattle.io/v1, Kind=Plan
objectset.rio.cattle.io/owner-name: k3s-agent
objectset.rio.cattle.io/owner-namespace: system-upgrade
upgrade.cattle.io/ttl-seconds-after-finished: 900
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Sat, 23 Apr 2022 17:34:53 -0400
Active Deadline Seconds: 900s
Pods Statuses: 1 Active / 0 Succeeded / 14 Failed
Pod Template:
Labels: controller-uid=728cbed1-ce28-46ef-be2c-8a640164a121
job-name=apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793
plan.upgrade.cattle.io/k3s-agent=f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc
upgrade.cattle.io/controller=system-upgrade-controller
upgrade.cattle.io/node=dairy
upgrade.cattle.io/plan=k3s-agent
upgrade.cattle.io/version=v1.23.5-k3s1
Service Account: system-upgrade
Init Containers:
prepare:
Image: rancher/k3s-upgrade:v1.23.5-k3s1
Port:
Host Port:
Args:
prepare
k3s-server
Environment:
SYSTEM_UPGRADE_NODE_NAME: (v1:spec.nodeName)
SYSTEM_UPGRADE_POD_NAME: (v1:metadata.name)
SYSTEM_UPGRADE_POD_UID: (v1:metadata.uid)
SYSTEM_UPGRADE_PLAN_NAME: k3s-agent
SYSTEM_UPGRADE_PLAN_LATEST_HASH: f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc
SYSTEM_UPGRADE_PLAN_LATEST_VERSION: v1.23.5-k3s1
Mounts:
/host from host-root (rw)
/run/system-upgrade/pod from pod-info (ro)
drain:
Image: rancher/kubectl:v1.21.9
Port:
Host Port:
Args:
drain
dairy
--pod-selector
!upgrade.cattle.io/controller
--ignore-daemonsets
--delete-local-data
--force
--skip-wait-for-delete-timeout
60
Environment:
SYSTEM_UPGRADE_NODE_NAME: (v1:spec.nodeName)
SYSTEM_UPGRADE_POD_NAME: (v1:metadata.name)
SYSTEM_UPGRADE_POD_UID: (v1:metadata.uid)
SYSTEM_UPGRADE_PLAN_NAME: k3s-agent
SYSTEM_UPGRADE_PLAN_LATEST_HASH: f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc
SYSTEM_UPGRADE_PLAN_LATEST_VERSION: v1.23.5-k3s1
Mounts:
/host from host-root (rw)
/run/system-upgrade/pod from pod-info (ro)
Containers:
upgrade:
Image: rancher/k3s-upgrade:v1.23.5-k3s1
Port:
Host Port:
Environment:
SYSTEM_UPGRADE_NODE_NAME: (v1:spec.nodeName)
SYSTEM_UPGRADE_POD_NAME: (v1:metadata.name)
SYSTEM_UPGRADE_POD_UID: (v1:metadata.uid)
SYSTEM_UPGRADE_PLAN_NAME: k3s-agent
SYSTEM_UPGRADE_PLAN_LATEST_HASH: f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc
SYSTEM_UPGRADE_PLAN_LATEST_VERSION: v1.23.5-k3s1
Mounts:
/host from host-root (rw)
/run/system-upgrade/pod from pod-info (ro)
Volumes:
host-root:
Type: HostPath (bare host directory volume)
Path: /
HostPathType: Directory
pod-info:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
Events:

Davids-iMac:K3s$ k logs job.batch/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 -n system-upgrade
Found 15 pods, using pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-2sbvm
Error from server (BadRequest): container "upgrade" in pod "apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-2sbvm" is terminated

Davids-iMac:K3s$ k logs pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-r6dvh -n system-upgrade
Error from server (BadRequest): container "upgrade" in pod "apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-r6dvh" is waiting to start: PodInitializing

Plan:

These plans are adapted from work by Dax McDonald (https://github.com/daxmc99) and Hussein Galal (https://github.com/galal-hussein)

in support of Rancher v2 managed k3s upgrades. See Also: https://rancher.com/docs/k3s/latest/en/upgrades/automated/


apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: k3s-server
namespace: system-upgrade
labels:
k3s-upgrade: server
spec:
concurrency: 1 # Batch size (roughly maps to maximum number of unschedulable nodes)
version: v1.23.5+k3s1
nodeSelector:
matchExpressions:
- {key: k3s-upgrade, operator: Exists}
- {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]}
- {key: k3os.io/mode, operator: DoesNotExist}
- {key: node-role.kubernetes.io/control-plane, operator: Exists}
serviceAccountName: system-upgrade
tolerations:

  • key: "node-role.kubernetes.io/master"
    operator: "Exists"
    cordon: true
    upgrade:
    image: rancher/k3s-upgrade

apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: k3s-agent
namespace: system-upgrade
labels:
k3s-upgrade: agent
spec:
concurrency: 1 # Batch size (roughly maps to maximum number of unschedulable nodes)
version: v1.23.5+k3s1
nodeSelector:
matchExpressions:
- {key: k3s-upgrade, operator: Exists}
- {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]}
- {key: k3os.io/mode, operator: DoesNotExist}
- {key: node-role.kubernetes.io/control-plane, operator: DoesNotExist}
serviceAccountName: system-upgrade
prepare:
# Defaults to the same "resolved" tag that is used for the upgrade container, NOT latest
image: rancher/k3s-upgrade
args: ["prepare", "k3s-server"]
drain:
force: true
skipWaitForDeleteTimeout: 60 # 1.18+ (honor pod disruption budgets up to 60 seconds per pod then moves on)
upgrade:
image: rancher/k3s-upgrade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant