[BUG] Running out of disk space during upgrade from v0.6 and v0.7 where the default disks are 32GB #2161

przemyslavic · 2021-03-26T15:35:37Z

Describe the bug
In versions 0.6 and 0.7 there was no repository machine and epirepo was installed on kubernetes master vm which was 32 GB by default. When upgrading to develop, you are running out of disk space, causing Docker to randomly delete some images to free up space, which in turn results in the upgrade process being aborted with an error when trying to tag images that no longer exist.
Looks like we need to clean old and unnecessary images and packages to free up disk space before downloading new requirements. Otherwise, we will have to extend the disk to perform upgrade, which will not be an easy solution.

How to reproduce
Steps to reproduce the behavior:

Deploy a 0.6 (or 0.7) cluster with kubernetes master component enabled - execute epicli apply from v0.6/v0.7 branch
Upgrade the cluster to the develop branch - execute epicli upgrade from develop branch

Expected behavior
The cluster has been successfully upgraded.

Environment

Cloud provider: [all]
OS: [all]

epicli version: [epicli --version]

Additional context

2021-03-25T17:28:16.6926428Z[38;21m17:28:16 INFO cli.engine.ansible.AnsibleCommand - TASK [image_registry : Tag k8s.gcr.io/kube-scheduler:v1.15.10 image with ec2-xx-xx-xx-xx.eu-west-3.compute.amazonaws.com:5000/k8s.gcr.io/kube-scheduler:v1.15.10] ***[0m
2021-03-25T17:28:18.2723196Z[31;21m17:28:18 ERROR cli.engine.ansible.AnsibleCommand - fatal: [ec2-xx-xx-xx-xx.eu-west-3.compute.amazonaws.com]: FAILED! => {"changed": true, "cmd": ["docker", "tag", "k8s.gcr.io/kube-scheduler:v1.15.10", "ec2-xx-xx-xx-xx.eu-west-3.compute.amazonaws.com:5000/k8s.gcr.io/kube-scheduler:v1.15.10"], "delta": "0:00:00.057406", "end": "2021-03-25 17:28:18.042186", "msg": "non-zero return code", "rc": 1, "start": "2021-03-25 17:28:17.984780", "stderr": "Error response from daemon: No such image: k8s.gcr.io/kube-scheduler:v1.15.10", "stderr_lines": ["Error response from daemon: No such image: k8s.gcr.io/kube-scheduler:v1.15.10"], "stdout": "", "stdout_lines": []}[0m

[ec2-user@ec2-xx-xx-xx-xx ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        1.8G     0  1.8G   0% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           1.9G   18M  1.9G   1% /run
tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/nvme0n1p2   30G   25G  5.5G  82% /
tmpfs           373M     0  373M   0% /run/user/1000

DoD checklist

Changelog updated (if affected version was released)
COMPONENTS.md updated / doesn't need to be updated
Automated tests passed (QA pipelines)
- apply
- upgrade
Case covered by automated test (if possible)
Idempotency tested
Documentation updated / doesn't need to be updated
All conversations in PR resolved

The text was updated successfully, but these errors were encountered:

przemyslavic · 2021-03-30T15:05:23Z

The fix works in the sense that it removes unnecessary files, packages and images. For some older clusters where the repository disks are 32 GB it is insufficient (os disks have to be extended there) but it brings additional improvements anyway, so it's worth applying.

przemyslavic added type/bug status/grooming-needed priority/high Task with high priority area/repository labels Mar 26, 2021

przemyslavic added this to the S20210408 milestone Mar 26, 2021

to-bar self-assigned this Mar 26, 2021

mkyc removed the status/grooming-needed label Mar 30, 2021

przemyslavic self-assigned this Mar 30, 2021

mkyc closed this as completed Apr 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Running out of disk space during upgrade from v0.6 and v0.7 where the default disks are 32GB #2161

[BUG] Running out of disk space during upgrade from v0.6 and v0.7 where the default disks are 32GB #2161

przemyslavic commented Mar 26, 2021 •

edited

Loading

przemyslavic commented Mar 30, 2021

[BUG] Running out of disk space during upgrade from v0.6 and v0.7 where the default disks are 32GB #2161

[BUG] Running out of disk space during upgrade from v0.6 and v0.7 where the default disks are 32GB #2161

Comments

przemyslavic commented Mar 26, 2021 • edited Loading

przemyslavic commented Mar 30, 2021

przemyslavic commented Mar 26, 2021 •

edited

Loading