-
Notifications
You must be signed in to change notification settings - Fork 0
503 when node is down
To test MinIO when Node is down, we shouldn't get 503 in a put operation when node is down. So we are going to reproduce the failure in version RELEASE.2023-01-02T09-40-09Z
- Deploy tenant:
createcluster
installoperator
installtenant
k apply -f ~/ubuntu.yaml -n tenant-lite
-
Change image to
RELEASE.2023-01-02T09-40-09Z
in the tenant spec -
Make sure you are using
RELEASE.2023-01-02T09-40-09Z
by looking at the pod image. Normally I delete the statefulset to get new servers with proper image. -
In Ubuntu pod put mc and perform some puts, they should work, also register the cluster:
apt update
apt install -y wget
apt install -y curl
apt install -y vim
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
mv mc /usr/local/bin
mc alias set monmar1323508pm/ https://minio.tenant-lite.svc.cluster.local console console123
touch a.txt
echo "a" > a.txt
mc mb monmar1323508pm/bucket
mc cp a.txt monmar1323508pm/bucket
mc license register --api-key <token> monmar1323508pm
Expected to see this working:
root@ubuntu:/# mc cp a.txt myminio/bucket
/a.txt: 2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 59 B/s 0sroot@ubuntu:/#
- Then cordon one MinIO Node via Lens but select a node where ubuntu pod is not running so that you can still use ubuntu pod for experimenting:
In terminal:
Cesars-MacBook-Pro:~ cniackz$ kubectl cordon kind-worker
node/kind-worker cordoned
- Verify one MinIO Server is down due to missing node:
0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 4 node(s) had volume node affinity conflict. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
- From Ubuntu pod, try doing the put again.
root@ubuntu:/# mc cp a.txt myminio/bucket
/a.txt: 2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt: 2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt: 2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt: 2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt: 2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt: 2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt: 2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt: 2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
- As a result, I can't put files is getting stuck, but am I getting 503?..., well I am getting this:
root@ubuntu:/# mc admin trace myminio
2023-03-13T16:53:26.366 [200 OK] s3.GetBucketLocation storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/bucket/?location= 10.244.4.9 771µs ↑ 77 B ↓ 128 B
2023-03-13T16:53:26.376 [200 OK] s3.HeadBucket storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/bucket/ 10.244.4.9 879µs ↑ 77 B ↓ 0 B
2023-03-13T16:53:26.380 [200 OK] s3.HeadBucket storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/bucket/ 10.244.4.9 594µs ↑ 77 B ↓ 0 B
2023-03-13T16:53:26.387 [404 Not Found] s3.GetBucketObjectLockConfig storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/bucket/?object-lock= 10.244.4.9 172µs ↑ 77 B ↓ 330 B
2023-03-13T16:53:26.393 [200 OK] s3.HeadBucket storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/bucket/ 10.244.4.9
-
But I am not getting the 503, how can I reproduce it then?...
-
By the way I cannot put but I still can read:
root@ubuntu:/# mc ls myminio/bucket
[2023-03-13 17:03:48 UTC] 2B STANDARD a.txt
- In affected versions, you will see the lock to those files you are trying to cp while node is down:
root@ubuntu:/# mc support top locks monmar1323508pm2
Time Type Resource
12 minutes WRITE .minio.sys/leader.lock
1 minutes WRITE bucket/e.txt
1 minutes WRITE .minio.sys/buckets/bucket/.usage-cache.bin
1 minutes WRITE bucket/c.txt
38 seconds WRITE bucket/a.txt <-------------------------------------- A lock here
30 seconds WRITE .minio.sys/buckets/.usage-cache.bin
- Uncordon the node and repeat the cp
$ kubectl uncordon kind-worker
As a result
a. The cp will fail due to the lock b. If using February version, there will be no lock and cp will go thru.
root@ubuntu:/# mc cp a.txt monmar1323508pm2/bucket/z.txt
/a.txt: 2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47 B/s 0sroot@ubuntu:/#
Newer versions, will not create the lock anymore on those files you copy during the cordon allowing cp to work as expected and not getting any stuck or 503.
- AR Cesar: Tomorrow Tue Mar 14 I will talk to Kannappan, expose these scenarios manually tested and then test will be created accordingly.