-
Notifications
You must be signed in to change notification settings - Fork 0
minio operator Look into re adding readiness probe into Operator #741
https://github.com/miniohq/engineering/issues/741
- Recreate kind cluster
kind delete cluster
cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 30080
hostPort: 30080
listenAddress: "127.0.0.1"
protocol: TCP
- containerPort: 80
hostPort: 10080
protocol: TCP
- containerPort: 443
hostPort: 10443
protocol: TCP
- containerPort: 9000
hostPort: 19000
protocol: TCP
- containerPort: 9090
hostPort: 19090
protocol: TCP
- containerPort: 9443
hostPort: 19443
protocol: TCP
- role: worker
- role: worker
- role: worker
- role: worker
EOF
- Change console service: spec.type is NodePort spec.ports[0].nodePort is 30080
spec:
ports:
- name: http
port: 9090
nodePort: 30080
- name: https
port: 9443
nodePort: 30965
selector:
app: console
type: NodePort
-
Comment out destroy_kind from testing/deploy-tenant.sh
-
Install operator only
(cd "testing/.." && TAG=minio/operator:noop make docker) # will not change your shell's current directory
kind load docker-image minio/operator:noop
kubectl apply -k "testing/../testing/dev"
kubectl wait --namespace minio-operator \
--for=condition=ready pod \
--selector name=minio-operator \
--timeout=120s
- Get JWT Login http://localhost:30080/login with token
k -n minio-operator get secret console-sa-secret -o jsonpath="{.data.token}" | base64 -d | pbcopy
-
In UI, Setup tenant "storage-lite" under namespace "tenant-lite" WITHOUT TLS and no CPU nor memory default It is convenient to save access details e.g.Access Key: g3mHo4JOYTZOxt9d Secret Key: 8htxt5nEfzMPeGw8pTYZIlhXpS0jvuSb
-
Change image and readiness probe of tenant:
spec:
image: docker.io/allanrogerreid/minio:latest
## Specification for MinIO Pool(s) in this Tenant.
readiness:
httpGet:
port: 9000
path: /minio/health/ready
scheme: HTTPS
initialDelaySeconds: 5
periodSeconds: 1
- Obtain login details to tenant console and login to http://localhost:9090/login, if needed or possible. If not continue to next step
export TENANT_CONFIG_SECRET=$(kubectl -n tenant-lite get tenants storage-lite -o jsonpath="{.spec.configuration.name}")
export USER=$(kubectl -n tenant-lite get secrets "$TENANT_CONFIG_SECRET" -o go-template='{{index .data "config.env"|base64decode }}' | grep 'export MINIO_ROOT_USER="' | sed -e 's/export MINIO_ROOT_USER="//g' | sed -e 's/"//g')
PASSWORD=$(kubectl -n tenant-lite get secrets "$TENANT_CONFIG_SECRET" -o go-template='{{index .data "config.env"|base64decode }}' | grep 'export MINIO_ROOT_PASSWORD="' | sed -e 's/export MINIO_ROOT_PASSWORD="//g' | sed -e 's/"//g')
Create port-forward to tenant hl service
kubectl port-forward svc/storage-lite-hl -n tenant-lite 9000:9000
Create alias and bucket with appropriate access level
sudo mc alias set minios3 http://localhost:9000 $USER $PASSWORD --insecure
sudo mc mb minios3/data
sudo mc admin policy remove minios3 readwrite
sudo mc anonymous list minios3/data
sudo mc anonymous set public minios3/data
sudo mc cp /Users/allanreid/Documents/MinIO/demo/pics/42489388690_1f48eec1cc_o.jpg minios3/data/42489388690_1f48eec1cc_o.jpg --insecure
Validate http e.g. http://localhost:9000/data/42489388690_1f48eec1cc_o.jpg
- Deploy an Ingress controller
#Ingress NGINX
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
#Wait
kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=90s
- Deploy Ingress
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
labels:
app.kubernetes.io/component: controller
name: ingress-class-nginx-demo
annotations:
ingressclass.kubernetes.io/is-default-class: "true"
spec:
controller: k8s.io/ingress-nginx
EOF
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-nginx-demo
namespace: tenant-lite
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: 8m
spec:
ingressClassName: ingress-class-nginx-demo
defaultBackend:
service:
name: storage-lite-hl
port:
number: 9000
rules:
- host: localhost
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: storage-lite-hl
port:
number: 9000
EOF
Since port 10080 is the hostport, allow browser access e.g. Validate access, after killing port forward temporarily http://localhost:10080/data/42489388690_1f48eec1cc_o.jpg
Test /Users/allanreid/Documents/MinIO/curl_benchmark.sh http://localhost:10080/data/42489388690_1f48eec1cc_o.jpg
#!/bin/bash
printf 'namelookup\tconnect\t\tappconnect\tpretransfer\tredirect\tstarttransfer\ttotal\t\tresponse_code\n'
for i in {1..10}
do
curl --max-time 60 -w "@$HOME/Documents/MinIO/curl-format" -o /dev/null -s "$1"
sleep 0.1
done
curl-format
%{time_namelookup}\t
%{time_connect}\t
%{time_appconnect}\t
%{time_pretransfer}\t
%{time_redirect}\t
%{time_starttransfer}\t
%{time_total}\t
%{response_code}\n
Delete pods and try to access resource
for i in {0..1}; do; k -n tenant-lite delete pod/storage-lite-pool-0-$i; done;
$HOME/Documents/MinIO/curl_benchmark.sh http://home.k8s.local:10080/data/42489388690_1f48eec1cc_o.jpg
Observe multiple failures after file is deleted
➜ operator git:(reimplement-tenant-readiness) ✗ for i in {0..3}; do; k -n tenant-lite delete pod/storage-lite-pool-0-$i; done;
pod "storage-lite-pool-0-0" deleted
pod "storage-lite-pool-0-1" deleted
pod "storage-lite-pool-0-2" deleted
pod "storage-lite-pool-0-3" deleted
➜ operator git:(reimplement-tenant-readiness) ✗ $HOME/Documents/MinIO/curl_benchmark.sh http://home.k8s.local:10080/data/42489388690_1f48eec1cc_o.jpg
namelookup connect appconnect pretransfer redirect starttransfer total response_code
0.003785 0.004011 0.000000 0.004031 0.000000 0.014025 0.014237 503
0.006083 0.006372 0.000000 0.006399 0.000000 0.013171 0.013366 503
0.009248 0.009754 0.000000 0.009799 0.000000 11.040953 11.042670 200
0.003961 0.004183 0.000000 0.004202 0.000000 14.750503 14.751972 200
0.004403 0.004615 0.000000 0.004639 0.000000 0.009746 0.010382 200
0.005521 0.005810 0.000000 0.005852 0.000000 0.008265 0.008529 503
0.006155 0.006401 0.000000 0.006440 0.000000 6.058780 6.059284 200
0.005109 0.005338 0.000000 0.005358 0.000000 6.082359 6.083249 200
0.007787 0.008237 0.000000 0.008280 0.000000 0.016614 0.018058 200
0.005651 0.005899 0.000000 0.005927 0.000000 0.010749 0.011922 200
➜ operator git:(reimplement-tenant-readiness) ✗ for i in {4..7}; do; k -n tenant-lite delete pod/storage-lite-pool-0-$i; done;
pod "storage-lite-pool-0-4" deleted
pod "storage-lite-pool-0-5" deleted
pod "storage-lite-pool-0-6" deleted
pod "storage-lite-pool-0-7" deleted
➜ operator git:(reimplement-tenant-readiness) ✗ $HOME/Documents/MinIO/curl_benchmark.sh http://home.k8s.local:10080/data/42489388690_1f48eec1cc_o.jpg
namelookup connect appconnect pretransfer redirect starttransfer total response_code
0.004187 0.004411 0.000000 0.004427 0.000000 0.009661 0.009821 503
0.005210 0.005466 0.000000 0.005491 0.000000 0.011484 0.011635 503
0.006529 0.006855 0.000000 0.006887 0.000000 11.012152 11.012786 200
0.004837 0.005067 0.000000 0.005089 0.000000 11.013581 11.014536 200
0.007122 0.007630 0.000000 0.007675 0.000000 0.018681 0.019035 503
0.007907 0.008389 0.000000 0.008435 0.000000 1.024184 1.027035 200
0.005949 0.006196 0.000000 0.006222 0.000000 1.022314 1.023611 200
0.007658 0.008130 0.000000 0.008172 0.000000 11.017359 11.018144 200
0.004194 0.004406 0.000000 0.004426 0.000000 1.013662 1.014561 200
0.007685 0.008014 0.000000 0.008044 0.000000 10.062520 10.063366 200
- Add readiness to tenant, restart pods
Add readiness to tenant, restart pods
readiness:
exec:
command:
- /bin/sh
- -c
- if minio --version | grep -iq "minio"; then exit 0; else exit 1; fi
initialDelaySeconds: 20
periodSeconds: 5
for i in {0..3}; do; k -n tenant-lite delete pod/storage-lite-pool-0-$i; done;
for i in {4..7}; do; k -n tenant-lite delete pod/storage-lite-pool-0-$i; done;
k -n tenant-lite get pods --selector=v1.min.io/pool=pool-0
Observe no failures
➜ operator git:(reimplement-tenant-readiness) ✗ $HOME/Documents/MinIO/curl_benchmark.sh http://home.k8s.local:10080/data/42489388690_1f48eec1cc_o.jpg
namelookup connect appconnect pretransfer redirect starttransfer total response_code
0.005172 0.005393 0.000000 0.005416 0.000000 11.014145 11.014810 200
0.011349 0.011725 0.000000 0.011796 0.000000 11.020158 11.020735 200
0.007300 0.007566 0.000000 0.007610 0.000000 0.017514 0.018624 200
0.007877 0.008131 0.000000 0.008161 0.000000 1.014856 1.015400 200
0.004957 0.005171 0.000000 0.005195 0.000000 0.009952 0.010459 200
0.006830 0.007064 0.000000 0.007089 0.000000 0.013108 0.014128 200
0.003999 0.004216 0.000000 0.004234 0.000000 3.031393 3.032047 200
0.011215 0.011489 0.000000 0.011523 0.000000 0.017061 0.018782 200
0.005624 0.005876 0.000000 0.005904 0.000000 0.011993 0.012633 200
0.006657 0.006927 0.000000 0.006964 0.000000 10.018073 10.018586 200
➜ operator git:(reimplement-tenant-readiness) ✗ for i in {4..7}; do; k -n tenant-lite delete pod/storage-lite-pool-0-$i; done;
pod "storage-lite-pool-0-4" deleted
pod "storage-lite-pool-0-5" deleted
pod "storage-lite-pool-0-6" deleted
pod "storage-lite-pool-0-7" deleted
➜ operator git:(reimplement-tenant-readiness) ✗ $HOME/Documents/MinIO/curl_benchmark.sh http://home.k8s.local:10080/data/42489388690_1f48eec1cc_o.jpg
namelookup connect appconnect pretransfer redirect starttransfer total response_code
0.005810 0.006040 0.000000 0.006062 0.000000 11.017189 11.018060 200
0.010540 0.010842 0.000000 0.010885 0.000000 1.025492 1.026766 200
0.010946 0.011272 0.000000 0.011330 0.000000 1.030395 1.030933 200
0.009641 0.009934 0.000000 0.009977 0.000000 11.017343 11.017963 200
0.018904 0.019159 0.000000 0.019221 0.000000 0.030695 0.031390 200
0.015107 0.015332 0.000000 0.015382 0.000000 11.023996 11.024813 200
0.010994 0.011294 0.000000 0.011348 0.000000 0.017466 0.026548 200
0.011554 0.011783 0.000000 0.011815 0.000000 0.019325 0.020135 200
0.011058 0.011323 0.000000 0.011353 0.000000 0.021561 0.023456 200
0.011851 0.012105 0.000000 0.012131 0.000000 0.020735 0.022377 200
Some unknown activity happens in the Minio server with allows the pod to be able to actually handle requests and return resource data between 10-20 [s] of initialDelaySeconds.
Question: How can a Minio server pod in a tenant pool test itself, i.e. within the confines of its own container, to know if it is able to handle requests for resources?
Current working attempt based on the observations where readiness when starting up with initialDelaySeconds=20 leads to all 200 responses.
readiness:
exec:
command:
- /bin/sh
- -c
- if minio --version | grep -iq "minio"; then exit 0; else exit 1; fi
initialDelaySeconds: 20
periodSeconds: 5
The currently available services include the following:
curl http://localhost:9000/minio/health/cluster -v
However these services reference other pods and is not viable. See following log snippet from pod/storage-lite-pool-0-1:
Waiting for atleast 1 remote servers with valid configuration to be online
Following servers are currently offline or unreachable [http://storage-lite-pool-0-0.storage-lite-hl.tenant-lite.svc.cluster.local:9000/export0 is unreachable: Post "http://storage-lite-pool-0-0.storage-lite-hl.tenant-lite.svc.cluster.local:9000/minio/bootstrap/v1/verify": lookup storage-lite-pool-0-0.storage-lite-hl.tenant-lite.svc.cluster.local on 10.96.0.10:53: no such host http://storage-lite-pool-0-2.storage-lite-hl.tenant-lite.svc.cluster.local:9000/export0 is unreachable: Post "http://storage-lite-pool-0-2.storage-lite-hl.tenant-lite.svc.cluster.local:9000/minio/bootstrap/v1/verify": lookup storage-lite-pool-0-2.storage-lite-hl.tenant-lite.svc.cluster.local on 10.96.0.10:53: no such host http://storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/export0 is unreachable: Post "http://storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/minio/bootstrap/v1/verify": lookup storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local on 10.96.0.10:53: no such host]
This leads to a deadlock since all pods are waiting mutually for their counterparts to become ready. No state changes.
/health/live and /health/ready only return true
/health/cluster/read returns a result based on a quorum from among the nodes online (see similar /health/cluster above).
readiness:
httpGet:
port: 9000
path: /minio/health/cluster
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 1
The same quorum issue applies to the following attempt at a local curl. The Minio server checks its pool for other nodes online, causing a deadlock if they are not ready. Therefore the following readiness probe fails.
readiness:
exec:
command:
- /bin/sh
- -c
- |-
ready=$(curl http://localhost:9000/data/42489388690_1f48eec1cc_o.jpg -s -o /dev/null -w "%{response_code}"); if [[ ${ready} -lt 500 ]]; then exit 0; else exit 1; fi
initialDelaySeconds: 20
periodSeconds: 5
Logging into the Minio server container, there is an /export[0..n] directory depending on the number of volumes chosen. A test can be made to validate that these directories (resources) exist to determine readiness. This failed, since it does not in fact determine readiness.
readiness:
exec:
command:
- ls
- /export0
initialDelaySeconds: 1
periodSeconds: 1
In cmd/healthcheck-handler.go in minio, modify:
// ReadinessCheckHandler Checks if the process is up. Always returns success.
func ReadinessCheckHandler(w http.ResponseWriter, r *http.Request) {
ctx := newContext(r, w, "ReadinessCheckHandler")
if shouldProxy() {
// Service not initialized yet
w.Header().Set(xhttp.MinIOServerStatus, unavailable)
}
objLayer := newObjectLayerFn()
// Borrowed from
// https://github.com/etcd-io/etcd/blob/main/etcdctl/ctlv3/command/ep_command.go#L118
ctx, cancel := context.WithTimeout(r.Context(), defaultContextTimeout)
defer cancel()
result := objLayer.ReadHealth(ctx)
if !result {
writeResponse(w, http.StatusServiceUnavailable, nil, mimeNone)
return
}
writeResponse(w, http.StatusOK, nil, mimeNone)
}
To non-TLS tenant add probe:
readiness:
httpGet:
port: 9000
path: /minio/health/ready
scheme: HTTPS
initialDelaySeconds: 5
periodSeconds: 1
Observe that pods become ready after 5 seconds and testing the object layer. However, minio still sees the pods as unavailable even though an internal curl works. e.g.
k -n tenant-lite exec -it storage-lite-pool-0-0 -- /bin/sh
curl http://localhost:9000/minio/health/ready --insecure -v
Same with a public resource. e.g. From same pod
k -n tenant-lite delete pod/storage-lite-pool-0-0
k -n tenant-lite exec -it storage-lite-pool-0-0 -- /bin/sh
curl http://localhost:9000/minio/health/ready --insecure -v
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9000 (#0)
> GET /minio/health/ready HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Content-Length: 0
< Content-Security-Policy: block-all-mixed-content
< Server: MinIO
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Vary: Origin
< X-Amz-Request-Id: 173D6BCB755A4797
< X-Content-Type-Options: nosniff
< X-Xss-Protection: 1; mode=block
< Date: Wed, 25 Jan 2023 02:19:18 GMT
<
* Connection #0 to host localhost left intact
or
k -n tenant-lite delete pod/storage-lite-pool-0-0
k -n tenant-lite exec -it storage-lite-pool-0-0 -- /bin/sh
curl http://localhost:9000/test/156157869_9e929563d3_o.jpg
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
From another pod:
k -n tenant-lite delete pod/storage-lite-pool-0-1
k -n tenant-lite exec -it storage-lite-pool-0-0 -- /bin/sh
curl http://localhost:9000/minio/health/ready --insecure -v
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9000 (#0)
> GET /minio/health/ready HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Content-Length: 0
< Content-Security-Policy: block-all-mixed-content
< Server: MinIO
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Vary: Origin
< X-Amz-Request-Id: 173D6BCB755A4797
< X-Content-Type-Options: nosniff
< X-Xss-Protection: 1; mode=block
< Date: Wed, 25 Jan 2023 02:19:18 GMT
<
* Connection #0 to host localhost left intact
or
k -n tenant-lite delete pod/storage-lite-pool-0-1
k -n tenant-lite exec -it storage-lite-pool-0-0 -- /bin/sh
curl http://localhost:9000/test/156157869_9e929563d3_o.jpg
.
.
long timeout...
.
.
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.