Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty objects under storageClaims bypass TiDB’s validation #4613

Open
hoyhbx opened this issue Jul 6, 2022 · 2 comments
Open

Empty objects under storageClaims bypass TiDB’s validation #4613

hoyhbx opened this issue Jul 6, 2022 · 2 comments

Comments

@hoyhbx
Copy link
Contributor

hoyhbx commented Jul 6, 2022

Bug Report

What version of Kubernetes are you using?

Client Version: [version.Info](http://version.info/){Major:“1”, Minor:“17", GitVersion:“v1.17.1”, GitCommit:“d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:“clean”, BuildDate:“2020-01-14T21:04:32Z”, GoVersion:“go1.13.5”, Compiler:“gc”, Platform:“linux/amd64”}
Server Version: [version.Info](http://version.info/){Major:“1", Minor:“22”, GitVersion:“v1.22.9", GitCommit:“6df4433e288edc9c40c2e344eb336f63fad45cd2”, GitTreeState:“clean”, BuildDate:“2022-05-19T19:53:08Z”, GoVersion:“go1.16.15", Compiler:“gc”, Platform:“linux/amd64"}

What version of TiDB Operator are you using?

TiDB Operator Version: [version.Info](http://version.info/){GitVersion:“v1.3.0-45+1470cfb46e1ffb-dirty”, GitCommit:“1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:“dirty”, BuildDate:“2022-07-06T18:55:00Z”, GoVersion:“go1.18.3”, Compiler:“gc”, Platform:“linux/amd64”}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

$ kubectl get sc
NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   [rancher.io/local-path](http://rancher.io/local-path)   Delete          WaitForFirstConsumer   false                  40m

$ kubectl get pvc -n {tidb-cluster-namespace}
NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pd-advanced-tidb-pd-0       Bound    pvc-b566858b-bf4e-4e33-b31e-5d7feb7397b1   10Gi       RWO            standard       40m
pd-advanced-tidb-pd-1       Bound    pvc-df70980f-12cf-499f-8ad7-e41cac98c5d0   10Gi       RWO            standard       40m
pd-advanced-tidb-pd-2       Bound    pvc-d41691d8-feb5-4e21-b282-1ece1851cffa   10Gi       RWO            standard       40m
tikv-advanced-tidb-tikv-0   Bound    pvc-42e652d6-2400-4ae8-b790-cef8466e4566   100Gi      RWO            standard       39m
tikv-advanced-tidb-tikv-1   Bound    pvc-5af08c43-e02d-433c-896a-b85ad568d1ca   100Gi      RWO            standard       39m
tikv-advanced-tidb-tikv-2   Bound    pvc-652761b6-9fff-4080-b13d-9e364062cddc   100Gi      RWO            standard       39m

What’s the status of the TiDB cluster pods?

$ kubectl get po -n {tidb-cluster-namespace} -o wide
NAME                                       READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
advanced-tidb-discovery-6998694d4c-fmpjl   1/1     Running   0          41m   10.244.2.2   test-worker3   <none>           <none>
advanced-tidb-pd-0                         1/1     Running   0          41m   10.244.3.4   test-worker    <none>           <none>
advanced-tidb-pd-1                         1/1     Running   0          41m   10.244.2.4   test-worker3   <none>           <none>
advanced-tidb-pd-2                         1/1     Running   0          41m   10.244.1.4   test-worker2   <none>           <none>
advanced-tidb-tidb-0                       2/2     Running   0          39m   10.244.3.7   test-worker    <none>           <none>
advanced-tidb-tidb-1                       2/2     Running   0          39m   10.244.2.7   test-worker3   <none>           <none>
advanced-tidb-tidb-2                       2/2     Running   0          39m   10.244.1.7   test-worker2   <none>           <none>
advanced-tidb-tikv-0                       1/1     Running   0          40m   10.244.2.6   test-worker3   <none>           <none>
advanced-tidb-tikv-1                       1/1     Running   0          40m   10.244.3.6   test-worker    <none>           <none>
advanced-tidb-tikv-2                       1/1     Running   0          40m   10.244.1.6   test-worker2   <none>           <none>
tidb-controller-manager-6cc68f8949-52vwt   1/1     Running   0          41m   10.244.3.2   test-worker    <none>           <none>
tidb-scheduler-dd569b6b4-hj294             2/2     Running   0          41m   10.244.1.2   test-worker2   <none>           <none>

What did you do?

We created a CR with TiFlash as follows and we deployed it using kubectl apply -f.

The CR file
apiVersion: [pingcap.com/v1alpha1](http://pingcap.com/v1alpha1)
kind: TidbCluster
metadata:
  name: advanced-tidb
spec:
  version: “v5.4.0"
  timezone: UTC
  configUpdateStrategy: RollingUpdate
  helper:
    image: busybox:1.34.1
  pvReclaimPolicy: Retain
  enableDynamicConfiguration: true
  tiflash:
    baseImage: pingcap/tiflash
    replicas: 2
    storageClaims:
    - {}
    - {}
    - {}
  pd:
    baseImage: pingcap/pd
    config: |
      [dashboard]
        internal-proxy = true
    replicas: 3
    maxFailoverCount: 0
    requests:
      storage: 10Gi
    mountClusterClientSecret: true
  tidb:
    baseImage: pingcap/tidb
    config: |
      [performance]
        tcp-keep-alive = true
    replicas: 3
    maxFailoverCount: 0
    service:
      type: NodePort
      externalTrafficPolicy: Local
  tikv:
    baseImage: pingcap/tikv
    config: |
      log-level = “info”
    replicas: 3
    maxFailoverCount: 0
    requests:
      storage: 100Gi
    mountClusterClientSecret: true

Note that we assigned empty values to storageClaims.

What did you expect to see?
We expected to see that the input would be rejected as storageClaims is essentially empty. Additionally, we believed that we should see an error message “storageClaims should be configured at least one item.” as this line suggests.

What did you see instead?
The pod related to TiFlash was never created but we did not see the error message that we mentioned above. In fact, we did not see any error messages from the operator indicating why the pod for TiFlash failed to be created.

Possible reason
We believed that the validation here is not comprehensive since even if we build storageClaims with empty values, it will bypass this if statement as the length of spec.StorageClaims is 3 in our case.

Besides, from kubectl we found out that spec.storageClaims[INDEX].resources.requests.storage is a required field. But it is labelled as optional in types.go at this line, which should be incorrect.

@mikechengwei
Copy link
Contributor

Do you mean operator need to validate storageClaims configuration? If it is not currently configured, it will use default storage.

@hoyhbx
Copy link
Contributor Author

hoyhbx commented Jul 15, 2022

Hi, @mikechengwei Thanks for your reply. Yes, you are absolutely correct, the operator validates the storageClaims by checking the length of the slice, if the length is smaller than 1, it will use the default storage. But the problem is triggered when there are several items under storageClaims, but the items are empty. For example:

    storageClaims:
    - {}
    - {}
    - {}

In this case, it bypasses the validation and causes pods to be down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants