Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podManagementPolicy silently changes to Parallel if the value is invalid #4649

Open
hoyhbx opened this issue Jul 27, 2022 · 0 comments
Open

Comments

@hoyhbx
Copy link
Contributor

hoyhbx commented Jul 27, 2022

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T21:04:32Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What version of TiDB Operator are you using?

TiDB Operator Version: version.Info{GitVersion:"v1.3.0-45+1470cfb46e1ffb-dirty", GitCommit:"1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:"dirty", BuildDate:"2022-07-07T21:33:51Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

$ kubectl get sc
NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   rancher.io/local-path   Delete          WaitForFirstConsumer   false                  9m4s

$ kubectl get pvc -n {tidb-cluster-namespace}
NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pd-advanced-tidb-pd-0       Bound    pvc-5d85aab1-3a5b-4bd2-a096-83be8a6e4c63   10Gi       RWO            standard       8m37s
pd-advanced-tidb-pd-1       Bound    pvc-22015e6c-601d-409e-b480-07458ce53711   10Gi       RWO            standard       8m37s
pd-advanced-tidb-pd-2       Bound    pvc-aa1bfdda-b2be-44ed-a721-a17ef5d6140c   10Gi       RWO            standard       8m37s
tikv-advanced-tidb-tikv-0   Bound    pvc-a702225b-1b7f-4363-8975-04f9170f5853   100Gi      RWO            standard       7m11s
tikv-advanced-tidb-tikv-1   Bound    pvc-fc633404-97d4-4359-9921-61faaf26b4c8   100Gi      RWO            standard       7m11s
tikv-advanced-tidb-tikv-2   Bound    pvc-69b6c9eb-6dcf-4ae6-91b8-912551c3c1d4   100Gi      RWO            standard       7m11s

What's the status of the TiDB cluster pods?

NAME                                       READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
advanced-tidb-discovery-6998694d4c-zmg5d   1/1     Running   0          9m21s   10.244.3.2   test-worker2   <none>           <none>
advanced-tidb-pd-0                         1/1     Running   0          9m20s   10.244.2.3   test-worker    <none>           <none>
advanced-tidb-pd-1                         1/1     Running   0          9m20s   10.244.3.4   test-worker2   <none>           <none>
advanced-tidb-pd-2                         1/1     Running   0          9m20s   10.244.1.5   test-worker3   <none>           <none>
advanced-tidb-tidb-0                       2/2     Running   0          6m48s   10.244.2.6   test-worker    <none>           <none>
advanced-tidb-tidb-1                       2/2     Running   0          6m48s   10.244.3.7   test-worker2   <none>           <none>
advanced-tidb-tidb-2                       2/2     Running   0          6m48s   10.244.1.8   test-worker3   <none>           <none>
advanced-tidb-tikv-0                       1/1     Running   0          7m54s   10.244.2.5   test-worker    <none>           <none>
advanced-tidb-tikv-1                       1/1     Running   0          7m54s   10.244.3.6   test-worker2   <none>           <none>
advanced-tidb-tikv-2                       1/1     Running   0          7m54s   10.244.1.7   test-worker3   <none>           <none>
tidb-controller-manager-5c775c65f5-24sw4   1/1     Running   0          9m36s   10.244.1.3   test-worker3   <none>           <none>
tidb-scheduler-5665b7f8fb-j4864            2/2     Running   0          9m36s   10.244.1.2   test-worker3   <none>           <none>

What did you do?

The field spec.pd.podManagementPolicy silently falls back to parallel if the user supplies a value other than OrderedReady.
The CR file we used is shown below, note that user has a typo for the spec.pd.podManagementPolicy as OrderredReady other than OrderedReady

CR file
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: advanced-tidb
spec:
  version: "v5.4.0"
  timezone: UTC
  helper:
    image: busybox:1.34.1
  pvReclaimPolicy: Retain
  enableDynamicConfiguration: true
  pd:
    baseImage: pingcap/pd
    config: |
      [dashboard]
        internal-proxy = true
    replicas: 3
    maxFailoverCount: 0
    requests:
      storage: 10Gi
    mountClusterClientSecret: true
    podManagementPolicy: OrderredReady
  tidb:
    baseImage: pingcap/tidb
    config: |
      [performance]
        tcp-keep-alive = true
    replicas: 3
    maxFailoverCount: 0
    service:
      type: NodePort
      externalTrafficPolicy: Local
  tikv:
    baseImage: pingcap/tikv
    config: |
      log-level = "info"
    replicas: 3
    maxFailoverCount: 0
    requests:
      storage: 100Gi
    mountClusterClientSecret: true

What did you expect to see?
We expected to see the input get rejected with a clear error message, or fall back to the safe value with some error messages.

What did you see instead?
We see that spec.pd.podManagementPolicy is changed to parallel without any error message.

Additional comment
We inspected the source code and found that podManagementPolicy will silently fall back to Parallel if podManagementPolicy is any string other than OrderedReady (see this function). We believe that if users provide an invalid podManagementPolicy, there should be a log indicating that podManagementPolicy is invalid before the operator adopts one of its default values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant