Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TiDB] podManagementPolicy silently changes to Parallel if the value is invalid #150

Closed
unw9527 opened this issue Jul 22, 2022 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@unw9527
Copy link
Contributor

unw9527 commented Jul 22, 2022

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T21:04:32Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What version of TiDB Operator are you using?

TiDB Operator Version: version.Info{GitVersion:"v1.3.0-45+1470cfb46e1ffb-dirty", GitCommit:"1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:"dirty", BuildDate:"2022-07-07T21:33:51Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

$ kubectl get sc
NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   rancher.io/local-path   Delete          WaitForFirstConsumer   false                  9m4s

$ kubectl get pvc -n {tidb-cluster-namespace}
NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pd-advanced-tidb-pd-0       Bound    pvc-5d85aab1-3a5b-4bd2-a096-83be8a6e4c63   10Gi       RWO            standard       8m37s
pd-advanced-tidb-pd-1       Bound    pvc-22015e6c-601d-409e-b480-07458ce53711   10Gi       RWO            standard       8m37s
pd-advanced-tidb-pd-2       Bound    pvc-aa1bfdda-b2be-44ed-a721-a17ef5d6140c   10Gi       RWO            standard       8m37s
tikv-advanced-tidb-tikv-0   Bound    pvc-a702225b-1b7f-4363-8975-04f9170f5853   100Gi      RWO            standard       7m11s
tikv-advanced-tidb-tikv-1   Bound    pvc-fc633404-97d4-4359-9921-61faaf26b4c8   100Gi      RWO            standard       7m11s
tikv-advanced-tidb-tikv-2   Bound    pvc-69b6c9eb-6dcf-4ae6-91b8-912551c3c1d4   100Gi      RWO            standard       7m11s

What's the status of the TiDB cluster pods?

NAME                                       READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
advanced-tidb-discovery-6998694d4c-zmg5d   1/1     Running   0          9m21s   10.244.3.2   test-worker2   <none>           <none>
advanced-tidb-pd-0                         1/1     Running   0          9m20s   10.244.2.3   test-worker    <none>           <none>
advanced-tidb-pd-1                         1/1     Running   0          9m20s   10.244.3.4   test-worker2   <none>           <none>
advanced-tidb-pd-2                         1/1     Running   0          9m20s   10.244.1.5   test-worker3   <none>           <none>
advanced-tidb-tidb-0                       2/2     Running   0          6m48s   10.244.2.6   test-worker    <none>           <none>
advanced-tidb-tidb-1                       2/2     Running   0          6m48s   10.244.3.7   test-worker2   <none>           <none>
advanced-tidb-tidb-2                       2/2     Running   0          6m48s   10.244.1.8   test-worker3   <none>           <none>
advanced-tidb-tikv-0                       1/1     Running   0          7m54s   10.244.2.5   test-worker    <none>           <none>
advanced-tidb-tikv-1                       1/1     Running   0          7m54s   10.244.3.6   test-worker2   <none>           <none>
advanced-tidb-tikv-2                       1/1     Running   0          7m54s   10.244.1.7   test-worker3   <none>           <none>
tidb-controller-manager-5c775c65f5-24sw4   1/1     Running   0          9m36s   10.244.1.3   test-worker3   <none>           <none>
tidb-scheduler-5665b7f8fb-j4864            2/2     Running   0          9m36s   10.244.1.2   test-worker3   <none>           <none>

What did you do?

The field spec.pd.podManagementPolicy silently falls back to parallel if the user supplies a value other than OrderedReady.
The CR file we used is shown below, note that user has a typo for the spec.pd.podManagementPolicy as OrderredReady other than OrderedReady

CR file
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: advanced-tidb
spec:
  version: "v5.4.0"
  timezone: UTC
  helper:
    image: busybox:1.34.1
  pvReclaimPolicy: Retain
  enableDynamicConfiguration: true
  pd:
    baseImage: pingcap/pd
    config: |
      [dashboard]
        internal-proxy = true
    replicas: 3
    maxFailoverCount: 0
    requests:
      storage: 10Gi
    mountClusterClientSecret: true
    podManagementPolicy: OrderredReady
  tidb:
    baseImage: pingcap/tidb
    config: |
      [performance]
        tcp-keep-alive = true
    replicas: 3
    maxFailoverCount: 0
    service:
      type: NodePort
      externalTrafficPolicy: Local
  tikv:
    baseImage: pingcap/tikv
    config: |
      log-level = "info"
    replicas: 3
    maxFailoverCount: 0
    requests:
      storage: 100Gi
    mountClusterClientSecret: true

What did you expect to see?
We expected to see the input get rejected with a clear error message, or fall back to the safe value with some error messages.

What did you see instead?
We see that spec.pd.podManagementPolicy is changed to parallel without any error message.

Additional comment
We inspected the source code and found that podManagementPolicy will silently fall back to Parallel if podManagementPolicy is any string other than OrderedReady (see this function). We believe that if users provide an invalid podManagementPolicy, there should be a log indicating that podManagementPolicy is invalid before the operator adopts one of its default values.

@unw9527 unw9527 added the bug Something isn't working label Jul 22, 2022
@tylergu tylergu changed the title [TiDB] podManagementPolicy silently falls back to a pre-defined constant if it is invalid [TiDB] podManagementPolicy silently changes to Parallel if the value is invalid Jul 22, 2022
@unw9527
Copy link
Contributor Author

unw9527 commented Jul 27, 2022

Issued: pingcap/tidb-operator#4649

@unw9527
Copy link
Contributor Author

unw9527 commented Jul 27, 2022

PR created: pingcap/tidb-operator#4650

@unw9527
Copy link
Contributor Author

unw9527 commented Aug 3, 2022

Confirmed: pingcap/tidb-operator#4650 (comment)

But as discussed with @tylergu , their proposed solution is not optimal. We can use enum to restrict users' choice to the two valid input only. I have updated the PR.

@tianyin
Copy link
Member

tianyin commented Aug 3, 2022

awesome

@unw9527 unw9527 closed this as completed Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants