Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting unsatisfiable affinity requirements leads to broken TiDB cluster #4960

Open
hoyhbx opened this issue Apr 7, 2023 · 1 comment
Open

Comments

@hoyhbx
Copy link
Contributor

hoyhbx commented Apr 7, 2023

Hey TiDB developers,

We found that if we set spec.tidb.affinity to some affinity requirement that cannot be satisfied by the current cluster status, then at least one TiDB pod will fail because the statefulset controller will restart the pod with the updated affinity requirement and the Kubernetes will find no way to schedule the TiDB pod. More importantly, we find the TiDB operator cannot recover TiDB from this failure because it always waits for all pods to become ready before applying the next update (as already reported here: #4946).

A concrete example is to run TiDB in a cluster with only 3 nodes and set

spec:
  tidb:
    replicas: 5
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/name
              operator: In
              values:
              - test-cluster
          topologyKey: kubernetes.io/hostname

TiDB pod keeps failing to be get scheduled because no machine can satisfy the Affinity rule in the cluster.

We are not sure if this should be counted as a bug and how to fix it because it is difficult for the operator to tell whether the affinity requirement is unsatisfiable before updating the statefulset. The scheduling logic in implemented in the scheduler. However, the consequence of the issue is severe as TiDB pods cannot get started, and we find no way to recover from the failure (neither resetting affinity nor restarting the operator works). We also found that Affinity is not the only property to break the TiDB statefulSet, there are a lot of other properties, such as priorityClassName, when set incorrectly, can harm the reliability of the cluster.

We want to open this issue separately to discuss what should be the best practice to handle this issue, or what functionalities should the Kubernetes provide to make this validation easier. Is there a way to prevent the bad operation from happening in the first place, or there is a way for tidb-operator to automatically recognize the statefulSet is stuck and perform an automatic recovery. If you know of any practical code fixes for this issue, we are also happy to send a PR for that.

@csuzhangxc
Copy link
Member

As you have said, "it is difficult for the operator to tell whether the affinity requirement is unsatisfiable before updating the statefulset.".

In the above blocking cases, can you try to delete the old StatefulSet (but do not delete the pods and let them become orphans) and let the TiDB-Operator recreate a new StatefulSet with the correct spec?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants