You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What did you do to encounter the bug?
We first created a Mongodb cluster using a very standard CR.
Then we tried to create some ephemeralContainers in the MongoDB Pods by specifying it in the spec.statefulSet.spec.template.ephemeralContainers field.
The MongoDB cluster ended up getting unhealthy because the spec for the ephemeralContainers has some problems, then we tried to recover by deleting the spec.statefulSet.spec.template.ephemeralContainers. But the operator is not able to recover the cluster after we manually revert the CR. It always waits for all members to reach desired state before proceeding to update the ephemeralContainer. But the statefulSet is never going to get ready because the spec is rejected by the statefulSet controller. This gets in a infinite loop scenario.
To fix this problem, we had to delete the cluster and redeploy it.
Steps to reproduce the behavior:
Deploy the MongoDB cluster with spec:
apiVersion: mongodbcommunity.mongodb.com/v1kind: MongoDBCommunitymetadata:
namespace: mongodbname: test-clusterspec:
automationConfig:
processes:
- disabled: falsename: test-cluster-1members: 3type: ReplicaSetversion: "4.4.0"security:
authentication:
modes: ["SCRAM"]users:
- name: my-userdb: adminpasswordSecretRef: # a reference to the secret that will be used to generate the user's passwordname: my-user-passwordroles:
- name: clusterAdmindb: admin
- name: userAdminAnyDatabasedb: adminscramCredentialsSecretName: my-scramstatefulSet:
spec:
template:
spec:
containers:
- name: mongodresources:
limits:
cpu: '1'memory: 1000Mrequests:
cpu: '1'memory: 1000M
- name: mongodb-agentresources:
limits:
cpu: '1'memory: 1000Mrequests:
cpu: '1'memory: 1000M
Add ephemeralContainer to the statefulset template by applying:
apiVersion: mongodbcommunity.mongodb.com/v1kind: MongoDBCommunitymetadata:
namespace: mongodbname: test-clusterspec:
automationConfig:
processes:
- disabled: falsename: test-cluster-1members: 3type: ReplicaSetversion: "4.4.0"security:
authentication:
modes: ["SCRAM"]users:
- name: my-userdb: adminpasswordSecretRef: # a reference to the secret that will be used to generate the user's passwordname: my-user-passwordroles:
- name: clusterAdmindb: admin
- name: userAdminAnyDatabasedb: adminscramCredentialsSecretName: my-scramstatefulSet:
spec:
template:
spec:
containers:
- name: mongodresources:
limits:
cpu: '1'memory: 1000Mrequests:
cpu: '1'memory: 1000M
- name: mongodb-agentresources:
limits:
cpu: '1'memory: 1000Mrequests:
cpu: '1'memory: 1000MephemeralContainers:
- name: ACTOCONTAINERresources:
limits:
cpu: 800m
What did you expect?
The operator should be able to recover the cluster after the manual revert.
What happened instead?
The operator is stuck and cannot make any progress even after manually reverting the CR.
Normal SuccessfulCreate 48m statefulset-controller create Pod test-cluster-0 in StatefulSet test-cluster successful
Normal SuccessfulCreate 46m statefulset-controller create Pod test-cluster-1 in StatefulSet test-cluster successful
Normal SuccessfulCreate 46m statefulset-controller create Pod test-cluster-2 in StatefulSet test-cluster successful
Normal SuccessfulDelete 2m15s statefulset-controller delete Pod test-cluster-2 in StatefulSet test-cluster successful
Warning FailedCreate 93s (x2 over 2m14s) statefulset-controller create Pod test-cluster-2 in StatefulSet test-cluster failed error: Pod "test-cluster-2" is invalid: [spec.ephemeralContainers[0][0].name: Invalid value: "ACTOKEY": a lowercase RFC 1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), spec.ephemeralContainers[0][0].image: Required value, spec.ephemeralContainers[0][0].resources.requests[ilvwddmkyk]: Invalid value: "ilvwddmkyk": must be a standard resource type or fully qualified, spec.ephemeralContainers[0][0].resources.requests[ilvwddmkyk]: Invalid value: "ilvwddmkyk": must be a standard resource for containers, spec.ephemeralContainers[0][0].resources.requests[ACTOKEY]: Invalid value: "ACTOKEY": must be a standard resource type or fully qualified, spec.ephemeralContainers[0][0].resources.requests[ACTOKEY]: Invalid value: "ACTOKEY": must be a standard resource for containers, spec.ephemeralContainers[0].resources: Forbidden: cannot be set for an Ephemeral Container, spec.ephemeralContainers: Forbidden: cannot be set on create]
Warning FailedCreate 52s (x13 over 2m14s) statefulset-controller create Pod test-cluster-2 in StatefulSet test-cluster failed error: Pod "test-cluster-2" is invalid: [spec.ephemeralContainers[0][0].name: Invalid value: "ACTOKEY": a lowercase RFC 1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), spec.ephemeralContainers[0][0].image: Required value, spec.ephemeralContainers[0][0].resources.requests[ACTOKEY]: Invalid value: "ACTOKEY": must be a standard resource type or fully qualified, spec.ephemeralContainers[0][0].resources.requests[ACTOKEY]: Invalid value: "ACTOKEY": must be a standard resource for containers, spec.ephemeralContainers[0][0].resources.requests[ilvwddmkyk]: Invalid value: "ilvwddmkyk": must be a standard resource type or fully qualified, spec.ephemeralContainers[0][0].resources.requests[ilvwddmkyk]: Invalid value: "ilvwddmkyk": must be a standard resource for containers, spec.ephemeralContainers[0].resources: Forbidden: cannot be set for an Ephemeral Container, spec.ephemeralContainers: Forbidden: cannot be set on create]
tylergu
changed the title
[BUG] mongodb-kubernetes-operator: pod deleted and system unable to recover when invalid ephemeral container is specified
[BUG] mongodb-kubernetes-operator: operator stuck and unable to recover mongodb if the podTemplate is rejected by the statefulSet controller
Mar 22, 2023
tylergu
changed the title
[BUG] mongodb-kubernetes-operator: operator stuck and unable to recover mongodb if the podTemplate is rejected by the statefulSet controller
[BUG] mongodb-kubernetes-operator: operator stuck and unable to recover the mongodb if the podTemplate is rejected by the statefulSet controller
Mar 22, 2023
What did you do to encounter the bug?
We first created a Mongodb cluster using a very standard CR.
Then we tried to create some ephemeralContainers in the MongoDB Pods by specifying it in the
spec.statefulSet.spec.template.ephemeralContainers
field.The MongoDB cluster ended up getting unhealthy because the spec for the ephemeralContainers has some problems, then we tried to recover by deleting the
spec.statefulSet.spec.template.ephemeralContainers
. But the operator is not able to recover the cluster after we manually revert the CR. It always waits for all members to reach desired state before proceeding to update the ephemeralContainer. But the statefulSet is never going to get ready because the spec is rejected by the statefulSet controller. This gets in a infinite loop scenario.To fix this problem, we had to delete the cluster and redeploy it.
Steps to reproduce the behavior:
What did you expect?
The operator should be able to recover the cluster after the manual revert.
What happened instead?
The operator is stuck and cannot make any progress even after manually reverting the CR.
Operator Information
0.7.4
4.4.0
Kubernetes Cluster Information
kubectl version --short --output=yaml
The text was updated successfully, but these errors were encountered: