You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What did you do to encounter the bug?
We first created a Mongodb cluster using a very standard CR.
Then we tried to create some ephemeralContainers in the MongoDB Pods by specifying it in the spec.statefulSet.spec.template.ephemeralContainers field.
The MongoDB cluster ended up getting unhealthy because the spec for the ephemeralContainers has some problems, then we tried to recover by deleting the spec.statefulSet.spec.template.ephemeralContainers. But the operator is not able to recover the cluster after we manually revert the CR. It always waits for all members to reach desired state before proceeding to update the ephemeralContainer. But the statefulSet is never going to get ready because the spec is rejected by the statefulSet controller. This gets in a infinite loop scenario.
To fix this problem, we had to delete the cluster and redeploy it.
Note that the bug can be triggered by any invalid input which will be rejected by the statefulSet controller, not limited to ephemeralContainers
Steps to reproduce the behavior:
Deploy the MongoDB cluster with spec:
apiVersion: mongodbcommunity.mongodb.com/v1kind: MongoDBCommunitymetadata:
namespace: mongodbname: test-clusterspec:
automationConfig:
processes:
- disabled: falsename: test-cluster-1members: 3type: ReplicaSetversion: "4.4.0"security:
authentication:
modes: ["SCRAM"]users:
- name: my-userdb: adminpasswordSecretRef: # a reference to the secret that will be used to generate the user's passwordname: my-user-passwordroles:
- name: clusterAdmindb: admin
- name: userAdminAnyDatabasedb: adminscramCredentialsSecretName: my-scramstatefulSet:
spec:
template:
spec:
containers:
- name: mongodresources:
limits:
cpu: '1'memory: 1000Mrequests:
cpu: '1'memory: 1000M
- name: mongodb-agentresources:
limits:
cpu: '1'memory: 1000Mrequests:
cpu: '1'memory: 1000M
Add ephemeralContainer to the statefulset template by applying:
apiVersion: mongodbcommunity.mongodb.com/v1kind: MongoDBCommunitymetadata:
namespace: mongodbname: test-clusterspec:
automationConfig:
processes:
- disabled: falsename: test-cluster-1members: 3type: ReplicaSetversion: "4.4.0"security:
authentication:
modes: ["SCRAM"]users:
- name: my-userdb: adminpasswordSecretRef: # a reference to the secret that will be used to generate the user's passwordname: my-user-passwordroles:
- name: clusterAdmindb: admin
- name: userAdminAnyDatabasedb: adminscramCredentialsSecretName: my-scramstatefulSet:
spec:
template:
spec:
containers:
- name: mongodresources:
limits:
cpu: '1'memory: 1000Mrequests:
cpu: '1'memory: 1000M
- name: mongodb-agentresources:
limits:
cpu: '1'memory: 1000Mrequests:
cpu: '1'memory: 1000MephemeralContainers:
- name: ACTOCONTAINERresources:
limits:
cpu: 800m
What did you expect?
The operator should be able to recover the cluster after the manual revert.
What happened instead?
The operator is stuck and cannot make any progress even after manually reverting the CR.
Normal SuccessfulCreate 48m statefulset-controller create Pod test-cluster-0 in StatefulSet test-cluster successful
Normal SuccessfulCreate 46m statefulset-controller create Pod test-cluster-1 in StatefulSet test-cluster successful
Normal SuccessfulCreate 46m statefulset-controller create Pod test-cluster-2 in StatefulSet test-cluster successful
Normal SuccessfulDelete 2m15s statefulset-controller delete Pod test-cluster-2 in StatefulSet test-cluster successful
Warning FailedCreate 93s (x2 over 2m14s) statefulset-controller create Pod test-cluster-2 in StatefulSet test-cluster failed error: Pod "test-cluster-2" is invalid: [spec.ephemeralContainers[0][0].name: Invalid value: "ACTOKEY": a lowercase RFC 1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), spec.ephemeralContainers[0][0].image: Required value, spec.ephemeralContainers[0][0].resources.requests[ilvwddmkyk]: Invalid value: "ilvwddmkyk": must be a standard resource type or fully qualified, spec.ephemeralContainers[0][0].resources.requests[ilvwddmkyk]: Invalid value: "ilvwddmkyk": must be a standard resource for containers, spec.ephemeralContainers[0][0].resources.requests[ACTOKEY]: Invalid value: "ACTOKEY": must be a standard resource type or fully qualified, spec.ephemeralContainers[0][0].resources.requests[ACTOKEY]: Invalid value: "ACTOKEY": must be a standard resource for containers, spec.ephemeralContainers[0].resources: Forbidden: cannot be set for an Ephemeral Container, spec.ephemeralContainers: Forbidden: cannot be set on create]
Warning FailedCreate 52s (x13 over 2m14s) statefulset-controller create Pod test-cluster-2 in StatefulSet test-cluster failed error: Pod "test-cluster-2" is invalid: [spec.ephemeralContainers[0][0].name: Invalid value: "ACTOKEY": a lowercase RFC 1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), spec.ephemeralContainers[0][0].image: Required value, spec.ephemeralContainers[0][0].resources.requests[ACTOKEY]: Invalid value: "ACTOKEY": must be a standard resource type or fully qualified, spec.ephemeralContainers[0][0].resources.requests[ACTOKEY]: Invalid value: "ACTOKEY": must be a standard resource for containers, spec.ephemeralContainers[0][0].resources.requests[ilvwddmkyk]: Invalid value: "ilvwddmkyk": must be a standard resource type or fully qualified, spec.ephemeralContainers[0][0].resources.requests[ilvwddmkyk]: Invalid value: "ilvwddmkyk": must be a standard resource for containers, spec.ephemeralContainers[0].resources: Forbidden: cannot be set for an Ephemeral Container, spec.ephemeralContainers: Forbidden: cannot be set on create]
This issue is being marked stale because it has been open for 60 days with no activity. Please comment if this issue is still affecting you. If there is no change, this issue will be closed in 30 days.
This issue was closed because it became stale and did not receive further updates. If the issue is still affecting you, please re-open it, or file a fresh Issue with updated information.
What did you do to encounter the bug?
We first created a Mongodb cluster using a very standard CR.
Then we tried to create some ephemeralContainers in the MongoDB Pods by specifying it in the
spec.statefulSet.spec.template.ephemeralContainers
field.The MongoDB cluster ended up getting unhealthy because the spec for the ephemeralContainers has some problems, then we tried to recover by deleting the
spec.statefulSet.spec.template.ephemeralContainers
. But the operator is not able to recover the cluster after we manually revert the CR. It always waits for all members to reach desired state before proceeding to update the ephemeralContainer. But the statefulSet is never going to get ready because the spec is rejected by the statefulSet controller. This gets in a infinite loop scenario.To fix this problem, we had to delete the cluster and redeploy it.
Note that the bug can be triggered by any invalid input which will be rejected by the statefulSet controller, not limited to ephemeralContainers
Steps to reproduce the behavior:
What did you expect?
The operator should be able to recover the cluster after the manual revert.
What happened instead?
The operator is stuck and cannot make any progress even after manually reverting the CR.
Operator Information
0.7.4
4.4.0
Kubernetes Cluster Information
kubectl version --short --output=yaml
The text was updated successfully, but these errors were encountered: