Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple ReplicaSet Race Condition #2188

Closed
mjallday opened this issue Apr 2, 2021 · 8 comments · Fixed by #2196
Closed

Multiple ReplicaSet Race Condition #2188

mjallday opened this issue Apr 2, 2021 · 8 comments · Fixed by #2196
Assignees
Labels
kind/bug Something isn't working

Comments

@mjallday
Copy link

mjallday commented Apr 2, 2021

We're trying to spin up a camel-k integration and there's some sort of race condition creating replicasets as it's starting. Here's how we're doing this.

apiVersion: camel.apache.org/v1
kind: Integration
metadata:
  name: processor-mj
  labels:
    tenant: tnt2
  namespace: camel-k
spec:
  serviceAccountName: camel-k
  dependencies: []
  configuration:
    - type: env
      value: AWS_REGION=us-west-2
    - type: env
      value: AWS_REGION2=us-west-2
    - type: env
      value: AWS_REGION3=us-west-2
    - type: env
      value: AWS_REGION4=us-west-2
    - type: env
      value: AWS_REGION5=us-west-2
    - type: env
      value: AWS_REGION6=us-west-2
    - type: env
      value: AWS_REGION7=us-west-2
    - type: env
      value: AWS_REGION8=us-west-2
  replicas: 1
  traits:
    deployment:
      configuration:
        enabled: true
    tracing:
      configuration:
        enabled: true
        endpoint: http://observability.local:14268/api/traces
        service-name: big-file-tnt2
    knative:
      configuration:
        enabled: false
    knative-service:
      configuration:
        enabled: false
        minScale: 1
        maxScale: 10
        class: kpa.autoscaling.knative.dev
    knative-eventing:
      configuration:
        enabled: false
  sources:
    - content: |
        
        beans {
        }

      name: process.groovy
status: { }

which we're applying with this command kubectl -n camel-k apply -f config.yaml

when this happens I see the integration created successfully

kubectl -n camel-k get it/processor-mj
NAME           PHASE     KIT                        REPLICAS
processor-mj   Running   kit-c1iph3cfsqtehrtd8eqg   0

and the deployment looks like this

kubectl -n camel-k describe deployment/processor-mj
Name:                   processor-mj
Namespace:              camel-k
CreationTimestamp:      Fri, 02 Apr 2021 15:04:05 +1300
Labels:                 camel.apache.org/generation=2
                        camel.apache.org/integration=processor-mj
Annotations:            deployment.kubernetes.io/revision: 53
Selector:               camel.apache.org/integration=processor-mj
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           camel.apache.org/integration=processor-mj
  Service Account:  camel-k
...
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  processor-mj-6c66666f47 (1/1 replicas created)
NewReplicaSet:   <none>
Events:
  Type    Reason             Age                  From                   Message
  ----    ------             ----                 ----                   -------
  Normal  ScalingReplicaSet  8m8s                 deployment-controller  Scaled up replica set processor-mj-74b4ffbc7c to 1
  Normal  ScalingReplicaSet  8m5s (x2 over 8m6s)  deployment-controller  Scaled up replica set processor-mj-7f9758bb7f to 1
  Normal  ScalingReplicaSet  8m5s (x2 over 8m6s)  deployment-controller  Scaled down replica set processor-mj-7f9758bb7f to 0
  Normal  ScalingReplicaSet  8m4s (x2 over 8m8s)  deployment-controller  Scaled up replica set processor-mj-85cf4dfb45 to 1
  Normal  ScalingReplicaSet  8m4s (x2 over 8m6s)  deployment-controller  Scaled down replica set processor-mj-85cf4dfb45 to 0
  Normal  ScalingReplicaSet  8m3s (x2 over 8m3s)  deployment-controller  (combined from similar events): Scaled down replica set processor-mj-784db95457 to 0
  Normal  ScalingReplicaSet  8m2s (x3 over 8m5s)  deployment-controller  Scaled up replica set processor-mj-6c66666f47 to 1
  Normal  ScalingReplicaSet  8m2s (x3 over 8m5s)  deployment-controller  Scaled down replica set processor-mj-6c66666f47 to 0
  Normal  ScalingReplicaSet  8m2s                 deployment-controller  Scaled up replica set processor-mj-784db95457 to 1
  Normal  ScalingReplicaSet  8m1s (x3 over 8m6s)  deployment-controller  Scaled up replica set processor-mj-7469ff45c7 to 1
  Normal  ScalingReplicaSet  8m1s (x3 over 8m5s)  deployment-controller  Scaled down replica set processor-mj-7469ff45c7 to 0
  Normal  ScalingReplicaSet  8m1s                 deployment-controller  Scaled down replica set processor-mj-784db95457 to 0

and then there's a ton of replica sets created

processor-mj-6c66666f47                 1         1         1       10m
processor-mj-7469ff45c7                 0         0         0       10m
processor-mj-74b4ffbc7c                 0         0         0       10m
processor-mj-784db95457                 0         0         0       10m
processor-mj-79f7b567c9                 0         0         0       10m
processor-mj-7f9758bb7f                 0         0         0       10m
processor-mj-85cf4dfb45                 0         0         0       10m
processor-mj-85ddb99755                 0         0         0       10m

Is this expected behavior? The integration file doesn't appear to let us control how scaling, I'm looking for it to create a single replicaset with a single pod or if i scale it to maintain a single replicaset with multiple pods but it seems to spam replicasets for a while before setting down.

If i look at the replicaset i see a bunch of scaling events but i'm unsure why

kubectl -n camel-k describe rs/processor-mj-7f9758bb7f

Name:           processor-mj-7f9758bb7f
Namespace:      camel-k
Selector:       camel.apache.org/integration=processor-mj,pod-template-hash=7f9758bb7f
Labels:         camel.apache.org/integration=processor-mj
                pod-template-hash=7f9758bb7f
Annotations:    deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 42
                deployment.kubernetes.io/revision-history: 6,9,25,30,36,40
Controlled By:  Deployment/processor-mj
Replicas:       0 current / 0 desired
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           camel.apache.org/integration=processor-mj
                    pod-template-hash=7f9758bb7f
  Service Account:  camel-k
...
Events:
  Type    Reason            Age   From                   Message
  ----    ------            ----  ----                   -------
  Normal  SuccessfulCreate  13m   replicaset-controller  Created pod: processor-mj-7f9758bb7f-d62vz
  Normal  SuccessfulDelete  13m   replicaset-controller  Deleted pod: processor-mj-7f9758bb7f-d62vz
  Normal  SuccessfulCreate  13m   replicaset-controller  Created pod: processor-mj-7f9758bb7f-cgqdw
  Normal  SuccessfulDelete  13m   replicaset-controller  Deleted pod: processor-mj-7f9758bb7f-cgqdw
  Normal  SuccessfulCreate  13m   replicaset-controller  Created pod: processor-mj-7f9758bb7f-vckpf
  Normal  SuccessfulDelete  13m   replicaset-controller  Deleted pod: processor-mj-7f9758bb7f-vckpf
  Normal  SuccessfulCreate  13m   replicaset-controller  Created pod: processor-mj-7f9758bb7f-cqsrc
  Normal  SuccessfulDelete  13m   replicaset-controller  Deleted pod: processor-mj-7f9758bb7f-cqsrc
  Normal  SuccessfulCreate  13m   replicaset-controller  Created pod: processor-mj-7f9758bb7f-9zttb
  Normal  SuccessfulDelete  13m   replicaset-controller  Deleted pod: processor-mj-7f9758bb7f-9zttb
  Normal  SuccessfulCreate  13m   replicaset-controller  Created pod: processor-mj-7f9758bb7f-7ssl6
  Normal  SuccessfulDelete  13m   replicaset-controller  Deleted pod: processor-mj-7f9758bb7f-7ssl6
  Normal  SuccessfulCreate  13m   replicaset-controller  Created pod: processor-mj-7f9758bb7f-8m779
  Normal  SuccessfulDelete  13m   replicaset-controller  Deleted pod: processor-mj-7f9758bb7f-8m779

any hints on how we can stop this behavior?

@mjallday
Copy link
Author

mjallday commented Apr 2, 2021

this is camel-k:1.3.1

@mjallday
Copy link
Author

mjallday commented Apr 2, 2021

The behavior goes away when we remove the configuration block which contains

  configuration:
    - type: env
      value: AWS_REGION=us-west-2

for the integration object. it creates one replicaset for each configuration entry.

is there a better way to pass environment variables?

@astefanutti
Copy link
Member

I suspect there is another controller that applies conflicting changes to the Deployment. That would explain the deployment.kubernetes.io/revision: 42 label. For each Deployment revision, a new ReplicaSet is created, and the previous one is scaled down.

To identify the other controller that can possibly change the Deployment, could you please provide its definition, by providing the output of kubectl get deployment processor-mj -o yaml?

Also, if this is possible for you, it could be interesting that you test on master branch, as we now use server-side apply to manage the Deployment (#2039).

@fshaikh-vgs
Copy link

Here is the output YAML. As mentioned by @mjallday, if we get rid of the env variables (first snippet below), then the system starts with just one replica set. 0 or 1 values in configuration lead to 1 replica set, 2 values in configuration lead to 2 replica sets, and it scales 1:1 up from there.

- type: env
  value: AWS_REGION=us-west-2

Here is the output of kubectl get deployment processor-mj -o yaml (sanitized)

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "30"
  creationTimestamp: "2021-04-01T20:35:58Z"
  generation: 33
  labels:
    camel.apache.org/generation: "1"
    camel.apache.org/integration: processor-mj
  name: processor-mj
  namespace: camel-k
  ownerReferences:
  - apiVersion: camel.apache.org/v1
    blockOwnerDeletion: true
    controller: true
    kind: Integration
    name: processor-mj
    uid: 68f161fb-6b35-47fc-ac3f-de77ed6138b1
  resourceVersion: "114176131"
  selfLink: /apis/apps/v1/namespaces/camel-k/deployments/processor-mj
  uid: 1f8102a7-a3ef-4af9-85bf-2d3a9a139988
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      camel.apache.org/integration: processor-mj
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        camel.apache.org/integration: processor-mj
    spec:
      containers:
      - args:
        - ...
        command:
        - /bin/sh
        - -c
        env:
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: SAMPLE_ENV_NAME
          value: "SAMPLE_ENV_VALUE"
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: quay.io/...
        imagePullPolicy: IfNotPresent
        name: integration
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/camel/sources/i-source-000
          name: i-source-000
          readOnly: true
        - mountPath: /etc/camel/conf
          name: application-properties
          readOnly: true
        workingDir: /deployments
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: camel-k
      serviceAccountName: camel-k
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: content
            path: process.groovy
          name: processor-mj-source-000
        name: i-source-000
      - configMap:
          defaultMode: 420
          items:
          - key: application.properties
            path: application.properties
          name: processor-mj-application-properties
        name: application-properties
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2021-04-01T20:36:07Z"
    lastUpdateTime: "2021-04-01T20:36:07Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2021-04-01T20:35:59Z"
    lastUpdateTime: "2021-04-02T09:27:01Z"
    message: ReplicaSet "processor-mj-d457745f9" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 33
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

@astefanutti
Copy link
Member

@fshaikh-vgs thanks for the details.

If I understand your use case correctly, this is the expected behaviour of the Kubernetes Deployment controller. When the Integration is updated with an extra environment variable, the Camel K operator adds it to the Deployment. Then Kubernetes triggers a rollout deployment, that creates a new ReplicaSet. This is described in details in https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment.

In order to avoid that behaviour, it is possible to pause the Deployment, to apply a sequence of updates, without creating a new ReplicaSet for each one, then resume the Deployment. This is documented in https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#pausing-and-resuming-a-deployment. That being said, I'm not sure it'll work for the Deployment created and managed by the Camel K operator, and owned by the Integration.

@viacheslav-fomin-main
Copy link

@astefanutti thanks for the quick response! but it is not updated with an extra environment variable, all the variables are provided at once in one file.. why can't camel K apply it once?

@astefanutti
Copy link
Member

Ah I misunderstood it. Thanks for the clarification, now I think I got this right.

So I think it's fixed with #2039, that introduces server-side apply to manage the Integration Deployment.

The underlying issue lies in the way we process environment variables as a map[string]string, and Go explicitly randomises keys order, so that ultimately leads to Deployment revisions, hence ReplicaSets, for all possible combinations. Server-side apply is clever enough to handle the ordering differences, as it uses the environment variable name as merge key.

Even if that is fixed with #2039, as a safe measure, I think it's still important that we avoid processing these environment variables as map[string]string and the resulting randomisation.

@astefanutti
Copy link
Member

It should be fixed with #2039 and #2196. Thanks a lot for the report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants