Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new scaling logic with fix orphan pod issue #1214

Merged
merged 3 commits into from
Oct 5, 2020

Conversation

TsuyoshiUshio
Copy link
Contributor

@TsuyoshiUshio TsuyoshiUshio commented Oct 3, 2020

I introduce a change for the scaling logic for scaled job.
It is going to solve these issues. I'd like to share this PR for reviewing it first.

Old Logic

The number of newly created jobs are queueLength - runningCount.

New Logic

The number of newly created jobs are

    if (queueLength + runningJobCount> scaledJob.MaxReplicaCount() {
        effectiveMaxScale = scaledJob.MaxReplicaCount() - runningJobCount
    } else {
        effectiveMaxScale = queueLength
    }

Limitation

ServiceBusScaler using *queueEntity.CountDetails.ActiveMessageCount to fetch the ActiveMessageCount, However, this is not right as a queue length. The value includes the message that is locked. That means, If you receive a queue, and not complete the message, it is locked, and other client can't consume it. However, ActiveMessageCount includes the locked message. I tried other way to fetch the ActiveMessageCount - LockedMessageCount, however, I couldn't find the way to do it until now.

What I did

  • Introduce new Scaled Job Logic
  • Fix the orphan pod issue

Checklist

Fixes #
#1207 (comment)
#1186
#1211

Signed-off-by: Tsuyoshi Ushio <[email protected]>
Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@dron-alterpost
Copy link

Thanks for fix! When it will be in Helm chart 2.0-RC?

silenceper pushed a commit to silenceper/keda that referenced this pull request Oct 10, 2020
* Introduce new scaling logic with fix orphan pod issue

Signed-off-by: Tsuyoshi Ushio <[email protected]>

* update yamls

Signed-off-by: Tsuyoshi Ushio <[email protected]>

* Remove to fit the coding style

Signed-off-by: Tsuyoshi Ushio <[email protected]>
@MoKassem
Copy link

MoKassem commented Dec 14, 2022

@TsuyoshiUshio
What scalingStrategy i should use to get this behaviour?! I'm using "accurate" , and still facing the same issue when a job is currently running, and a new message received in the queue, it doesn't scale up a new job.

Actually, i tried all scaling profiles, and still can't get to achieve that when a long executing job is running and a new RabbitMQ is received a new job get created

spec:
  jobTargetRef:
    parallelism: 1                            
    completions: 1
    activeDeadlineSeconds: 21600
    template:
      spec:
        tolerations:
        - key: "node_pool"
          operator: "Equal"
          value: "routing_small"
          effect: "NoSchedule"
        containers:
        - name: axl-routing-sm
          image: us-west1-docker.pkg.dev/axlehire-prod/axl-dcr/axl-routing-controller:0.5.52-kubernetes
          imagePullPolicy: Always
          resources:
            requests:
              memory: "4Gi"
              cpu: 8
            limits:
              cpu: 16
              memory: "8Gi"
        restartPolicy: Never
    backoffLimit: 0  
  pollingInterval: 5                    # Optional. Default: 30 seconds
  minReplicaCount: 0
  maxReplicaCount: 100                  # Optional. Default: 100
  successfulJobsHistoryLimit: 100       # Optional. Default: 100. How many completed jobs should be kept.
  failedJobsHistoryLimit: 100           # Optional. Default: 100. How many failed jobs should be kept.
  scalingStrategy:
    strategy: "accurate"                # Optional. Default: default. Which Scaling Strategy to use. 
    pendingPodConditions:               # Optional. A parameter to calculate pending job count per the specified pod conditions
    - "Pending"
    - "ContainerCreating"
  triggers:
  - type: rabbitmq
    metadata:
      protocol: amqp
      queueName: routing-kubernetes
      mode: QueueLength
      value: "1"
    authenticationRef:
      name: keda-trigger-auth-axl-rabbitmq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants