Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eager scaling strategy for ScaledJob does not work as documented (or intended?) #6416

Open
chinery opened this issue Dec 11, 2024 · 9 comments · May be fixed by #6419
Open

eager scaling strategy for ScaledJob does not work as documented (or intended?) #6416

chinery opened this issue Dec 11, 2024 · 9 comments · May be fixed by #6419
Labels
bug Something isn't working

Comments

@chinery
Copy link

chinery commented Dec 11, 2024

Report

This form prompts me to be clear and concise, and I will try to be very clear but fear that will not be very concise (apologies)

I was trying to understand the difference between the default and eager scaling strategies of ScaledJob (see https://keda.sh/docs/2.16/reference/scaledjob-spec/#scalingstrategy)

In short

  • I do not think the documentation on the eager strategy is correct, either about the behaviour of default or the behaviour of eager
  • I think the implementation of eager may be bugged, but it's hard to tell what the intention is since I believe it is already given by default

The documented behaviour

  • the ScaledJob spec has this phrase “The number of the scale” – the number of jobs that will be created on a given poll
  • a key point is that the scaling behaviour differs from a ScaledObject: where for example in a Deployment, if you have 5 items on the queue (in progress), then you need to set the number of replicas to 5 (setting it to less would shut down running pods). But jobs are not managed after creation, so if there are 5 jobs running and 5 jobs on the queue, then (normally) the correct number of jobs to create is zero (except for when jobs are consumed from the queue, in which case the accurate strategy is required)
  • when scaling strategy is set to default, this is calculated as maxScale - runningJobCount,
    where maxScale = min(scaledJob.MaxReplicaCount(), divideWithCeil(queueLength, targetAverageValue))
  • the section about the eager scaling strategy does not exactly explain how it differs, only that it makes up for an issue you might find with default. there is an example listed, where the maximum replicas is 10, the target average value is 1, and there is the following sequence: submit 3 jobs, poll, submit another 3 jobs, poll, and gives this table

With the default scaling strategy, we are supposed to see the metrics changes in the following table:

initial incoming 3 messages after poll incoming 3 messages after poll
queueLength 0 3 3 6 6
runningJobs 0 0 3 3 3
  • the final column, to my understanding, is incorrect. After the second poll, using the formulas above:
    maxScale = min(10, ceil(6 / 1)) = 6
    so "the number of the scale" = 3
    so 3 new jobs will be created, meaning the total of running jobs is now 6
    which is working as intended.
  • The second table in that section goes on to show that it is actually the eager strategy which has 6 running jobs after the poll – I'll come to what eager actually does in a later section but I believe this is incorrect also.

The intended behaviour

The documentation also suggests reading the initial suggestion here: #5114

I don't want to offend or misconstrue anyone here, so please don't take any of this as criticism, just trying to untangle the web – please correct me if I've misunderstood anything.

It seems to me that @junekhan may have confused the behaviour of "the number to scale", and thought that it would scale like a Deployment (where in the example above, a scale of 3 would mean only 3 running jobs after poll, instead of 3 new jobs). My evidence is this comment:

junekhan commented on Oct 27, 2023
Getting back to ScaledJob, let's imagine a case with 3 running pods and another 3 messages standing in line, and each of them takes 3 hours or even longer to run. Does it sound better if we empty the queue and run 6 pods in parallel within our affordable limit which is 10 replicas?

But this is the behaviour of default. @zroubalik replies and says this behaviour should be added. The pull request is later made by @junekhan and documentation added by @zroubalik.

It's possible that some miscommunication happened here, so I also wanted to work out what the eager strategy does, in case I misunderstood the intention, and it is simply the documentation that needs updating.

The actual behaviour

Here I will try to narrate a sequence of logic through the code that explains how the two strategies work. I hope you can follow it – I have tried to just include the relevant detail with function names, parameter names, return value names, code/pseudocode behaviour, and some commentary (in italics). The function names link to the code with line numbers. I will also include the example values from earlier.

  • in checkScalers
    isActive, isError, scaleTo, maxScale := h.isScaledJobActive(ctx, obj)
    • in isScaledJobActive
      isActive, queueLength, maxValue, maxFloatValue := scaledjob.IsScaledJobActive(scalersMetrics, scaledJob.Spec.ScalingStrategy.MultipleScalersCalculation, scaledJob.MinReplicaCount(), scaledJob.MaxReplicaCount())
      • in IsScaledJobActive
        sum/max/avg over each metric:
            queueLength = metric.QueueLength
            maxValue = metric.MaxValue
        
        (but where do metrics get their values for MaxValue/QueueLength, an aside:)
        - in CalculateQueueLengthAndMaxValue
        for each metric:
            queueLength += metricValue
        targetAverageValue = getTargetAverageValue(metricSpecs)
        averageLength := queueLength / targetAverageValue
        maxValue = min(averageLength, maxReplicaCount)
        
        (getTargetAverageValue gets the target value from the trigger, so for our example targetAverageValue=1, queueLength=6, maxReplicaCount=10, and so maxValue=6. worth noting that queueLength does not divide by targetAverageValue, it is the raw length)
      • (back inside IsScaledJobActive)
        maxValue = min(maxValue, maxReplicaCount)
        return isActive, ceilToInt64(queueLength), ceilToInt64(maxValue), maxValue
    • (so IsScaledJobActive returns queueLength=6 and maxValue=6)
    • (and isScaledJobActive returns them in this order: isActive, isError, queueLength, maxValue)
  • (checkScalers assigns these to isActive, isError, scaleTo, maxScale, so scaleTo=queueLength=6, maxScale=maxValue=6)
    h.scaleExecutor.RequestJobScale(ctx, obj, isActive, isError, scaleTo, maxScale)
    • RequestJobScale
      effectiveMaxScale, scaleTo := e.getScalingDecision(scaledJob, runningJobCount, scaleTo, maxScale, pendingJobCount, logger)
      • getScalingDecision
        (this is where it forks based on scaling strategy)
        effectiveMaxScale, scaleTo = NewScalingStrategy(logger, scaledJob).GetEffectiveMaxScale(maxScale, runningJobCount-minReplicaCount, pendingJobCount, scaledJob.MaxReplicaCount(), scaleTo)
        and the definition of GetEffectiveMaxScale: GetEffectiveMaxScale(maxScale, runningJobCount, pendingJobCount, maxReplicaCount, scaleTo int64) (int64, int64)
        (example: maxScale=6, runningJobCount=3, minReplicaCount=0, pendingJobCount=0, scaledJob.MaxReplicaCount()=10, scaleTo=6)
        • default
          return maxScale - runningJobCount, scaleTo
          (so this returns (3, 6))
        • eager
          return min(maxReplicaCount-runningJobCount-pendingJobCount, maxScale), maxReplicaCount
          (so this returns (min(7, 6), 10)=(6, 10))
      • return effectiveMaxScale, scaleTo
    • (finally RequestJobScale calls e.createJobs)
    • e.createJobs(ctx, logger, scaledJob, scaleTo, effectiveMaxScale)
      with signature: createJobs(..., scaleTo int64, maxScale int64) (so effectiveMaxScale is now maxScale)
      • and this does:
      if maxScale <= 0: return
      if scaleTo > maxScale: scaleTo = maxScale
      generate scaleTo jobs
      
      so for our example values,
      - default: maxScale = 3, scaleTo = 6, so this generates 3 jobs
      - eager: maxScale = 6, scaleTo = 10, so this generates 6 jobs

After the second poll in our example, the eager strategy will have 9 jobs. On the third poll, assuming no new jobs, it will create 1 more job and hit the maximum, since that is maxReplicaCount-runningJobCount.

I am not sure what scaleTo is doing in this calculation. It is set to the queue length, unmodified by the targetAverageValue, maxReplicas, or runningJobs. I can't immediately see any scenario where scaleTo < maxScale, meaning that it will always just use the value of maxScale for the number of jobs to create.

Regardless my conclusion for the behaviour of the eager strategy is that it does as @JorTurFer asked in the discussion, which is that it scales up until it hits the maximum whenever the queue is non zero. But the rate of scaling depends on the number of items in the queue. I'm still not sure if this is the intended behaviour – I think this could be achieved more efficiently with a scale strategy like
if maxScale > 0 return maxReplicaCount else 0
and there wouldn't be a slow ramp up, but perhaps that is desirable.

Expected Behavior

Expected default to have 3 running jobs, and eager to have 6 running jobs

Actual Behavior

default has 6 running jobs, eager has 9 running jobs

Steps to Reproduce the Problem

See above

Logs from KEDA operator

No response

KEDA Version

2.16.0

Kubernetes Version

None

Platform

None

Scaler Details

No response

Anything else?

No response

@chinery chinery added the bug Something isn't working label Dec 11, 2024
@junekhan
Copy link
Contributor

@chinery Thanks for your feedback and your findings! It's tough to recall the tricky computation and follow up your analysis. I'm not sure if the default has changed in the latest version or not, but it was an obstacle to me in that particular version 2.14.

Do you mind checking these test cases:

If they don't have 100% coverage, could you provide the missing cases to prove your point?

@chinery
Copy link
Author

chinery commented Dec 11, 2024

Hi @junekhan

I'm not sure if the default has changed in the latest version or not, but it was an obstacle to me in that particular version 2.14.

I don't see any changes to the default since 2.14, other than the introduction of scaleTo which is ignored anyway because is it always greater than or equal to maxScale

Do you mind checking these test cases:
https://github.com/kedacore/keda/pull/5872/files#diff-7fa05d55ebaeabf8480957f1657206cb11cc8eeca2bf98019ace9c6230bfad8cR92

The changes for TestDefaultScalingStrategy, TestCustomScalingStrategy, and TestAccurateScalingStrategy simply add a value for the scaleTo input, and ignore the scaleTo output with a helper function.

https://github.com/kedacore/keda/pull/5872/files#diff-7fa05d55ebaeabf8480957f1657206cb11cc8eeca2bf98019ace9c6230bfad8cR145

For TestEagerScalingStrategy, it obviously passes, but I don't know what the intention is behind the design of the asserts. Here is the first example (I've included the signature for GetEffectiveMaxScale)

GetEffectiveMaxScale(maxScale, runningJobCount, pendingJobCount, maxReplicaCount, scaleTo int64) (int64, int64)

maxScale, scaleTo := strategy.GetEffectiveMaxScale(4, 3, 0, 10, 1)
assert.Equal(t, int64(4), maxScale)
assert.Equal(t, int64(10), scaleTo)

The inputs are:
maxScale=4, runningJobCount=3, pendingJobCount=0, maxReplicaCount=10, scaleTo=1

The first return value, maxScale, is the one that determines how many jobs to create – before this PR, there was only one return value.

There are 4 jobs on the queue, and 3 already running. So to maximise the number of jobs, the result should be to create 1 new job. So I think the test should say

assert.Equal(t, int64(1), maxScale)

However, the assert.Equals says maxScale it should be equal to 4. But perhaps I misunderstand the purpose of eager, and you can clarify.

The second assert states that scaleTo should be equal to 10. But the value passed in for scaleTo is 1. I'm not sure what the test should be as I don't understand the purpose of scaleTo as an input or output, but in practice since scaleTo > maxScale, it will be ignored anyway on line 131 of scale_jobs.go.

https://github.com/kedacore/keda/pull/5872/files#diff-5ec504eb045c4a11082657ffab9f2666a3787bf09df8798bdfff416d75fec6fcR123

For the e2e test, the behaviour in the test seems to mimic the description in the documentation, in other words, the desired behaviour of eager appears to be to scale to the number of jobs, not to overshoot

RMQPublishMessages(t, rmqNamespace, connectionString, queueName, 4)
assert.True(t, WaitForScaledJobCount(t, kc, scaledJobName, testNamespace, 4, iterationCount, 1),
"job count should be %d after %d iterations", 4, iterationCount)

RMQPublishMessages(t, rmqNamespace, connectionString, queueName, 4)
assert.True(t, WaitForScaledJobCount(t, kc, scaledJobName, testNamespace, 8, iterationCount, 1),
"job count should be %d after %d iterations", 8, iterationCount)

RMQPublishMessages(t, rmqNamespace, connectionString, queueName, 4)
assert.True(t, WaitForScaledJobCount(t, kc, scaledJobName, testNamespace, 10, iterationCount, 1),
"job count should be %d after %d iterations", 10, iterationCount)

i.e., push 4 messages, scale to 4, push 4 messages, scale to 8, push 4 messages, scale to 10

I am not a KEDA (or Go) developer but this seems to be a limitation of WaitForScaledJobCount which returns true as soon as the target is met, it does not test that the scaling stops. And with 1 second intervals between test iterations, but a 5 second poll on the rmq trigger, then it makes sense that the tests will pass.

Please try either of the following examples which I think illustrate my point

Test 1: only push 4 messages

func testEagerScaling1(t *testing.T, kc *kubernetes.Clientset) {
	iterationCount := 20
	RMQPublishMessages(t, rmqNamespace, connectionString, queueName, 4)
	assert.True(t, WaitForScaledJobCount(t, kc, scaledJobName, testNamespace, 4, iterationCount, 1),
	"job count should be %d after %d iterations", 4, iterationCount)
	
	assert.False(t, WaitForScaledJobCount(t, kc, scaledJobName, testNamespace, 8, iterationCount, 1),
	"job count should still be 4 after %d iterations", iterationCount)
	
	assert.False(t, WaitForScaledJobCount(t, kc, scaledJobName, testNamespace, 10, iterationCount, 1),
	"job count should still be 4 after %d iterations", 10, iterationCount)
}

or Test 2: using WaitForJobCountUntilIteration which does not return early

func testEagerScaling2(t *testing.T, kc *kubernetes.Clientset) {
	iterationCount := 20
	RMQPublishMessages(t, rmqNamespace, connectionString, queueName, 4)
	assert.True(t, WaitForJobCountUntilIteration(t, kc, testNamespace, 4, iterationCount, 1),
	"job count should be %d after %d iterations", 4, iterationCount)
	
	RMQPublishMessages(t, rmqNamespace, connectionString, queueName, 4)
	assert.True(t, WaitForJobCountUntilIteration(t, kc, testNamespace, 8, iterationCount, 1),
	"job count should be %d after %d iterations", 8, iterationCount)
	
	RMQPublishMessages(t, rmqNamespace, connectionString, queueName, 4)
	assert.True(t, WaitForJobCountUntilIteration(t, kc, testNamespace, 10, iterationCount, 1),
	"job count should be %d after %d iterations", 10, iterationCount)
}

@junekhan
Copy link
Contributor

Thank you @chinery for your input!

There are 4 jobs on the queue, and 3 already running. So to maximise the number of jobs, the result should be to create 1 new job. So I think the test should say

assert.Equal(t, int64(1), maxScale)

I think assert.Equal(t, int64(4), maxScale) is correct tho since eager gets maxScale as below:

 func (s eagerScalingStrategy) GetEffectiveMaxScale(maxScale, runningJobCount, pendingJobCount, maxReplicaCount, _ int64) (int64, int64) {
	return min(maxReplicaCount-runningJobCount-pendingJobCount, maxScale), maxReplicaCount
}

With the input maxScale=4, runningJobCount=3, pendingJobCount=0, maxReplicaCount=10, scaleTo=1, the above statement translates to return min(10 - 3 - 0, 4), 10


The second assert states that scaleTo should be equal to 10. But the value passed in for scaleTo is 1. I'm not sure what the test should be as I don't understand the purpose of scaleTo as an input or output, but in practice since scaleTo > maxScale, it will be ignored anyway on line 131 of scale_jobs.go.

That's true. I intended to return scaleTo with a value that is greater than maxScale to guarantee that maxScale's value is eventually adopted with respect to the protocol inside func (e *scaleExecutor) createJobs


I am not a KEDA (or Go) developer but this seems to be a limitation of WaitForScaledJobCount which returns true as soon as the target is met, it does not test that the scaling stops. And with 1 second intervals between test iterations, but a 5 second poll on the rmq trigger, then it makes sense that the tests will pass.

This point is valuable. I will improve the test case with your proposal Test 1: only push 4 messages


But perhaps I misunderstand the purpose of eager, and you can clarify.

I literally want to have the scaling style I described below, which is NOT fulfilled by default:

junekhan commented #5114 (comment)
Getting back to ScaledJob, let's imagine a case with 3 running pods and another 3 messages standing in line, and each of them takes 3 hours or even longer to run. Does it sound better if we empty the queue and run 6 pods in parallel within our affordable limit which is 10 replicas?

@chinery
Copy link
Author

chinery commented Dec 12, 2024

I think assert.Equal(t, int64(4), maxScale) is correct tho since eager gets maxScale as below:

 func (s eagerScalingStrategy) GetEffectiveMaxScale(maxScale, runningJobCount, pendingJobCount, maxReplicaCount, _ int64) (int64, int64) {
	return min(maxReplicaCount-runningJobCount-pendingJobCount, maxScale), maxReplicaCount
}

You are describing the output of the code, so the test is guaranteed to pass, but I am asking why the value should be 4. Tests should be written separately from the implementation.

I literally want to have the scaling style I described below, which is NOT fulfilled by default:

junekhan commented #5114 (comment)
Getting back to ScaledJob, let's imagine a case with 3 running pods and another 3 messages standing in line, and each of them takes 3 hours or even longer to run. Does it sound better if we empty the queue and run 6 pods in parallel within our affordable limit which is 10 replicas?

I've changed the e2e test to use the default strategy and changed the numbers to match your comment exactly, as well as using WaitForJobCountUntilIteration so the test checks the job doesn't over-scale. This test fails with the eager strategy but passes with default. Please try running this file, I assure you it does what you are describing.

Perhaps when you tried default, you were consuming (acking) messages off the RabbitMQ queue immediately on receipt rather than when the job was finished? (In which case, accurate scaling would have helped you out.)

@junekhan
Copy link
Contributor

You are describing the output of the code, so the test is guaranteed to pass, but I am asking why the value should be 4. Tests should be written separately from the implementation.

Because 4 tasks are waiting, and I want to run them eagerly without exceeding the maxReplicaCount.

Perhaps when you tried default, you were consuming (acking) messages off the RabbitMQ queue immediately on receipt rather than when the job was finished? (In which case, accurate scaling would have helped you out.)

This has nothing to do with the trigger. This guy experienced it in SQS as well #5881

(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 1 job/pod and processes it. Let’s say the consumer places another message in the queue while 1st job is still running then Keda will not trigger another job until it completes 1st job. So we would expect Keda to process subsequent jobs even if existing jobs are in progress.

@chinery
Copy link
Author

chinery commented Dec 12, 2024

The trigger is absolutely relevant – whether or not the queueLength includes running/locked jobs is vital. I have not tested SQS but it's possible this other user also should have been using accurate.

Please could you run the test file I linked in my last message which confirms that the default strategy scales exactly as you wish with RabbitMQ. After 3 messages, 3 jobs are running. After another 3 messages, 6 jobs are running. Please let me know if you get different results or you disagree with the design of the test for some other reason.

Maybe it would be good to hear from some other developers too, e.g. it seems @TsuyoshiUshio wrote the accurate/custom scalers.

@junekhan junekhan linked a pull request Dec 12, 2024 that will close this issue
7 tasks
@chinery
Copy link
Author

chinery commented Dec 12, 2024

Repeating from my comment on the PR junekhan just created

I have uploaded the output of the default test case here (pass): https://gist.github.com/chinery/5845b73a1f5504e7e46d53a9eeec1cb0?permalink_comment_id=5334111#gistcomment-5334111

and the two eager test cases I posted above (fail): https://gist.github.com/chinery/b6caf218b0924e97cb36a49a970fbea1

@junekhan
Copy link
Contributor

Perhaps when you tried default, you were consuming (acking) messages off the RabbitMQ queue immediately on receipt rather than when the job was finished? (In which case, accurate scaling would have helped you out.)

It's definitely not the case. If you have pondered a step further, you probably wouldn't have such speculation. As I pointed out my case as each takes 3 hours or even longer to run. Acking on receiving has an immediate impact on the system that is not ignorable.

That's why I am certain that it's much simpler for people to follow the test case and the outcome than to read lengthy passages and reason in mind

@chinery
Copy link
Author

chinery commented Dec 12, 2024

I appreciate my messages are verbose, and I'm sorry if there's a language barrier that means I don't fully follow your meaning (I am still not sure whether you ack on receive or not, either can work for 3hr+ jobs, but that was a side point).

Sometimes following lengthy passages is necessary to understand – a test alone cannot do anything without rationale because the asserts of the test must match the intended outcome (not just the implemented one).

Regardless, I've provided a test which I believe shows that default implements the behaviour that eager is documented as implementing, and shown the pass output in a test environment, so I am happy for that to be the basis of the discussion going forward. If you think the test is wrong or flawed somehow please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To Triage
Development

Successfully merging a pull request may close this issue.

2 participants