-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Azure Event Hub scaling #5125
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Thank you for your contribution! 🙏 We will review your PR as soon as possible. While you are waiting, make sure to:
Learn more about: |
troydn
force-pushed
the
fix-azure-eh-scaling
branch
from
October 24, 2023 20:09
499ab66
to
e17f322
Compare
@tomkerkhove , could you pull a revision from any folk expert in event hub? |
@tomkerkhove FYI |
tomkerkhove
approved these changes
Jan 2, 2024
I think is ready to merge. Could you solve merge conflicts please @troydn ? 🙏 |
Signed-off-by: Troy <[email protected]>
Signed-off-by: Troy <[email protected]>
Signed-off-by: Troy <[email protected]>
troydn
force-pushed
the
fix-azure-eh-scaling
branch
from
January 3, 2024 22:27
e17f322
to
ff0d4ee
Compare
/run-e2e azure |
toniiiik
pushed a commit
to toniiiik/keda
that referenced
this pull request
Jan 15, 2024
* Update Azure EventHub scaling Signed-off-by: Troy <[email protected]> * Update changelog Signed-off-by: Troy <[email protected]> * Remove unused context Signed-off-by: Troy <[email protected]> --------- Signed-off-by: Troy <[email protected]> Signed-off-by: anton.lysina <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR improves the Event Hub unprocessed event count calculation to prevent negative and large values.
There are two issues in the current implementation:
Stale partition runtime information results in very large values:
The scaling calculation uses the SequenceNumber from the partition runtime information and the checkpoint store.
Partition information could be stale compared to the checkpoint, for example when the partition runtime information is 10 and the checkpoint is 15. This should result in 0 and not in 9223372036854775802.
When the sum of all unprocessed events per partition is greater than the Int64 max value the result is negative:
When there are a lot of unprocessed events (mostly caused by the first issue), the sum of all unprocessed events per partition is greater than the Int64 max value, resulting in a negative value.
Using the first example: Partition 1 (9223372036854775802) + Partition 2 (100) = -9223372036854775714
In this case the result should be the Int64 max value, which will result in the max value for
lagRelatedToPartitionCount
:(partitionCount * threshold)
.To solve the first issue I introduced a new parameter
stalePartitionInfoThreshold
to configure the stale partition information threshold.This configures the range to decide if the partition information is stale or if the Event Hub went almost through the full circular buffer.
Using the first example to explain the new implementation:
stalePartitionInfoThreshold
: 10000Visualization:
Checklist
Fixes #4250