Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Kafka Scaler: Implementation for Excluding Persistent Lag #3965

Merged
merged 2 commits into from
Dec 8, 2022

Conversation

josephangbc
Copy link
Contributor

Summary

Add implementation for excluding consumer lag from partitions with persistent lag.

Use Case

In situations where consumer is unable to process / consume from partition due to errors etc., committed offset will not change, and consumer lag on that partition will be increasing and never be decreased. KEDA trigger scaling towards the maxReplicaCount.

If partition lag is deemed as persistent, excluding its consumer lag will allow KEDA to trigger scaling appropriately based on the consumer lag observed on other topics and partition, and not be affected by this consumer lag which will not be resolved by scaling.

Logic

Upon each polling cycle, check if current consumer offset is same as previous consumer offset.

Different: return endOffset - consumerOffset (No different from current implementation)
Same: return 0 (To exclude this partition's consumer lag from the total lag)

Checklist

  • Commits are signed with Developer Certificate of Origin (DCO - learn more)
  • Tests have been added
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Changelog has been updated and is aligned with our changelog requirements

Relates to #3904
Relates to kedacore/keda-docs#984

@josephangbc josephangbc requested a review from a team as a code owner December 6, 2022 19:14
@zroubalik
Copy link
Member

zroubalik commented Dec 7, 2022

/run-e2e kafka*
Update: You can check the progress here

@zroubalik
Copy link
Member

@JosephABC the e2e test failed :(

@josephangbc
Copy link
Contributor Author

josephangbc commented Dec 7, 2022

I think the kafka e2e tests was taking longer than 20 mins and timedout.

I changed the AssertReplicaCountNotChangeDuringTimePeriod from 3 mins to 1 min. The testPersistentLag test passed in 150 sec on local cluster

Can we increase the timeout for the go test in run-all.sh?

@josephangbc
Copy link
Contributor Author

Made changes to kafka test for persistent lag to reduce testing duration. Total Testing duration for kafka_test.go was 1109.934s on local cluster. Hope this helps to get the e2e test to pass

@zroubalik
Copy link
Member

zroubalik commented Dec 7, 2022

/run-e2e kafka*
Update: You can check the progress here

@zroubalik
Copy link
Member

@JosephABC thanks I started the e2e tests, could you please rebase this PR and merge changes? We merged another PR in the meantime that caused conflicts, should be simple though.

@zroubalik
Copy link
Member

@JosephABC I can see that you have again a lot of commits here, that's why the DCO fails. could you fix that?

@josephangbc josephangbc reopened this Dec 8, 2022
@zroubalik
Copy link
Member

zroubalik commented Dec 8, 2022

/run-e2e kafka*
Update: You can check the progress here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants