-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix/sanitize exponential retry defaults #30286
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dmitryax
approved these changes
Jan 4, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Pinging code owners @Aneurysm9 @rapphil
rapphil
approved these changes
Jan 5, 2024
fatsheep9146
approved these changes
Jan 5, 2024
Unrelated to this PR, but we need to revisit the timeout settings for this component since differently from other components based on the exporter helper, this component implements retries that are run in the context of the deadline. |
cparkins
pushed a commit
to AmadeusITGroup/opentelemetry-collector-contrib
that referenced
this pull request
Jan 10, 2024
…lts (open-telemetry#30286) **Description:** This PR sanitizes the default retry settings by using default constructor and setting the single `InitialInterval`, instead of user initializing entire structure manually. The current default settings do not become exponential due to being upper bound at 200ms by `MaxInterval: 200 * time.Millisecond,`. This effectively caps the wait intervals between requests at a very low interval, thus becomes more of a linear retry than exponential. And due to the distributed nature of the Otel Collector, coupled with each metric export being handled by go-routine, this caused a massive churn of requests when remote endpoint is unavailable. *Before/Current* ``` first backoff 673.174543ms elapsed 83ns next backoff 25.476545ms elapsed 44.95825ms next backoff 134.374813ms elapsed 217.756375ms next backoff 120.396777ms elapsed 475.136125ms next backoff 130.842218ms elapsed 607.402958ms next backoff 170.769874ms elapsed 836.674125ms next backoff 201.503411ms elapsed 938.644667ms next backoff 166.184906ms elapsed 1.186788375s next backoff 143.509999ms elapsed 1.296595125s next backoff 190.37078ms . . . elapsed 59.34433225s next backoff 298.715693ms elapsed 59.533727458s next backoff 136.844597ms total requests made: 299 ``` *After this change* ``` first backoff 343.109367ms elapsed 166ns next backoff 55.833932ms elapsed 74.545375ms next backoff 151.276125ms elapsed 192.138833ms next backoff 364.656003ms elapsed 744.091666ms next backoff 564.927763ms elapsed 1.75722475s next backoff 1.248209052s elapsed 4.077209583s next backoff 3.374196626s elapsed 8.769952708s next backoff 8.449184533s elapsed 23.303548708s next backoff 7.341400254s elapsed 43.388838083s next backoff 22.327511322s elapsed 1m14.147837833s next backoff 43.127363668s elapsed 1m55.730420666s next backoff 34.674349715s elapsed 2m16.703792375s next backoff 40.140627107s elapsed 2m31.856713458s next backoff 27.45308936s elapsed 2m49.376459416s next backoff 29.853555518s elapsed 3m14.625332666s next backoff 17.209739491s elapsed 3m37.914321833s next backoff 15.614405319s elapsed 4m14.614882458s next backoff 35.458886397s elapsed 4m33.424847s next backoff -1ns total requests made: 18 ``` **Link to tracking Issue:** No issues opened from my side. **Testing:** Unit testes already in place. **Documentation:** N/A
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This PR sanitizes the default retry settings by using default constructor and setting the single
InitialInterval
, instead of user initializing entire structure manually.The current default settings do not become exponential due to being upper bound at 200ms by
MaxInterval: 200 * time.Millisecond,
. This effectively caps the wait intervals between requests at a very low interval, thus becomes more of a linear retry than exponential. And due to the distributed nature of the Otel Collector, coupled with each metric export being handled by go-routine, this caused a massive churn of requests when remote endpoint is unavailable.Before/Current
After this change
Link to tracking Issue:
No issues opened from my side.
Testing:
Unit testes already in place.
Documentation:
N/A