Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugins/logs: Fix broken retry logic #4493

Merged

Conversation

ashutosh-narkar
Copy link
Member

We were incorrectly resetting the retry counter after
every error condition instead of using the incremented
value. As a result, retry delay would always be 0s.
This meant that if OPA encountered an error while
uploading decision logs it would immediately retry
instead of doing an exponential backoff.

Fixes: #4486

Signed-off-by: Ashutosh Narkar [email protected]

@ashutosh-narkar
Copy link
Member Author

Post fix: We can see the delay increasing after each unsuccessful attempt to upload decision logs

{"headers":{"Content-Encoding":["gzip"],"Content-Type":["application/json"],"User-Agent":["Open Policy Agent/0.39.0-dev (darwin, amd64)"]},"level":"debug","method":"POST","msg":"Sending request.","time":"2022-03-25T12:26:16-07:00","url":"http://localhost:8001/logs"}
{"level":"error","msg":"log upload failed: Post \"http://localhost:8001/logs\": dial tcp [::1]:8001: connect: connection refused.","plugin":"decision_logs","time":"2022-03-25T12:26:16-07:00"}
{"level":"debug","msg":"Waiting 0s before next upload/retry.","plugin":"decision_logs","time":"2022-03-25T12:26:16-07:00"}
{"headers":{"Content-Encoding":["gzip"],"Content-Type":["application/json"],"User-Agent":["Open Policy Agent/0.39.0-dev (darwin, amd64)"]},"level":"debug","method":"POST","msg":"Sending request.","time":"2022-03-25T12:26:16-07:00","url":"http://localhost:8001/logs"}
{"level":"error","msg":"log upload failed: Post \"http://localhost:8001/logs\": dial tcp [::1]:8001: connect: connection refused.","plugin":"decision_logs","time":"2022-03-25T12:26:16-07:00"}
{"level":"debug","msg":"Waiting 185.461446ms before next upload/retry.","plugin":"decision_logs","time":"2022-03-25T12:26:16-07:00"}
{"headers":{"Content-Encoding":["gzip"],"Content-Type":["application/json"],"User-Agent":["Open Policy Agent/0.39.0-dev (darwin, amd64)"]},"level":"debug","method":"POST","msg":"Sending request.","time":"2022-03-25T12:26:17-07:00","url":"http://localhost:8001/logs"}
{"level":"error","msg":"log upload failed: Post \"http://localhost:8001/logs\": dial tcp [::1]:8001: connect: connection refused.","plugin":"decision_logs","time":"2022-03-25T12:26:17-07:00"}
{"level":"debug","msg":"Waiting 224.251515ms before next upload/retry.","plugin":"decision_logs","time":"2022-03-25T12:26:17-07:00"}

@anderseknert
Copy link
Member

The fix looks reasonable to me, but don't we have any tests on this? We should have one if not.

srenatus
srenatus previously approved these changes Mar 25, 2022
We were incorrectly resetting the retry counter after
every error condition instead of using the incremented
value. As a result, retry delay would always be 0s.
This meant that if OPA encountered an error while
uploading decision logs it would immediately retry
instead of doing an exponential backoff.

Fixes: open-policy-agent#4486

Signed-off-by: Ashutosh Narkar <[email protected]>
@ashutosh-narkar
Copy link
Member Author

The fix looks reasonable to me, but don't we have any tests on this? We should have one if not.

I did an end-to-end test with the change and posted the results here. Not sure what a unit test would be here unless we move private counter on the plugin struct and check for it's value which does not seem necessary here.

@ashutosh-narkar ashutosh-narkar merged commit c08b81d into open-policy-agent:main Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS CA too many open files - decisionlog upload
3 participants