Report uploads now use timeouts and retriees with backoff #34

quintesse · 2024-07-31T09:27:18Z

Fixes mwtele 225 and 226

NB: This needs the very latest version of the client to work (at least 2.0.3-SNAPSHOT) so this code won't work properly until either 2.0.3 has been released or you manually build the latest client and make the agent use it.

Fixes MWTELE-226

Fixes MWTELE-225

kittylyst

I want QE to weigh in on this one before I approve - does it actually address the issue?

As it stands, 413 and 415 (which are client-side more-or-less-unrecoverable errors) are treated the same way as 503 (the code which was the original point of the issue).

FWIW, the spec says this about 413:

"If the condition is temporary, the server SHOULD generate a Retry-After header field to indicate that it is temporary and after what time the client MAY try again."

We emphatically do not check that header, even assuming it's created (and propagated back to us)

For 503:

"The server MAY send a Retry-After header field to suggest an appropriate amount of time for the client to wait before retrying the request."

Given the normative language, my reading of it is that neither 413 or 415 should be treated as transient conditions, but arguably 503 should be (and that given the lack of guidance from the server, our backoff strategy is acceptable) - therefore they should be handled differently. As it stands, this PR does not do that.

quintesse · 2024-07-31T15:15:05Z

Indeed, this PR simply copies the similar behavior that currently exists in the Insights client code. Treating different errors differently from the PoV of the BackoffWrapper code wouldn't be too hard, but isn't currently implemented. We would have to define how to distinguish the errors that are "repeatable" from the ones that aren't, though.

mocenas · 2024-08-01T12:34:55Z

@kittylyst
This thing does address the problem with no retry. Original motivation was to retry in case that server is overloaded or has whatever TEMPORARY problem. Since http client cannot really know if server side problem is temporary or not, it makes sense to me to retry for all 5xx codes.

I agree that it would make sense to not backoff in case of not-recoverable errors, basically for all 4xx codes. If you want to extend BackoffWrapper to distinguish recoverable and non-recoverable errors. That is IMHO OK. As @quintesse stated, this PR copies the same behaviour as used in client-runtime. So it should also be addressed there.
Also this PR should handle 403 code.

As I already stated somewhere, this PR IMHO has too long timeout (1 minute) and too many retries (5 retries exponentially backing off) which combined can take ~6-7 minutes to timeout, in extreme case. That is IMHO too much.

To conclude, this PR solves the original problems, but IMHO can be tweaked to do it better.

kittylyst · 2024-08-01T12:42:07Z

@mocenas I understand where you're coming from, but I think I'm "agree to disagree" about this. 😺

However, I think we're all in agreement that this is an improvement, and we are under some time pressure for this.

So, :lgtm: and we can tweak this in a future release.

kittylyst

LGTM

quintesse added 2 commits July 31, 2024 11:23

The agent's apache http client now supports timeouts

96a7608

Fixes MWTELE-226

Now using retry with backoff when uploading reports

ee10d4b

Fixes MWTELE-225

kittylyst requested changes Jul 31, 2024

View reviewed changes

kittylyst approved these changes Aug 1, 2024

View reviewed changes

kittylyst merged commit fb3b09f into RedHatInsights:main Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report uploads now use timeouts and retriees with backoff #34

Report uploads now use timeouts and retriees with backoff #34

quintesse commented Jul 31, 2024

kittylyst left a comment

quintesse commented Jul 31, 2024

mocenas commented Aug 1, 2024

kittylyst commented Aug 1, 2024

kittylyst left a comment

Report uploads now use timeouts and retriees with backoff #34

Report uploads now use timeouts and retriees with backoff #34

Conversation

quintesse commented Jul 31, 2024

kittylyst left a comment

Choose a reason for hiding this comment

quintesse commented Jul 31, 2024

mocenas commented Aug 1, 2024

kittylyst commented Aug 1, 2024

kittylyst left a comment

Choose a reason for hiding this comment