Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry artifact upload/download if a 500 is hit #442

Closed
wants to merge 1 commit into from

Conversation

konradpabjan
Copy link
Contributor

@konradpabjan konradpabjan commented May 1, 2020

Overview

We should be retrying if a 500 is encountered during either artifact upload or download.

Should fix: actions/download-artifact#33

Looking at telemetry, v1 triggers these exact same 500s very infrequently (but they do happen) so this isn't something new that all of the sudden started coming up. The v1 actions treat 500s as retryable. Anything between 400-499 is fail fast while everything else is retried: https://github.com/actions/runner/blob/6c70d53eead402ba5d53676d6ed649a04e219c9b/src/Runner.Plugins/Artifact/FileContainerServer.cs#L483

Testing

After a lot of runs, I've managed to actually hit a 500 during download, you can see in the log that it is retired and the download is successful: https://github.com/konradpabjan/artifact-test/runs/622943741?check_suite_focus=true#step:5:223

@webknjaz
Copy link

webknjaz commented May 3, 2020

Looks like there's also some low-level socket timeout error (ETIMEDOUT) that needs to be taken into account too: actions/download-artifact#33 (comment).

@konradpabjan
Copy link
Contributor Author

After some offline discussions, we're going to be fixing this by remapping some error codes on the server so that a 503 is returned instead of a 500. The 503 should trigger the retry logic.

@konradpabjan konradpabjan marked this pull request as draft May 14, 2020 11:49
@konradpabjan
Copy link
Contributor Author

A fix is rolling out behind to scenes to make 500s that are caused by timeouts return as a 503 which will be retried.

@joshmgross joshmgross deleted the konradpabjan/artifact-retry-on-500s branch July 27, 2020 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Retry on HTTP 500
2 participants