Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@actions/artifact keep-alive configuration #433

Closed
wants to merge 5 commits into from

Conversation

konradpabjan
Copy link
Contributor

Keep Alive

I was doing some stress testing, and I came upon a really weird error that caused two files not to upload and some resets not to work correctly. You can see the run here: https://github.com/konradpabjan/artifact-test/runs/615227935?check_suite_focus=true#step:3:7251
https://github.com/konradpabjan/artifact-test/runs/615227935?check_suite_focus=true#step:3:7474
TLDR, two files don't upload, and resetting the connections doesn't work correctly.

This hasn't been reported as part of the v2-preview (which was suspicious) and I haven't seen the ECONNRESET error ever since switching over to multiple http-clients so I did a little digging into why this could have happened. I had a suspicion that keep-alive was the problem since there was a stacktrace that said the connection was already reset?

Turns out, there is a keep-alive configuration setting as part of the @actions/http-client that needs to be set (this is independent of any headers). https://github.com/actions/http-client/blob/f6aae3dda4f4c9dc0b49737b36007330f78fd53a/index.ts#L143

I'm making all the http-clients use keep alive. I'm also cleanup up some of the code to clarify between headers and requestOptions.

Add 500 as a retryable status code

I'm also adding 500 as a retryable status code. I've managed to hit a few of these and we would normally fail the file upload: https://github.com/konradpabjan/artifact-test/runs/622212761?check_suite_focus=true#step:9:6690. Digging in, these 500s are always some SQL error that happens pretty randomly.

Looking at Kusto, v1 triggers these exact same 500s from time to time so this isn't something new that all of the sudden started coming up ( a follow up issue might be good). The v1 actions does treat 500s as retryable. Anything between 400-499 is fail fast while everything else is retried: https://github.com/actions/runner/blob/6c70d53eead402ba5d53676d6ed649a04e219c9b/src/Runner.Plugins/Artifact/FileContainerServer.cs#L483

Testing

You can see retries working for downloads when a 500 is hit: https://github.com/konradpabjan/artifact-test/runs/622943741?check_suite_focus=true#step:5:223

I tried hitting a 500 for uploads, but after 3+ hours, i got nothing, here are some successful runs:
https://github.com/konradpabjan/artifact-test/actions/runs/89232043
https://github.com/konradpabjan/artifact-test/actions/runs/89221845
https://github.com/konradpabjan/artifact-test/actions/runs/89203550

headers: [],
// keep alive is configured at the http-client level and is used by each client when making calls, this is independent
// of the keep-alive header that is used to let the remove server know how the connection should be treated
keepAlive: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This settings needs to be set and it's independent of the keep-alive header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting makes the http-client save some agent behind that scenes that actually makes the calls: https://github.com/actions/http-client/blob/f6aae3dda4f4c9dc0b49737b36007330f78fd53a/index.ts#L143

const body: string = await response.readBody()

if (isSuccessStatusCode(response.message.statusCode) && body) {
return JSON.parse(body)
}
displayHttpDiagnostics(response)
this.disposeAllConnections()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you dispose of all clients here? Don't we need those clients later? Are you recreating clients somewhere?

If those clients are single use, why not just create a HTTP client here and dispose of it when you are done. Why bother with HttpManager?

Copy link
Contributor Author

@konradpabjan konradpabjan Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right below this line an error is thrown, so the download essentially stops and no more HTTP calls get made (the clients don't get used anywhere else if this gets hit). I'm actually running some tests right now where we don't dispose at all, disposal might (should?) happen automatically when node terminates.

@konradpabjan konradpabjan marked this pull request as draft April 27, 2020 17:59
@konradpabjan
Copy link
Contributor Author

konradpabjan commented Apr 27, 2020

Putting this as a draft for now, will revisit if need be. I don't think this should be a blocker for releasing v2 tomorrow considering no one else has reported the issue and I've only hit this once really

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants