-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@actions/artifact keep-alive configuration #433
Conversation
headers: [], | ||
// keep alive is configured at the http-client level and is used by each client when making calls, this is independent | ||
// of the keep-alive header that is used to let the remove server know how the connection should be treated | ||
keepAlive: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This settings needs to be set and it's independent of the keep-alive
header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting makes the http-client save some agent
behind that scenes that actually makes the calls: https://github.com/actions/http-client/blob/f6aae3dda4f4c9dc0b49737b36007330f78fd53a/index.ts#L143
const body: string = await response.readBody() | ||
|
||
if (isSuccessStatusCode(response.message.statusCode) && body) { | ||
return JSON.parse(body) | ||
} | ||
displayHttpDiagnostics(response) | ||
this.disposeAllConnections() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you dispose of all clients here? Don't we need those clients later? Are you recreating clients somewhere?
If those clients are single use, why not just create a HTTP client here and dispose of it when you are done. Why bother with HttpManager
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right below this line an error is thrown, so the download essentially stops and no more HTTP calls get made (the clients don't get used anywhere else if this gets hit). I'm actually running some tests right now where we don't dispose at all, disposal might (should?) happen automatically when node terminates.
Putting this as a draft for now, will revisit if need be. I don't think this should be a blocker for releasing |
Keep Alive
I was doing some stress testing, and I came upon a really weird error that caused two files not to upload and some resets not to work correctly. You can see the run here: https://github.com/konradpabjan/artifact-test/runs/615227935?check_suite_focus=true#step:3:7251
https://github.com/konradpabjan/artifact-test/runs/615227935?check_suite_focus=true#step:3:7474
TLDR, two files don't upload, and resetting the connections doesn't work correctly.
This hasn't been reported as part of the
v2-preview
(which was suspicious) and I haven't seen theECONNRESET
error ever since switching over to multiple http-clients so I did a little digging into why this could have happened. I had a suspicion thatkeep-alive
was the problem since there was a stacktrace that said the connection was already reset?Turns out, there is a
keep-alive
configuration setting as part of the@actions/http-client
that needs to be set (this is independent of any headers). https://github.com/actions/http-client/blob/f6aae3dda4f4c9dc0b49737b36007330f78fd53a/index.ts#L143I'm making all the http-clients use keep alive. I'm also cleanup up some of the code to clarify between
headers
andrequestOptions
.Add 500 as a retryable status code
I'm also adding
500
as a retryable status code. I've managed to hit a few of these and we would normally fail the file upload: https://github.com/konradpabjan/artifact-test/runs/622212761?check_suite_focus=true#step:9:6690. Digging in, these 500s are always some SQL error that happens pretty randomly.Looking at Kusto,
v1
triggers these exact same 500s from time to time so this isn't something new that all of the sudden started coming up ( a follow up issue might be good). Thev1
actions does treat 500s as retryable. Anything between 400-499 is fail fast while everything else is retried: https://github.com/actions/runner/blob/6c70d53eead402ba5d53676d6ed649a04e219c9b/src/Runner.Plugins/Artifact/FileContainerServer.cs#L483Testing
You can see retries working for downloads when a 500 is hit: https://github.com/konradpabjan/artifact-test/runs/622943741?check_suite_focus=true#step:5:223
I tried hitting a 500 for uploads, but after 3+ hours, i got nothing, here are some successful runs:
https://github.com/konradpabjan/artifact-test/actions/runs/89232043
https://github.com/konradpabjan/artifact-test/actions/runs/89221845
https://github.com/konradpabjan/artifact-test/actions/runs/89203550