-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix withTimeout
helper function
#1293
Conversation
If the download keeps being stuck in the background, will the NodeJS runtime then be allowed to shut down when the action is done with all its foreground tasks? Can you verify this? |
Yikes, an excellent point, I've made a commit on top of this PR where I added a very long task with a timeout, and spawned a test run with this. Sure enough it continues with the foreground tasks but then gets stuck at the end of the step with the long task after finishing those 🙁. So, unless there's something I'm missing (more input from JS experts would be welcome), it sounds like we can't add a timeout on top of the existing code as we're trying to do here and we really will need the proper solution of having a configurable timeout or cancellation mechanism threaded into the Actions cache library code which is unfortunate since it means having proper handling of outages in the Actions cache now requires work on another team. |
Would it help to call |
This works, but is a little scary because we might also terminate other async tasks that are still pending. I had a conversation with @henrymercer about this, and he spotted that actions/toolkit#1140 added a timeout option for the downloading, which seems like a better idea than our workaround. I've bumped the version of the library we pull in so that we have that PR and started using that new option. The timeout is per-segment rather than for the whole cache, but as we are bounding our caches to 1GB they will always fit in one segment (which is 2GB on a 64-bit machine) so this isn't an issue. For the uploading, there is no such option. I propose we leave the workaround in place there. It's not ideal that we won't terminate the workflow until we hit the timeout for the run, but at least with our workaround the code will run to completion before we hang which means that results will get uploaded to the Code Scanning, the database will make it to MRVA if applicable, etc. This should still hopefully be a rare case, and we can monitor with our telemetry to see if it becomes a concern and then perhaps think about more aggressive solutions like using In the future, of course, the ideal solution would be for the Actions library to just expose a timeout option in the upload case too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small suggestion around documenting some of what we learnt. Unfortunately accepting this requires rebuilding the Action and reapproving the PR, so feel free to skip it.
* Note that this does NOT cancel the original promise, so that promise will | ||
* continue in the background even after the timeout has expired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Note that this does NOT cancel the original promise, so that promise will | |
* continue in the background even after the timeout has expired. | |
* Important: This does NOT cancel the original promise, so that promise will continue in the | |
* background even after the timeout has expired. If the original promise hangs, then this will | |
* prevent the process terminating. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that is indeed better. Will do this in a follow-up to avoid dismissing the approvals here.
Noticed while looking at logs during the rollout of TRAP caching.
It turns out that I had slightly misunderstood the semantics of
Promise.race
. While it returns the value of the first promise that finishes, it does not actually halt the other promise (there is, as far as I now understand things, no way to do this in JavaScript). This means we were eventually calling the timeout callback even if the task completed. This isn't a big deal because the timeout callback just logged an informational message. Still, it's not ideal and creates confusing logs since we log a message about the download timing out some time after it has in fact already finished.This PR fixes that, and adds a unit test that would've caught this. It's still not ideal that when we do timeout we don't halt the download in the background but I think that is okay since if we've timed out it probably means that has got stuck somehow so isn't using network resources. Actually cancelling the download would require re-writing parts of the Actions cache library to support cancellation, which is outside of our team's scope (although it's certainly a sensible feature request we should consider filing with the Actions team).
Merge / deployment checklist