Fix `withTimeout` helper function #1293

edoardopirovano · 2022-10-11T09:10:25Z

Noticed while looking at logs during the rollout of TRAP caching.

It turns out that I had slightly misunderstood the semantics of Promise.race. While it returns the value of the first promise that finishes, it does not actually halt the other promise (there is, as far as I now understand things, no way to do this in JavaScript). This means we were eventually calling the timeout callback even if the task completed. This isn't a big deal because the timeout callback just logged an informational message. Still, it's not ideal and creates confusing logs since we log a message about the download timing out some time after it has in fact already finished.

This PR fixes that, and adds a unit test that would've caught this. It's still not ideal that when we do timeout we don't halt the download in the background but I think that is okay since if we've timed out it probably means that has got stuck somehow so isn't using network resources. Actually cancelling the download would require re-writing parts of the Actions cache library to support cancellation, which is outside of our team's scope (although it's certainly a sensible feature request we should consider filing with the Actions team).

Merge / deployment checklist

Confirm this change is backwards compatible with existing workflows.
Confirm the readme has been updated if necessary.
Confirm the changelog has been updated if necessary.

jbj · 2022-10-11T09:21:19Z

It's still not ideal that when we do timeout we don't halt the download in the background but I think that is okay since if we've timed out it probably means that has got stuck somehow so isn't using network resources.

If the download keeps being stuck in the background, will the NodeJS runtime then be allowed to shut down when the action is done with all its foreground tasks? Can you verify this?

edoardopirovano · 2022-10-11T10:07:28Z

If the download keeps being stuck in the background, will the NodeJS runtime then be allowed to shut down when the action is done with all its foreground tasks? Can you verify this?

Yikes, an excellent point, I've made a commit on top of this PR where I added a very long task with a timeout, and spawned a test run with this. Sure enough it continues with the foreground tasks but then gets stuck at the end of the step with the long task after finishing those 🙁.

So, unless there's something I'm missing (more input from JS experts would be welcome), it sounds like we can't add a timeout on top of the existing code as we're trying to do here and we really will need the proper solution of having a configurable timeout or cancellation mechanism threaded into the Actions cache library code which is unfortunate since it means having proper handling of outages in the Actions cache now requires work on another team.

jbj · 2022-10-11T11:17:14Z

Would it help to call process.exit at the end of runWrapper?

edoardopirovano · 2022-10-11T16:11:59Z

Would it help to call process.exit at the end of runWrapper?

This works, but is a little scary because we might also terminate other async tasks that are still pending.

I had a conversation with @henrymercer about this, and he spotted that actions/toolkit#1140 added a timeout option for the downloading, which seems like a better idea than our workaround. I've bumped the version of the library we pull in so that we have that PR and started using that new option. The timeout is per-segment rather than for the whole cache, but as we are bounding our caches to 1GB they will always fit in one segment (which is 2GB on a 64-bit machine) so this isn't an issue.

For the uploading, there is no such option. I propose we leave the workaround in place there. It's not ideal that we won't terminate the workflow until we hit the timeout for the run, but at least with our workaround the code will run to completion before we hang which means that results will get uploaded to the Code Scanning, the database will make it to MRVA if applicable, etc. This should still hopefully be a rare case, and we can monitor with our telemetry to see if it becomes a concern and then perhaps think about more aggressive solutions like using process.exit() or spawning a new process for the upload that we can kill independently.

In the future, of course, the ideal solution would be for the Actions library to just expose a timeout option in the upload case too.

aeisenberg

This looks reasonable.

henrymercer

Small suggestion around documenting some of what we learnt. Unfortunately accepting this requires rebuilding the Action and reapproving the PR, so feel free to skip it.

henrymercer · 2022-10-11T17:28:55Z

src/util.ts

+ * Note that this does NOT cancel the original promise, so that promise will
+ * continue in the background even after the timeout has expired.


Suggested change

* Note that this does NOT cancel the original promise, so that promise will

* continue in the background even after the timeout has expired.

* Important: This does NOT cancel the original promise, so that promise will continue in the

* background even after the timeout has expired. If the original promise hangs, then this will

* prevent the process terminating.

Thanks, that is indeed better. Will do this in a follow-up to avoid dismissing the approvals here.

Fix withTimeout helper function

6e1dab2

edoardopirovano requested a review from a team as a code owner October 11, 2022 09:10

Further update to reflect a conversation with @henrymercer

adb2896

aeisenberg approved these changes Oct 11, 2022

View reviewed changes

henrymercer approved these changes Oct 11, 2022

View reviewed changes

edoardopirovano merged commit 44edb7c into main Oct 11, 2022

edoardopirovano deleted the edoardo/fix-with-timeout branch October 11, 2022 20:29

edoardopirovano mentioned this pull request Oct 11, 2022

Improve documentation comment #1296

Merged

3 tasks

This was referenced Oct 14, 2022

Merge main into releases/v2 #1303

Closed

Merge main into releases/v2 #1311

Merged

Merge releases/v2 into releases/v1 #1314

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `withTimeout` helper function #1293

Fix `withTimeout` helper function #1293

edoardopirovano commented Oct 11, 2022

jbj commented Oct 11, 2022

edoardopirovano commented Oct 11, 2022

jbj commented Oct 11, 2022

edoardopirovano commented Oct 11, 2022

aeisenberg left a comment

henrymercer left a comment

henrymercer Oct 11, 2022

edoardopirovano Oct 11, 2022

		* Note that this does NOT cancel the original promise, so that promise will
		* continue in the background even after the timeout has expired.

- * Note that this does NOT cancel the original promise, so that promise will
- * continue in the background even after the timeout has expired.
+ * Important: This does NOT cancel the original promise, so that promise will continue in the
+ * background even after the timeout has expired. If the original promise hangs, then this will
+ * prevent the process terminating.

Fix withTimeout helper function #1293

Fix withTimeout helper function #1293

Conversation

edoardopirovano commented Oct 11, 2022

Merge / deployment checklist

jbj commented Oct 11, 2022

edoardopirovano commented Oct 11, 2022

jbj commented Oct 11, 2022

edoardopirovano commented Oct 11, 2022

aeisenberg left a comment

Choose a reason for hiding this comment

henrymercer left a comment

Choose a reason for hiding this comment

henrymercer Oct 11, 2022

Choose a reason for hiding this comment

edoardopirovano Oct 11, 2022

Choose a reason for hiding this comment

Fix `withTimeout` helper function #1293

Fix `withTimeout` helper function #1293