Workers: Curtail the browser's license to kill #1004

lars-t-hansen · 2016-04-06T05:20:01Z

(Background: https://www.w3.org/Bugs/Public/show_bug.cgi?id=29039; #851)

The current "kill a worker" algorithm states that it may be run at any time, and the rest of the workers spec seems to take that into account properly. Thus a correct implementation of workers can pick a random worker in a random domain every second and just gun it down for no good reason at all. This license seriously impacts the utility of workers and is almost certainly wider license than the browser needs.

I expect the need for the /some/ wording comes from the desire to kill runaway workers without a slow-script dialog and also from the need to close tabs and throw things out of the history; I also expect the current wording was simply expedient.

I would like this wording to be narrowed considerably:

In the context of current browsers gunning down workers will lead to pages that don't work but is probably otherwise "benign". But in the context of the evolving spec on shared memory for JavaScript (http://tc39.github.io/ecmascript_sharedmem/shmem.html) gunning down workers will also lead to scenarios where workers that hold locks in shared memory disappear without releasing those locks or cleaning up state, and that will lead to yet other workers deadlocking. (The lack of a termination signaling mechanism does not help here but is a separate bug.)

I don't have any concrete proposal for a new wording, and when workers communicate synchronously through shared memory and don't return to their event loops regularly it will be very difficult to determine if a worker is 'runaway'. Could we at least enumerate in the spec the situations under which the kill a worker algorithm can be run, so that we can discuss specific cases?

domenic · 2016-04-06T05:23:44Z

It's not clear to me what this bug is meant to accomplish. If there are times when workers should be prohibited from being killed, then we should enumerate them (pending implementer agreement). But if there are no such times, then the current wording seems to reflect reality well.

I'm happy to evaluate a concrete proposal (or rather, to take a concrete proposal to the worker teams of various browsers and see if they are OK with the normative changes proposed), but I'm not clear what normative changes are being asked for here.

annevk · 2016-04-06T06:36:08Z

@domenic I think what @lars-t-hansen is asking for that instead of enumerating when they should not be killed, we should more clearly enumerate when they are allowed to be killed, instead of basically allowing that all the time, because nobody could be bothered to enumerate the reasons (it was more expedient, as lth says, most likely).

Reasons I can think of:

OOM (already covered by a general disclaimer at the start of the specification)
"Owner" document unloads
"Owner" worker is terminated

I'm not sure why slow-script dialog would be a reason. That only seems a concern if the worker shares the event loop with a document, which is not technically allowed.

lars-t-hansen · 2016-04-06T08:55:55Z

What @annevk said, which is a nice clarification.

domenic · 2016-04-06T13:01:01Z

I guess if implementers are OK with disallowing killing at all other times, then we could change to a safelist instead of the current approach (which is effectively a currently-empty blocklist). I am skeptical this is the case though; in general UAs have preferred the freedom to protect the user's experience and change their heuristics here. See, for example, service workers, where these heuristics are still evolving to my knowledge, and explicitly not interoperable.

Not sure who the implementers are who work on workers at other browsers, but I believe @kinu would be for Chrome. @kinu, would your team be interested in a spec change that explicitly enumerates the places user agents are allowed to kill workers, instead of letting them kill them at any time? Would that be something you could conform to now and going forward?

annevk · 2016-04-06T13:48:20Z

Note that service workers have a totally different lifecycle. That they are a subclass here and there is a bit of a lie, since the processing requirements are vastly different (though not always written down as such). I would expect this thread to only cover requirements for dedicated and shared workers.

domenic · 2016-04-06T14:09:25Z

Well, they still use "kill a worker", I thought.

annevk · 2016-04-06T16:35:18Z

Yeah I know, I'm not really sure their reuse makes sense once you start looking at the details, though I have heard we share some code between shared workers and service workers, so who knows. E.g., while workers are based around message channels, service workers do not use those.

annevk · 2016-06-20T12:36:58Z

Okay, having discussed this within Mozilla, these are the ways a (dedicated/shared) worker can be shutdown:

Document shutdown
Worker closes itself (.close())
Parent closes worker (.terminate())
GC

GC happens if there is nothing (other than network, I suspect) that can observe the worker being terminated. E.g., SharedArrayBuffer would prevent GC.

It seems currently the specification doesn't really address GC. (Perhaps we can leave it since it's generally not supposed to be observable... @bakulf?) And on the other hand, permits way more shutdown than at least Mozilla actually performs.

I think we should move the specification in this direction. We need to get more predictable JavaScript "threads" if we want folks to actually rely on this for their computing needs.

bakulf · 2016-06-22T09:13:32Z

GC behavior is not observable: workers are terminated when there are no APIs keeping them alive for some networking (XHR, WebSocket), cross-thread event listeners (MessagePort, BroadcastChannel, etc) or pending async operations, and, of course, if there is not JS code running.

This could be consider an internal optimization also if not part of the spec.

duanyao · 2016-06-22T10:03:11Z

I think GC of worker should be spec'ed, because it determines whether developers have to always terminate workders they created to prevent resource leak -- this can be tricky on error conditions.

annevk · 2016-06-22T11:03:41Z

I see, yes, we should not mention it in that case. So that leaves us with three very simple cases, which should all already be covered.

There’s a new sheriff in town and workers can now no longer be killed. “Kill” was a poor term to use anyway and also not needed as worker termination already covered all angles we care about except for one. Now, workers can solely be terminated due to close(), terminate(), or disappearing documents. They can of course continued to be GC’d as long as that is not observable, not even from the network. Fixes #1004.

There’s a new sheriff in town and workers can now no longer be killed. “Kill” was a poor term to use anyway and also not needed as worker termination already covered all angles we care about except for one. Now, workers can solely be terminated due to close(), terminate(), or disappearing documents. Fixes #1004.

There’s a new sheriff in town and workers can now no longer be killed. “Kill” was a poor term to use anyway and also not needed as worker termination already covered all angles we care about except for one. Now, workers can solely be terminated due to close(), terminate(), or disappearing documents. Fixes whatwg#1004.

domenic added normative change needs implementer interest Moving the issue forward requires implementers to express interest labels Apr 6, 2016

annevk added the topic: workers label Jun 20, 2016

annevk mentioned this issue Jun 20, 2016

Workers: Dedicated execution thread + related syntactic matters #851

Closed

annevk mentioned this issue Jun 22, 2016

Remove overly permissive "kill a worker" algorithm #1457

Merged

lars-t-hansen mentioned this issue Jun 30, 2016

Agents, vats, continents, oh my: Rephrase agents properly in terms of job queues and realms tc39/proposal-ecmascript-sharedmem#27

Closed

lars-t-hansen mentioned this issue Apr 12, 2017

Do Agent Cluster requirements have TC39 consensus? tc39/ecma262#886

Closed

annevk closed this as completed in #1457 Apr 19, 2017

annevk mentioned this issue Apr 19, 2017

Define GC for workers in more detail #2553

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workers: Curtail the browser's license to kill #1004

Workers: Curtail the browser's license to kill #1004

lars-t-hansen commented Apr 6, 2016

domenic commented Apr 6, 2016

annevk commented Apr 6, 2016

lars-t-hansen commented Apr 6, 2016

domenic commented Apr 6, 2016

annevk commented Apr 6, 2016

domenic commented Apr 6, 2016

annevk commented Apr 6, 2016

annevk commented Jun 20, 2016 •

edited

Loading

bakulf commented Jun 22, 2016

duanyao commented Jun 22, 2016

annevk commented Jun 22, 2016

Workers: Curtail the browser's license to kill #1004

Workers: Curtail the browser's license to kill #1004

Comments

lars-t-hansen commented Apr 6, 2016

domenic commented Apr 6, 2016

annevk commented Apr 6, 2016

lars-t-hansen commented Apr 6, 2016

domenic commented Apr 6, 2016

annevk commented Apr 6, 2016

domenic commented Apr 6, 2016

annevk commented Apr 6, 2016

annevk commented Jun 20, 2016 • edited Loading

bakulf commented Jun 22, 2016

duanyao commented Jun 22, 2016

annevk commented Jun 22, 2016

annevk commented Jun 20, 2016 •

edited

Loading