Skip to content
This repository has been archived by the owner on Aug 29, 2021. It is now read-only.

Avoid reordering execution of module bodies before top-level await is reached #47

Closed
littledan opened this issue Feb 17, 2019 · 36 comments · Fixed by #61
Closed

Avoid reordering execution of module bodies before top-level await is reached #47

littledan opened this issue Feb 17, 2019 · 36 comments · Fixed by #61

Comments

@littledan
Copy link
Member

Let's continue discussion on #44 (comment) here. To summarize, @zenparsing noticed a potential reordering issue in #33 , and @bergus and I were chatting about possible ways to avoid it.

@littledan
Copy link
Member Author

@bergus, What do you see as the problem with draining the job queue between siblings? I see it as a midpoint between inserting additional implicit dependencies and not controlling things at all.

@littledan
Copy link
Member Author

About a "synchronous" Promise.all, as proposed in #44 (comment) , that would seem to go against the spirit of what many expressed in
#43

@domenic
Copy link
Member

domenic commented Feb 17, 2019

Can someone summarize the issue? Extracting it from the comment thread is hard, and I'd like to first get on the same page as to whether there's actually an issue.

@bergus
Copy link

bergus commented Feb 17, 2019

@domenic: @zenparsing brought the example

// a-1.js
console.log('a-1');
// a-2.js
console.log('a-2');
// a.js
import './a-1.js';
import './a-2.js';
console.log('a');
// b.js
console.log('b');
// main.js
import './a.js';
import './b.js';
console.log('main');

with the current ES semantics this will result in

a-1
a-2
a
b
main

and we of course want to keep this ordering. However if we naively just add Promise.all to await all dependencies, we get to evaluate

const a1 = Promise.resolve()    .then(() => console.log('a-1'));
const a2 = Promise.resolve()    .then(() => console.log('a-2'));
const a  = Promise.all([a1, a2]).then(() => console.log('a'));
const b  = Promise.resolve()    .then(() => console.log('b'));
           Promise.all([a, b])  .then(() => console.log('main'));

and the ordering will become

a-1
a-2
b
a
main

which is undesirable.

@littledan
Copy link
Member Author

Right, @bergus summarizes it well. To sum it up, the microtask queue may get gummed up just working through all the microtasks from module evaluation itself, leading a mid-level module in a heavier subtree on the "left" side have its body evaluated after a lighter "right" subtree, even in the absence of top-level await.

@domenic
Copy link
Member

domenic commented Feb 17, 2019

Thanks.

I'm not sure this is a problem in HTML, because every script evaluation (including module scripts) there is done by wrapping it in various things, the end of which will run a microtask checkpoint if the stack is empty (which it would be in this case, I believe). Concretely, https://html.spec.whatwg.org/multipage/webappapis.html#run-a-module-script step 9.

Note that there are a lot of important setup and teardown steps there; in general it's important for any synchronous block of code to be run via the host so that the host can maintain its invariants in these sorts of ways.

So as long as we respect that existing infrastructure, this will not be an issue in the browser. I haven't checked if any of the various spec PRs open do or do not respect the existing infrastructure.

@domenic
Copy link
Member

domenic commented Feb 17, 2019

Note that this is also taken care of if all script work is queued via EnqueueJob(); see https://html.spec.whatwg.org/multipage/webappapis.html#enqueuejob(queuename,-job,-arguments) step 7.8. So e.g. if you queued separate evaluation jobs for a and then b, the queue would be completely emptied by the time the b job came around, and thus presumably 'a' would have been logged to the console.

@bergus
Copy link

bergus commented Feb 17, 2019

@littledan I just think it would be ugly when the module loading component would take control over the job queues, this doesn't look sound from an architectural point of view. But it's not my decision, you would have to discuss this with the TC editors.

@domenic Ah, good point that the HTML spec could solve this, but I'd rather have this behaviour mapped out in the ES spec alone, not relying on the host to run the job queue in the expected order.
However I don't understand what you mean by queuing script work via EnqueueJob. Script evaluation should not be a promise job, should it? And even if it was, those jobs are put on the microtask queue, and the "script clean up" step of the first de-queued job would then perform a microtask checkpoint which would run all the other jobs in the microtask queue - afaics not really a difference wrt. evaluation order from not doing cleanup and just continuing the dequeuing.

I'd rather see the weakly-deterministic-with-yielding strategy from your Determinism vs ASAP document, with dynamic import() featuring the same role as the opt-in second entry point for concurrent evaluation.

@littledan
Copy link
Member Author

@domenic If there's a way that we can handle this logic via HTML, that could be a good solution. I don't understand how the current HTML spec handles this, however.

Step 7.2 of the "run a module script" algorithm notes, the Evaluate algorithm "will recursively evaluate all of the module's dependencies." It's exactly this recursive evaluation that we might need to intersperse microtask queue checkpoints into, but I don't see where the recursion in the ES spec jumps back to this HTML algorithm.

I'm not sure what you're suggesting by EnqueueJob. The only jobs for module evaluation in #33 are Promise reaction jobs due to waiting on dependencies. This hits the HTML EnqueueJob algorithm, but I don't see how that leads to the ordering we want. Is there some other way we should enqueue additional jobs to get another ordering?

@domenic
Copy link
Member

domenic commented Feb 18, 2019

Step 7.2 of the "run a module script" algorithm notes, the Evaluate algorithm "will recursively evaluate all of the module's dependencies." It's exactly this recursive evaluation that we might need to intersperse microtask queue checkpoints into, but I don't see where the recursion in the ES spec jumps back to this HTML algorithm.

Currently, Evaluate() only executes one big sync chunk of code. Every sync chunk of code needs to be wrapped in setup/cleanup steps, so in the current setup, nothing more is needed.

What I was alluding to above, was that if whatever changes the top-level await proposal makes also respect this invariant---that every sync chunk of code gets wrapped in a setup/cleanup pair---then I think everything will work as expected.

Similarly, I was pointing out that if sync chunks of code are executed by using EnqueueJob, then you'd get this wrapping for free.

Does this help?

I might still be missing something, as I haven't immersed myself in the problem fully like you have. I am just pattern matching: we need to run microtask checkpoints after each chunk, and HTML has well-established mechanisms for ensuring microtask checkpoints are run. I'm not sure if it solves the problem fully, but I was hopeful. Thus, for example:

if you queued separate evaluation jobs for a and then b, the queue would be completely emptied by the time the b job came around, and thus presumably 'a' would have been logged to the console.

I'm not sure how that lines up with #33, having not reviewed it in detail.

@littledan
Copy link
Member Author

@domenic I see; this is an interesting frame. Maybe there's some way that we could give HTML a chance to clean up after launching a module load. My first shot at HTML integration with top-level await doesn't do that, but I'll think about host hooks that we might add for that purpose.

At the same time, given comments people are making elsewhere about job queue related topics, I imagine there would be interest in ensuring a cross-environment guarantee that this reordering doesn't occur.

Another solution to this problem is to not yield to the job queue when importing a module that does not have a top-level await, as I was suggesting in #43. I think that'd solve this problem as well.

littledan added a commit to littledan/proposal-top-level-await that referenced this issue Feb 22, 2019
littledan added a commit to littledan/proposal-top-level-await that referenced this issue Feb 22, 2019
littledan added a commit to littledan/proposal-top-level-await that referenced this issue Feb 22, 2019
@littledan
Copy link
Member Author

littledan commented Feb 27, 2019

Fundamentally, I see two approaches to solve this problem. Both of these approaches seem doable to me; I don't see fatal flaws with either, and I don't have an extremely strong preference. I'm wondering what you all think.

cc @GeorgNeis @domenic @MylesBorins @guybedford @mathiasbynens

@domenic
Copy link
Member

domenic commented Feb 27, 2019

  • HTML cleans up after JS all the time with microtask queue checkpoints, so this is not unusual at all.

To expand on this. Microtask checkpoints get run whenever HTML is done running "a block of script". For example, given two sync classic scripts

<script>
Promise.resolve().then(() => console.log("promise 1"));
console.log("script 1");
</script>
<script>
Promise.resolve().then(() => console.log("promise 2"));
console.log("script 2");
</script>

the result will be script 1 / promise 1 / script 2 / promise 2. (Try it: https://honeysuckle-ermine.glitch.me/. Note that iframe-based runners like jsbin will NOT show this effect, since the stack is not empty.)

So it makes perfect sense to me that we would flush it between modules as well.

Honestly, it is kind of an oversight that we haven't been doing that so far, I would say. We didn't do so because it was easier for the HTML spec to treat module graphs as atomic single-blocks-of-script. But I think that approach was against the spirit of microtask checkpoints, and is showing its cracks here.

@zenparsing
Copy link
Member

Curious: if you flush microtasks after executing a module body (leaving TLA aside) does that make the module evaluation algorithm reentrant?

@littledan
Copy link
Member Author

littledan commented Feb 27, 2019

@zenparsing Are you asking, is it possible for some other module evaluation algorithm to start running? I believe the answer is, with the HTML host environment, no. import() always calls back into the ES algorithms after queueing a task, see HTML's definition of HostImportModuleDynamically. I was thinking, we could make a tweak to the requirements for HostImportModuleDynamically to require that all host environments behave, fundamentally, like this, in some way or other.

@littledan
Copy link
Member Author

cc @jakearchibald

@jakearchibald
Copy link

The flushing model will change current behaviour right?

1.js

import './2.js';
console.log('1');

2.js

console.log('2');
Promise.resolve().then(() => console.log('2-promise'));

https://static-misc.glitch.me/module-microtask-test/

The above currently logs 2, 1, 2-promise, and we're talking about changing it to 2, 2-promise, 1? Is it ok to change something like that?

Fwiw, I tell devs that microtask checkpoints happen whenever the JS stack empties (plus a specific point in the event loop). I don't know how bulletproof that rule is supposed to be, but the current behaviour breaks that rule, whereas the flushing proposal 'fixes' it.

I guess I agree with @domenic that we should have been flushing the job queue between module evaluation all along.

@littledan
Copy link
Member Author

@jakearchibald Yes, I agree that it'd be an observable change; see previous discussion at #43 .

OK, if folks are pretty positive on #51, maybe we should settle on that.

@guybedford
Copy link
Collaborator

guybedford commented Feb 28, 2019

I've been going back and fourth on this, but after some consideration, I'm actually strongly against #51.

The reason for this is that bundling tools like Rollup combine modules together in a way that assumes execution between successive modules in execution order will not yield to the event loop execute microtasks.

If we execute microtasks, we will create observable differences between the built version of a library, and the original source version.

This one change will severely impact Rollup and similar JS optimiziation tools, because there is now a very different execution semantic in play for the build vs the original source.

Even before I saw this argument, I was leaning towards #49 as well because , just as @jakearchibald mentions, JS developers think very differently to standards here, because we only see the event loop trigger as when we reach the end of the our code.

I've yet to fully understand the arguments against #49, but I hope these new arguments can be considered here.

@guybedford
Copy link
Collaborator

(corrected usage of "event loop" - see I'm still thinking like an early Node.js dev :P)

@zenparsing
Copy link
Member

I'm trying to better understand the browser semantics at play here. Let's say I do this:

var script = document.createElement('script')
script.innerText = `
  console.log('script-a'); 
  Promise.resolve().then(() => console.log('script-promise'));
  console.log('script-b');
`;
(
  console.log('outside-a'), 
  document.head.appendChild(script),
  console.log('outside-b')
);

I get

outside-a
script-a
script-b
outside-b
script-promise

Is that correct?

@littledan
Copy link
Member Author

@zenparsing See clean up after running script:

If the JavaScript execution context stack is now empty, perform a microtask checkpoint. (If this runs scripts, these algorithms will be invoked reentrantly.)

Because there is JS code running, the microtask queue checkpoint doesn't happen in your example.

@domenic
Copy link
Member

domenic commented Feb 28, 2019

@guybedford we're likely to end up yielding to the event loop anyway, regardless of top-level await, for the performance reasons discussed in this document. At least in browsers. Startup jank is an increasing problem and one modules are well-positioned to solve.

JS developers think very differently to standards here, because we only see the event loop trigger as when we reach the end of the our code.

I read @jakearchibald's point the opposite way, that JS developers expect reaching the end of a module script to behave the same as reaching the end of a sync script, and yield. I guess it depends on what you mean by "our code", but I interpreted that to be a single module, not an entire graph of modules.

@zenparsing
Copy link
Member

Thanks @littledan. I wonder how this might play out for a hypothetical JS environment which allows for more direct access to module compilation and evaluation.

Strawman example:

const module = System.compileModule(`
  export const a = 1;
`);
let moduleGraph;
// Populate moduleGraph
System.linkAndEvaluate(moduleGraph);

Now we would presumably have JS on the call stack below module evaluation. Would we still flush jobs after every module evaluation in that graph?

@guybedford
Copy link
Collaborator

@guybedford we're likely to end up yielding to the event loop anyway, regardless of top-level await, for the performance reasons discussed in this document

Fascinating. I agree this is a huge problem today, and that would be a huge solution. If this is the case why not go all the way then and have this proposal yield the event loop for all modules so we can adopt evenly amongst platforms? It would be a win for all.
.

@domenic
Copy link
Member

domenic commented Feb 28, 2019

I've filed an issue to discuss the more general yielding-to-the-event-loop question at whatwg/html#4400. BTW if anyone wants to be added to the @whatwg/modules team, ping me on IRC or something. (Or maybe https://github.com/orgs/whatwg/teams/modules has a "ask to join" button? I dunno.)

@littledan
Copy link
Member Author

@zenparsing I'd suggest that we include requirements in the specification that the module loading/execution algorithm not be used re-entrantly like that. I think some aspects of it may already be broken in such an embedding environment (e.g., the way errors are propagated).

@littledan
Copy link
Member Author

littledan commented Feb 28, 2019

OK, between the consistency in terms of using promises the same way all the time (as strongly encouraged by TC39) and the potential of reducing jank on page load, I think we have sufficient motivation to land #51 for now, with the hope that we'll be able to work through some further details in whatwg/html#4400 and whatwg/html#4352 . If we encounter issues with this approach, we can revisit #49.

@zenparsing
Copy link
Member

@littledan I'm sure you can appreciate that I don't find that answer particularly satisfying. 😉

@zenparsing
Copy link
Member

I described the example above as a "hypothetical" environment, but in fact you can already do something very similar in node with the --experimental-modules and --experimental-vm-modules flags.

Node VM docs

When you call module.evaluate() for a vm.SourceTextModule object, it runs evaluation synchronously, so in that case you do have JS on the call stack. Deeper still, in node the ES module loader itself is written in JS.

Do you think we can support those scenarios and also flush the promise job queue in the middle of evaluation?

@littledan
Copy link
Member Author

littledan commented Mar 1, 2019

OK, between #51 (comment) and #47 (comment) , sounds like I misunderstood the concerns here, which are concretely about compatibility with the web and Node.js. I will revert #51 for now and just land some parts of it that were common with #49 and no one raised concerns about.

I have to spend some time rethinking how these algorithms would work if invoked reentrantly. I was just looking at HTML and hoping that other environments would also refrain from this case.

To answer your question, no, I don't think we can re-entrantly flush the job queue/perform a microtask checkpoint (or re-entrantly queue a task and continue the following steps from the task).

For Node's module integration, I am wondering, do they need to be synchronous, or could they return a Promise?

@littledan littledan reopened this Mar 1, 2019
littledan added a commit to littledan/proposal-top-level-await that referenced this issue Mar 1, 2019
littledan added a commit to littledan/proposal-top-level-await that referenced this issue Mar 1, 2019
@littledan
Copy link
Member Author

littledan commented Mar 1, 2019

Apologies for the noise; let's keep discussing here. You can find the two (rebased) alternatives at #57 (which is the new #51) and #49.

@domenic
Copy link
Member

domenic commented Mar 1, 2019

Sad to hear the hard work here was reverted. I remain confident the best path for landing top-level await in browsers is via a microtask (or perhaps full event loop) flush.

As for the question about whether Node should allow sync module script execution, I strongly hope they do not. But if they do, they are welcome to either run the microtask checkpoint or not, as this is a host hook. For example, one could argue that vms are not like iframes, and so in Node, they might count "the topmost thing on the stack is the VM running script" as an "empty stack" for the purposes of performing a microtask checkpoint.

@zenparsing
Copy link
Member

For Node's module integration, I am wondering, do they need to be synchronous, or could they return a Promise?

Node exposes a thin wrapper (ModuleWrap) around v8 Module objects and this wrapper is used by the vm module and the internal ESM module loader. So, currently, both in vm and the loader, JS is on the stack when evaluation occurs.

If there is JS on the call stack (even "internal" node JS) then flushing the job queue will mess with our run-to-completion semantics.

It may be possible to modify node's internals such that a (non-JS) job to call "evaluate" is enqueued and then evaluate is only called directly from c++, but I'm not sure.

@guybedford what are your thoughts?

@littledan
Copy link
Member Author

Talking with @MylesBorins , it sounded like that API was experimental and due to be rethought and possibly replaced, which is why it's behind multiple flags. When that happens, I hope it's possible to consider making the API not synchronous, but instead return a Promise and wait an event loop tick before doing work. But, in particular, it doesn't sound like this is a concern about maintaining compatibility with the current Node ecosystem, or a new API that's about to ship.

Stepping back, about alignment between Node, tools, browsers, and other environments: I really want to find a common solution between all JavaScript implementations and environments for this problem if we can. I'm not sure if they are all in scope with respect to whatwg/html#4400 , but it's clear to me that there's broad interest in TC39 in helping unify things. So, @zenparsing , thank you for bringing background about Node concerns, and thank you to @guybedford , @Rich-Harris, @lukastaegert for bringing the context about bundlers.

@littledan
Copy link
Member Author

Another alternative to avoid reordering is to differentiate loading modules which might contain await with an import await statement, as proposed in #60 . Then, ordinary import statements are always synchronous and never reordered. However, significant concerns are raised in that thread.

littledan added a commit to littledan/proposal-top-level-await that referenced this issue Mar 19, 2019
This patch is a variant on tc39#49 which determines which module subgraphs
are to be executed synchronously based on syntax (whether the module
contains a top-level await syntactically) and the dependency graph
(whether it imports a module which contains a top-level await,
recursively). This fixed check is designed to be more predictable and
analyzable.

Abstract module record changes:
- The [[Async]] field stores whether this module or dependencies are
  async. It is expected to be initialized by the Linking phase.
- The [[ExecutionPromise]] field stores the Promise related to the
  evaluation of a module whose [[Async]] field is *true*.
- Evaluate() returns a Promise for [[Async]] modules, a completion
  record for sync modules which throw, or undefined otherwise.

Cyclic Module Record changes:
- A new [[ModuleAsync]] field stores whether this particular module
  is asynchronous (dependencies aside).
- The ExecuteModule method on Cyclic Module Records takes an
  argument for the Promise capability, but only if that particular
  module is [[ModuleAsync]].
- The Link/Instantiate phase is used to propagate the [[Async]]
  field up the module graph to dependencies.
- When there's a cycle, with some modules sync and some async, the
  whole cycle is considered async, with the Promise of each module
  set to the entrypoint of the cycle, although the cycle-closing
  edge will not actually be awaited (since this would be a deadlock).

Source Text Module Record changes:
- The check for whether a module contains a top-level await locally
  is in a ContainsAwait algorithm (TBD writing this out, but it
  should be static since await may not appear in a direct eval)
- Module execution works as before if ContainsAwait is false, and
  works like an async function if ContainsAwait is true.

Closes tc39#47, tc39#48, tc39#43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants