Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detecting an empty vat promise queue (end of "crank") #45

Closed
dckc opened this issue Aug 28, 2019 · 27 comments · Fixed by #2225
Closed

detecting an empty vat promise queue (end of "crank") #45

dckc opened this issue Aug 28, 2019 · 27 comments · Fixed by #2225
Assignees
Labels
SwingSet package: SwingSet

Comments

@dckc
Copy link
Member

dckc commented Aug 28, 2019

This test seems to be failing on xs (now that my tape work-alike is actually working):

https://github.com/Agoric/SwingSet/blob/7cd651107a31ba84d9b9f11610d72b5477035fd9/test/test-queue-priority.js#L5

{"actual":[1,2,3,5,6,4],"expected":[1,2,3,4,5,6]}

What's the justification for that test? Is the queue priority specified?

In what way does SwingSet rely on the queue priority?

cc @warner @phoddie

@dtribble
Copy link
Member

JavaScript specifies that the promise queue is higher priority than the timer queue. As a result, JS is guaranteed that it will not take outside events until the promise queue is empty.

@phoddie
Copy link

phoddie commented Aug 28, 2019

@dtribble, you note:

JavaScript specifies that the promise queue is higher priority than the timer queue

Since timers are not defined by JavaScript, I assume you are referring more generally to events delivered by the host using a mechanism other than promises (e.g. timer callbacks). Some non-definitive web searching appears to confirm that this may be the case in browsers. However, I was unable to find text in ECMA-262 that defines this behavior. That said ECMA-262 is huge, so I may well have overlooked that. This morning I only focused on Jobs and Job Queues section.

If you wouldn't mind, would you help me with the normative reference for this behavior? Thank you!

@erights
Copy link
Member

erights commented Aug 28, 2019

Attn @littledan What's the status of this?

@littledan
Copy link

@erights I think this PR is ready to land, but there may be an editorial alternative from @jmdyck coming soon.

@erights
Copy link
Member

erights commented Sep 2, 2019

Link to PR please?

@littledan
Copy link

tc39/ecma262#1597

@erights
Copy link
Member

erights commented Sep 2, 2019

Yes, that's what I was looking for. Thanks.

@dckc
Copy link
Member Author

dckc commented Sep 28, 2019

@littledan writes Sep 19:

We already have consensus on this layering change at TC39 ...

@phoddie that puts the ball in your court, yes? Would it help for me to open an issue in the Moddable repo?

Has anyone added test262 tests for this? At what point in the process does that usually happen? In a few recent cases I was a little concerned that fixes went into the Moddable repo without corresponding tests, but then it occurred to me that the tests should probably go into test262.

@phoddie
Copy link

phoddie commented Sep 29, 2019

When the specification in ECMA-262 settles down, we can look at revised behavior. Making the above example work as expected is straightforward. However, I expect there are more subtle behaviors implied by the changes under discussion. Good coverage in test262 would help make the intentions clear.

Finding an efficient implementation for what I understand the behavior to be looks to be non-trivial on microcontrollers where watch-dogs trigger if a single turn of the event loop runs for very long.

@erights
Copy link
Member

erights commented Sep 29, 2019

What happens when these watch-dogs bark? Where can I read about this? I worry about the same consistency issue I explain at https://github.com/Agoric/proposal-oom-fails-fast

@phoddie
Copy link

phoddie commented Sep 29, 2019

Ha. Sorry for not explaining more. Let me try.

The watchdogs detect software hangs - infinite loops, deadlocks, etc. When they trigger, the usual action is to reboot the microcontroller. These are used to ensure an embedded device doesn't become unresponsive as a result of a bug. The threshold can be quite low - 100ms. There is considerable variation in the details across microcontrollers, and often there are many configuration options. The ESP32 watchdog is reasonably representative.

Unlike memory exhaustion, there's no consistency issue, as the watchdog reboots the entire device.

My concern is implementing the behavior under consideration for run loops where a promise resolves a promise that resolves another promise in a long chain. Each step may be short enough to avoid the watchdog, but in total they are not. To avoid that, it appears necessary to exit the run loop before servicing the all promises and resume on the next turn without allow timers/etc to execute. That's possible.

An easier work around is to "feed" the watchdog on each promise, effectively tell it "we didn't hang, we're just busy, so reset your timeout". But, there are problems with that. If the RTOS has work it only does when the event loop returns control to it, that would be deferred indefinitely. On an ESP8266, for example, the network stack runs at that time. Also, in my experience, feeding the watchdog has a bad habit of introducing subtle bugs that causing the watchdog to then fail to detect real issues.

Anyway... I don't think there's an issue here from a JavaScript or SES perspective. I just believe there are implementation concerns that will take some care to arrive at a result that is both correct and efficient on the embedded device XS targets.

@littledan
Copy link

Sorry, to clarify, the layering change does not say that Promises are higher priority than other jobs. That continues to be defined by the host environment. IMO it would be nice for the ecosystem to converge on this kind of scheduling where possible, but I understand if that's not appropriate for all environments.

@phoddie
Copy link

phoddie commented Sep 29, 2019

Thanks, @littledan. We're holding off on any implementation until the dust settles. To be clear, I'm not suggesting that we can't support the various event loop rules being discussed, only that there are some interesting challenges to implement them well in our target environments.

@warner warner transferred this issue from Agoric/SwingSet Dec 1, 2019
@warner warner added the SwingSet package: SwingSet label Dec 1, 2019
dckc pushed a commit to dckc/agoric-sdk that referenced this issue Dec 5, 2019
R4R: Json and Stringer structs in querier instead of CLI
dckc pushed a commit to dckc/agoric-sdk that referenced this issue Dec 5, 2019
dckc pushed a commit to dckc/agoric-sdk that referenced this issue Dec 5, 2019
* update outdated packages

* update ertp to 0.0.9
dckc pushed a commit to dckc/agoric-sdk that referenced this issue Dec 5, 2019
This rewrites most everything: the way objects+promises are referenced (both
in vats and the kernel), the way messages manage their result promises, the
kernel-side run-queue, the kernel-side promise tables, the way comms messages
are formatted, the entire comms layer, and all the comms tests. The kernel
now supports pipelining sends all the way into the deciding vat, and the
comms vat will pipeling those sends to the remote deciding machine.

It does not yet implement "forwarding" (replacing one promise with a
different one), nor do promise references get deleted after their promise has
been resolved.

The comms vat must be created with the `enablePipelining` option set to
`true` to get the pipelining behavior.

closes Agoric#88
closes Agoric#34
closes Agoric#79
closes Agoric#45
@dckc
Copy link
Member Author

dckc commented Dec 6, 2019

preempting queued callbacks looks tricky with glib

I found that the xs lin platform uses the glib main loop API. At first, this made me think addressing this issue is a matter of specifying G_PRIORITY_HIGH when queuing promise jobs.

But g_main_context_iteration () works like this:

Runs a single iteration for the given main loop. This involves checking to see if any event sources are ready to be processed, then if no events sources are ready and may_block is TRUE, waiting for a source to become ready, then dispatching the highest priority events sources that are ready.

Recall the test code:

  const log = [];
  setImmediate(() => log.push(1));
  setImmediate(() => {
    log.push(2);
    Promise.resolve().then(() => log.push(4));
    log.push(3);
  });
  setImmediate(() => log.push(5));
  setImmediate(() => log.push(6));

  t.deepEqual(log, [1, 2, 3, 4, 5, 6]);

By the time the push(4) is queued, glib has already prepared to dispatch 5 and 6 and using G_PRIORITY_HIGH doesn't preempt that.

I was going to say this can't be done with glib, but they do break down g_main_context_iteration in such a way that parts of it can be replaced:

Customizing the main loop iteration
Single iterations of a GMainContext can be run with g_main_context_iteration(). In some cases, more detailed control of exactly how the details of the main loop work is desired, for instance, when integrating the GMainLoop with an external main loop. In such cases, you can call the component functions of g_main_context_iteration() directly. These functions are g_main_context_prepare(), g_main_context_query(), g_main_context_check() and g_main_context_dispatch().

So we could make a customized version of g_main_context_dispatch that handles this sort of preemption. But it's around 100 lines of pretty tricky looking code.

@michaelfig
Copy link
Member

michaelfig commented Dec 6, 2019

Can we do something like:

const waitForPromises = inXS
  ? cb => setTimeout(cb, 1) // or other XS primitive?
  : cb => setImmediate(cb);

because AFAICT, setImmediate is only used by SwingSet to detect when no more promises will run. Or maybe some other primitive to execute only when the promise queue is done, not necessarily futzing with the relative priorities of the existing methods?

@dckc dckc changed the title queue priority specified? relied on? detecting an empty vat promise queue (end of "crank") Dec 7, 2019
@dckc
Copy link
Member Author

dckc commented Dec 7, 2019

yes, @michaelfig , something like that. @warner and I chatted about this and now I think I understand the requirements better. It need not involve setTimeout at all. I updated the issue title as such.

@phoddie
Copy link

phoddie commented Dec 7, 2019 via email

@dckc
Copy link
Member Author

dckc commented Dec 7, 2019

Thanks, @phoddie . Something like that may very well work.

But the core of the issue is not really priority of promises vs. host callbacks. It's about detecting when the promise queue of a "vat" under the control of a "kernel" is empty. The "kernel" was using setTimeout() to queue something that would only come back after the "vat" was done with all the promises it wanted to queue for itself.

@phoddie
Copy link

phoddie commented Dec 7, 2019 via email

@dckc
Copy link
Member Author

dckc commented Dec 7, 2019

Browsers presumably work like node. The original test passes in chrome (with const setImmediate = f => setTimeout(f, 0)).

@phoddie
Copy link

phoddie commented Dec 7, 2019 via email

@dckc
Copy link
Member Author

dckc commented Dec 7, 2019

Yeah; I put "vat" and "kernel" in quotes because they're somewhat local terms of art.

There are some docs in https://github.com/Agoric/agoric-sdk/tree/master/packages/SwingSet/docs but I'm not sure even those are up to date.

No need to apologize; I think you may well have helped. But standing aside is fine too.

@dckc
Copy link
Member Author

dckc commented Jan 19, 2020

On xs, I'm getting:

# Exception: replay: cannot coerce undefined to object!

I traced it to empty playbackSyscalls at s = playbackSyscalls.shift():

function replay(name, ...args) {
const s = playbackSyscalls.shift();
if (djson.stringify(s.d) !== djson.stringify([name, ...args])) {

Is it possible this queue priority stuff is to blame?

@dckc
Copy link
Member Author

dckc commented Jan 15, 2021

fixed in #2194

@dckc dckc closed this as completed Jan 15, 2021
@dckc
Copy link
Member Author

dckc commented Jan 18, 2021

double-check that the setImmediate queue is empty too in xsnap
(from discussion with @warner)

@dckc dckc reopened this Jan 18, 2021
@dckc
Copy link
Member Author

dckc commented Jan 18, 2021

Oh yeah... we turned off setImmediate in the current xsnap. I wonder if/how liveslots is going to work without it. I'll press on and see...

@dckc
Copy link
Member Author

dckc commented Jan 23, 2021

In #2225 (comment) @kriskowal writes:

I remain weary that waitUntilQuiescent rests on the same slippery slope of ever deepening nested loops and priority queues as Node.js and the web. I fear that this creates a cottage industry for people who bear the deep knowledge of the hierarchy of micro-tasks and promise resolution, IO callbacks, timers, ticks, setImmediate, mutation observers, message passing, and now the notion of quiescence.

One can legitimately lose their mind sorting them out https://github.com/kriskowal/asap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants