-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overhaul vat-timer: virtualized, durable, upgradable, Clock, TimerBrand #5847
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments so far. #5862 is also a review comment.
I have not yet dug into the meat of this. Looking forward to doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just more comments so far. Not done reviewing yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There are still things I'd like to understand better. But we can discuss them after this is merged (and once I get back on Wednesday).
Questions for reviewers:
There is a general problem of how to know when "background" async work has finished, before it is safe to either check for consequences, or to initiate further async activity that effectively polls mutable state that the background work might or might not have finished modifying. The old Zoe This is not something the I have not yet changed the TimerService API to use Brand-bearing timestamps, but the cutover will be just a few lines in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor proofreading comments.
I'm done with vat-timer and its primary tests, but have a bunch of other code to review still.
That reasoning sounds right to me.
When the throw or rejection is not indicating that something bad happened, not indicating that a possible bug symptom may have just occurred, then I prefer that the thrown thing or the rejection reason not be an error, for all the reasons you state above. I like your suggestion that it should be strongly analogous, while testably distinct, from the rejection reasons with which we indicate upgrade.
This seems right.
In the scenario you mention, "...
Sure. Goldens are fragile, and the notifier spec is inherently non-deterministic.
I'm not following this yet.
Good. There's no need to do that in this same PR. |
It looks like 1: zoe's |
Today's discussion pointed out some changes that I need to make:
if (timeStamp <= latestTick) {
return resolve(priceInQuote(amountIn, brandOut, latestTick));
}
// else push `resolve` into a map that is checked within the `startTicker` handler We walked through the simplest I see calls to I see |
That commit (ce2347b) implements the Notifier/Iterator changes. For clarity: the "finish state" of a cancelled TimerNotifier has a I'm still working on the |
I think I understand the fencepost error. In the old version (on current trunk):
The main problem was that
So the test was asking for the price as of time 3n, but that was a bit ambiguous as to whether it wanted the price before or after the In the new version:
As a result, when the I hit a similar problem in Most of the other tests are working now that I've completed the mechanical changes of the |
I walked through |
0551e25
to
d955681
Compare
In Agoric/agoric-sdk#5847 , Zoe's `tools/manualTimer.js` is updated to align with the new SwingSet TimerService. In this new version, when the ManualTimer is configured with `eventLoopIteration`, an `await tick()` will properly drain the promise queue before proceeding. In addition, the TimerService will fire a `setWakeup(now)` immediately, rather than requiring a `tick()` first. This changes the behavior of tests which were not correctly synchronizing before, especially the timer-based fakePriceAuthority. Where previously you could set a price sequence of `[20, 55]` and use a single `await tick()` to get to `time=1` and `price=20`, in the new version, you use no ticks, and start out with `time=0` and `price=20`. A single `await tick()` gets you `time=1` and `price=55`. This requires changes to the unit tests, in general removing one `tick()` and decrementing the expected timestamp by one.
54d56f4
to
b34946c
Compare
9477a0c
to
2bbc9f3
Compare
I re-rebased because some zoe durability changes just landed.. wanted to make sure they didn't conflict. Sorry for the churn. |
@warner, this new code should be converted to arrow function style. WebStorm has good support for converting. Would you like me to push a commit that fixes all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
The main thing it needs is conversion of the function
s to arrow-style (which I'll happily help with).
I have a couple of question/suggestions; nearly all on comments, though one is on the eventLoopIteration()
added back in test-vaultFactory.js.
Yeah, I guess so. I find the arrow style less readable, so I haven't been using it in kernel code, but I'd like to see how it looks, and maybe that will get me over it. Let's convert everything in |
I pushed b6d8ab6
I agree about readability, but there were other considerations that we decided mattered even more than readability. |
@FUDCo and I walked through this code today, so I think I have a virtual r+ from him. We identified a couple of things to be cleaned up, and I did some refactoring to save a few duplicate |
In Agoric/agoric-sdk#5847 , Zoe's `tools/manualTimer.js` is updated to align with the new SwingSet TimerService. In this new version, when the ManualTimer is configured with `eventLoopIteration`, an `await tick()` will properly drain the promise queue before proceeding. In addition, the TimerService will fire a `setWakeup(now)` immediately, rather than requiring a `tick()` first. This changes the behavior of tests which were not correctly synchronizing before, especially the timer-based fakePriceAuthority. Where previously you could set a price sequence of `[20, 55]` and use a single `await tick()` to get to `time=1` and `price=20`, in the new version, you use no ticks, and start out with `time=0` and `price=20`. A single `await tick()` gets you `time=1` and `price=55`. This requires changes to the unit tests, in general removing one `tick()` and decrementing the expected timestamp by one.
3201368
to
c9f4a17
Compare
vat-timer is now fully virtualized, durablized, and upgradeable. RAM usage should be O(N) in the number of: * pending Promise wakeups (`wakeAt`, `delay`) * active Notifier promises (`makeNotifier`) * active Iterator promises (`makeNotifier()[Symbol.asyncIterator]`) Pending promises will be disconnected (rejected) during upgrade, as usual. All handlers and Promises will fire with the most recent timestamp available, which (under load) may be somewhat later than the scheduled wakeup time. Until cancellation, Notifiers will always report a scheduled time (i.e. `start` plus some multiple of the interval). The opaque `updateCount` used in Notifier updates is a counter starting from 1n. When a Notifier is cancelled, the final/"finish" value is the timestamp of cancellation, which may or may not be a multiple of the interval (and might be a duplicate of the last non-final value). Once in the cancelled state, `getUpdateSince(anything)` yields `{ value: cancellationTimestamp, updateCount: undefined }`, and the corresponding `iterator.next()` resolves to `{ value: cancellationTimestamp, done: true }`. Neither will ever reject their Promises (except due to upgrade). Asking for a wakeup in the past or present will fire immediately. Most API calls will accept an arbitrary Far object as a CancelToken, which can be used to cancel the wakeup/repeater. `makeRepeater` is the exception. This does not change the device-timer API or implementation, however vat-timer now only uses a single device-side wakeup, and only exposes a single handler object, to minimize the memory usage and object retention by the device (since devices do not participate in GC). This introduces a `Clock` which can return time values without also providing scheduling authority, and a `TimerBrand` which can validate time values without providing clock or scheduling authority. Timestamps are not yet Branded, but the scaffolding is in place. `packages/SwingSet/tools/manual-timer.js` offers a manually-driven timer service, which can help with unit tests. closes #4282 refs #4286 closes #4296 closes #5616 closes #5668 closes #5709 refs #5798
The Zoe package provides a manually-driven timer service, for use by unit tests in both zoe and other contract-centric packages. To minimize API drift, this `buildManualTimer` is now a wrapper around the one provided by SwingSet. The wrapper continues to provide the same `tick()` and `tickN()` controls, but there are a few differences: * The signature is now `buildManualTimer(log, startValue, options)`, and the previous `timeStep` positional argument now goes in the options bag. * `options.eventLoopIteration`: To help unit tests defer looking for timer consequences until after any triggered activity has settled, `tick()` returns a Promise. Previously this waited for the return promise of the user-provided `wake()` handlers, but that was never completely reliable: if the handler used Promises and `.then` to trigger more work, but did not `await` or otherwise couple its return Promise with that work, it might fire while callbacks were still ready on the event queue, leading to a race between the test code doing `await tick()` (before polling for state changes) and the callbacks that changed that state. In this version, building the manualTimer with the function from `zoe/tools/eventLoopIteration.js` should reliably flush the promise queue before `tick()`'s promise fires. * If `options.eventLoopIteration` is *not* provided, `tick()` does not wait at all. Various tests using explicit `eventLoopIteration()` calls were changed to configure manualTimer instead. * Some of the logged messages have been removed (`&& running a task`), because they could not made to work reliably. * The new timer service may fire events in slightly different orders than before, so some test expectations were updated. * The `updateCount` provided by the TimerNotifier has changed (previously it was a counter, now it is a timestamp). This is opaque to callers, so nothing should be inspecting it, but a few tests were assuming it would increment sequentially.
The "priceAuthority quoteAtTime" test was designed to exercise the API that waits for a particular time to arrive, then emits a price quote for that timestamp. Previously, the fakePriceAuthority used two uncoordinated timer calls. The authority used `makeRepeater()` to periodically update `currentPriceIndex` and `latestTick`. And the `quoteAtTime()` API method used its own `setWakeup()`, to wait for a time to arrive, then sample `currentPriceIndex` to build the quote. The relative firing order of these nominally-simultaneous wakeup events is not specified by the TimerService API, and apparently changed in the new implementation. (Now that Zoe's manualTimer is built on top of the real vat-timer implementation, the opportunity for future divergence should be reduced). With the old implementation, the `quoteAtTime(3n)` alarm was firing first. It sampled `currentPriceIndex` before the `makeRepeater` wakeup could change it. So it built a quote from the price for `time=2n`, where `currentPriceIndex = 1` (i.e. price=55) and a timestamp of 3n. The new implementation of fakePriceAuthority removes the uncoordinated `setWakeup`. When `quoteAtTime` is given a time in the future, it adds the timestamp and a `resolve` callback to a list named `timeClients`. Then the single `makeRepeater` handler checks this list and fires anything no longer in the future, *after* it has updated `currentPriceIndex`. In addition, this implementation starts its `makeRepeater()` at time 1, not time 0. The new manualTimer responds to `delay=0` by firing the repeater immediately, whereas the old one would only fire during a `tick()` call. This caused the new implementation to increment `currentPriceIndex` one time too many, as it was counting wakeups, not the changes to the timestamp received by the wakeups. As an extra defense, the `startTicker` handler was changed to ignore a wakeup with `time === 0`. When run against the new implementation, the unit test sees the price=55 happen at time `2n`, not `3n`, which I think was the original intent. A similar issue caused `test-callSpread.js` to need a different option expiration time.
The test-vaultFactory.js "price falls precipitously" test exercises the asset price changing in four steps: * initial conditions: t=0, price=2200 * tick(): t=1, price=19180 * tick(): t=2, price=1650 * tick(): t=3, price=150 A loan is taken out at t=0. The drop to price=1650 is not quite enough to trigger liquidation. The drop to price=150 does cause liquidation, moreover the price is so low that it falls underwater entirely, so all of the collateral is sold, the client gets nothing back, and the vault manager must tap the reserves to avoid insolvency. Previously, this test used *four* calls to `tick()`, asserting that liquidation did not happen after any of the first three. It appears that this only passed because the `await tick()` was unable to completely wait for all triggered activity to complete (it merely waited on the `wake()` result promise, and did not do a full `setImmediate` / `eventLoopIteration`). When I inserted `await eventLoopIteration()` calls after `tick()` in the original version, the test failed, as liquidation was happening (and completing) after the *third* `tick()`. Now that our `manualTimer.tick()` can be configured to completely flush the promise queue, I'm removing the extra `tick()` call, and the "liquidation has not happened" assertion that was being made too early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go fer it.
In Agoric/agoric-sdk#5847 , Zoe's `tools/manualTimer.js` is updated to align with the new SwingSet TimerService. In this new version, when the ManualTimer is configured with `eventLoopIteration`, an `await tick()` will properly drain the promise queue before proceeding. In addition, the TimerService will fire a `setWakeup(now)` immediately, rather than requiring a `tick()` first. This changes the behavior of tests which were not correctly synchronizing before, especially the timer-based fakePriceAuthority. Where previously you could set a price sequence of `[20, 55]` and use a single `await tick()` to get to `time=1` and `price=20`, in the new version, you use no ticks, and start out with `time=0` and `price=20`. A single `await tick()` gets you `time=1` and `price=55`. This requires changes to the unit tests, in general removing one `tick()` and decrementing the expected timestamp by one.
const buildManualTimer = (log = nolog, startValue = 0n, options = {}) => { | ||
const { | ||
timeStep = 1n, | ||
eventLoopIteration = () => 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I looked over the timer PR test change I didn’t see anything surprising except that eventLoopIteration has to be passed in as an arg. I remember skimming it and thinking “okay” but I just took another look and I’m confused why it has to be passed in as a function instead of a flag. E.g. iterateEventLoop: boolean = false
.
@warner what motivates receiving an arbitrary function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a handful of "swingset" unit tests (as opposed to "unit" unit tests) which use the manual timer too, which means they import manualTimer.js
into code that runs inside a real vat. And setImmediate
is not available to vat code (the kernel doesn't populate it in the Compartments that houses the vat code) to keep vats from retaining agency after claiming their crank/delivery has finished. So manualTimer.js
can't provide that functionality by itself.
We experimented with a version that used if (globalThis.setImmediate)
to try and work with both, but it ended up being unsatisfying, and it wasn't clear that all tests would want (or could even tolerate) the automatic wait-for-flush behavior all the time, so it seemed better to give tests a choice.
That might be worth revisiting, now that we figured out the other test failures (which turned out to be the result of different assumptions about timers that are born ready, and races between "simultaneous" events).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I'll humbly ask that we make the common case simpler (without real vat) even if it makes the utility more complex. Consider:
- add default option
iterateEventLoop = false
- make
eventLoopIteration
option default to undefined - when
iterateEventLoop
is true, manualTimer does that automatically if it can - if it can't, it uses
eventLoopIteration
option and throws an error if it's not set, informing the dev that with a real vat it must be passed in
vat-timer is now fully virtualized, durablized, and upgradeable. RAM
usage should be O(N) in the number of:
wakeAt
,delay
)makeNotifier
)makeNotifier()[Symbol.asyncIterator]
)Pending promises will be disconnected (rejected) during upgrade, as
usual.
All handlers and Promises will fire with the most recent timestamp
available, which (under load) may be somewhat later than the scheduled
wakeup time.
Until cancellation, Notifiers will always report a scheduled time
(i.e.
start
plus some multiple of the interval). The opaqueupdateCount
used in Notifier updates is a counter starting from 1n.When a Notifier is cancelled, the final/"finish" value is the
timestamp of cancellation, which may or may not be a multiple of the
interval (and might be a duplicate of the last non-final value). Once
in the cancelled state,
getUpdateSince(anything)
yields{ value: cancellationTimestamp, updateCount: undefined }
, and thecorresponding
iterator.next()
resolves to{ value: cancellationTimestamp, done: true }
. Neither will ever reject theirPromises (except due to upgrade).
Asking for a wakeup in the past or present will fire immediately.
Most API calls will accept an arbitrary Far object as a CancelToken,
which can be used to cancel the wakeup/repeater.
makeRepeater
is theexception.
This does not change the device-timer API or implementation, however
vat-timer now only uses a single device-side wakeup, and only exposes
a single handler object, to minimize the memory usage and object
retention by the device (since devices do not participate in GC).
This introduces a
Clock
which can return time values without alsoproviding scheduling authority, and a
TimerBrand
which can validatetime values without providing clock or scheduling
authority. Timestamps are not yet Branded, but the scaffolding is in
place.
packages/SwingSet/tools/manual-timer.js
offers a manually-driventimer service, which can help with unit tests.
closes #4282
refs #4286
closes #4296
closes #5616
closes #5668
closes #5709
refs #5798