Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overhaul vat-timer: virtualized, durable, upgradable, Clock, TimerBrand #5847

Merged
merged 4 commits into from
Aug 19, 2022

Conversation

warner
Copy link
Member

@warner warner commented Jul 28, 2022

vat-timer is now fully virtualized, durablized, and upgradeable. RAM
usage should be O(N) in the number of:

  • pending Promise wakeups (wakeAt, delay)
  • active Notifier promises (makeNotifier)
  • active Iterator promises (makeNotifier()[Symbol.asyncIterator])

Pending promises will be disconnected (rejected) during upgrade, as
usual.

All handlers and Promises will fire with the most recent timestamp
available, which (under load) may be somewhat later than the scheduled
wakeup time.

Until cancellation, Notifiers will always report a scheduled time
(i.e. start plus some multiple of the interval). The opaque
updateCount used in Notifier updates is a counter starting from 1n.
When a Notifier is cancelled, the final/"finish" value is the
timestamp of cancellation, which may or may not be a multiple of the
interval (and might be a duplicate of the last non-final value). Once
in the cancelled state, getUpdateSince(anything) yields { value: cancellationTimestamp, updateCount: undefined }, and the
corresponding iterator.next() resolves to { value: cancellationTimestamp, done: true }. Neither will ever reject their
Promises (except due to upgrade).

Asking for a wakeup in the past or present will fire immediately.

Most API calls will accept an arbitrary Far object as a CancelToken,
which can be used to cancel the wakeup/repeater. makeRepeater is the
exception.

This does not change the device-timer API or implementation, however
vat-timer now only uses a single device-side wakeup, and only exposes
a single handler object, to minimize the memory usage and object
retention by the device (since devices do not participate in GC).

This introduces a Clock which can return time values without also
providing scheduling authority, and a TimerBrand which can validate
time values without providing clock or scheduling
authority. Timestamps are not yet Branded, but the scaffolding is in
place.

packages/SwingSet/tools/manual-timer.js offers a manually-driven
timer service, which can help with unit tests.

closes #4282
refs #4286
closes #4296
closes #5616
closes #5668
closes #5709
refs #5798

@warner warner self-assigned this Jul 28, 2022
@warner warner added the SwingSet package: SwingSet label Jul 28, 2022
Copy link
Member

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments so far. #5862 is also a review comment.

I have not yet dug into the meat of this. Looking forward to doing so.

packages/SwingSet/docs/timer.md Show resolved Hide resolved
packages/SwingSet/docs/timer.md Outdated Show resolved Hide resolved
packages/SwingSet/docs/timer.md Outdated Show resolved Hide resolved
packages/SwingSet/docs/timer.md Outdated Show resolved Hide resolved
packages/SwingSet/package.json Outdated Show resolved Hide resolved
packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
@erights erights self-requested a review August 4, 2022 17:16
Copy link
Member

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just more comments so far. Not done reviewing yet.

packages/SwingSet/tools/manual-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
Copy link
Member

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

There are still things I'd like to understand better. But we can discuss them after this is merged (and once I get back on Wednesday).

packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
@warner
Copy link
Member Author

warner commented Aug 11, 2022

Questions for reviewers:

  • Somebody said that the new timerBrand did not need a isMyClock() method, but now I'm not so sure. If you hold a branded Timestamp and someone hands you a Clock, would you ever want to verify that the Clock produces the right kind of timestamps? It seems to me that if you aren't going to be handed a TimerService too, there's no way to detect a deceptive Clock without this extra method.
  • when Promise-flavored timer/repeater/notifier is cancelled, this rejects any Promise with Error('TimerCancelled')
    • should I use a non-Error rejection object instead, like { name: 'TimerCancelled' } ? We do that with Promises that are rejected because of a vat upgrade, so clients can distinguish those cancellations from actual rejections, but I'm not sure that's appropriate here
    • that Error causes a vat-timer.js stack trace to be included in the rejection (userspace does not see it, because of our censorship, but it's noisy in unit tests, and implies something went wrong within vat-timer, when cancellation is entirely controlled by clients). Would it be appropriate for me to create that Error object once, at top-level module scope, and re-use the error singleton whenever a rejection is needed? OTOH I suppose that would capture the stack of liveslots and the supervisor when they first import the timer vat code, which is even less relevant.
  • All APIs (except for Notifiers and their iterators) now fire with the most recent timestamp, not the time they were originally scheduled for (these would only diverge if the timer device and timer vat suffered delays in communication).
    • Notifiers (and their iterators) fire with the most recent nominal event prior to the query. They simulate an "eager" Notifier that is being triggered every interval ticks, from which the client is fetching the most recent update. If start a Notifier with delay=0 and manage to submit your getUpdateSince(undefined) before any time has passed, you'll get a Promise that fulfills promptly to the current time.
    • I think this is mostly the same as the previous behavior, but the Zoe manual timer behaved slightly differently, so I wanted to make sure this specified behavior seems correct
  • I replaced Zoe's manualTimer.js with a logging/awaiting-augmented wrapper around a new SwingSet-provided manual-timer.js (which only has advanceTo(newTime))
    • some of the "&& scheduling event for.." log messages were too hard to support, so I removed them from the test cases that expected them
    • the delivery order of some nominally-simultaneous events seems to have changed, so a few "golden transcript" -style tests were modified to match
    • the fakePriceAuthority needed to start its Repeater one tick later, to get the same results as before. I'm not entirely sure how this works, so I'm hoping someone else can look at it and make sure I didn't break anything

There is a general problem of how to know when "background" async work has finished, before it is safe to either check for consequences, or to initiate further async activity that effectively polls mutable state that the background work might or might not have finished modifying. The old Zoe manualTimer.js offered some synchronization tools, but the new (real) timer API is bigger and doesn't lend itself to as complete a solution. I've added enough promise-tracking awaits into the Zoe wrapper to support the existing tests (with some small changes), but in general, new tests should probably use await eventLoopIteration() (or some other setImmediate -based wrapper, not just a few await Promise.resolve()s) after calling tick(), if they want to ensure that all the wakeups thus triggered have finally rolled to a halt.

This is not something the manualTimer can do internally, because several zoe/etc test/swingsetTests/ use the manual timer inside a vat, where setImmediate is not available. I changed tickN() to accept eventLoopIteration as an argument, which it will call and await if present, so the "unit" unit-tests can do that, while the "swingset" unit-tests do not.

I have not yet changed the TimerService API to use Brand-bearing timestamps, but the cutover will be just a few lines in vat-timer.js, and I'm guessing we should do that very soon. The swingset unit tests will require more work to adapt, but it's mostly mechanical.

@erights erights self-requested a review August 11, 2022 23:12
Copy link
Contributor

@Chris-Hibbert Chris-Hibbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor proofreading comments.

I'm done with vat-timer and its primary tests, but have a bunch of other code to review still.

packages/SwingSet/docs/timer.md Outdated Show resolved Hide resolved
packages/SwingSet/docs/timer.md Outdated Show resolved Hide resolved
@erights
Copy link
Member

erights commented Aug 12, 2022

Questions for reviewers:

  • Somebody said that the new timerBrand did not need a isMyClock() method, but now I'm not so sure. If you hold a branded Timestamp and someone hands you a Clock, would you ever want to verify that the Clock produces the right kind of timestamps? It seems to me that if you aren't going to be handed a TimerService too, there's no way to detect a deceptive Clock without this extra method.

That reasoning sounds right to me.

  • when Promise-flavored timer/repeater/notifier is cancelled, this rejects any Promise with Error('TimerCancelled')

    • should I use a non-Error rejection object instead, like { name: 'TimerCancelled' } ? We do that with Promises that are rejected because of a vat upgrade, so clients can distinguish those cancellations from actual rejections, but I'm not sure that's appropriate here
    • that Error causes a vat-timer.js stack trace to be included in the rejection (userspace does not see it, because of our censorship, but it's noisy in unit tests, and implies something went wrong within vat-timer, when cancellation is entirely controlled by clients). Would it be appropriate for me to create that Error object once, at top-level module scope, and re-use the error singleton whenever a rejection is needed? OTOH I suppose that would capture the stack of liveslots and the supervisor when they first import the timer vat code, which is even less relevant.

When the throw or rejection is not indicating that something bad happened, not indicating that a possible bug symptom may have just occurred, then I prefer that the thrown thing or the rejection reason not be an error, for all the reasons you state above. I like your suggestion that it should be strongly analogous, while testably distinct, from the rejection reasons with which we indicate upgrade. harden({ name: 'TimerCancelled' }) seems like a fine choice to me.

  • All APIs (except for Notifiers and their iterators) now fire with the most recent timestamp, not the time they were originally scheduled for (these would only diverge if the timer device and timer vat suffered delays in communication).

This seems right.

  • Notifiers (and their iterators) fire with the most recent nominal event prior to the query. They simulate an "eager" Notifier that is being triggered every interval ticks, from which the client is fetching the most recent update. If start a Notifier with delay=0 and manage to submit your getUpdateSince(undefined) before any time has passed, you'll get a Promise that fulfills promptly to the current time.
  • I think this is mostly the same as the previous behavior, but the Zoe manual timer behaved slightly differently, so I wanted to make sure this specified behavior seems correct

In the scenario you mention, "...delay=0...before any time has passed...", would the old notifier wait a tick first? Neither behavior seems incorrect to me, but the new behavior seems better.

  • I replaced Zoe's manualTimer.js with a logging/awaiting-augmented wrapper around a new SwingSet-provided manual-timer.js (which only has advanceTo(newTime))
    • some of the "&& scheduling event for.." log messages were too hard to support, so I removed them from the test cases that expected them
    • the delivery order of some nominally-simultaneous events seems to have changed, so a few "golden transcript" -style tests were modified to match

Sure. Goldens are fragile, and the notifier spec is inherently non-deterministic.

  • the fakePriceAuthority needed to start its Repeater one tick later, to get the same results as before. I'm not entirely sure how this works, so I'm hoping someone else can look at it and make sure I didn't break anything

There is a general problem of how to know when "background" async work has finished, before it is safe to either check for consequences, or to initiate further async activity that effectively polls mutable state that the background work might or might not have finished modifying. The old Zoe manualTimer.js offered some synchronization tools, but the new (real) timer API is bigger and doesn't lend itself to as complete a solution. I've added enough promise-tracking awaits into the Zoe wrapper to support the existing tests (with some small changes), but in general, new tests should probably use await eventLoopIteration() (or some other setImmediate -based wrapper, not just a few await Promise.resolve()s) after calling tick(), if they want to ensure that all the wakeups thus triggered have finally rolled to a halt.

This is not something the manualTimer can do internally, because several zoe/etc test/swingsetTests/ use the manual timer inside a vat, where setImmediate is not available. I changed tickN() to accept eventLoopIteration as an argument, which it will call and await if present, so the "unit" unit-tests can do that, while the "swingset" unit-tests do not.

I'm not following this yet.

I have not yet changed the TimerService API to use Brand-bearing timestamps, but the cutover will be just a few lines in vat-timer.js, and I'm guessing we should do that very soon. The swingset unit tests will require more work to adapt, but it's mostly mechanical.

Good. There's no need to do that in this same PR.

@warner
Copy link
Member Author

warner commented Aug 12, 2022

It looks like 1: zoe's buildManualTimer is being used by dapps too, so any API changes must do argument-sensing and be backwards compatible, and 2: the "fakePriceAuthority has to start at 1 instead of 0" problem that I saw in zoe will also occur in the dapp tests that have copied that code. So I think I have to dig into the dapps and understand that fencepost problem before I can land this without disruption, sigh.

@warner
Copy link
Member Author

warner commented Aug 12, 2022

Today's discussion pointed out some changes that I need to make:

  • change TimerNotifier to use an updateCount that starts at 1n and increments sequentially from there, by using something like BigInt(Math.floor((now - delay) / interval)+1)
    • @erights was weirded out by updateCount not being sequential
    • this would also simplify the conditionals in getUpdateSince
    • also remove the Error('invalid updateCount for timer Notifier') clause
    • if the supplied updateCount is not undefined, then we check whether it matches the current value, or is different: don't do more than that
    • the +1 is to ensure that a live updateCount is not falsy, to guard against clients who accidentally use if (!updateCount) instead of if (updateCount === undefined) to sense the finish condition
  • change TimerNotifier's canceller to put the notifier into a "finish" state, rather than an erroring state
    • cancellation should record the current time, to compute a final update value
    • this makes all outstanding and all future getUpdateSince promises fire with { value: lastTime, updateCount: lastUpdateCount }
  • zoe's buildManualTimer will change from the old buildManualTimer(log, startTime=0n, timeStep=1n) to buildManualTimer(log=dummy, startTime=0n, options={ timeStep=1n, eventLoopIteration }
    • if eventLoopIteration is provided, tick() will finish with await eventLoopIteration()
    • this removes the { doWait: true } I was using
    • unit tests will need to change their buildManualTimer calls to provide eventLoopIteration, but then they won't need to do their own call after tick()
  • zoe's fakePriceAuthority contains a race because it has multiple uncoordinated calls into the timer, specifically startTicker vs quoteAtTime
    • the plan is to make quoteAtTime follow the ticker (the notifier driven by startTicker), rather than using E(timer).setWakeup itself
    • so quoteAtTime looks like:
if (timeStamp <= latestTick) {
  return resolve(priceInQuote(amountIn, brandOut, latestTick));
}
// else push `resolve` into a map that is checked within the `startTicker` handler

We walked through the simplest test-fakePriceAuthority.js test (priceAuthority quoteAtTime), and we think the fourth call to tick() was a workaround for the lack of complete eventLoopIteration synchronication: the time = 3n update (which is odd, not even, so the price should be 55) should be fired by the third call to tick(). The failures I've seen in my branches are because the race between the quoteAtTime call to E(timer).setWakeup and the startTicker use of E(timer).setRepeater. And setRepeater(0) now fires earlier, which doesn't change latestTick but does start incrementing currentPriceIndex earlier.

I see calls to buildManualTimer in three external repositories: dapp-card-store, dapp-oracle, and dapp-otc. None of them use the timeStep argument, so it should be relatively safe to convert that argument slot into an options bag.

I see fakePriceAuthority mentioned in two external repos: dapp-oracle and dapp-treasury. The dapp-oracle copy is significantly different than the one in zoe: I see things like makeLinearPriceAuthority and makeSinglePriceAuthority. I don't know if that makes it newer or older than the zoe one, but it will probably require its own analysis. I haven't looked at the dapp-treasury copy yet, but it looks like it's in an example, not the main code.

@warner
Copy link
Member Author

warner commented Aug 15, 2022

That commit (ce2347b) implements the Notifier/Iterator changes. For clarity: the "finish state" of a cancelled TimerNotifier has a value of the cancellation time, not the latest update (which wouldn't exist anyways, if the Notifier was cancelled before the first event happened).

I'm still working on the manualTimer and priceAuthority changes, so zoe tests are probably borked.

@warner
Copy link
Member Author

warner commented Aug 15, 2022

I think I understand the fencepost error. In the old version (on current trunk):

  • Zoe's manualTimer.makeRepeater(delay=0) does not wake right away (while time = 0). Instead, the first wake() occurs after the test calls tick() for the first time, and the handlers get a wake(time=1).
  • fakePriceAuthority has a handler that is subscribed to that makeRepeater(delay=0)
    • the first time handler.wake is called, it leaves currentPriceIndex alone (it remains as 0)
    • all subsequent times, it increments currentPriceIndex
    • it records the current timestamp in latestTick, then updates a Notifier named ticker with the timestamp
      • ticker is consumed by generateQuotes
  • some of the API is pretty immediate: quoteGiven/quoteWanted grabs the current timestamp and uses currentPriceIndex to look up the current price, then returns a quote
  • quoteWhenGTE/etc push a resolver onto comparisonQueue, and each time handler.wake fires, it resolves anything in the queue that is matched by the new price
  • quoteAtTime uses its own E(timer).setWakeup(timeStamp) to wake a handler at the requested time. It then takes the handler's time argument and uses priceInQuote() to build a quote from the current currentPriceIndex and time

The main problem was that quoteAtTime is scheduling a wakeup for the same time as the makeRepeater, and the invocation order of two such "simultaneous" alarms is not a well-specified part of the TimerService API. The test-fakePriceAuthority.js test named priceAuthority quoteAtTime was accidentally dependent upon a specific ordering of events:

  • the test calls E(priceAuthority).quoteAtTime(when=3n), which schedules a wakeAt(3n)
  • tick() is called several times, all of which fire the makeRepeater handler in startTicker
    • the tick() that advances time from 0n to 1n leaves currentPriceIndex at 0, and sets latestTick = 1n
    • the tick() that advances time from 1n to 2n updates currentPriceIndex to 1, and sets latestTick = 2n
  • the wakeAt(3n) handler fires first, is handed time=3n, and priceInQuote() is called with time=3n
    • it polls currentPriceIndex and gets 1, which tells it to use the second time in the [20, 55] repeating list
    • the quoteAtTime() return promise resolves to a quote with time=3n and a price based on 55
  • then the makeRepeater() handler fires with time=3n, which updates currentPriceIndex to 2, and sets latestTick = 3n

So the test was asking for the price as of time 3n, but that was a bit ambiguous as to whether it wanted the price before or after the startTicker handler had a chance to increase the currentPriceIndex. The test behavior depended upon getting the price before the increment.

In the new version:

  • I changed the makeRepeater to start at time=1, not 0, to avoid the extra wake(time=0) call from causing the currentPriceIndex to be incremented one time too many
    • I also changed the handler to return immediately if time===0n, as a backup
  • to avoid an ambiguity, I changed quoteAtTime to remove the uncoordinated setWakeup() call
    • instead, I added a timeClients list
    • quoteAtTime pushes a [when, resolver] pair onto that list
    • the startTicker handler updates currentPriceIndex and latestTick, then walks timeClients to find entries where when >= latestTick, and calls the resolver
    • this ensures that the quoteAtTime action happens consistently after the currentPriceIndex has been updated

As a result, when the priceAuthority quoteAtTime test wants a price of 55, it really needs to ask for the price at time 2n, not 3n. I updated the test to use quoteAtTime(2n), and to expect to see 2n in the quote's timestamp.

I hit a similar problem in zoe/test/unitTests/contracts/test-callSpread.js, which exercises timed options. I had to change the expiration: 3n to 2n so they would sample the expected price.

Most of the other tests are working now that I've completed the mechanical changes of the buildManualTimer() signature, and passing in eventLoopIteration on the tests that need it.

@warner warner requested a review from FUDCo August 15, 2022 22:37
@warner
Copy link
Member Author

warner commented Aug 16, 2022

I walked through test-vaultFactory.js with @Chris-Hibbert just now. It looks like the old version was doing more tick()s than it should have needed, because the old tick() was unable to completely wait for the promise queue to drain. So the test had evolved to meet the bugs in the infrastructure. The new tick() (in that test, and several others) is configured to use eventLoopIteration() to fully drain the queue, and so it needs one fewer tick. I'll put a note in the commit comment with the details.

@warner warner force-pushed the 5668-vat-timer branch 3 times, most recently from 0551e25 to d955681 Compare August 16, 2022 05:15
warner added a commit to Agoric/dapp-oracle that referenced this pull request Aug 16, 2022
In Agoric/agoric-sdk#5847 , Zoe's
`tools/manualTimer.js` is updated to align with the new SwingSet
TimerService. In this new version, when the ManualTimer is configured
with `eventLoopIteration`, an `await tick()` will properly drain the
promise queue before proceeding. In addition, the TimerService will
fire a `setWakeup(now)` immediately, rather than requiring a `tick()`
first.

This changes the behavior of tests which were not correctly
synchronizing before, especially the timer-based
fakePriceAuthority. Where previously you could set a price sequence of
`[20, 55]` and use a single `await tick()` to get to `time=1` and
`price=20`, in the new version, you use no ticks, and start out with
`time=0` and `price=20`. A single `await tick()` gets you `time=1` and
`price=55`.

This requires changes to the unit tests, in general removing one
`tick()` and decrementing the expected timestamp by one.
@warner warner force-pushed the 5668-vat-timer branch 2 times, most recently from 54d56f4 to b34946c Compare August 16, 2022 07:13
@warner warner marked this pull request as ready for review August 16, 2022 07:13
@warner warner requested a review from turadg as a code owner August 16, 2022 07:13
@warner warner added this to the Mainnet 1 RC0 milestone Aug 16, 2022
@Tartuffo Tartuffo removed this from the Mainnet 1 RC0 milestone Aug 16, 2022
@warner warner force-pushed the 5668-vat-timer branch 2 times, most recently from 9477a0c to 2bbc9f3 Compare August 16, 2022 17:11
@warner
Copy link
Member Author

warner commented Aug 16, 2022

I re-rebased because some zoe durability changes just landed.. wanted to make sure they didn't conflict. Sorry for the churn.

@Chris-Hibbert
Copy link
Contributor

@warner, this new code should be converted to arrow function style. WebStorm has good support for converting. Would you like me to push a commit that fixes all the functions here?

Copy link
Contributor

@Chris-Hibbert Chris-Hibbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

The main thing it needs is conversion of the functions to arrow-style (which I'll happily help with).

I have a couple of question/suggestions; nearly all on comments, though one is on the eventLoopIteration() added back in test-vaultFactory.js.

packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/src/vats/timer/vat-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/test/timer/test-timer.js Outdated Show resolved Hide resolved
packages/SwingSet/test/test-vat-timer.js Outdated Show resolved Hide resolved
@warner
Copy link
Member Author

warner commented Aug 16, 2022

@warner, this new code should be converted to arrow function style. WebStorm has good support for converting. Would you like me to push a commit that fixes all the functions here?

Yeah, I guess so. I find the arrow style less readable, so I haven't been using it in kernel code, but I'd like to see how it looks, and maybe that will get me over it. Let's convert everything in vat-timer.js, manual-timer.js, test-vat-timer.js, and test-manual-timer.js.. I think that would avoid single files having a mix of styles, while not increasing the scope of the PR.

@Chris-Hibbert
Copy link
Contributor

@warner, this new code should be converted to arrow function style. WebStorm has good support for converting. Would you like me to push a commit that fixes all the functions here?

Yeah, I guess so. ... Let's convert everything in vat-timer.js, manual-timer.js, test-vat-timer.js, and test-manual-timer.js.. I think that would avoid single files having a mix of styles, while not increasing the scope of the PR.

I pushed b6d8ab6

I find the arrow style less readable, so I haven't been using it in kernel code, but I'd like to see how it looks, and maybe that will get me over it.

I agree about readability, but there were other considerations that we decided mattered even more than readability.

@warner
Copy link
Member Author

warner commented Aug 19, 2022

@FUDCo and I walked through this code today, so I think I have a virtual r+ from him. We identified a couple of things to be cleaned up, and I did some refactoring to save a few duplicate getNow() device calls. @erights said to not wait for him. And I think @turadg has seen everything he needs to know about the test changes. So I think I'm ready to land. I'll rebase (to fix that test-priceAggregator.js merge conflict) and squash back down into a small number of commits.

warner added a commit to Agoric/dapp-oracle that referenced this pull request Aug 19, 2022
In Agoric/agoric-sdk#5847 , Zoe's
`tools/manualTimer.js` is updated to align with the new SwingSet
TimerService. In this new version, when the ManualTimer is configured
with `eventLoopIteration`, an `await tick()` will properly drain the
promise queue before proceeding. In addition, the TimerService will
fire a `setWakeup(now)` immediately, rather than requiring a `tick()`
first.

This changes the behavior of tests which were not correctly
synchronizing before, especially the timer-based
fakePriceAuthority. Where previously you could set a price sequence of
`[20, 55]` and use a single `await tick()` to get to `time=1` and
`price=20`, in the new version, you use no ticks, and start out with
`time=0` and `price=20`. A single `await tick()` gets you `time=1` and
`price=55`.

This requires changes to the unit tests, in general removing one
`tick()` and decrementing the expected timestamp by one.
@warner warner force-pushed the 5668-vat-timer branch 2 times, most recently from 3201368 to c9f4a17 Compare August 19, 2022 05:16
warner added 4 commits August 18, 2022 23:04
vat-timer is now fully virtualized, durablized, and upgradeable. RAM
usage should be O(N) in the number of:

* pending Promise wakeups (`wakeAt`, `delay`)
* active Notifier promises (`makeNotifier`)
* active Iterator promises (`makeNotifier()[Symbol.asyncIterator]`)

Pending promises will be disconnected (rejected) during upgrade, as
usual.

All handlers and Promises will fire with the most recent timestamp
available, which (under load) may be somewhat later than the scheduled
wakeup time.

Until cancellation, Notifiers will always report a scheduled time
(i.e. `start` plus some multiple of the interval). The opaque
`updateCount` used in Notifier updates is a counter starting from 1n.
When a Notifier is cancelled, the final/"finish" value is the
timestamp of cancellation, which may or may not be a multiple of the
interval (and might be a duplicate of the last non-final value). Once
in the cancelled state, `getUpdateSince(anything)` yields `{ value:
cancellationTimestamp, updateCount: undefined }`, and the
corresponding `iterator.next()` resolves to `{ value:
cancellationTimestamp, done: true }`. Neither will ever reject their
Promises (except due to upgrade).

Asking for a wakeup in the past or present will fire immediately.

Most API calls will accept an arbitrary Far object as a CancelToken,
which can be used to cancel the wakeup/repeater. `makeRepeater` is the
exception.

This does not change the device-timer API or implementation, however
vat-timer now only uses a single device-side wakeup, and only exposes
a single handler object, to minimize the memory usage and object
retention by the device (since devices do not participate in GC).

This introduces a `Clock` which can return time values without also
providing scheduling authority, and a `TimerBrand` which can validate
time values without providing clock or scheduling
authority. Timestamps are not yet Branded, but the scaffolding is in
place.

`packages/SwingSet/tools/manual-timer.js` offers a manually-driven
timer service, which can help with unit tests.

closes #4282
refs #4286
closes #4296
closes #5616
closes #5668
closes #5709
refs #5798
The Zoe package provides a manually-driven timer service, for use by
unit tests in both zoe and other contract-centric packages. To
minimize API drift, this `buildManualTimer` is now a wrapper around
the one provided by SwingSet. The wrapper continues to provide the
same `tick()` and `tickN()` controls, but there are a few differences:

* The signature is now `buildManualTimer(log, startValue, options)`,
  and the previous `timeStep` positional argument now goes in the
  options bag.

* `options.eventLoopIteration`: To help unit tests defer looking for
  timer consequences until after any triggered activity has settled,
  `tick()` returns a Promise. Previously this waited for the return
  promise of the user-provided `wake()` handlers, but that was never
  completely reliable: if the handler used Promises and `.then` to
  trigger more work, but did not `await` or otherwise couple its return
  Promise with that work, it might fire while callbacks were still ready
  on the event queue, leading to a race between the test code doing
  `await tick()` (before polling for state changes) and the callbacks
  that changed that state. In this version, building the manualTimer
  with the function from `zoe/tools/eventLoopIteration.js` should
  reliably flush the promise queue before `tick()`'s promise fires.

* If `options.eventLoopIteration` is *not* provided, `tick()` does not
  wait at all. Various tests using explicit `eventLoopIteration()`
  calls were changed to configure manualTimer instead.

* Some of the logged messages have been removed (`&& running a task`),
  because they could not made to work reliably.

* The new timer service may fire events in slightly different orders
  than before, so some test expectations were updated.

* The `updateCount` provided by the TimerNotifier has
  changed (previously it was a counter, now it is a timestamp). This is
  opaque to callers, so nothing should be inspecting it, but a few tests
  were assuming it would increment sequentially.
The "priceAuthority quoteAtTime" test was designed to exercise the API
that waits for a particular time to arrive, then emits a price quote
for that timestamp.

Previously, the fakePriceAuthority used two uncoordinated timer
calls. The authority used `makeRepeater()` to periodically update
`currentPriceIndex` and `latestTick`. And the `quoteAtTime()` API
method used its own `setWakeup()`, to wait for a time to arrive, then
sample `currentPriceIndex` to build the quote.

The relative firing order of these nominally-simultaneous wakeup
events is not specified by the TimerService API, and apparently
changed in the new implementation. (Now that Zoe's manualTimer is
built on top of the real vat-timer implementation, the opportunity for
future divergence should be reduced).

With the old implementation, the `quoteAtTime(3n)` alarm was firing
first. It sampled `currentPriceIndex` before the `makeRepeater` wakeup
could change it. So it built a quote from the price for `time=2n`,
where `currentPriceIndex = 1` (i.e. price=55) and a timestamp of 3n.

The new implementation of fakePriceAuthority removes the uncoordinated
`setWakeup`. When `quoteAtTime` is given a time in the future, it adds
the timestamp and a `resolve` callback to a list named
`timeClients`. Then the single `makeRepeater` handler checks this list
and fires anything no longer in the future, *after* it has updated
`currentPriceIndex`.

In addition, this implementation starts its `makeRepeater()` at time
1, not time 0. The new manualTimer responds to `delay=0` by firing the
repeater immediately, whereas the old one would only fire during a
`tick()` call. This caused the new implementation to increment
`currentPriceIndex` one time too many, as it was counting wakeups, not
the changes to the timestamp received by the wakeups. As an extra
defense, the `startTicker` handler was changed to ignore a wakeup with
`time === 0`.

When run against the new implementation, the unit test sees the
price=55 happen at time `2n`, not `3n`, which I think was the original
intent.

A similar issue caused `test-callSpread.js` to need a different option
expiration time.
The test-vaultFactory.js "price falls precipitously" test exercises
the asset price changing in four steps:

* initial conditions: t=0, price=2200
* tick(): t=1, price=19180
* tick(): t=2, price=1650
* tick(): t=3, price=150

A loan is taken out at t=0. The drop to price=1650 is not quite enough
to trigger liquidation. The drop to price=150 does cause liquidation,
moreover the price is so low that it falls underwater entirely, so all
of the collateral is sold, the client gets nothing back, and the vault
manager must tap the reserves to avoid insolvency.

Previously, this test used *four* calls to `tick()`, asserting that
liquidation did not happen after any of the first three. It appears
that this only passed because the `await tick()` was unable to
completely wait for all triggered activity to complete (it merely
waited on the `wake()` result promise, and did not do a full
`setImmediate` / `eventLoopIteration`). When I inserted `await
eventLoopIteration()` calls after `tick()` in the original version,
the test failed, as liquidation was happening (and completing) after
the *third* `tick()`.

Now that our `manualTimer.tick()` can be configured to completely
flush the promise queue, I'm removing the extra `tick()` call, and the
"liquidation has not happened" assertion that was being made too
early.
@warner warner added the automerge:no-update (expert!) Automatically merge without updates label Aug 19, 2022
@warner warner changed the title vat-timer upgrade overhaul vat-timer: virtualized, durable, upgradable, Clock, TimerBrand Aug 19, 2022
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go fer it.

@mergify mergify bot merged commit 130feae into master Aug 19, 2022
@mergify mergify bot deleted the 5668-vat-timer branch August 19, 2022 06:55
warner added a commit to Agoric/dapp-oracle that referenced this pull request Aug 19, 2022
In Agoric/agoric-sdk#5847 , Zoe's
`tools/manualTimer.js` is updated to align with the new SwingSet
TimerService. In this new version, when the ManualTimer is configured
with `eventLoopIteration`, an `await tick()` will properly drain the
promise queue before proceeding. In addition, the TimerService will
fire a `setWakeup(now)` immediately, rather than requiring a `tick()`
first.

This changes the behavior of tests which were not correctly
synchronizing before, especially the timer-based
fakePriceAuthority. Where previously you could set a price sequence of
`[20, 55]` and use a single `await tick()` to get to `time=1` and
`price=20`, in the new version, you use no ticks, and start out with
`time=0` and `price=20`. A single `await tick()` gets you `time=1` and
`price=55`.

This requires changes to the unit tests, in general removing one
`tick()` and decrementing the expected timestamp by one.
const buildManualTimer = (log = nolog, startValue = 0n, options = {}) => {
const {
timeStep = 1n,
eventLoopIteration = () => 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I looked over the timer PR test change I didn’t see anything surprising except that eventLoopIteration has to be passed in as an arg. I remember skimming it and thinking “okay” but I just took another look and I’m confused why it has to be passed in as a function instead of a flag. E.g. iterateEventLoop: boolean = false.

@warner what motivates receiving an arbitrary function?

Copy link
Member Author

@warner warner Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a handful of "swingset" unit tests (as opposed to "unit" unit tests) which use the manual timer too, which means they import manualTimer.js into code that runs inside a real vat. And setImmediate is not available to vat code (the kernel doesn't populate it in the Compartments that houses the vat code) to keep vats from retaining agency after claiming their crank/delivery has finished. So manualTimer.js can't provide that functionality by itself.

We experimented with a version that used if (globalThis.setImmediate) to try and work with both, but it ended up being unsatisfying, and it wasn't clear that all tests would want (or could even tolerate) the automatic wait-for-flush behavior all the time, so it seemed better to give tests a choice.

That might be worth revisiting, now that we figured out the other test failures (which turned out to be the result of different assumptions about timers that are born ready, and races between "simultaneous" events).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I'll humbly ask that we make the common case simpler (without real vat) even if it makes the utility more complex. Consider:

  • add default option iterateEventLoop = false
  • make eventLoopIteration option default to undefined
  • when iterateEventLoop is true, manualTimer does that automatically if it can
  • if it can't, it uses eventLoopIteration option and throws an error if it's not set, informing the dev that with a real vat it must be passed in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:no-update (expert!) Automatically merge without updates SwingSet package: SwingSet
Projects
None yet
6 participants