feat(notifier): subscribeEach/subscribeLatest iterators retry when broken by vat upgrade #7401

gibson042 · 2023-04-12T21:57:44Z

Description

"latest" iterators look for rejections that indicate failure due to vat upgrade, and upon encountering them retry the getUpdateSince call—if the source is durable, the retry will succeed and provide the next iteration result. If the source is not durable, the retry will go splat and the error associated with that will be propagated in place of the vatUpgraded rejection (which is perhaps unfortunate, but seems to be the only way of guaranteeing that a real fail result with unfortunate timing is not replaced by a vatUpgraded rejection).

"each" iterators behave similarly, but additionally fail if multiple values are published before they can successfully reconnect because they cannot tolerate gaps in the sequence.

Security Considerations

A malicious source can fake vatUpgraded rejections, which the "latest" iterators will dutifully accept and respond to by retrying as long as incarnationNumber indicates forward progress. Something like this is necessary to handle the edge case where a second upgrade occurs in between the consumer receiving a rejection and the producer receiving the retry, but we could put in a retry count limit if this is perceived as an actual risk.

Scaling Considerations

I don't think this affects scaling.

Documentation Considerations

These relatively fine implementation details are covered by JSDoc comments.

Testing Considerations

The rapid-upgrade edge case is not covered by testing, but I consider that acceptable. If reviewers disagree, I think I can add a new file with manually-constructed mock producers.

mhofman · 2023-04-12T23:33:11Z

subscribeEach iterators continue to fail in that scenario, because they cannot guarantee absence of gaps.

That makes sense. But I'm somewhat worried about what that means for these consumers, especially since it seems they may be using subscibeEach through layers of utils without knowing. That said I don't see the alternative.

seems to be the only way of guaranteeing that a real fail result with unfortunate timing is not replaced by a vatUpgraded rejection).

Could we annotate the rejection with the upgraded rejection?

gibson042 · 2023-04-16T18:05:50Z

Updated to opportunistically reconnect subscribeEach iterators when possible per #5185 (comment) .

gibson042 · 2023-04-16T20:54:58Z

This PR introduces a locally-reproducible failure for the "Watch interest accrue" test in test-stakeFactory.js that might be real—d.waitDays(90) is followed by d.checkRUNDebt(195n), which compares E(vault).getCurrentDebt() against 195e6 and comes up just outside the acceptable tolerance of 0.2e6:

    name: AssertionError
    assertion: deepEqual
    values:
      'Difference (- actual, + expected):': |2-
          {
            brand: Object @Alleged: IST brand {},
        -   value: 194796320n,
        +   value: 195000000n,
          }
    at: >-
      approxEqual
      (packages/inter-protocol/test/stakeFactory/test-stakeFactory.js:507:7)

      Object.checkRUNDebt
      (packages/inter-protocol/test/stakeFactory/test-stakeFactory.js:742:7)

      async packages/inter-protocol/test/stakeFactory/test-stakeFactory.js:991:3

I'm not sure how E(vault).getCurrentDebt() depends upon subscribeEach or subscribeLatest, but I can confirm that the reported value in that test is affected by the count of awaits they entail (e.g., removing the await null in reconnectAsNeeded would reduce that 194.796320e6 to 194.730620e6). The situation reads to me like sensitivity to artificial testing details and can be "fixed" by increasing the value of epsilon from micro.unit / 5n to e.g. micro.unit / 4n, but I'd like confirmation from @dckc and/or @turadg that such a change would maintain the intent of the stakeFactory tests (which were added in entirety by #4741), and also if possible a suggested explanation of what epsilon values are appropriate in case further changes are warranted in the future.

turadg · 2023-04-18T21:25:51Z

@gibson042 thanks for the detailed question. @dckc and I conferred and agree with your conclusion. For appropriate epsilon value, go with whatever works because StakeFactory is not scheduled for any release.

michaelfig

Looking good! Just a few comments that shouldn't block approval.

michaelfig · 2023-04-25T03:17:22Z

packages/internal/src/upgrade-api.js

+  });
+harden(makeUpgradeDisconnection);
+
+// TODO: Simplify once we have @endo/patterns (or just export the shape).


We do have @endo/patterns now.

My first attempt didn't quite work, so I'm deferring this until we've got a little more time to spend on it.

michaelfig · 2023-04-25T03:36:46Z

packages/notifier/src/subscribe.js

+      // rejections here too to avoid invalid unhandled rejection issues later.
+      void E.when(publishCountP, sink, sink);
+      void E.when(nextCellP, sink, sink);


I'm uncertain about why publishCountP is sunk, but tailP (originally pubList at this point) is not. Can you elaborate (either by replying here, or in a code comment) as to why these two calls are necessary?

I suspect that there may be a clearer way to avoid introducing any unhandled rejections with minimal sinks. I just don't see it right now.

Any rejection of tailP would be handled in reconnectAsNeeded.

michaelfig · 2023-04-25T03:37:51Z

packages/notifier/src/subscribe.js

+      const firstCellP = reconnectAsNeeded(() => E(topic).subscribeAfter());
+      return makeEachIterator(topic, firstCellP);


This is really clear!

…grade subscribeEach iterators continue to fail in that scenario, because they cannot guarantee absence of gaps. Fixes #5185

…tion

…onnection Depends upon PublishKit `publishCount` being a gap-free sequence of bigints.

… sinking

…eFactory

…ertSimilarAmount

gibson042 requested review from turadg, mhofman and michaelfig April 12, 2023 21:57

gibson042 force-pushed the gibson-5185-reconnect-subscribers branch 3 times, most recently from 548ed4c to 36ee17b Compare April 13, 2023 16:40

gibson042 mentioned this pull request Apr 13, 2023

Notifier observers to discriminate between vat failure and upgrade #5185

Closed

gibson042 force-pushed the gibson-5185-reconnect-subscribers branch 2 times, most recently from aad76d2 to 0e926a7 Compare April 16, 2023 18:00

gibson042 changed the title ~~feat(notifier): subscribeLatest iterators retry when broken by vat upgrade~~ feat(notifier): subscribeEach/subscribeLatest iterators retry when broken by vat upgrade Apr 16, 2023

michaelfig approved these changes Apr 25, 2023

View reviewed changes

gibson042 added 11 commits April 25, 2023 13:06

feat(notifier): subscribeLatest iterators retry when broken by vat up…

e96a0ee

…grade subscribeEach iterators continue to fail in that scenario, because they cannot guarantee absence of gaps. Fixes #5185

refactor: Move DisconnectionObject generation/testing into "internal"

ef7fc68

chore(notifier): Annotate subscriber rejections that follow disconnec…

a6b275a

…tion

chore(notifier): Tolerate failure of an opportunistic assert.note()

bdbbe7d

feat(notifier): Opportunistic eachIterator recovery from upgrade disc…

229d7b2

…onnection Depends upon PublishKit `publishCount` being a gap-free sequence of bigints.

style(notifier): Rename some bindings to clarify their use

9e84193

test(notifier): Reduce nesting for better actual vs. expected diffs

10debb3

chore(notifier): Include all relevant promises in automatic rejection…

19aa90c

… sinking

test(inter-protocol): Increase await-sensitive tolerance in test-stak…

092e7f3

…eFactory

test(inter-protocol): Refactor approxEqual into a scale-sensitive ass…

61b85e1

…ertSimilarAmount

chore(notifier): Clarify makeEachIterator rejection suppression

e6d65d0

gibson042 force-pushed the gibson-5185-reconnect-subscribers branch from 2ec199f to e6d65d0 Compare April 25, 2023 17:08

gibson042 added the automerge:no-update (expert!) Automatically merge without updates label Apr 25, 2023

mergify bot merged commit bf2a9ff into master Apr 25, 2023

mergify bot deleted the gibson-5185-reconnect-subscribers branch April 25, 2023 17:56

erights mentioned this pull request Jan 29, 2024

refactor(internal): Doing a TODO for UpgradeDisconnectionShape #8831

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(notifier): subscribeEach/subscribeLatest iterators retry when broken by vat upgrade #7401

feat(notifier): subscribeEach/subscribeLatest iterators retry when broken by vat upgrade #7401

gibson042 commented Apr 12, 2023 •

edited

Loading

mhofman commented Apr 12, 2023

gibson042 commented Apr 16, 2023

gibson042 commented Apr 16, 2023 •

edited

Loading

turadg commented Apr 18, 2023

michaelfig left a comment

michaelfig Apr 25, 2023

gibson042 Apr 25, 2023

michaelfig Apr 25, 2023

gibson042 Apr 25, 2023

michaelfig Apr 25, 2023

		const firstCellP = reconnectAsNeeded(() => E(topic).subscribeAfter());
		return makeEachIterator(topic, firstCellP);

feat(notifier): subscribeEach/subscribeLatest iterators retry when broken by vat upgrade #7401

feat(notifier): subscribeEach/subscribeLatest iterators retry when broken by vat upgrade #7401

Conversation

gibson042 commented Apr 12, 2023 • edited Loading

Description

Security Considerations

Scaling Considerations

Documentation Considerations

Testing Considerations

mhofman commented Apr 12, 2023

gibson042 commented Apr 16, 2023

gibson042 commented Apr 16, 2023 • edited Loading

turadg commented Apr 18, 2023

michaelfig left a comment

Choose a reason for hiding this comment

michaelfig Apr 25, 2023

Choose a reason for hiding this comment

gibson042 Apr 25, 2023

Choose a reason for hiding this comment

michaelfig Apr 25, 2023

Choose a reason for hiding this comment

gibson042 Apr 25, 2023

Choose a reason for hiding this comment

michaelfig Apr 25, 2023

Choose a reason for hiding this comment

gibson042 commented Apr 12, 2023 •

edited

Loading

gibson042 commented Apr 16, 2023 •

edited

Loading