You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the kernel notifies the vat of the resolution of a promise that is resolved to data (or rejected with data), in some cases promise cross-references contained in the resolution value can trigger a pathological chain of promise retirements followed by subscribe syscalls followed by re-notifications of the resolution, that continue ad infinitum. And, of course, infinite loops are bad.
Description of the Design
The cure is to atomically notify the vat of the resolution of a promise together with the (previously existing) resolutions of any other promises contained in the resolution value's reference graph that are unknown to the vat (i.e., not in the vat's c-list) at the time of notification. Currently the notifications for the other promises in the reference graph are triggered by the vat noticing, in course of processing the initial notification, that some of the promises referenced in the data are unknown, and in response issuing subscribe syscalls for those promises. This results in the notifications for these dependent promises being delivered in later cranks, in turn resulting in the cascading failure mentioned above, since other intermediate promises may have been retired in the meantime.
Delivering a collection of promise resolutions in a single notification requires a change to the kernel->vat notification interface. A complication is that we currently have three different kinds of resolution (resolve-to-data, resolve-to-presence, and reject) realized by three different notification calls, but since an aggregate resolution notification referencing multiple promises might reference promises that fall into more than one of these cases, the three cases will need to be unified. (Such unification is a cleanup notion we've been considering for a while, but largely on aesthetic grounds. Since there wasn't any pressing need driving it, it has been low enough on our priority list to never make it to an actionable state.)
More specifically, a notify delivery to a vat currently only contains the kpid of the single promise whose resolution is being notified; the implementation then interrogates the kernel's known state for that promise to decide what kind of action to take. This unfolds into one of three calls, notifyFulfillToPresence, notifyFulfillToData, or notifyReject, according to the state of the promise, extracting the remaining necessary parameters directly from the kernel's record of the promise itself. Each of the three individual notification functions takes different parameters specific to the kind of resolution that it is performing; in particular, both notifyFulfillToData and notifyReject represent the resolution value as a capdata object, whereas notifyFulfillToPresence simply passes the id of the presence directly, since that form is easier to handle and considerably more compact than the capdata representation.
Each of the three handlers both handles the specific case it implements and then does some post notification cleanup (such as retiring the now-resolved promise, which is specifically the thing that's getting us into the trouble motivating this refactoring). The actual case-specific handling logic can most likely be retained with minimal adaptation, but the overall processing of the aggregate notification will need to first handle the resolutions of each the notified promises and only when this is complete do the subsequent cleanup for all of them.
The plan for implementing this is as follows:
Change both sides of the kernel->vat notification interface to allow the passage of a list of kpids in place of the single kpid passed now. The kernel-side notification code will behave as it did before by passing a list of length 1. The vat-side notify handler will iterate over the list but simply invoke the existing, otherwise unchanged, individual case handlers as it currently does.
Split the individual case handlers and the code that iterates over the notified promise list so that we do all the notifications in one pass then all the post-notification cleanups in a second pass.
Modify the kernel side of the notification (i.e., the code that issues the notify request) to walk the reference graph of the promise that it is given, remembering all the promises it finds that are not presently in the target vat's c-list. This entire list of promise ids will then be included in the notify request it sends.
Security Considerations
Since the intended semantics of notification remain unchanged, this refactoring should not introduce any new security considerations. However, the infinite request sequence that can be triggered using the older (i.e., current) implementation represents a possible denial of service attack vector. If we implement this refactoring successfully and it works the way we intend, that potential attack vector should be eliminated. We should regard that elimination as a primary goal of this effort and a requirement for its successful completion.
Both @warner and I have an intuition that the mechanism outlined here may only address one side of the recursive-runaway-on-promise-resolution problem, namely the kernel->vat side. We think there may be a corresponding issue in the vat->kernel side, wherein a single vat generating a set of circularly linked promise resolutions and then communicating them outward may trigger a similar runaway. We do not understand this case nearly as well, not even well enough to know if such a problem actually exists -- what we have at this point is just a suspicion. It is our hope that by implementing the fix described here we will improve our understanding of the problem space. Additionally, it may turn out that some of the examples we have in hand of runaway promise resolutions are actually manifestations of a vat->kernel problem; if so, we will discover this.
Test Plan
Obviously, all the existing tests of promise resolution, promise retirement, and notification should continue to pass successfully. In addition, I have a collection of swingset-runner demos that manifest the problematic behavior we are attempting to address here. Once we have verified that these examples no longer produce a runaway, they will be converted to unit tests and added to our test repertoire.
The text was updated successfully, but these errors were encountered:
What is the Problem Being Solved?
When the kernel notifies the vat of the resolution of a promise that is resolved to data (or rejected with data), in some cases promise cross-references contained in the resolution value can trigger a pathological chain of promise retirements followed by
subscribe
syscalls followed by re-notifications of the resolution, that continue ad infinitum. And, of course, infinite loops are bad.Description of the Design
The cure is to atomically notify the vat of the resolution of a promise together with the (previously existing) resolutions of any other promises contained in the resolution value's reference graph that are unknown to the vat (i.e., not in the vat's c-list) at the time of notification. Currently the notifications for the other promises in the reference graph are triggered by the vat noticing, in course of processing the initial notification, that some of the promises referenced in the data are unknown, and in response issuing
subscribe
syscalls for those promises. This results in the notifications for these dependent promises being delivered in later cranks, in turn resulting in the cascading failure mentioned above, since other intermediate promises may have been retired in the meantime.Delivering a collection of promise resolutions in a single notification requires a change to the kernel->vat notification interface. A complication is that we currently have three different kinds of resolution (resolve-to-data, resolve-to-presence, and reject) realized by three different notification calls, but since an aggregate resolution notification referencing multiple promises might reference promises that fall into more than one of these cases, the three cases will need to be unified. (Such unification is a cleanup notion we've been considering for a while, but largely on aesthetic grounds. Since there wasn't any pressing need driving it, it has been low enough on our priority list to never make it to an actionable state.)
More specifically, a
notify
delivery to a vat currently only contains the kpid of the single promise whose resolution is being notified; the implementation then interrogates the kernel's known state for that promise to decide what kind of action to take. This unfolds into one of three calls,notifyFulfillToPresence
,notifyFulfillToData
, ornotifyReject
, according to the state of the promise, extracting the remaining necessary parameters directly from the kernel's record of the promise itself. Each of the three individual notification functions takes different parameters specific to the kind of resolution that it is performing; in particular, bothnotifyFulfillToData
andnotifyReject
represent the resolution value as a capdata object, whereasnotifyFulfillToPresence
simply passes the id of the presence directly, since that form is easier to handle and considerably more compact than the capdata representation.Each of the three handlers both handles the specific case it implements and then does some post notification cleanup (such as retiring the now-resolved promise, which is specifically the thing that's getting us into the trouble motivating this refactoring). The actual case-specific handling logic can most likely be retained with minimal adaptation, but the overall processing of the aggregate notification will need to first handle the resolutions of each the notified promises and only when this is complete do the subsequent cleanup for all of them.
The plan for implementing this is as follows:
Change both sides of the kernel->vat notification interface to allow the passage of a list of kpids in place of the single kpid passed now. The kernel-side notification code will behave as it did before by passing a list of length 1. The vat-side notify handler will iterate over the list but simply invoke the existing, otherwise unchanged, individual case handlers as it currently does.
Split the individual case handlers and the code that iterates over the notified promise list so that we do all the notifications in one pass then all the post-notification cleanups in a second pass.
Modify the kernel side of the notification (i.e., the code that issues the
notify
request) to walk the reference graph of the promise that it is given, remembering all the promises it finds that are not presently in the target vat's c-list. This entire list of promise ids will then be included in thenotify
request it sends.Security Considerations
Since the intended semantics of notification remain unchanged, this refactoring should not introduce any new security considerations. However, the infinite request sequence that can be triggered using the older (i.e., current) implementation represents a possible denial of service attack vector. If we implement this refactoring successfully and it works the way we intend, that potential attack vector should be eliminated. We should regard that elimination as a primary goal of this effort and a requirement for its successful completion.
Both @warner and I have an intuition that the mechanism outlined here may only address one side of the recursive-runaway-on-promise-resolution problem, namely the kernel->vat side. We think there may be a corresponding issue in the vat->kernel side, wherein a single vat generating a set of circularly linked promise resolutions and then communicating them outward may trigger a similar runaway. We do not understand this case nearly as well, not even well enough to know if such a problem actually exists -- what we have at this point is just a suspicion. It is our hope that by implementing the fix described here we will improve our understanding of the problem space. Additionally, it may turn out that some of the examples we have in hand of runaway promise resolutions are actually manifestations of a vat->kernel problem; if so, we will discover this.
Test Plan
Obviously, all the existing tests of promise resolution, promise retirement, and notification should continue to pass successfully. In addition, I have a collection of swingset-runner demos that manifest the problematic behavior we are attempting to address here. Once we have verified that these examples no longer produce a runaway, they will be converted to unit tests and added to our test repertoire.
The text was updated successfully, but these errors were encountered: