Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend GC over comms protocol #3306

Closed
warner opened this issue Jun 12, 2021 · 0 comments · Fixed by #3364
Closed

extend GC over comms protocol #3306

warner opened this issue Jun 12, 2021 · 0 comments · Fixed by #3364
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Jun 12, 2021

What is the problem being solved

Our #3106 GC plan covers inter-vat intra-SwingSet references, but doesn't address inter-SwingSet references (which travel through the comms vat). The next step is to extend the existing dropExport/retireExport/retireImport plan through the comms vat.

From the kernel's point of view, the comms vat is just another vat, so it sends and expects the same GC deliveries/syscalls as any other. From the comms vat's point of view, the kernel is just another remote (albeit much more synchronous, which reduces the management overhead somewhat).

The task is to build refcount tracking into the comms tables, then react to inbound GC messages (from either the kernel or remotes) by changing these refcounts. This may cause the comms vat to emit outbound GC messages (either to the kernel as a syscall, or to another remote as a protocol message).

Design

(theses are just some quick protocol notes, I'll edit them into a proper design later)

  • incoming drop/retire messages are either "informed" if the sender was fully aware of any cross-on-the-wire re-introductions, or "ignorant" if not
    • comms-kernel messages are always informed
    • comms-comms messages are informed if the inbound message's acknum is equal or greater than the outbound clist record of the last reintroduction, else they are ignorant
  • comms will add an isReachable flag to all clist entries
    • on import entries, this describes the state of the importer
    • on export entries, it merely records whether we've sent a drop or not, and might not actually be necessary
  • comms will add a (reachable, recognizable) refcount tuple to all comms object table entries
    • just like the kernel does:
      • reachable is the sum of the isReachable flags from all importers, plus one for each resolved promise or auxdata that references the object
      • recognizable is the count of all importing clist entries, plus resolved promises and auxdata
  • when comms receives an informed dropExport, it clears the isReachable flag for that importer's clist, which decrements the reachable refcount
    • if reachable hits zero, comms sends a dropExport to the exporter and clears the isReachable flag on the exporter's clist entry (again, maybe not really necessary)
  • when comms receives an informed retireExport, it deletes the importer's clist entry, which decrements the recognizable refcount
    • if recognizable hits zero, comms sends a retireExport to the exporter and deletes the clist entry
  • when comms receives an informed(??) dropImport (which will always be from an exporter):
    • comms translates the dropImport from sender-space to local (comms) space, then deletes the exporting c-list entry
    • the reachable count must already be zero, else the sender of dropImport did something wrong
    • comms locates all importers, then deletes the object table entry
    • comms translates a dropImport to each importer, then deletes their clist entry
    • comms sends the translated dropImport to the importer

If comms receives a retireExport or retireImport for a ref that is not in the c-list, it should just ignore it. There are two reasons/phases where this might happen. The "normal" one is during a race between the importer sending retireExport and the exporter sending retireImport. We could choose to track this race in the same way we handle the retirement of promises:

  • importer sends retireExport, and tracks the (sent seqnum, rref) pair in an ordered list
    • if a message arrives that effectively acks that seqnum, delete the pair: the window for a race has closed
    • if a re-introduction arrives before that point, delete the pair: the race has been superceded by a replacement object
    • if a retireImport arrives before that point, ignore it: there was a race, no big deal
    • if a retireImport arrives after that point (i.e. neither the clist nor the ordered retireExport-sent list knows the rref): this is the weird case, we might decide to kill the connection, or log-but-ignore, or just ignore
  • follow the same pattern when the exporter sends retireImport

The algorithm for comms is mostly simpler than the kernel because:

  • there are no queued messages: no run-queue, and no per-promise message queues (because everything is immediately pipelined)
  • we aren't trying to spread GC actions out among multiple cranks, so we don't need to record the upcoming work in a durable fashion: the equivalents of maybeFreeKrefs and the durable gcActions set can both be ephemeral
  • drops and retires appear in separate messages, so we don't need the processNextGCAction code that prioritizes one over the other, and actions won't be negated by an earlier re-introduction

However it's slightly more complex because of the need to distinguish between informed and ignorant inbound messages, and the possibility that we choose to log or kill-connection when a retireImport/retireExport arrives after we know the race window has closed.

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels Jun 12, 2021
@warner warner self-assigned this Jun 12, 2021
warner added a commit that referenced this issue Jun 14, 2021
@warner warner added this to the Testnet: Stress Test Phase milestone Jun 18, 2021
warner added a commit that referenced this issue Jun 19, 2021
This is the big switch to activate comms GC. All the new code is in
`gc-comms.js`, which creates the "gcKit".

`dispatch()` was changed to assemble GC actions arriving from the kernel into
a single "GC" structure, with all three kinds of actions.

* this anticipates a future kernel change that lumps all three GC-related
  dispatch types into a single message, and mirrors the remote protocol
  which does exactly that
* we first filter out any kfrefs (vrefs) for meta-objects, like the
  "controller" (root object) and the receivers, because we aren't prepared
  to drop these
* after all dispatch operations (message send, promise resolution
  notification, and all GC operations), we call the new `processGC()` to
  perform any necessary GC actiity

`gc-comms.js` follows the same pattern as `delivery.js`:

* two inbound conversion functions (`gcFromKernel` and `gcFromRemote`) to
handle GC requests arriving from either source
* two outbound conversion functions (`gcToKernel` and `gcToRemote`) to emit
syscalls or remote-protocol messages that express GC requests
* the end-of-crank `processGC()` function
  * this calls `processMaybeFree()` to check on everything whose refcount
    touched zero during the crank
  * that returns a set of actions which need to be taken
  * the actions are sorted into the kernel or remote that is affected, then
    request messages are generated for syscalls and/or transmission

refs #3306
warner added a commit that referenced this issue Jun 19, 2021
This is the big switch to activate comms GC. All the new code is in
`gc-comms.js`, which creates the "gcKit".

`dispatch()` was changed to assemble GC actions arriving from the kernel into
a single "GC" structure, with all three kinds of actions.

* this anticipates a future kernel change that lumps all three GC-related
  dispatch types into a single message, and mirrors the remote protocol
  which does exactly that
* we first filter out any kfrefs (vrefs) for meta-objects, like the
  "controller" (root object) and the receivers, because we aren't prepared
  to drop these
* after all dispatch operations (message send, promise resolution
  notification, and all GC operations), we call the new `processGC()` to
  perform any necessary GC actiity

`gc-comms.js` follows the same pattern as `delivery.js`:

* two inbound conversion functions (`gcFromKernel` and `gcFromRemote`) to
handle GC requests arriving from either source
* two outbound conversion functions (`gcToKernel` and `gcToRemote`) to emit
syscalls or remote-protocol messages that express GC requests
* the end-of-crank `processGC()` function
  * this calls `processMaybeFree()` to check on everything whose refcount
    touched zero during the crank
  * that returns a set of actions which need to be taken
  * the actions are sorted into the kernel or remote that is affected, then
    request messages are generated for syscalls and/or transmission

refs #3306
warner added a commit that referenced this issue Jun 19, 2021
This is the big switch to activate comms GC. All the new code is in
`gc-comms.js`, which creates the "gcKit".

`dispatch()` was changed to assemble GC actions arriving from the kernel into
a single "GC" structure, with all three kinds of actions.

* this anticipates a future kernel change that lumps all three GC-related
  dispatch types into a single message, and mirrors the remote protocol
  which does exactly that
* we first filter out any kfrefs (vrefs) for meta-objects, like the
  "controller" (root object) and the receivers, because we aren't prepared
  to drop these
* after all dispatch operations (message send, promise resolution
  notification, and all GC operations), we call the new `processGC()` to
  perform any necessary GC actiity

`gc-comms.js` follows the same pattern as `delivery.js`:

* two inbound conversion functions (`gcFromKernel` and `gcFromRemote`) to
handle GC requests arriving from either source
* two outbound conversion functions (`gcToKernel` and `gcToRemote`) to emit
syscalls or remote-protocol messages that express GC requests
* the end-of-crank `processGC()` function
  * this calls `processMaybeFree()` to check on everything whose refcount
    touched zero during the crank
  * that returns a set of actions which need to be taken
  * the actions are sorted into the kernel or remote that is affected, then
    request messages are generated for syscalls and/or transmission

refs #3306
warner added a commit that referenced this issue Jun 19, 2021
This is the big switch to activate comms GC. All the new code is in
`gc-comms.js`, which creates the "gcKit".

`dispatch()` was changed to assemble GC actions arriving from the kernel into
a single "GC" structure, with all three kinds of actions.

* this anticipates a future kernel change that lumps all three GC-related
  dispatch types into a single message, and mirrors the remote protocol
  which does exactly that
* we first filter out any kfrefs (vrefs) for meta-objects, like the
  "controller" (root object) and the receivers, because we aren't prepared
  to drop these
* after all dispatch operations (message send, promise resolution
  notification, and all GC operations), we call the new `processGC()` to
  perform any necessary GC actiity

`gc-comms.js` follows the same pattern as `delivery.js`:

* two inbound conversion functions (`gcFromKernel` and `gcFromRemote`) to
handle GC requests arriving from either source
* two outbound conversion functions (`gcToKernel` and `gcToRemote`) to emit
syscalls or remote-protocol messages that express GC requests
* the end-of-crank `processGC()` function
  * this calls `processMaybeFree()` to check on everything whose refcount
    touched zero during the crank
  * that returns a set of actions which need to be taken
  * the actions are sorted into the kernel or remote that is affected, then
    request messages are generated for syscalls and/or transmission

refs #3306
warner added a commit that referenced this issue Jun 19, 2021
This is the big switch to activate comms GC. All the new code is in
`gc-comms.js`, which creates the "gcKit".

`dispatch()` was changed to assemble GC actions arriving from the kernel into
a single "GC" structure, with all three kinds of actions.

* this anticipates a future kernel change that lumps all three GC-related
  dispatch types into a single message, and mirrors the remote protocol
  which does exactly that
* we first filter out any kfrefs (vrefs) for meta-objects, like the
  "controller" (root object) and the receivers, because we aren't prepared
  to drop these
* after all dispatch operations (message send, promise resolution
  notification, and all GC operations), we call the new `processGC()` to
  perform any necessary GC actiity

`gc-comms.js` follows the same pattern as `delivery.js`:

* two inbound conversion functions (`gcFromKernel` and `gcFromRemote`) to
handle GC requests arriving from either source
* two outbound conversion functions (`gcToKernel` and `gcToRemote`) to emit
syscalls or remote-protocol messages that express GC requests
* the end-of-crank `processGC()` function
  * this calls `processMaybeFree()` to check on everything whose refcount
    touched zero during the crank
  * that returns a set of actions which need to be taken
  * the actions are sorted into the kernel or remote that is affected, then
    request messages are generated for syscalls and/or transmission

refs #3306
warner added a commit that referenced this issue Jun 20, 2021
This is the big switch to activate comms GC. All the new code is in
`gc-comms.js`, which creates the "gcKit".

`dispatch()` was changed to assemble GC actions arriving from the kernel into
a single "GC" structure, with all three kinds of actions.

* this anticipates a future kernel change that lumps all three GC-related
  dispatch types into a single message, and mirrors the remote protocol
  which does exactly that
* we first filter out any kfrefs (vrefs) for meta-objects, like the
  "controller" (root object) and the receivers, because we aren't prepared
  to drop these
* after all dispatch operations (message send, promise resolution
  notification, and all GC operations), we call the new `processGC()` to
  perform any necessary GC actiity

`gc-comms.js` follows the same pattern as `delivery.js`:

* two inbound conversion functions (`gcFromKernel` and `gcFromRemote`) to
handle GC requests arriving from either source
* two outbound conversion functions (`gcToKernel` and `gcToRemote`) to emit
syscalls or remote-protocol messages that express GC requests
* the end-of-crank `processGC()` function
  * this calls `processMaybeFree()` to check on everything whose refcount
    touched zero during the crank
  * that returns a set of actions which need to be taken
  * the actions are sorted into the kernel or remote that is affected, then
    request messages are generated for syscalls and/or transmission

refs #3306
warner added a commit that referenced this issue Jun 20, 2021
When the comms vat resolves a promise, it begins a two-part retirement
process. Right after it sends the `resolve` message to the remote, it deletes
the outbound half of the c-list, because this promise ID is partially
retired, and the comms vat won't be referencing it in any outbound messages
again.

The inbound half is retained until the other side acknowledges the resolution
message (i.e. some inbound messages arrives which demonstrates awareness of
the `notify`, by including an `ackSeqNum` at least as large as the `seqNum`
of the outbound `notify`).

This changes the code to retain both sides of the c-list entry until the ack
is received. I think this winds up being slightly cleaner, and most
importantly it retains our ability to map local-ref to remote-ref, which
enables an upcoming change to `deleteRemoteMapping` / `deleteKernelMapping`
to only take a single argument (lref, not lref+rret), which is a lot cleaner,
and avoids some error cases.

refs #3306
warner added a commit that referenced this issue Jun 20, 2021
This is the big switch to activate comms GC. All the new code is in
`gc-comms.js`, which creates the "gcKit".

`dispatch()` was changed to assemble GC actions arriving from the kernel into
a single "GC" structure, with all three kinds of actions.

* this anticipates a future kernel change that lumps all three GC-related
  dispatch types into a single message, and mirrors the remote protocol
  which does exactly that
* we first filter out any kfrefs (vrefs) for meta-objects, like the
  "controller" (root object) and the receivers, because we aren't prepared
  to drop these
* after all dispatch operations (message send, promise resolution
  notification, and all GC operations), we call the new `processGC()` to
  perform any necessary GC actiity

`gc-comms.js` follows the same pattern as `delivery.js`:

* two inbound conversion functions (`gcFromKernel` and `gcFromRemote`) to
handle GC requests arriving from either source
* two outbound conversion functions (`gcToKernel` and `gcToRemote`) to emit
syscalls or remote-protocol messages that express GC requests
* the end-of-crank `processGC()` function
  * this calls `processMaybeFree()` to check on everything whose refcount
    touched zero during the crank
  * that returns a set of actions which need to be taken
  * the actions are sorted into the kernel or remote that is affected, then
    request messages are generated for syscalls and/or transmission

refs #3306
warner added a commit that referenced this issue Jun 20, 2021
This is the big switch to activate comms GC. All the new code is in
`gc-comms.js`, which creates the "gcKit".

`dispatch()` was changed to assemble GC actions arriving from the kernel into
a single "GC" structure, with all three kinds of actions.

* this anticipates a future kernel change that lumps all three GC-related
  dispatch types into a single message, and mirrors the remote protocol
  which does exactly that
* we first filter out any kfrefs (vrefs) for meta-objects, like the
  "controller" (root object) and the receivers, because we aren't prepared
  to drop these
* after all dispatch operations (message send, promise resolution
  notification, and all GC operations), we call the new `processGC()` to
  perform any necessary GC actiity

`gc-comms.js` follows the same pattern as `delivery.js`:

* two inbound conversion functions (`gcFromKernel` and `gcFromRemote`) to
handle GC requests arriving from either source
* two outbound conversion functions (`gcToKernel` and `gcToRemote`) to emit
syscalls or remote-protocol messages that express GC requests
* the end-of-crank `processGC()` function
  * this calls `processMaybeFree()` to check on everything whose refcount
    touched zero during the crank
  * that returns a set of actions which need to be taken
  * the actions are sorted into the kernel or remote that is affected, then
    request messages are generated for syscalls and/or transmission

refs #3306
warner added a commit that referenced this issue Jun 20, 2021
This is the big switch to activate comms GC. All the new code is in
`gc-comms.js`, which creates the "gcKit".

`dispatch()` was changed to assemble GC actions arriving from the kernel into
a single "GC" structure, with all three kinds of actions.

* this anticipates a future kernel change that lumps all three GC-related
  dispatch types into a single message, and mirrors the remote protocol
  which does exactly that
* we first filter out any kfrefs (vrefs) for meta-objects, like the
  "controller" (root object) and the receivers, because we aren't prepared
  to drop these
* after all dispatch operations (message send, promise resolution
  notification, and all GC operations), we call the new `processGC()` to
  perform any necessary GC actiity

`gc-comms.js` follows the same pattern as `delivery.js`:

* two inbound conversion functions (`gcFromKernel` and `gcFromRemote`) to
handle GC requests arriving from either source
* two outbound conversion functions (`gcToKernel` and `gcToRemote`) to emit
syscalls or remote-protocol messages that express GC requests
* the end-of-crank `processGC()` function
  * this calls `processMaybeFree()` to check on everything whose refcount
    touched zero during the crank
  * that returns a set of actions which need to be taken
  * the actions are sorted into the kernel or remote that is affected, then
    request messages are generated for syscalls and/or transmission

refs #3306
warner added a commit that referenced this issue Jun 20, 2021
When the comms vat resolves a promise, it begins a two-part retirement
process. Right after it sends the `resolve` message to the remote, it deletes
the outbound half of the c-list, because this promise ID is partially
retired, and the comms vat won't be referencing it in any outbound messages
again.

The inbound half is retained until the other side acknowledges the resolution
message (i.e. some inbound messages arrives which demonstrates awareness of
the `notify`, by including an `ackSeqNum` at least as large as the `seqNum`
of the outbound `notify`).

This changes the code to retain both sides of the c-list entry until the ack
is received. I think this winds up being slightly cleaner, and most
importantly it retains our ability to map local-ref to remote-ref, which
enables an upcoming change to `deleteRemoteMapping` / `deleteKernelMapping`
to only take a single argument (lref, not lref+rret), which is a lot cleaner,
and avoids some error cases.

refs #3306
warner added a commit that referenced this issue Jun 20, 2021
This is the big switch to activate comms GC. All the new code is in
`gc-comms.js`, which creates the "gcKit".

`dispatch()` was changed to assemble GC actions arriving from the kernel into
a single "GC" structure, with all three kinds of actions.

* this anticipates a future kernel change that lumps all three GC-related
  dispatch types into a single message, and mirrors the remote protocol
  which does exactly that
* we first filter out any kfrefs (vrefs) for meta-objects, like the
  "controller" (root object) and the receivers, because we aren't prepared
  to drop these
* after all dispatch operations (message send, promise resolution
  notification, and all GC operations), we call the new `processGC()` to
  perform any necessary GC actiity

`gc-comms.js` follows the same pattern as `delivery.js`:

* two inbound conversion functions (`gcFromKernel` and `gcFromRemote`) to
handle GC requests arriving from either source
* two outbound conversion functions (`gcToKernel` and `gcToRemote`) to emit
syscalls or remote-protocol messages that express GC requests
* the end-of-crank `processGC()` function
  * this calls `processMaybeFree()` to check on everything whose refcount
    touched zero during the crank
  * that returns a set of actions which need to be taken
  * the actions are sorted into the kernel or remote that is affected, then
    request messages are generated for syscalls and/or transmission

refs #3306
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant