serializing resolved promises without allocating an external identity #2381

warner · 2021-02-10T05:28:39Z

What is the Problem Being Solved?

This isn't anything to work on in the near-term, but @erights and I brainstormed some ideas for marshal that I wanted to capture.

The big improvement we could make is to avoid allocating kernel-visible identifiers for the ancillary promises that appear in a serialized argument/resolution graph. These are the vpids which get created when a previously-resolved Promise is referenced by the arguments (since we've retired the vpid we previously used for that Promise). From the kernel's point of view, this promise ID is referenced by the argument graph, gets resolved to some data, and gets retired, all in a single syscall (or pair of syscalls). This creates an entry in the kernel promise table which ought to be short-lived, but requires a GC pass or immediate refcounting to get rid of.

When we serialize the arguments for a message send, or the value of a resolution, we're serializing an object graph like this:

So far, serialization only looks at the "normal edges" of this graph (colored black), which consist of properties looked up on each object. It stops when it encounters a pass-by-reference items, such as Promises (triangles) and Remotables (circles).

marshal doesn't follow (or even know about) potential resolutions of those Promises, partly because JavaScript gives us no (synchronous) way to know that a Promise is currently resolved. And we haven't yet defined the "auxilliary data" which Remotables might reference (colored blue), that will happen in #2069. Unresolved Promises need an identity that the kernel knows about, because the message which resolves that promise will arrive in some future turn or crank, and it needs a way to indicate which promise is being resolved. Both unresolved Promises and Remotables need an identity because someone might send a message to them, and the kernel needs to know where to deliver those messages. However the pass-by-copy data of the message arguments does not have an identity: our remote object model says that data is selfless.

When liveslots is serializing the arguments of a message, it used to follow the selfless object graph up to the self-ish nodes, then stopped. @FUDCo 's work in #2358 changes this to also examine the resolved-promise edges, and include a batch of ancillary promise resolutions along with the main message. If we think about the resolution of a promise as a second type of edge in the extended object graph, then the extended form of serialize() that liveslots builds out of the basic marshal is following both kinds of edges, but it is emitting an aggregate result: the capdata of the selfless graph, and a collection of promiseID -> capdata resolutions for all known-resolved Promises reachable from that combined graph. It does this two-level serialization first, before emitting a syscall, because those promise identifiers will be retired the moment that syscall hits the kernel, and if we wait any longer to serialize it, cycles that traverse the promise-resolution edges will never converge: we'll keep retiring+forgetting the identifier we need to express the full cycle at once.

When #2069 adds auxdata to each Remotable/Presence, that acts as a third type of edge, and we'll need to update liveslots again to follow these edges too. The result of serializing the message arguments (or primary promise resolution) will be three items: the capdata of the primary message, the batch of resolved promises (each with their own capdata), and the table of auxdata (mapping an objectID to its auxilliary capdata, iff we know the kernel does not yet know this auxdata).

Multi-Entry Serialization

Usually we think about serialization as taking a single starting point (the "entry") and walking an object graph until we've seen every node that isn't an "exit" (of which there may be several).

But now I'm thinking about the liveslots serialization process as emitting data which has many entry edges. There is the primary one (the args of a message send, or the resolutionData of a primary promise resolution), but we also discover all the Remotables that are reachable from that primary entry. Each one could be referenced in other messages, so each one needs an identity, distinct from the particular argument graph which first revealed it. Each unresolved promise needs an identity, as described above. So the serialized data is saying "here's how you'd reconstruct the argument graph, but by the way, I couldn't help but noticing this new batch of identity-bearing Remotables and unresolved Promises that you should be aware of".

And, most notably, any resolved Promises do not need an identity. Since we use @dtribble 's optimization of retiring promises as soon as they are resolved (he observed in Midori that most promises are only mentioned once), once liveslots knows that a Promise is resolved, it needs to remember the resolution (in a WeakMap), to break cycles, but it doesn't need to remember an identity for it. Resolved Promises might have resolution data which points to other resolved Promises, so we need a way to break those cycles, but that's scoped to the one object graph, just like two objects which point to each other. And we have an ibid mechanism to deal with those cycles.

Description of the Design

So the idea would be:

build a super-duper serializer (maybe as a layer above marshal, maybe incorporated into marshal itself), using code that currently lives in liveslots
this serializer is aware of the full three-types-of-edges object graph, including promise resolutions and auxdata
- this requires a WeakMap that maps from Promise object to its resolution data, and a .then on each Promise it encounters to update the table if/when it becomes resolved
when asked to serialize an entry point, the super-duper serializer will emit the primary capdata, and a table of newly-exported Remotables and their auxilliary capdata. Both capdatas can mention object IDs (remotables or imported Presences) and unresolved promise IDs
when any of these object graphs encounters a known-resolved promise, it gets serialized as a special QCLASS: 'resolved-promise', which contains the resolution data as an additional property, just like how an Object or Array contains more data. It reads the data from the WeakMap to know what to serialize. These nodes are given a local counter-based identifier (using the same ibid numberspace as other container-like objects), and if we encounter the same resolved Promise multiple times within a single object graph, the second and subsequent references are serializes as QCLASS: 'ibid', ibid: counter instead of re-serializing the contents. These ibid counters are scoped to the capdata, and do not incur entries in the c-lists or in the kernel tables.

The text was updated successfully, but these errors were encountered:

warner added enhancement New feature or request marshal package: marshal labels Feb 10, 2021

Tartuffo added migrate-icebox and removed migrate-icebox labels Nov 17, 2022

erights self-assigned this Dec 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serializing resolved promises without allocating an external identity #2381

serializing resolved promises without allocating an external identity #2381

warner commented Feb 10, 2021

serializing resolved promises without allocating an external identity #2381

serializing resolved promises without allocating an external identity #2381

Comments

warner commented Feb 10, 2021

What is the Problem Being Solved?

Multi-Entry Serialization

Description of the Design