-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large String device #46
Comments
Dean and I came up with this "blob device" last night, but now I have to think about whether it actually meets our goals. The hope here is that we can keep large strings (i.e. the source code being used to instantiate a dynamic Vat) out of the vat transcript, which is our current performance bottleneck (adding new vats isn't likely to happen fast enough to cause problems for a while, but at least in theory we'd like to handle very large chunks of code without incurring an eternal storage obligation for all of it). The idea was that the special static "vat creating vat" could use access to both the blob device and the vat-making device to do:
But the other (and more critical) goal is to maintain deterministic execution. I'm trying to come up with rules for our devices to maintain this property. So far, none of our devices actually return anything when invoked. The Mailbox device (which is really the only one we use outside of tests) handles outgoing messages by adding them to a non-kernel-visible "outbox" table, and handles externally-provoked incoming messages by queueing a message to a receiver object. So none of the So the rule might be "device call return values must be a pure function of the inputs", for which But maybe we should just merge the vat-making-device with this blob-store, instead trying to find the right set of restrictions on the blob-store to maintain our deterministic execution properties. |
I proposed including ByteArray of some kind. Note that the existing JS types for ByteArrays don't allow for an immutable one, so this would not appear as a full ByteArray. |
I was thinking about this again today. @FUDCo 's recent #1331 vat-config changes make it possible to statically provide a bundle to the kernel (e.g. the ZCF bundle), and then vat-zoe can ask the kernel to create a new dynamic vat loaded with that bundle, using only a short reference name rather than including the whole bundle in a message (thus adding it to at least one transcript, maybe two if the dynamic vat creation mechanism involves an internal vat). However Zoe then immediately tells the new ZCF dynamic vat to load a specific contract bundle. These bundles aren't interned like the statically-defined ZCF bundle, so it doesn't help amortize costs of multiple instantiations of that particular contract. To improve that, of course, we need to be able to add new items at runtime, and reference them with short identifiers. I think we have three or four requirements:
If the blobs show up in the transcript, we're spending secondary storage for their contents (every time they're referenced), but at least they aren't held in RAM forever. If we can keep them out of the transcript too, then we reduce the secondary storage requirements to I had been thinking of a blob device that operates with hashes of data (so successfully fetching So now I'm thinking of the hierarchical identifiers (#455) and the way their data is managed, and how we might apply that here. Our plan for #455 is to restrict the mutable data store to just the one vat, so the liveslots layer can safely cache results without worrying about some other vat changing it while it isn't looking. But we could introduce a shared immutable store without that concern, as long as we ensure that vats can never reach for data that isn't there. And we can accomplish that by introducing a new kind of reference, the blobcap. BlobcapA "blobcap" is a c-list -managed reference to a chunk of immutable data, which the kernel references by hash, but which vats reference with a The kernel tables currently map The c-lists for each vat currently map The vat (as a whole, i.e. liveslots) will somehow get one function that lets it submit data and allocate a new blobcap ( Liveslots would then translate the blobref into a Blob representative, in the same way it turns The Blob representative could be held within the vat's heap for arbitrarily long periods of time before something decides it wants to use the data. Therefore we want the data to be retrieved as lazily as possible. This is the same question we have for #455: the "slowcap" approach would require the kernel to know, just before each delivery, what auxilliary data the vat will need, and provide it preemptively, but if the vat doesn't actually want to use the data right away, the vat would have to hold in in RAM for a long time, because the kernel would have no way of telling when the right moment would be to pass it back. Async vs SyncSo the vat code needs a way to ask liveslots (through the Blob representative) for the data at the last moment, and liveslots (i.e. the vat as a whole) needs a way to ask the kernel or other secondary storage for the contents (syscall or In #455 we decided that contract authors cannot safely work with the reentrancy and interleaving hazards presented by async access to large mutable data. But here, where the data is immutable, I think we could handle async access just fine. The real constraint will be determinism. A synchronous blob read can only resolve to one thing, at one time (immediately). An async read could resolve in the next turn, or twelve turns later. Some vat workers can support blocking syscalls, others only support async ones. We might reveal this to vats through different methods available on the Blob object (e.g. |
There might be a way to merge this and #455, maybe. In #455 it's important for the vat to know that nobody else can change the data, but maybe that could be accomplished by having the vat keep the blobcap closely-held, whereas here you register the blobcap and then send it to someone. One is mutable, the other really wants to be immutable. #455 is about sending a blobcap to yourself in the future, this is about sending it to someone else. The underlying access method (syscall, If we only pay attention to this issue, not #455, then I can think of a few options:
|
Since the blobcap internally incorporates the hash of the data as the data key, I think deterministic replay of asynchronous reads should be reasonably easy. When the |
Huh, interesting. So it goes into the transcript, but we deliver something slightly different than the transcript (data vs hash). I can see how that would help. I was thinking about the nondeterminism of reads which resolved without an additional delivery: vat code does Making the read response show up as a future delivery would address that problem, but I think we'd need to introduce a new kind of delivery. Cool, I'll noodle on that some more. Thanks! |
I don't think you have to do anything special at all on the read side -- it's just a message send somewhere that returns a promise. The promise resolution happens whenever the data is available. The only magic is how the data delivery is represented in the transcript. |
Oh one other idea before I forget: we'll want an exogenous blobstore-add operation; a way for something outside the kernel to add a bundle to the store. The initial static vat loads could be replaced by a blobstore-add followed by an externally-introduced ( |
Provide a device that supports synchronous access to large strings, bytearrays, or hierarchies of same (for module tree), where the data is stored outside the heap of the Vat. These device capabilities allow large data values to be passed between vats by reference.
Use Cases
With this mechanism, dynamic vat creation can refer to the source to be launched as a large-string reference.
Operations
For string or byte array
length()
- return the length of the string/bytearrayslice(begin, end)
- make a shallow copy of a range of the stringhash()
- cryptographic hash of the datastream(begin, end)
- is there a streaming abstraction to get the data in chunks?Notes
These are analogous attaching shared immutable pages to multiple domains in KeyKOS.
These can be supported in remoting: the Comms vat can check with the other side whether it already has a larger string with the associated crypto graphic hash. If so, it passes a reference ot that. Otherwise, it passes the data and establishes such a thing.
The text was updated successfully, but these errors were encountered: