Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type-driven serialization as an IR-to-IR pass #164

Closed
nomeata opened this issue Feb 11, 2019 · 7 comments
Closed

Type-driven serialization as an IR-to-IR pass #164

nomeata opened this issue Feb 11, 2019 · 7 comments

Comments

@nomeata
Copy link
Collaborator

nomeata commented Feb 11, 2019

I want to note down some thoughts about serialization of data for messages.

So far, the backend implements dynamic serialization: The heap representation of our values is rich enough to traverse and serialize them generically. (Each heap object has a tag, from the tag the RTS knows how large the object is, and which entries are pointers (need to be followed) and which are references (need to be put in an elembuf for transportation.)

But at a meeting in Zurich, we decided that a serialization based on the concrete type signature of the message would be more suited. I don’t recall all the reasons any more, but some are:

  • We don’t want to serialize record fields that have been “removed” by subtyping,
  • We want the deserialize code to safely trap if the data does not match the expected type.
  • We maybe want just a databuf (no elembuf) for messages that contain no references, and maybe just use int32 for a message that is just a Int32 type in AS.

So we want type-driven serialization. So far so good.

I propose that we implement that as an IR-to-IR pass, separate from the current backend. Reasons include

  • the backend is already very large,
  • it is easier to cross-check the code, thanks to Check_ir and Interpret_ir.
  • it is maybe also easier to test serialization and deserialization (including deserializating bad data!) if we can invoke both transformations from (possibly privileged) AS, without actually having to send messages.
  • we do some more dogfooding of our own language and its type system
  • we will identify missing library functionality that we want to provide to our users (e.g. to deal with binary data)
  • easier to spread the work among more developers
  • in a way, the async translation is already a first step of the “translate AS types in messages to DFINITY types” journey; this just follows that precedent.

Steps to do that might be:

  1. Introduce AS types that closely mirror the DFINITY types, in particular:

    • a databuf, which is a reference, the externalized form of [Word8], together with conversions to bytes that wrap data.internalize and data.externalize.
    • an elembuf, which is the externalized form of [Reference], together with conversions to that that wrap elem.internalize and elem.externalize.
    • an abstract type Reference that is a supertype of shared func …, actor …, databuf and elembuf. Also, we need downcast operations, which likely can only be unchecked.

    We might choose to not expose these types (or not all of them) to source ActorScript. They could all just be PrimT, and the operations PrimE, for convenience.

  2. (optional) better unboxed array types

    Using [Word8] and [Reference] is not great,as it involves an additional copying step when externalizing/internalizing them. Also, [Word8] is horribly inefficient in terms of memory representation. So, as an extension, we might want to introduce

    • a type bytes that is a packed array of bytes (quite like Text, but it seems prudent to keep the Text type, with its guarantee of utf8-encoded text, separate).
    • a separate PackedReferenceArray that has a packed heap representation that matches what elem.internalize expect, to avoid lots of additional copying.
    • (pipe dream) come with with a good idea how to make unboxed array types more generic and pleasant to work with (without a zoo of duplicated types and operations)
  3. Create an IR-to-IR pass that

    • updates the types of all actor messages to the image of the type translation (just like the async translation does already).

    • at each method call with an argument of type t, inject a call to serialize_t, and similarly to deserialize_t at the header of a message.

    • Functions serialize_t and deserialize_t are added to the top-level of the module. They are likely mutually recursive (due to recursive types).

      Hopefully we can generate combinators for the various type constructors to avoid too much repetition. But at least for objects, I think we can’t do much better than having a separate combinator for each set of field names that occur (we can be polymorphic in each concrete field, but not in the set of fields).

    • These combinators likely need some nice additional abstractions or data types for efficiently filling a growing buffer, tracking references etc.

  4. Simplify the backend, as now all message sends simply pass references (or maybe primitive number types) as message arguments.

The architecture of this anticipates what we would have to do once we have the IDL defined; hopefully only the concrete wire format needs to be adjusted, so I believe it is reasonable to start this work already now.

@rossberg
Copy link
Contributor

All sounds good to me. Go! :)

For the record, the most important reason for basing serialisation on the type signature is to decouple serialisation from internal representation, in order to be able to evolve the latter without breaking all existing actors.

@nomeata
Copy link
Collaborator Author

nomeata commented Feb 11, 2019

decouple serialisation from internal representation,

Good point. That raises the question of whether we will have abstract types that define their own serialization interface? For example, the standard library probably want to provide a map<K,V> type constructor. Will that library be able to define its serialization format somehow?

@nomeata
Copy link
Collaborator Author

nomeata commented Feb 11, 2019

As warm-up and preparation, I will create a magic function show : <A>(A -> Text) that prints a textual representation of a value at type A. This way I don’t have to deal with elembuf and databuf yet, but can build the infrastructure for such type-driven code. And show might be useful for debugging later.

@crusso
Copy link
Contributor

crusso commented Feb 11, 2019

All sounds good.

I wonder if we could build the serialization methods in a Shared type somehow, perhaps by deeming that any class that implements Serializable is Shared.

@crusso
Copy link
Contributor

crusso commented Feb 11, 2019

For show, I guess you'll have to restrict yourself to its static instantiation, bailing on the serialization of open types?

@nomeata
Copy link
Collaborator Author

nomeata commented Feb 11, 2019

For show, I guess you'll have to restrict yourself to its static instantiation, bailing on the serialization of open types?

Yes, same as for serialization (at least for now).

@nomeata
Copy link
Collaborator Author

nomeata commented Feb 14, 2019

Now tracked at https://dfinity.atlassian.net/browse/AST-11

@nomeata nomeata closed this as completed Feb 14, 2019
dfinity-bot added a commit that referenced this issue Sep 19, 2020
## Changelog for motoko-base:
Branch: next-moc
Commits: [dfinity/motoko-base@cc57fd99...7cdbb04f](dfinity/motoko-base@cc57fd9...7cdbb04)

* [`8b749849`](dfinity/motoko-base@8b74984) Time module (Take II)
* [`c29ba993`](dfinity/motoko-base@c29ba99) Update vessel version and uses compiler binary tarballs ([dfinity-lab/motoko-base⁠#157](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/157))
* [`8f050e90`](dfinity/motoko-base@8f050e9) address comments
* [`cbf02e4d`](dfinity/motoko-base@cbf02e4) change Nat to Int
* [`84a0e412`](dfinity/motoko-base@84a0e41) update Float doc for Int conversion ([dfinity-lab/motoko-base⁠#162](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/162))
* [`5c93d001`](dfinity/motoko-base@5c93d00) Stack container class ([dfinity-lab/motoko-base⁠#152](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/152))
* [`76fd50b7`](dfinity/motoko-base@76fd50b) Documentation for RBTree module ([dfinity-lab/motoko-base⁠#140](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/140))
* [`2cc2e75f`](dfinity/motoko-base@2cc2e75) Fix the typo in the Time.mo source file so that it is pulled in with the next adoc generation
* [`09241278`](dfinity/motoko-base@0924127) Fix: off by one in size / height of RBTree ([dfinity-lab/motoko-base⁠#164](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/164))
* [`3dfaf958`](dfinity/motoko-base@3dfaf95) Add a simple example to the Time.mo source
* [`f0607703`](dfinity/motoko-base@f060770) Move example after the description of the now function
* [`856e66e9`](dfinity/motoko-base@856e66e) Update src/Time.mo
* [`c607f15c`](dfinity/motoko-base@c607f15) Add an extra line above the example
* [`3cd3496e`](dfinity/motoko-base@3cd3496) Fixes syntax errors in Option module docs ([dfinity-lab/motoko-base⁠#172](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/172))
* [`0ad167b3`](dfinity/motoko-base@0ad167b) Update List.mo ([dfinity-lab/motoko-base⁠#174](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/174))
* [`f2ad261d`](dfinity/motoko-base@f2ad261) Various version updates ([dfinity-lab/motoko-base⁠#173](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/173))
mergify bot pushed a commit that referenced this issue Sep 19, 2020
## Changelog for motoko-base:
Branch: next-moc
Commits: [dfinity/motoko-base@cc57fd99...7cdbb04f](dfinity/motoko-base@cc57fd9...7cdbb04)

* [`8b749849`](dfinity/motoko-base@8b74984) Time module (Take II)
* [`c29ba993`](dfinity/motoko-base@c29ba99) Update vessel version and uses compiler binary tarballs ([dfinity-lab/motoko-base⁠#157](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/157))
* [`8f050e90`](dfinity/motoko-base@8f050e9) address comments
* [`cbf02e4d`](dfinity/motoko-base@cbf02e4) change Nat to Int
* [`84a0e412`](dfinity/motoko-base@84a0e41) update Float doc for Int conversion ([dfinity-lab/motoko-base⁠#162](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/162))
* [`5c93d001`](dfinity/motoko-base@5c93d00) Stack container class ([dfinity-lab/motoko-base⁠#152](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/152))
* [`76fd50b7`](dfinity/motoko-base@76fd50b) Documentation for RBTree module ([dfinity-lab/motoko-base⁠#140](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/140))
* [`2cc2e75f`](dfinity/motoko-base@2cc2e75) Fix the typo in the Time.mo source file so that it is pulled in with the next adoc generation
* [`09241278`](dfinity/motoko-base@0924127) Fix: off by one in size / height of RBTree ([dfinity-lab/motoko-base⁠#164](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/164))
* [`3dfaf958`](dfinity/motoko-base@3dfaf95) Add a simple example to the Time.mo source
* [`f0607703`](dfinity/motoko-base@f060770) Move example after the description of the now function
* [`856e66e9`](dfinity/motoko-base@856e66e) Update src/Time.mo
* [`c607f15c`](dfinity/motoko-base@c607f15) Add an extra line above the example
* [`3cd3496e`](dfinity/motoko-base@3cd3496) Fixes syntax errors in Option module docs ([dfinity-lab/motoko-base⁠#172](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/172))
* [`0ad167b3`](dfinity/motoko-base@0ad167b) Update List.mo ([dfinity-lab/motoko-base⁠#174](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/174))
* [`f2ad261d`](dfinity/motoko-base@f2ad261) Various version updates ([dfinity-lab/motoko-base⁠#173](http://r.duckduckgo.com/l/?uddg=https://github.com/dfinity-lab/motoko-base/issues/173))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants