Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closures - MVP #66

Merged
merged 27 commits into from
Jan 28, 2018
Merged

Closures - MVP #66

merged 27 commits into from
Jan 28, 2018

Conversation

ballercat
Copy link
Owner

@ballercat ballercat commented Jan 13, 2018

Early work to enable closures, parsing and emitting of closures(#65). The goal of this poc is to figure out the practical challenges in enabling this type of functionality and laying the groundwork for more of these type of "side_module" features to come.

To create a closure, arrow function syntax is used. This fits really well with existing JS syntax and creates clear separation in logic:

// Toy example
export function getClosure(): i32 {
  let x: i32 = 0;
  return () : i32 => {
     const y: i32 = x;
     x += 1;
     return x;
  }
}

Limitations of POC

  • Multiple closures will not be supported (yet)
  • Object(memory) closures are post-poc
  • The side module will not be very clever and will most likely never clean up its own memory space.
  • The side closure module cannot be overwritten

High-Level Implementation Details

The basic idea behind the implementation is to not dynamically manage the memory inside the module even with closures. This means that there will be no implicit runtime memory regions, maintaining the design direction of no surprises. A closure needs to create an object to represent the environment it's closing over however, the way this will be accomplished is with a side-module with its own memory object completely independent of the main module. The two will be linked with a shared Table.

Closure representation inside the binary

Declaration:

The closure is compiled to a regular old function declaration with an additional side-effect of its function index being encoded into the Element section(in the binary). Because function pointers are already supported this will be done for free by the compiler already. The closure value will be an expression returning 64-bit value. The original function declaration AST nodes will be moved to top-level module scope, with its default arguments appended with a base memory pointer 32-bit argument __ptr__.

For example

(x: i32) : i32 => { ... };

becomes a top level function definition in the binary

function <parentFunctionName>--closure#(__ptr__: i32, x: i32) { ... }

The -- is used so that the name cannot be accidentally overwritten by end-user. With this approach we only ever create a single function for every closure definition and many potential __ptr__-ers at runtime.

Call-Site:

Closure call sites will compile to an indirect_call, although a regular call operation may be possible(unlikely) since we can statically analyze which real function is being called. Either option will work, but this requires some creative encoding. Either option will require the closure to be encoded as a 64-bit value with low 32 bits being a table pointer(or function index) and the high 32 bits being the base environment pointer. Since wasm-MVP only supports 32-bit address space in both tables and memory this should work out of the box! This is a pretty neat way to get around the fact that WASM functions cannot return multiple values. The only downside to this is that a closure pointer cannot be directly exported to JS, so a wrapper function would need to be used.

The call sites will incur an additional instruction cost of shifting the bits to call correct function indexes with correct memory offsets and additional 32bit truncations for indirect_call.

Closure body encoding:
Special closure helper functions for <type>closure[GS]et and makeClosure will be implemented and likely be emulated via JS for the first POC. Every closed over variable access will become a function call inside the closure, which will provide the values stored inside a completely different memory space.

So a closure which looks like this:

  function getClosure(): ClosureType {
    let x: i32 = 0;
    return (z: i32): i32 => {
      const y: i32 = x;
      x += 1;
      return y;
    };
  }

becomes the following AST(subject to change)

    FunctionArguments FUNCTION_ARGUMENTS
      Pair :
        Identifier<i32> __ptr__ local/index(0)
        Type i32
      Identifier z
      Type<i32> i32
    FunctionResult<i32> i32
    Block {
      Declaration<i32> y local/index(1),type/const(true)
        FunctionCall<i32> i32closureGet__ function/index(1)
          BinaryExpression<i32> +
            Constant<i32> 0
            Identifier<i32> __ptr__ local/index(0)
      FunctionCall i32closureSet__ function/index(2)
        BinaryExpression<i32> +
          Constant<i32> 0
          Identifier<i32> __ptr__ local/index(0)
        BinaryExpression<i32> +
          FunctionCall<i32> i32closureGet__ function/index(1)
            BinaryExpression<i32> +
              Constant<i32> 0
              Identifier<i32> __ptr__ local/index(0)
          Constant<i32> 1
      ReturnStatement return
        Identifier<i32> y local/index(1),type/const(true)

Should be clear what is happening here, every use of x, the closed over variable becomes a lookup call, where the __ptr__ argument is used as an offset into the object store inside the side-module memory space. The Constant<i32> 0s are offsets from the base pointer. Tada! Closure 🎊

Tasks

  • Arrow syntax parser
  • Closure parser
  • Closure hoisting
  • Implicit imports of <type>closure[GS]et callbacks and makeClosure(size)
  • Closure plugin
  • Closure plugin ABI (Element section encoding)
  • Explicit syntax for a closure Type
  • binary encoding of 64-bit closure experssions
  • Call site implementation
  • Add example

@ballercat ballercat changed the title [WIP]: Closures [WIP, POC]: Closures Jan 13, 2018
@xtuc
Copy link

xtuc commented Jan 15, 2018

That sounds pretty solid!

This is a pretty neat way to get around the fact that WASM functions cannot return multiple values

I recall seeing someone emulating a multiple values result (I believe it was a Forth to WASM compiler).

Closure call sites will compile to an indirect_call

Do you know if you have an addition call cost compared to regular call? I guess it has to search/fetch by type.

The side module will not be very clever and will most likely never clean up its own memory space

I know it's just a POC, but this will be problematic in the future. I don't know what your memory management is done but instead of being separated (and 👍 isolated) it could be part of your "memory chunk pool"?

side_module

What is meant by side_module exactly? Do you mean a WebAssembly module instance?

@ballercat
Copy link
Owner Author

Hey @xtuc,

Thanks for the feedback! I wonder how FORTH does it(probably by writing to memory)

Do you know if you have an addition call cost compared to regular call? I guess it has to search/fetch by type.

indirect_call is almost guaranteed to incur a penalty because of the runtime type check. Pretty sure it's possible to optimize it out to a regular call in a lot of cases but that's not something I'm super concerned with at this moment.

RE: memory & side_module

Yes, I am talking about a separate module instance. The main module and the extension would share a table with two split memory spaces.

it could be part of your "memory chunk pool"?

I don't love the pattern of an implicit, sectioned off chunk of memory just for the runtime. I know this is a pattern used with other compiled languages to WASM, but that is not fit into the basic goals of the project. I'd like to stay unopinionated about the management of the memory in the module as much as possible.

I don't know what your memory management is done

It isn't 🙃 the user is in-charge of the memory just like with raw WASM. We only provide better syntax to do so.

All that being said, this was all theorycrafting and the PR once ready should be interesting. I think POC-ing something like this will give me more data on how to move forward. I'm pretty close now, but no working wasm code just yet, I think I might emulate extension module calls via JS/imports for the very first POC.

The potential benefits of using an extension mechanism instead of building functionality into a monolithic main module is that it would be possible for the end-user to overwrite the behavior completely, the ABI is going to be dead simple so it would be trivial for someone to swap in their own shared memory chunk pool closure manager instead.

@xtuc
Copy link

xtuc commented Jan 15, 2018

It isn't 🙃 the user is in-charge of the memory just like with raw WASM. We only provide better syntax to do so.

Yes I know, sorry that wasn't clear (and not even english). I meant if you need to allocate/free a module during runtime. Might not be a problem atm since you're not going to free it in this POC.

I thought that WASM format was only allowing one module (itself)? This is interesting because the spec tests (in WAST this time) are using sub-modules all the time.

The table you mentioned is an WebAssembly.Table? You would pass it from JS to multiple modules, right?

Edit: According to import { table: Table } from 'env'; yes. This is an incredibly clear way of doing an importObject lookup!

@@ -49,3 +49,23 @@ test("functions", t => {
t.is(exports.testFunctionPointers(), 4, "plain function pointers");
});
});

test.only("closures", t => {
Copy link

@xtuc xtuc Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder, you have an only here and a few skips later.

@ballercat
Copy link
Owner Author

Hey, no worries. Your English is great!

I thought that WASM format was only allowing one module (itself)? This is interesting because the spec tests (in WAST this time) are using sub-modules all the time.

Yes, and the specs use some superset of text-format which supports this. I always thought this was confusing because that means .wast !== .wat, but yeah.

The table you mentioned is an WebAssembly.Table? You would pass it from JS to multiple modules, right?

Precisely! While the spec says you main only have a single module compiled into a binary it has no limitations on multiple modules sharing a Table. At least that I've seen and documentation on MDN seems to support the idea that this is possible. Tentatively I'm planning on using JavaScript to pass the table around and use WebAssembly.Table.set to basically dynamically link the two together.

@ballercat
Copy link
Owner Author

Update:

Closure implementation is now working via a plugin system for closure handlers/callbacks. Interesting fact I discovered within Chrome/v8 wasm compiler. Passing WebAssembly functions as imports to another module kicks off type validation when the two modules are linked. This is not the case when imported functions are JS wrappers, hinting that V8 handles wasm-to-wasm imports/exports differently. I have a feeling this means that imports bound in this manner don't suffer from WASM -> JS call penalty so I used this approach instead of indirect_call approach through a shared table.

A consequence of this validation is that tests for type imports/exports can be verified by V8 itself:

const importWASM = getIR(imports);
const sourceWASM = getIR(source);
return WebAssembly.instantiate(importWASM.buffer()).then(deps => {
return WebAssembly.instantiate(sourceWASM.buffer(), {
env: { ...deps.instance.exports },

Pretty neat.

So the closure plugin will work through importsObject and can be easily overwritten by the end-user.

@ballercat ballercat changed the title [WIP, POC]: Closures Closures - MVP Jan 27, 2018
@coveralls
Copy link

coveralls commented Jan 27, 2018

Coverage Status

Coverage increased (+0.2%) to 95.867% when pulling 2a9d388 on closures into 7a703ff on master.

@ballercat ballercat merged commit 5465d72 into master Jan 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants