Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS][POC] Emscripten JS Runtime as WASI-like Library Provider #11075

Closed
tqchen opened this issue May 4, 2020 · 12 comments
Closed

[DISCUSS][POC] Emscripten JS Runtime as WASI-like Library Provider #11075

tqchen opened this issue May 4, 2020 · 12 comments
Labels

Comments

@tqchen
Copy link

tqchen commented May 4, 2020

Emcc now works nicely to produce standalone wasm code. However, in many cases these wasm code calls into system libraries, and we cannot have run the generated code, unless we have a libc and other related functionalities.

WASI is a direction to solve that problem, however, the level of libary support in existing WASI variants still falls behind the emscripten. Emscripten's library also has the advantage of being portable to both node and the browser environments.

Recently we start to look into directly build our js interface on top of the standalone WASM js API. We start to wonder if it is possible to directly interact with the wasm module generated by emscripten via the standard wasm js interface. The answer is positive (with some hacking).

Library Provider Interface

export interface LibraryProvider {
  /** The imports that can be passed to WebAssembly instance creation. */
  imports: Record<string, any>;
  /**
   * Callback function to notify the provider the created instance.
   * @param inst The created instance.
   */
  start: (inst: WebAssembly.Instance) => void;
}

In particular, we end up with the above interface(in typescript). To use the above interface, we can run

const lib: LibraryProvder = new SomeLibraryProvider();
let inst = new WebAssembly.Instance(new WebAssembly.Module(wasmSource), lib.imports);
lib.start(inst);

The above interface is deliberately aligned with node's WASI interface. Note that the imports is more general than node.WASI's wasiImport as it can contain more import fields and provider defined
memory and other resources.

Library Provider via Emscripten

Specifically, in the context of Emscripten, for a given mylib.cc, we want to generate two files via emcc.

  • mylib.wasm The wasm module
  • mylib.wasi.js The necessary library provider for mylib.

The desired usage example is as follows:

const EmccWASI = require("/path/to/mylib.wasi.js"));
const wasmSource = fs.readFileSync("/path/mylib.wasm");

const wasi = new EmccWASI();
let inst = new WebAssembly.Instance(new WebAssembly.Module(wasmSource), lib.imports);
wasi.start(inst);

Steps to Generate the Library Provider

After some hacking and investigation, we find it is possible to generate the library provider API through the emscripten's pre-js feature.

We first define the following pre-load js, which allows us to defer emscripten's instantiation. We then wrap the emscripten's successCallback as the start function.

// preoload.js
var __wasmLib = {};

function __wasmLibInstantiateWasm(imports, successCallback) {
    __wasmLib.imports = imports;
    __wasmLib.successCallback = successCallback;
}

function __wasmLibStart(wasmInstance) {
    __wasmLib.successCallback(wasmInstance);
}

__wasmLib.start = __wasmLibStart;

var Module = {
    "instantiateWasm": __wasmLibInstantiateWasm,
    "wasmLibraryProvider": __wasmLib
};

To make sure that we can repeatively create multiple new instance of the WASI like module
via new EmccWASI(). We run an additional step to decorate the generated js file. The idea is we create a class that captures all the Module creation code, and expose the imports and start function.

# decorate_as_wasi.py
import sys

template_head = """
function EmccWASI() {
"""

template_tail = """
    this.Module = Module;
    this.start = Module.wasmLibraryProvider.start;
    this.imports = Module.wasmLibraryProvider.imports;
    this.wasiImport = this.imports["wasi_snapshot_preview1"];
}

if (typeof module !== "undefined" && module.exports) {
  module.exports = EmccWASI;
}
"""

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage <file-in> <file-out>")
    result = template_head + open(sys.argv[1]).read() + template_tail
    with open(sys.argv[2], "w") as fo:
        fo.write(result)

Putting everything together, we can invoke the following command to generate
the Library provider module.

# invoke a emcc using a special preload.js
emcc -o mylib.js -s STANDALONE_WASM=1 mylib.cc --pre-js preload.js
# decorate the generated js to a wasi like interface.
python decorate_as_wasi.py mylib.js mylib.wasi.js

For a reference of the compelete project using this approach, see https://github.com/tqchen/tvm/tree/web/web

Discussions

The main reason I want to open this issue is to get some feedbacks and
discussion about this approach. In particular, it would be great to discuss
whether there are better ways to design the LibraryProvider.

Obviously the way to achieve the library provider is quite hacky. It would also be great to see the developers' view about this approach. i.e. if it is possible to have direct support in the emcc itself, and what should the best interface looks like.

Thanks @sbc100 for the initial discussions

@sbc100
Copy link
Collaborator

sbc100 commented May 4, 2020

Thanks @tqchen ... we have been having internal discussions about this kind refactoring of the JS code for while now.

Even before WASI and even interdependently of WASI we think it could be use to ship the emscripten JS harness code as a standalone things, either as some kind of univeral running for any emscripten-generated wasm file, or at least to avoid duplicating the JS harness code on a page with multiple modules.

Note that as of today the emscripten-generated JS code is very much tailored to a specific application so there is work to be done to allow it to tailored instead to multiple binaries, or indeed to any binary (a universal hardness).

I think this approach you have laid out is an interesting start.

A couple of questions:

  1. Have you looked at the MODULARIZE option? How does it compare to what you are doing in decorate_as_wasi.py?

  2. Out of interesting to you to mode the actual wasm creation out of the JS hardness? i.e. why not leave the new WebAssembly.Instance inside the emscripten harness code? Is there is particular reason, or is it just a general desire for separation on concerns?

@tqchen
Copy link
Author

tqchen commented May 4, 2020

Thanks @sbc100 I think the main difference between MODULARIZE and the EmccWASI is the interface they expose (wasm creation inside vs outside). so it is relates to your question 2

There are two choices:

  • C0: Isolate the Library and give control to the user (detailed in this POC)
  • C1: Let emscripten runtime to create the wasm instance.

The main advantage of C0, in the context of our project are:

It offers more separation of concerns and mixed and match of library providers. For example, we can maintain the same API style when user switches among library providers, from EmccWASI to nodejs's WASI() would only be one-line change.

// Async mode
// use WASI from nodejs
 tvmjs.instantiate(wasmSource, new WASI()).then(...);
// use library provider generated by emscripten
 tvmjs.instantiate(wasmSource, new EmccWASI()).then(...);
// Sync mode
const inst = new tvmjs.Instance(new WebAssembly.Module(wasmSource), new EmccWASI());

We want to be able to run further project dependent function customization on the import before we call start (see https://github.com/tqchen/tvm/blob/web/web/src/environment.ts#L127) These functions depends on our own runtime instance objects to maintain data structures. While it is certainly possible to use a callback into emscripten Module to do that, it is a bit more twisted, especially when we have a context object.

Finally, we have a need to create a RPC server that communicates with python via WebSocket, and we will compile, upload wasm and send them to the RPC server, which then initializes wasm instance, and serve benchmarks coming from requests from the python side. We will need the separated API to do so.

From a design point of view, I now think there are three clear separation of concepts:

  • library: WASI, EmccWASI provide the "backend library support" to the wasm.
  • wasm: The wasm itself.
  • ffi(frontend): The frontend that interacts with the wasm.

It would be great if each of the component starts to converges to common choices so that interfacing would become easier. In the context of JS. wasm itself is a standard, the ffi is standardized as the wasm JS API.

The library seems to be the only missing piece, and somewhat coupled with the ffi. i.e. There is no "standard WASI JS api" or "standard wasm JS libary API". By having something like a LibraryProvider/WASI style, it might provide a step toward that standardization.

The per project support and universal support is a separate concern. Right now we actually love the fact that emscripten generates a minimum per project library. As long as that library comes with a standard interface.

@sbc100
Copy link
Collaborator

sbc100 commented May 4, 2020

I think you arguments are fairly compelling, for taking the C0 approach. Liens up nicely with the approach the wasi-node took.

@kripken
Copy link
Member

kripken commented May 4, 2020

Very interesting @tqchen !

Would it be fair to summarize the key idea here as: Node.js has a WASI API now, usable as in the example there,

const { WASI } = require('wasi');
const wasi = new WASI({
  args: process.argv
});
const importObject = { wasi_snapshot_preview1: wasi.wasiImport };
(async () => {
  const wasm = await WebAssembly.compile(fs.readFileSync('./binary.wasm'));
  const instance = await WebAssembly.instantiate(wasm, importObject);
  wasi.start(instance);
})();

And the idea here is to allow emcc to emit JS + wasm that can replace the WASI in that example, but otherwise be a drop-in replacement for it?

And under the hood, emcc is emitting not just the wasm (which would be needed in both cases) but also a JS file which supports the wasm, supplying WASI and other APIs to it as needed?

(Apologies if I've misunderstood something!)

@tqchen
Copy link
Author

tqchen commented May 5, 2020

@kripken yes exactly. basically emacc generates two files binary.wasm, and binary.wasi.js

To repharse your example, we could use the binary.wasi.js as follows:

const EmccWASI = require('binary.wasi.js');
const wasi = new EmccWASI();
const importObject = { wasi_snapshot_preview1: wasi.wasiImport };
(async () => {
  const wasm = await WebAssembly.compile(fs.readFileSync('./binary.wasm'));
  const instance = await WebAssembly.instantiate(wasm, importObject);
  wasi.start(instance);
})();

Of course, one great advantage emcc has is that it provides more than the wasiImport in the wasi API, and can contain other libraries that is not yet defined by wasi (perhaps the term wasi could be confusing as it can provider more standard libraries than wasi).

So in our approach the LibraryProvider(WASI-like) API is changed from
{ wasiImports, start} to { imports, start }. Where the imports can contain env, and other fields.

So the API we described above is as follows:

// The same code would work on browser
// By letting EmccWASI present by default.
const EmccWASI = require('binary.wasi.js');
const wasi = new EmccWASI();
const importObject = wasi.imports;
(async () => {
  const wasm = await WebAssembly.compile(fs.readFileSync('./binary.wasm'));
  const instance = await WebAssembly.instantiate(wasm, importObject);
  wasi.start(instance);
})();

As a matter of fact, I tried to make both imports and wasiImport field available so both of them can be used. But I guess imports is more general in case there are other imports that are not under the wasi_snapshot_preview1.

It would be great to see what is your thoughts about such a standard library provider interface should look like and evolve in the future.

@kripken
Copy link
Member

kripken commented May 5, 2020

Makes sense, thanks!

And yeah, the implementation described above (using the instantiateWasm hook) sounds good. I think it could be simplified to use MODULARIZE, as @sbc100 mentioned. It would still use your proposed external API, but internally using MODULARIZE would be simpler. I think it might be doable by just appending some JS before and after, without any emcc changes.

But regardless it might be interesting to think about adding an emcc flag for this - we try to avoid new flags in general, but given Node.js is providing such an API, it will likely become common and familiar for people. OTOH perhaps that API is still changing as it's experimental?

I don't have strong feelings about the API exposing all the imports or also exposing just the wasi ones separately. But it seems like the user could always get the wasi ones from the full ones by looking at imports.wasi_snapshot_previewX?

@tqchen
Copy link
Author

tqchen commented May 5, 2020

The node wasi API seems is still experimental. But my guess is that overall the start + imports is a quite good choice and my guess is node's API will stay that way, or at least that style but only changes the wasiImport to something else.

Exposing all the imports sounds good. There is no cost of doing both by return an object with:
{ start, imports, wasiImport }, depending on the taste of sticking to one API versus as compatible as possible.

@sbc100
Copy link
Collaborator

sbc100 commented May 5, 2020

I think perhaps with the STANDALONE_WASM flag we could generate code with this layout? Avoiding the need for another falg.

@kripken
Copy link
Member

kripken commented May 7, 2020

Interesting @sbc100 ... yeah, that seems like it might fit. In standalone mode indeed the idea is to not really have any JS, and just have wasm. So providing an API like this one where there is really just the wasm, pretty close to node's WASI API which is like that, sounds appealing.

One issue though is that we do want the code to also run itself, though - that is, if you run emcc and get JS + wasm, and run the JS in node, it should run the program, like it does now. But maybe for libraries and not executables it would be a natural fit. Or, maybe when MODULARIZE is used (but that option is currently being reworked).

While there's a lot to think about here, I think experimentation doesn't need to wait for all those things.

@tqchen
Copy link
Author

tqchen commented May 7, 2020

One potential idea is to emit two js files, binary.js (same as the original emcc option, can be used to run the program) , binary.wasi.js(the library part).

@sbc100
Copy link
Collaborator

sbc100 commented May 7, 2020

Would the running code be basically fixed? Could we have a simple tools/wasi-runner.js and run the output of emscripten via that?

@stale
Copy link

stale bot commented Jun 2, 2021

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Jun 2, 2021
@stale stale bot closed this as completed Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants