Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a start function. #398

Closed
ghost opened this issue Oct 10, 2015 · 35 comments
Closed

Define a start function. #398

ghost opened this issue Oct 10, 2015 · 35 comments
Milestone

Comments

@ghost
Copy link

ghost commented Oct 10, 2015

While wasm can export functions, a use case might be running a stand-alone wasm application and if so then it seems appropriate to define a cold-start function. This could be used in self contained wasm code that does not depend on wrapper user code to launch it. What if the wrapper code itself were wasm code - this needs an entry point. What do people think, should a main cold-start entry point be defined in the spec, or is this out of scope?

@jcbeyler
Copy link

I think it makes sense for two reasons: first of all, as you say in a non web-based world, it makes sense to have such a method to be able to go there.

Second, if we have that, it can help perform bigger tests than what the assert system allows (especially since the asserts are now considered only to use constants :)).

However, if this is out of scope of the MVP or not is a different question. In my mind, I think you have defined it well:

  • In the case of a web based runtime, the main function would be ignored
  • In the case of a non-web based runtime the main method would called

You would have to add a bit more information since you can have multiple modules, it seems important to state that multiple main methods would then be semantically wrong, in order to define only at maximum one point of entry to a WASM method

@lukewagner
Copy link
Member

This is actually something we've talked about a few times, including on the Web. ES6 modules already have a main-like notion: after the whole ES6 module graph has been fetched, the top-level scripts (code inside the module not inside any function) is executed; so the top-level code is basically main. Since wasm modules are designed to integrate with ES6 modules (module instances can be nodes in the module dependency graph), it would be symmetric to have wasm modules run a main function at the same logical point where an ES6 module would run top-level code.

We could leave the exact timing of the execution of main up to the host; the invariant would just be that main executed after Eval.init and before any Eval.invoke. This would let main serve as a form of constructor for the instance, establishing invariants, etc, which I think is useful.

@jcbeyler
Copy link

I have then so many questions because I'm very interested in this issue.

Does this then belong in the design document?
Could we scope it out and get it in the MVP?
How/when should it be defined it can be called?
Could/Should a wasm module have a main that is ignored if not in a non-web system?

@titzer
Copy link

titzer commented Oct 12, 2015

The idea of performance hints seems promising. Maybe we could generalize to
a set of perf-function optimization levels such as RUN_ONCE, COLD, UNKNOWN,
HOT, and leave the usage of the hints up to engines? E.g. I foresee the
RUN_ONCE and possible even COLD category triggering an interpreter, UNKNOWN
triggers dynamic optimizations, and HOT triggers immediate max optimization.

On Mon, Oct 12, 2015 at 12:00 PM, jcbeyler [email protected] wrote:

I have then so many questions because I'm very interested in this issue.

Does this then belong in the design document?
Could we scope it out and get it in the MVP?
How/when should it be defined it can be called?
Could/Should a wasm module have a main that is ignored if not in a non-web
system?


Reply to this email directly or view it on GitHub
#398 (comment).

@lukewagner
Copy link
Member

@titzer I think "cold start" has thrown you off ;-)
@jcbeyler I added an item on my design-repo todo list to attempt to add this to the MVP (if noone does first).

How/when should it be defined it can be called?

I think we'd want to leave it open so that the host environment can call it whenever, just so long as main is called before calling the first (non-main) export. We could mandate this in ml-proto with bookkeeping and runtime checks.

On a random note, @sunfishcode pointed out that main isn't a good name: a lot of user code runs before main; main takes argc/argv; likely symbol collision. Probably we'd not want to bless a magic name but instead have some new optional attribute on export declarations.

@kg
Copy link
Contributor

kg commented Oct 12, 2015

#119 contains some previous discussion on the platform-specific bits of this issue. Worth thinking carefully about which stuff should be in the wasm module's main and which stuff should run in the host (native or JS or otherwise).

I think anything browser-specific shouldn't be specced to live in the module's function table (a web-only main definition should be something the application author decides to do), but it might make sense to be able to attach platform-specific goo like IDL or startup JS to modules in a way that each runtime knows how to deal with.

@jcbeyler
Copy link

@lukewagner : if I wanted to add this to the MVP, how would I go about it and am I welcome to try? :)

I like the export declaration attribute to do this in a "transparent" manner.

@lukewagner
Copy link
Member

@kg We're talking about something not browser-specific here.

@jcbeyler You are indeed most welcome. First step is to add to the design docs, probably Modules.md (intro section) and Modules.md#integration-with-es6-modules to mention this extension. Thinking a bit more about the specifics, I would probably decouple the start function from exports and instead say that modules have (in their header) an optional "start function" identified by index into the main function table (thus, no specific name imposed). This way, your start function isn't necessarily exposed for arbitrary calls by the outside world (which exports are). Then, if you're feeling brave, you could add it to spec/ml-proto along with some tests :)

@ghost
Copy link
Author

ghost commented Oct 13, 2015

Seems to be no need to distinguish between a cold and warm start etc as the wasm code can easily note the state once started so the same function could be used for a cold and warm start.

@jfbastien
Copy link
Member

I'm not sure I understand this proposal. Could you clarify the sequence of events leading to initialization of a program (what the VM does, which developer code is called).

It sounds similar to _start, but different too. How is it different from how Linux usually does things?

How does it interact with dynamic linking?

@lukewagner
Copy link
Member

Could you clarify the sequence of events leading to initialization of a program (what the VM does,
which developer code is called).

If a module declares a start function in its module header, that start function would be executed after the instance is initialized and before the host calls any exports (if any). A Unix-style shell program could use _start as its start function and have no other exports. On the web, a wasm module's start function would be called where the ES6 top-level code would be executed had that module been an ES6 module.

It sounds similar to _start, but different too. How is it different from how Linux usually does things?

This function would not be supplied any arguments nor have any return value. If a host environment wanted to provide argv/argc, it would expose this through some host builtin module imports such that the wasm start function could then make a proper main call. Another difference is that wasm would allow the host to call other functions after the start function if the host wanted.

How does it interact with dynamic linking?

dylibs could have start functions too that would be called when you'd expect (at load-time or before returning from dlopen). I haven't studied them carefully, but this all looks quite symmetric to the behavior of __attribute((constructor))__/DT_INIT with dynamic linking.

@jfbastien
Copy link
Member

OK, so it sounds like there are two things worth borrowing from UNIX-y systems:

  • _start
  • .init_array (the C++ equivalent is redundant with the C one).
    Would both be sufficient to cover what's proposed?

@ghost ghost changed the title Define a cold start function. Define a start function. Oct 13, 2015
@lukewagner
Copy link
Member

@jfbastien Yes, that is what we're talking about. It seems worth discussing whether we actually need a separate start function and array of functions that run before start; it seems simpler to just give the module a single start function and let it do the rest.

@jfbastien
Copy link
Member

Agreed that the moral equivalent of _start is all that's needed. Even with dynamic linking we can invoke it and let library code take care of calling .init_array as it wishes.

FWIW Windows does interesting stuff with DllMain: it has attach and detach events for threads as well as process.

@lukewagner
Copy link
Member

Yeah, it's definitely worth considering having a "thread start function" when we have threads.

@jcbeyler
Copy link

Ok, sounds good to me and I agree with the _start being the idea of what we want. I like not having the name exposed but instead can be defined as an attribute of the module. I'll start on this later this week most likely :)

@jfbastien
Copy link
Member

Would you define a list of attributes, or just the one? That's probably something @dschuff should participate in, since it has to do with dynamic linking.

@jcbeyler
Copy link

I was going to define one attribute for the module. We do have to think of order since the file can contain multiple modules it seems that you would want to have determinism in the order of each module's _start.

@lukewagner
Copy link
Member

@jfbastien I was thinking this would just be another module top-level declaration (sibling to (export ...), (memory ...), etc). Perhaps (start $func) with Ast.module_ containing a new start : var option field (var is the current type alias for an index with source info).

@jcbeyler The .wast file format consumed by ml-proto should be considered a host-specific detail, not part of the wasm spec (that's why parser.mly and script.ml are in host/). From the wasm spec's pov, there is only module instance (you can see this in the interfaces of spec/eval.mli). For ml-proto, I think the right thing to do is to run the start function (via a new Eval.start function) right after Eval.init (which happens when the (module ...) statement is (sequentially) executed.

@jfbastien
Copy link
Member

I'm not sure I like having a top-level statement in the module, versus having a function attribute.

@lukewagner
Copy link
Member

If you're just talking about TextFormat.md, then sure, I'm not talking about that above. But in the AST/s-expr format, I think a top-level statement is subjectively better because:

  • symmetric with current definition of exports (top-level (export "foo" $foo))
  • making it a var option field in AST makes it clear that there is at most 1, which is one less thing to check
  • in the binary encoding, I think we'll want all interesting annotation-like info up front, before the function defs since maximum info up front aids streaming compilation. While binary encoding doesn't need to mirror AST structure (the mapping from AST to binary could hoist things out of func defs tp build the up-front tables), I like having symmetry where feasible since it reduces the conceptual gap.

@jfbastien
Copy link
Member

OK I see your point: (export "foo" $foo) refers to $foo which can have attributes if we add them, so we need something akin to export. sgtm then :)

@kg
Copy link
Contributor

kg commented Oct 14, 2015

making it a var option field in AST makes it clear that there is at most 1, which is one less thing to check

Is it wise to limit it to 1? Wouldn't that prevent static linking of libraries that each have their own _start? I guess the static linker could just glue the function bodies together... Shouldn't matter when we have dynamic linking.

@lukewagner
Copy link
Member

We discussed this briefly above. Agreed static linker should be able to do the gluing.

@dschuff dschuff added this to the MVP milestone Oct 22, 2015
@jcbeyler
Copy link

Started this in the design first:
https://github.com/jcbeyler/design/commit/c78ac628b88fda1e37f2ca9fb967c87d34bab1a7

I've started getting "unrusty" with OCaml but would rather get the details of the design handled to ensure I'm doing the right thing in the spec/ml-proto :)

So criticize and comment away, I've kept it simple for now and letting you tell me what you would like to see: more details, etc.

@lukewagner
Copy link
Member

@jcbeyler Sorry for the delay. I think that looks great! Could you create a PR in this repo?

@jcbeyler
Copy link

jcbeyler commented Dec 8, 2015

And now it's me that apologizes. It seems I totally missed this conversation moving forward.

Here is the PR #495

@sunfishcode
Copy link
Member

With #495 merged, this is fixed.

@dschuff
Copy link
Member

dschuff commented Apr 1, 2016

Copy/pasted from #495

We've started some discussion about how this should integrate with emscripten in emscripten-core/emscripten#4218

It's not really clear from the current text in the design repo how it should work with the web embedding. e.g. when you instantiate the module, is the VM expected to call the start function automatically (if so, when? before Wasm.instantiateModule returns?), or should that be done by the enclosing JS code?

I think it should be put under the control of the JS code. For example (as a strawman) the instance object returned by Wasm.instantiateModule could have a way to determine which function is the start function, and then the JS would have to call that function before calling anything else on the instance. Or just have a method with the same name on every instance that is called by the JS, which itself calls the start function. Or whatever. The point is that it would allow the JS to instantiate the module, then modify the linear memory and do whatever JS wrapper stuff it would need to, then call the wasm start function (which would presumably then call its own global initializer functions).

@ghost
Copy link
Author

ghost commented Apr 1, 2016

@dschuff What I had in mind was supporting code that could both run in a web browser and stand-alone, so that the start function be called in a manner that would work in both contexts and independent of a scripting language in the embedded. If the embedder is doing a lot of preparation work then this would not fit this model, rather the preparation is intended to be done by the wasm instance when starting.

@dschuff
Copy link
Member

dschuff commented Apr 1, 2016

You could specify it in a way that would work for both use cases. Actually you'd want to support standalone applications in both web and non-web environments, plus "library"-type applications (where the expected use case would be that the wasm module has a bunch of exports and JS just calls what it wants). e.g. you could say that compliant implementations must ensure that the start function is called before any other import. For standalone non-web implementations, that could still have multiple interpretations; e.g. an AOT compiler could compile the whole thing to a linux executable with the start function being called by some main() wrapper function. Or if you had an interpreter shell, it would call the start function itself. In the browser the VM could instantiate the module without calling the start function itself, but throw an error if the JS code tried to call an import before calling that.

@ghost
Copy link
Author

ghost commented Apr 2, 2016

@dschuff I don't see why the app should be controlled differently when embedded in a browser versus being embedded elsewhere? So long as both environments support the required APIs then the app would be expected to run. If the wasm app is designed to be able to bootstrap itself from the start function then why shouldn't it do so when run in a browser too?

I think the higher level design of wasm is going in the wrong direction, too web developer centric and depending on a web scripting embedded even for download and pre-assembly, rather there should be wasm mode of operation that does not depend on a web scripting embedder but still works on web browsers.

What are the challenges that block the wasm code bootstrapping itself from the start function?

@dschuff
Copy link
Member

dschuff commented Apr 4, 2016

Allowing the instantiation (i.e. allocation, validation, possibly compilation) to be separated from actually running the module's code has advantages even if the module requires no special outside initialization. For example if the compilation is expected to take a long time, it could use an asynchronous API to avoid blocking the main JS thread. (Furthermore, allowing allocation to be split from validation and compilation would allow re-use of compiled module code for more than one instance of the same module).

I totally agree that an app should just run if the environment supports all of the APIs it needs. If the app is designed to bootstrap itself, then the environment (be it web, node, or non-JS) would just instantiate it and then call the start function. Arguably the wording in the design doc allows this already, and it is outside the scope of the spec. I'm just advocating that we take it into account when designing the web embedding, so that we can achieve both goals: things that can work everywhere do work everywhere, and things that need or benefit from different handling in the web environment can work also.

As to your question, @kripken can say more, but in the emscripten case, emscripten needs to modify the application's linear memory (for example, to preload file contents into memory) before the application's code starts. If there is just one call that allocates the linear memory, compiles, and runs the module, then that will not be possible. But as I mentioned in my first paragraph I still think there are benefits to allowing it to be split, even independently of that.

@kripken
Copy link
Member

kripken commented Apr 4, 2016

The spec says this about the start function:

If the module has a start node defined, the function it refers should be called by the loader after the instance is initialized and before the exported functions are called.

Which suggests it can be called lazily, some time after initialization, but before anything else is called. This is fine except that

  1. It's observable, which seems very strange. The start method might call out to print, for example, and it might print at different times on different browsers? E.g. if 20 frames pass until an export is called, it could be anywhere in between? This seems bad.
  2. The laziness is limited by when exported functions are called, which looks like an attempt at keeping things deterministic, however, if we also take into account memory that was exported, then this isn't good enough. We would have to add "before exported memory is modified", which seems onerous.

My suggestion is that the start function execute immediately after initialization, i.e., synchronously. In practice, that means emscripten wouldn't use a start method, it would just export what it needs for initialization, and then it can call it at the right time, which might be later e.g. if the application preloads files or has other async dependencies, as mentioned above by @dschuff.

@lukewagner
Copy link
Member

We haven't implemented the start function yet in SM, but my expectation was that it would simply be called at the end of Wasm.instantiateModule. When we iterate the JS API and split this one function into two, compile (with sync and async variants) and instantiate, then it seems natural for the start function to be called synchronously by the latter (the timing of which is fully under the control of the user). With ES6 module integration, the start function would naturally be called when the loader spec says to execute the top-level script of the ES6 module (after all modules in the import graph have been fetched, imports/exports resolved, called async in topological order).

Being able to do async initialization is an important use case, though. For lack of support, apparently many node.js modules have an implicit race condition between the completion of an async operation fired off during their module initialization and the first time one of their exports is called. To address this, iiuc, the tentative JS plan is to allow the new await operator to appear in the top-level of a module, allowing the completion of module initialization to block on async operations. As a future feature, we could define something analogous for wasm modules (allowing the start function to kick off a thread and do some parallel work before signalling that it was done). In that case, I think the return value from instantiate would be a Promise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants