-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JS interop: no-frills approach #279
Comments
Forgot to mention: there is a third solution, and that is to make information about exposed fields (and/or methods) part of the canonicalization algorithm. Details TBD, but what it would boil down to is that one way or another we'd have to associate the information how fields are exposed with the Wasm types such that the type system "can see it", and then |
Personally, I would not mind the no-frills approach at all. It certainly is cleanest, and avoids the danger of building in ad-hoc solutions and whose practical value is dubious for anything but toy examples. That said, the RTT design is supposed to provide an easy solution to the example scenario. In case that solution is not clear to everybody reading this, let me spell it out. The modules define imports for the respective RTTs, which they consistently use to create structs:
The JS API has a class
The field names in the struct descriptor become accessors on the observable struct object, the prototype is either the one given with the constructor (as for This further assumes the API shorthand that an instance of type WA.Struct can be dually used as an RTT type in a global descriptor (and others) and as a value of that type, whose default initialisation is itself. With this, the following code would work as expected:
Furthermore, there is no problem with merging the modules in a bundler, since the RTTs are distinct imports. To be sure, this does not prevent the modules from mixing up or returning the wrong kind of struct objects if they're buggy (the Wasm type system does not enforce that), but that's the norm in JavaScript. |
I realized that the no-frills approach is what various existing tools do today in the wasm MVP space. For example Emscripten does this for binding C++ to JS in its WebIDL binder tool. Some example code: struct Parent {
int attr;
}; The following bindings code is emitted in cpp: int EMSCRIPTEN_KEEPALIVE emscripten_bind_Parent_get_attr_0(Parent* self) {
return self->attr;
} And the emitted JS looks like this: Parent.prototype.get_attr = function() {
return _emscripten_bind_Parent_get_attr_0(this.ptr); // ptr is the address in linear memory of the object
}; We have many years' experience of this approach and it works well, so it sounds good to me. With wasm GC it would work even better as the exported getters/setters would pass in typed references and not just |
We could perhaps make the no-frills approach more idiomatic by adding one small frill: interpreting I know @jakobkummerow has expressed concerns about the expense of imperatively setting up all the prototypes like this, but my guess is that only a small minority of types would be meant to be exposed to JS, so only relatively few prototypes would be initialized. The fact that existing binding tools already do something similar suggests that this wouldn't be a problem. |
I don't know, that would closely entangle the way Wasm code has to define its types with the JavaScript side. Essentially, you'd have to generate JS-specific Wasm, and it only works for JS. It's much preferable to have a mechanism that separates embedding concerns through some suitable abstraction. (It also isn't clear how this would interact with subtyping or an efficient representation of structs inside Wasm.) |
I do see the benefits of a no-frills approach in that it doesn't require much change in how the JS API already works. But I also worry that we will eventually want to allow more idiomatic interaction in JS. Will it be possible to set up a no-frills JS API such that it could be extended in the future?
For example, even if Wasm objects are exported like this and fields should only be accessible by exported functions, the JS API still needs to specify how they appear to JS. It'd be nice if this could be done in a way that's relatively compatible with future changes (is it sufficient to have them be sealed empty objects with a null and immutable prototype? Spec-wise, it'd presumably have an internal slot to track the underlying wasm struct) And for post-MVP GC, eventually Wasm structs might be share-able across threads. Keeping that scenario in mind, it would be good to ensure structs could still be compatible somehow with being reflected as JS shared structs in the future so they can share the underlying concurrency technology. In the past, it seems like Wasm's design has generally evolved towards providing the JS API with most of the same abilities as Wasm (eventually). With BigInt integration, JS can provide inhabitants for all of the core types. And with the type reflection proposal's |
I realized that the "no-frills" approach even allows idiomatic JS syntax with some (possibly toolchain-generated) wrapper code, at the cost of some indirection: class Size {
constructor(width, height) {
this.#wasm_size = module.exports.make_size(width, height);
}
// Example for a getter:
width() {
return module.exports.size_get_width(this.#wasm_size);
}
height() {
return module.exports.size_get_height(this.#wasm_size);
}
// Example for an arbitrary custom method:
area() {
return this.width() * this.height();
}
} So even for idiomatic JavaScript, the only thing we really need is an An open question is whether this would be fast enough. I suppose engines could inline those accessors, which should allow a near-optimal performance ceiling. Supporting that is extra work that engine implementers have to do; that said, supporting any form of direct The example above kind of assumes that the JavaScript side controls the creation and lifetime of objects. If we assume that the Wasm side controls object creation, and exposes (some of) them to JavaScript, then things get slightly more complicated but still manageable: in that case, to avoid repeated wrapper object creation, the Wasm object would get an |
Wondering how the class Element {
constructor() {
this._ref = module.exports.new_element();
}
}
class Box {
constructor(element) {
this._ref = module.exports.new_box(element?._ref || null);
}
get element() {
let elementWrapper = module.exports.box_get_element_wrapper(this._ref);
if (!elementWrapper) {
let ref = module.exports.box_get_element(this._ref);
if (ref) {
elementWrapper = Object.create(Element.prototype);
elementWrapper.constructor = Element;
elementWrapper._ref = ref;
module.exports.box_set_element_wrapper(this._ref, elementWrapper);
}
}
return elementWrapper;
}
set element(element) {
module.exports.box_set_element(this._ref, element._ref);
module.exports.box_set_element_wrapper(this._ref, element);
}
} |
Something like that, yeah, modulo details. One key point: the Wasm type for (type $Size (struct
(field $wrapper externref)
(field $width f64)
(field $height f64)
)) class Size {
constructor(width_or_wasm, height) {
if (typeof width_or_wasm === "number") {
this.#wasm_ref = module.exports.make_size(width_or_wasm, height);
module.exports.size_set_wrapper(this.#wasm_ref, this);
} else {
// Assume {width_or_wasm} is an existing Wasm object.
this.#wasm_ref = width_or_wasm;
}
}
static fromWasm(wasm) {
let wrapper = module.exports.size_get_wrapper(wasm);
if (wrapper === null) {
wrapper = new Size(wasm);
module.exports.size_set_wrapper(wasm, wrapper);
}
return wrapper;
}
toWasm() {
return this.#wasm_ref;
}
}
class Rect {
constructor(top, left, width, height) {
let topleft = new Point(top, left);
let size = new Size(width, height);
this.#wasm_ref = module.exports.make_rect(topleft.toWasm(), size.toWasm());
}
toWasm() {
return this.#wasm_ref;
}
get size() {
return Size.fromWasm(module.exports.rect_get_size(this.toWasm()));
}
set size(new_size) {
module.exports.rect_set_size(this.toWasm(), new_size.toWasm());
}
} Or somesuch, but at least that's the high-level concept :-) |
Yes, tooling can do sort of anything, but still it might worth asking what should be the basic capabilities being present to support that efficiently. One possibility could be having WasmGCObject (or whatever the JS-reference to a WasmGC-backed object superclass is) have a prototype like:
On top of this you can then easily add back names or more meaningful accessors externally from the module.
The only problem that I see with approach with explicit accessors is if you want to expose only a subset of members, but that is still problematic basically in any case (since we are saying that accessors are a property of the type, but the same type can be used both in context where it's internal or part of the interface. |
Regarding the wrappers, I am seeing a few aspects that may eventually become relevant: One is that there naturally is a break even point between generating static glue code (lots of repetition but different names, which is probably fine for modules with a small interface) respectively only attaching metadata and utilizing a shared library of sorts to generate the necessary glue from it (fixed size library + compact descriptions, eventually becoming smaller / more easily wieldable). Also, generated static glue code is tied to the respective embedding, while metadata is reusable on a per-embedding or per-language basis, and I am wondering how long the former will be considered practical. Another is that in a more connected world, each language interfacing with a GC module may need its own wrapper, say for example where GC objects are both accessed by JS (using a wrapper like above) and a separate Wasm module (needing another wrapper). There, a single Furthermore, the need for wrappers forms a kind of barrier where metadata must be known anyhow to reasoanbly optimize a module graph, think meta-dce, and it would likely be more practical if toolchains like Binaryen could reason about what's needed and what's not without, for example, having to parse glue code in various languages to find out. Hard to say how much of this is relevant in an MVP, of course, but I'd assume that these aspects will become relevant rather quickly once there is a usable MVP, ultimately leading down the metadata path for its many benefits. |
To clarify: the no-frills approach requires no round trips and no glue code. Exported accessor functions (plus opaque references) enable full expressiveness. The approach is therefore language/embedder independent.
Relying on exported functions (as accessors) is precisely what allows Binaryen to reason about what's needed. (This was, in fact, the original reason why @tlively suggested this approach.)
Nothing in the no-frills approach suggests that anyone should have a reason to parse glue code in any language. |
Partners I've talked to including j2wasm, sheets, and Dart have all indicated that having a minimal JS API in the MVP would be fine for them. At the same time, there's clearly interest in continuing to flesh out ideas of how a richer and more ergonomic JS API could work. How would folks feel about splitting off the rich JS API into a separate post-MVP proposal with its own repo? That way discussion of a richer API could continue without blocking progress or causing uncertainty for the MVP. |
I've added an agenda item for discussing the JS API to our next meeting: WebAssembly/meetings#982 |
I won't be able to make it to the upcoming GC subgroup meeting, but I think it's reasonable to move the richer JS API to a separate proposal (while ensuring what's in the initial minimal API doesn't preclude future extensions). |
I also find the no-frills approach to be a practical, clean and unambiguous way to define object access, provided the performance is satisfactory. I like the way it encapsulates WasmGC types and object layout. Even without any object access features beyond this, we should still define as part of the interop how WasmGC objects behave in various JS operations. I have requested support for |
We agreed at the subgroup meeting yesterday to split out a JS API for customizing accessors and prototypes for GC objects as a post-MVP proposal (https://github.com/WebAssembly/gc-js-customization), so for the MVP we'll be going with something that looks like the no-frills approach described here, at least for structs. We discussed a few options for how arrays might be accessed or created more directly from JS and we might choose to add some frills for arrays in the MVP if we can get a performance benefit from doing so. I'll close this issue for now, but feel free to reopen it if you have anything to add. Options for arrays should probably be discussed in fresh issues. |
Some of us have been working on fleshing out the details of what exactly a "no-frills" approach would mean in practice: https://docs.google.com/document/d/17hCQXOyeSgogpJ0I0wir4LRmdvu4l7Oca6e1NkbVN8M/edit# As a quick summary:
These restrictions aren't meant to be there forever; they're meant to give us an initial, basic, "good-enough" solution, so we can finalize and ship the WasmGC proposal while taking the time we need to come up with richer JS interop. The module producers we talked to agreed that this basic feature set is enough to address their immediate needs. Please see the document for details and FAQ. |
This looks good to me, except for one point:
Remember that we have established in previous discussion that anyref is the type that modules will want to (and should) use for "abstract" imports, where they do not want to commit to whether these are implemented in Wasm or by host code (while externref should only be used for the much narrower use case of specific "host" imports). So disallowing anyref for calls would break the most common use case. |
I suppose we disagree on what "the most common use case" is for the MVP. In my understanding, it is a single-module Wasm app that needs to interact with a JavaScript embedding, and it is well served by this proposal. The future may hold many other scenarios. I fully expect that we'll allow many more types on the boundary. Eventually. |
@jakobkummerow, how much complexity would it add to the implementation to allow |
@tlively : Quite a bit. We're prepared to do it eventually, but realistically it won't get prioritized in the next several months at least. (Background: for startup speed reasons, there's a single signature-independent JS→Wasm call adaptor that's used before signature-specialized wrappers are compiled for hot functions. For the status quo, this adaptor is already ~1000 lines of handwritten assembly per supported platform; it needs to be assembly because it needs to muck with the stack and registers in ways that no compiler supports. Teaching it to do HeapNumber-to-i31ref conversions, and/or other type checks/conversions, is doable but will be annoying. For the time being, we have our hands full with demonstrating performance benefits; features that aren't required for an MVP will have to wait.) |
I was thinking about this more today and I realized that wasm-split would benefit from allowing arbitrary types on the JS boundary. The placeholder trampoline functions that wasm-split uses to download, compile, and instantiate the secondary module need to forward arguments to and results from secondary functions through JS. In principle we could have wasm-split add extra wrappers to translate to and from externref (or anyref, if we only partially relax the proposed restrictions), but then those wrappers would still be executed even after the secondary module is loaded. If we did the conversions and downcasts implicitly on the JS boundary instead, those extra costs would only be paid for the single placeholder function call that loads the secondary module. |
Wrt supporting the same set of reference types for accessing functions, globals, and tables:
Wrt to type reflection: It shouldn't be too hard to design syntactic AST representation for recursive types, in the same manner as we did for other types. The main question to answer is how to express internal recursive references, but that is primarily a question of choice of concrete syntax, not so much semantics. An interesting question that came up yesterday is whether and how type reflection should handle type equivalence. This problem already exists today: function types have structural equivalence, but that isn't reflected in their AST-style representation of the type reflection API; the same is true for all other types that we represent as objects(*). Users are free to create multiple AST objects that represent the same type – and that seems rather hard to avoid without compromising usability of the API. Recursive types do not really add anything new to this problem, equivalent types still wouldn't imply equal AST object identities. (*) In fact, it's already true in the bare Wasm 1.0 API, where things like TableDescriptor and MemoryDescriptor are rudimentary types with structural equivalence. Consequently, if we want to enable reflection on type equivalence, then I think we'd need to provide a dedicated API function for comparing types. This function could then canonicalise its argument internally (and probably cache that), but that would be an implementation detail. We may want a similar function for reflecting on subtyping. But these are extensions that can be considered independently from the GC proposal. And of course, such type comparison functions can already be implemented in user space (though I wouldn't expect the average dev to be able to do so correctly). |
Can anyone identify any problems with allowing arbitrary GC types everywhere on the boundary with JS and performing an implicit downcast in the (The actual implementation of this in V8 might not happen immediately, but that's ok. We would implement it before shipping.) |
An update on [[Get]]: we've discovered an issue, namely that resolving a promise involves a property lookup for
The conceptual simplicity/consistency of (2) is probably preferable over the exceptionalism of (1), so unless someone has a better idea (or different opinion), we'll update the design sketch and V8's implementation accordingly. [1] We could make [[GetPrototypeOf]] also not throw to get even closer to that, but IIUC that's an independent decision because we could specify [[Get]] to return without looking at the prototype chain. |
That's fine with me.
Going with option (2) also seems like a good idea. |
Not currently, though this of course only works with implicit RTTs, so may be expensive to generalise later, e.g., when we introduce generic types. So a meta-level concern with this might be creating expectations that force us on a costly path in the future. |
At the subgroup meeting today we decided to move forward with allowing arbitrary GC types on the boundary with an implicit downcast (preceded by an I'll make a PR adding the contents of the no-frills design doc to this repo, then we can make the change above, then we can close this issue and open new issues for any remaining problems that might come up. |
MVP-JS.md is now updated to reflect the no-frills approach and the spec documents have been updated as well, so I'll close this issue. |
This expands on @tlively 's idea here: #275 (comment)
I've been thinking about the situation created by multiple (semi-cooperative?) modules that use semantically-distinct types which get canonicalized per the isorecursive type system rules, but for which the involved modules would like to set up different JS interop behavior.
For a concrete example, suppose we have:
If we imagine an "idiomatic" JS interop scheme, then that would likely mean some way of setting up named properties (so JS code can use
my_point.x
andmy_size.width
) and/or prototypes (RTT_for_$size.prototype = { area() { return this.width * this.height; } }
etc). It is easy to see that this easily leads to collisions. In the "benign" case, we may simply end up with an object whose.x
and.width
properties are aliasing each other; but trouble arises when both types use the same name for different fields. If prototypes are assigned whole-sale (i.e..prototype = { area() {...} }
), they'll just clobber each other ("last one wins"); if they are assembled piecemeal (i.e..prototype.area = function() { ... }
), we'll again have problems if individual names clash; if we permit divergence between JS-exposed prototypes and Wasm-level typing (as some design explorations have suggested, e.g. by storing prototypes on RTTs, which would make these RTTs distinguishable in JS but indistinguishable in Wasm), then it seems unavoidable that this divergence would be observable from Wasm-module-embedding code and lead to some pretty weird behavior (such as prototypes "magically" changing or getting lost).Whether the JS annotations are set up imperatively or declaratively doesn't make much of a difference for this problem.
One solution to these concerns is to stick our heads in the sand and hope that these cases won't happen much in practice. The primary mitigating factor is likely the fact that (at least with Java-like source languages) it is exceedingly unlikely for two semantically-different classes to have identically-typed vtables, and that alone prevents their isorecursive canonicalization. So maybe it's fine in practice.
Thomas' idea is the other solution. By not exposing named fields or prototypes and instead fully relying on exported functions, we can avoid the whole problem. To be specific: the Wasm modules would, with some module-producer-defined naming convention, export free functions that act as accessors, e.g.:
Both functions, in this case, will compile to the same code, but that's not a problem. If the two modules want to export identically-named functions with different behavior, then that's also not a problem.
Obviously, JS code would have to write
moduleA.exports.$point_get_x(my_point)
instead ofmy_point.x
, which is undoubtedly not idiomatic in JS. The verbosity can be mitigated a bit with shorthands, likelet get_x = moduleA.exports.$point_get_x; get_x(my_point);
.From a higher-level point of view, this approach means that we reconcile the differences in object model / type system between Wasm and other languages (such as JS) by exposing Wasm objects "as they are": when interacting with them, code has to adapt to Wasm conventions and mental models. This is in contrast to alternative designs where Wasm objects would put in an effort to pretend to act like some other language's objects, which is fraught with peril because Wasm cannot possibly hope to do a faithful job of that in all conceivable other languages.
In summary, this "no-frills" approach would have these benefits:
At the cost of a drawback:
Performance should be similar in the limit: engines have to work a little harder to inline such accessors, but there are no fundamental obstacles to doing that.
Thoughts?
The text was updated successfully, but these errors were encountered: