-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Combine opaque external references with explicit wasm-extern casts #307
Comments
What I like about this idea is that it's a general solution that addresses several issues that came up in previous discussions (such as |
+1 to reintroducing From what I glean above, the One concern that @rossberg has brought up in that context is a Wasm module should be virtualizable as a host module and vice versa. We'd need to be able to specify a type above these two, or have type imports, for that to work. |
Perhaps a different approach is to have type imports. An externref could then be a type import provided by the host. |
Correct. These would be the same "casts" that are happening in the boundary.
Would that not be achieved by creating references in the host, then casting them to |
I agree that these casts are essentially hidden in the JS embedding, so making them explicit is attractive. That does have a downside in that they impose a cost on cross-module wasm -> wasm calls that use |
I also like the idea of Incidentally, I don't think we can emulate |
Right, but the downside compared the the current state would exist only in the case where they communicate directly though |
IIUC (please correct me if I'm wrong), in the presence of
where If it does not, a likely (here: consolidated) alternative hierarchy with
yet this one seems odd on the background that a) from a producer's point of view (regardless of representation issues), the most viable/convenient/expected hierarchy would be the initially envisioned
and b) given the former dataref/funcref/externref separation, if there is a JS import that accepts any JS/Wasm value (and would hence prefer to use the original What am I missing? :) |
Casts from dataref or funcref to externref should always succeed, thus the import could accept externref. |
A few observations:
Putting this together, there is the following conceptual hierarchy:
where the parenthesised nodes are types that we may not necessarily name in Wasm. Here, If
and for completeness, maybe also
If But if That is, every module will have to choose whether it is optimising for linking against the host or linking against another Wasm module. Note that this does not just affect exports, but internal choices like type definitions for data structures, so having two versions for every export function does not solve the problem (nor would it scale). Furthermore, the problem extends to exports like mutable tables and globals, which you cannot duplicate anyway. So, while I think separating the types is possible, there is no free lunch, and this certainly creates its own set of complexities and problems (some of which may be difficult to predict). |
At the last subgroup meeting we all agreed that figuring out the type hierarchy was a high priority, so it would be good to move this discussion forward as much as we can. I was thinking about the top In the OP, @manoskouk wrote:
But I don't see why we would need to apply reverse transformations on upcasts to Also, are representation changes on downcasts acceptable? I could imagine this could get hairy if the engine moved code around so that the value to be cast was materialized in multiple locations (e.g. in two different registers). Then the cast would have to update the representation in multiple locations. |
The only case that I can think of a representation on downcast being a problem is if both the from and to type are both under As for multiple places to update representations, a cast produces a new value and leaves the old value unchanged. |
Other types may have different representations in JS and Wasm. In V8, these are currently functions, but it may expand in the future. |
I very much like the framing of this as making boundary conversions optional and explicit. With this in mind, it makes sense to extend the conversions to any Wasm type, i.e. not just reference types. We can convert to/from any Wasm type at the boundary, and the explicit conversions would mirror that capability. We would have two instructions (names not meant as suggestions):
This positions the feature as independent of the In this model, we might want to rename the |
@tlively : I think having that top type or not doesn't make much of a practical difference (so I don't have a strong opinion either way). A minor argument against having |
Right, with lazily-converted These converting downcasts would only be acceptable if they could be avoided where So it makes sense to have a With a top type (A):
Simplified from
Or if we wanted
Simplified from
Without a top type we have either (C):
Or, if we want to continue allowing
Simplified from
@askeksa-google, I don't think trying to generalize |
@tlively : The key consequence of merging With today's state of the world, we specifically have a cost when converting between (It's hard to say whether structs/arrays might one day also have explicit conversions at the intern<->extern boundary; that depends in part on how the spec evolves.) |
In my mental model |
FWIW, even though it differs from the initial idea proposed in the OP, option (A)
(names according to my mental model) seems attractive both ergonomically and given the aspects that a) as long as |
When an engine has different representations for the same thing depending on whether it's inside Wasm or outside (as V8 currently does for functions; I think a NaN-boxing engine would have a similar situation for small integers, at least if we end up specifying that JS numbers must be converted to Wasm i31refs when possible; on non-pointer-compressed 64-bit configurations, even in V8 there's a conversion-requiring difference between Smi and i31ref), then conversions must be performed when a value leaves Wasm (e.g. is being passed as parameter to an imported function, or returned as result from an exported function, or read by the outside world from an exported global/table, etc). For example, in V8 a wasm-function must be converted back to a js-function, in a NaN-boxing engine an i31ref has to be turned back into a float64. Another way of phrasing it: if conversions to |
Aha, so having if we had a top type with Edit: The benefit of a top type would be for polymorphic modules that never downcast or convert and want to be generic over all reference types. |
Oh, also having |
The way I see it, having a |
I didn't mean to suggest that we extend the My proposal was perhaps a bit confusingly phrased, since I was conflating two issues which should be considered separately:
|
What a top type adds to the design is avoiding the need for a module to choose between intern or extern representations for handling unknown outside values. If a module does not care itself about that representation (because it's "polymorphic"), then neither choice is ideal, since either would impose extra cost on some of the clients. With a "lazy" top type, the module would not impose a cost on either, because no conversion is ever actually performed in the common case (unless different clients are mixed somehow).
No, that does not work. Their representations could be incompatible, which is the point of separating the types. For example, a data ref might be one word, a func ref two words. Then, certainly, a nullref array cannot be a subtype of both a dataref array and a funcref array. You have to keep the hierarchies completely disjoint. |
This is not what I meant: |
Oh, okay. But what would be the type of It's a moot point: since we'll already need distinct bottom types for the three hierarchies, the three nullref types exist automatically, by way of being |
Right, |
Right, another way to say this is that |
@jakobkummerow thanks! This does cause me to have a stronger opinion that we should defer In practical terms, the proposed |
I might just be misunderstanding the diagram but my assumption is that (1) if there is an edge between two types, the "higher" type is the supertype of the "lower" type and (2) because subtyping is non-coercive, an engine cannot rely on being able to perform a representation change on upcast and thus this diagram implies that all types shown must share the same universal representation. The prose below the diagram seems to suggest that the intention is to let the engine have different representations for different types in the tree. If so, could we have a diagram that only shows subtyping (as opposed to coercibility)? |
@lukewagner, your assumptions are right, but the interesting part is that the same value (e.g. an i31 or a func) can have two different representations (an |
@conrad-watt, regardless of @tlively, the diagram would be fine by me. Plus I think:
I'm indifferent about a |
I don't think this gets to the issue discussed above. If I have an |
Sure, the only representation changes must happen on downcasts. I'm not sure the described scenario can even come up. If a host engine was using 30 bit unboxed integers, then the only rational reason would be that its GC uses two tagging bits. In that case, it cannot possibly have 31 bit unboxed integers in the same GC to boot. Regardless of the hierarchy, such an engine either would have to change its implementation, or box i31refs that overflow 30 bits and suck up the bad performance – which also resolves the above problem. In reality, our design is essentially assuming a single tag bit on 32 bit architectures. |
I'm worried that we're not going to be able to anticipate all of the consequences of this form of I appreciate that this concern isn't necessarily actionable beyond "defer |
To refine what I stated above, there indeed is an additional assumption in the presence of Of course, the more fundamental constraint is that, given (*) It may still be different from the representation outside Wasm, e.g., there could be a uniform conversion for |
@rossberg I agree with the need for Overall, "upcasts"[1] along the subtype hierarchy cannot imply a representation change. Otherwise, at every point where a value of a subtype can be passed to a supertype anywhere in the Wasm program or even at the boundary ( [1] Normally this is referred to as "subsumption." I think we should prefer that term because "cast" has the connotation of a runtime operation of some kind, and there is never a runtime operation for subsumption. |
As for Yes, we can add I don't agree that we should conservatively leave Thus I think the performance argument for [1] On the other hand, two nulls might allow V8 to use a null object (valid reference) for JavaScript and the null pointer ( |
@titzer, yes, I believe we all agree that subtyping must be non-coercive. I also agree that "upcast" is a misleading term. Yes, we called it "pointer" in some of the discussions. Though as a final name that may be odd: pointer and reference are often synonymous in other contexts, so what's a "pointer reference"? FWIW, I don't think an implementation would need to represent the different null values differently. If the type hierarchies are disjoint, they can never be mixed anyway. But it would have to deal with different null types. |
I know, but disjoint implies the engine could have different null values if it wanted to. In V8, I was thinking of the case if they are not disjoint and, a JS null object subsumed to Thinking it through more, it's still possible to have different null values for |
At the meeting today we had consensus to move forward with the three-pronged approach, to be revisited only in the presence of significant new user/implementer feedback. I know that this is not everyone's preferred solution, but I'm glad that we have made a decision and can now move forward. We should now focus on whether we should have @titzer's proposed |
Pondering more over the additional instructions that we'll need to support the 3 separate hierarchies, I realised the following:
Maybe it's too late over here, but at this point, the design we chose looks increasingly silly, and I wonder if we should reconsider the decision to have a separate representation for functions in the MVP. Note that we can also add it later, as the separate type of "raw" code pointers that it is. Thoughts? |
Good points, and I don't think they've been raised before. I don't think it would entirely violate the spirit of our consensus to consider changing the three-pronged approach to a two-pronged approach, since fundamentally the question we answered was whether we should move forward with a single top type. That being said, I don't think these points invalidate the reasons we had for keeping Unless anyone feels strongly that these new considerations change the picture, we should stay the course with the three-pronged design. |
Could an even more minimal option be to not attempt to provide any coercions between |
Apologies for missing the meeting, I have a recurring conflict at exactly that time and have to juggle. I think we should go with the three-pronged consensus for now, in the interest of having a narrow enough cone of uncertainty to iron out the binary format and have tests for it. I love the part where I have some code to write and some tests that tell me if that code is working. Let's get to that part. There still seems to be disagreement on where the burden of proof lies w.r.t to the |
@rossberg: I think I see your point, but I don't think this creates a silly situation: conceptually, I think of Wasm funcrefs exported to JS as "JS wrapper objects that (internally) reference a Wasm function" (whether they're actually implemented as wrapper objects is irrelevant). So if such a JS object is passed to Wasm as an |
@lukewagner, since func refs can already escape into the JS universe, and the conversion happens implicitly at the boundary, we cannot truly remove that conversion. But you are right that we can probably make it unobservable from within Wasm, analogously to number types, as you suggested. That makes sense to me. Okay, another question. In the past, we repeatedly talked about the possibility of Wasm reference types that are not allowed to be passed to JS (similar to v128, e.g., consider continuation objects from the stack switching proposal). What would this mean for the intern->extern conversion lowered into Wasm?
|
Could such values be lossily converted at the |
@conrad-watt, that would fall under the "otherwise fail" clause I mentioned. It still means that the semantics is host-dependent. In particular, you can't know whether intern->extern followed by extern->intern succeeds and returns the original value. So, no. |
@rossberg I was under the impression is that |
I think the "never fail" option makes a lot of sense. Just like |
@lukewagner, I think I agree. Though one reason to prevent values from crossing is not to expose identity/equality to JS, when they are not eqrefs in Wasm, e.g., to avoid exposing implementation details. OTOH, we already have that issue with funcrefs, and they are exposed. @titzer, I agree that host-dependent behaviour would be fine if it was an import. But I'm not sure that we have a good model yet for such a "system import", or that it would meet the expectations of the group. |
@rossberg Yeah, that's a good concern. I wonder if maybe JS exotic object semantics give us hooks for making object identity unobservable? Otherwise, it seems like we could specify a "new object every time" semantics which would probably be fine. |
As per today's meeting, I think we don't need a |
I'll go ahead and close this issue. Any follow-up discussions can be new issues. |
In issue #293, we discussed the possibility that
funcref
not be a subtype ofanyref
, to avoid representation-change costs at the JS/Wasm boundary. It is reasonable to assume that other types might also require similar transformations in the future:i31ref
will most probably require some form of range check, and it is not unlikely that structs and arrays will require some wrapping to function optimally in both worlds.This creates a situation where passing a value to wasm as
anyref
carries significant cost at the boundary, since every object has to be checked against every Wasm type, in case a representation change has to be applied. The idea to apply these checks during downcasts fromanyref
will not work, as we would need to apply the reverse transformation when upcasting toanyref
, breaking the fundamental assumption that upcasts are no-ops.Therefore I suggest the following (assuming
funcref <: anyref
, but this does not matter for this discussion):externref
type which represents host references in the host's representation. This will be separate from the type hierarchy, i.e. not a subtype ofanyref
.anyref
is not just a supertype offuncref
,i31ref
, anddataref
, but in fact contains no values that do not classify as one of these three types. In other words,anyref
is the type of all references that can be generated by a wasm module.anyref
toexternref
and vice versa. These will not be free, and will perform the same representation change as at the Wasm/JS boundary. For performance, we will also introduce additional casts fromexternref
to more specific types. The specifics of the representation change will of course depend on the module's host.With this design
externref
and have the guarantee that there is no cost at the boundary. Such references might even be Wasm references (albeit in the host's representation); this way, another Wasm module can create objects internally and pass them to anexternref
interface.anyref
.I understand that the premise of this post is not a given. However, I also do not see the downside: even if we decide that
funcref </: anyref
, we allow for more flexibility in the implementation of other (current and future) reference types, and allow Wasm modules to specify more cleanly whether they expose an opaque interface.The text was updated successfully, but these errors were encountered: