Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interop with non-C ABIs in WebAssembly #16639

Open
jamii opened this issue Aug 1, 2023 · 6 comments
Open

Interop with non-C ABIs in WebAssembly #16639

jamii opened this issue Aug 1, 2023 · 6 comments
Labels
arch-wasm 32-bit and 64-bit WebAssembly proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@jamii
Copy link

jamii commented Aug 1, 2023

Currently, functions exported from zig use the clang ABI when compiling to wasm. This is perfect for interop with other languages that use the same ABI.

For interop with weirder targets, it would be useful to be able to import/export functions for any type signature that wasm supports, rather than just the subset used by the clang ABI.

Currently in zig it's not possible to import or export a function if the wasm type has:

The clang ABI also passes structs/arrays with more than one field by reference, so it's not possible in zig to pass structs by value in imported/exported functions. This is only an annoyance for params because we can manually unpack the fields in most case, but there is no workaround for returning structs by value.

This will affect interop with any non-C ABI (eg AssemblyScript can export functions that zig cannot import). I ran into this when writing a toy language runtime that passes around type-tagged pointers by value as (i32, i32):

const FatPointer = extern struct {
    tag: u32,
    ptr: *anyopaque,
};

export fn identity(p: FatPointer) FatPointer {
    return p;
}
> zig build-lib lib/runtime.zig -target wasm32-freestanding -mcpu generic+multivalue -
dynamic -rdynamic -O ReleaseSafe && wasm2wat -f runtime.wasm
(module
  (type (;0;) (func (param i32 i32)))
  (func $identity (type 0) (param i32 i32)
    (i64.store align=4
      (local.get 0)
      (i64.load align=4
        (local.get 1))))
  (memory (;0;) 16)
  (global $__stack_pointer (mut i32) (i32.const 1048576))
  (export "memory" (memory 0))
  (export "identity" (func $identity)))

The clang ABI here requires passing both input and output by reference instead of by value. As mentioned above, we can work around this for the input by manually unpacking it, but there is currently no way to opt in to using multivalue returns from zig.

The workaround available at the moment is to generate a wasm wrapper which reverses the loads/stores generated by the clang ABI.

;; Untested :)
(func $identity_wrapper (param i32 i32) (results i32 i32)
  ;; Store `p` to stack
  (i32.store offset=0 (local.get 0) (global.get $__stack_pointer))
  (i32.store offset=4 (local.get 1) (global.get $__stack_pointer))
  ;; Make space for `p` and result on stack
  (global.set $__stack_pointer (i32.sub (global.get $__stack_pointer) (i32.const  8))
  (call $identity 
    ;; Pointer to `p`
    (global.get $__stack_pointer)
    ;; Pointer to result
    (i32.sub (global.get $__stack_pointer) 4)))
  ;; Reset stack.
  (global.set $__stack_pointer (i32.add (global.get $__stack_pointer) (i32.const  8))
  ;; Return result.
  (i32.load offset=0 (global.get $__stack_pointer))
  (i32.load offset=4 (global.get $__stack_pointer)))

An ideal solution might look something like this:

const FatPointer = struct {
    tag: u32,
    ptr: *anyopaque,
};

export fn identity(p: FatPointer) callconv(.Wasm) FatPointer {
    return p;
}
> zig build-lib lib/runtime.zig -target wasm32-freestanding -mcpu generic+multivalue -
dynamic -rdynamic -O ReleaseSafe && wasm2wat -f runtime.wasm
(module
  (type (;0;) (func (param i32 i32) (result i32 i32)))
  (func $identity (type 0) (param i32 i32) (result i32 i32)
    (local.get 0)
    (local.get 1))
  (memory (;0;) 16)
  (global $__stack_pointer (mut i32) (i32.const 1048576))
  (export "memory" (memory 0))
  (export "identity" (func $identity)))

The rules for callconv(.Wasm) would be:

  • i32,i64,f32,f64 parameters are mapped to the same types in wasm.
  • Same for whatever zig types end up corresponding to vector and ref types, if any.
  • (Non-packed) structs and arrays are passed by value as multiple parameters.
  • If multivalue is not in the target feature set, using structs and arrays in the return type is a compile-time error.
  • Using any other type is a compile-time error.

Example:

// zig type
fn example(a: [2]FatPointer, b: v128) callconv(.Wasm) struct { f32, FatPointer } { ... }
;; wasm type
(func 
  (params 
    ;; a[0]
    i32 i32
    ;; a[1]
    i32 i32
    ;; b
    v128)
  (results
    ;; result[0]
    f32
    ;; result[1]
    i32 i32))

The goal would be to provide the minimum features necessary to allow other ABIs to be expressed in zig via comptime wrappers or codegen. (Eg emulating wasm-bindgen, implementing the wasm component ABI, or writing the runtime for a non-C-like language).

I'm not sure how much of this would have to be supported by upstream llvm. Eg is there bytecode you can emit to get multivalue results, or is it only available via the experimental-mv flag? In the worst case, it might be possible to polyfill by compiling with callconv(.C) and then generating wrappers like the one I wrote above.

@andrewrk andrewrk added this to the 0.12.0 milestone Aug 1, 2023
@andrewrk andrewrk added the arch-wasm 32-bit and 64-bit WebAssembly label Aug 1, 2023
@jamii
Copy link
Author

jamii commented Aug 1, 2023

Rust has a similar option hidden behind an unstable flag - rust-lang/rust#83788. It seems though that the mapping from rust types to wasm types in their case is partly defined by llvm implementation details, which seems like something to avoid. Better to be super explicit and make the user explicitly map types like slices and options into wasm-friendly types.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Aug 1, 2023
@andrewrk
Copy link
Member

andrewrk commented Aug 1, 2023

I like this proposal. What do you think, @Luukdegram?

@jamii
Copy link
Author

jamii commented Aug 1, 2023

https://github.com/jamii/dida/blob/main/bindings/js_common.zig#L142 is an example of the kind of comptime wrapper I'm thinking of. That one just passes *void everywhere, but I think it should be possible to pass things by value using callconv(.Wasm).

@Luukdegram
Copy link
Member

I like this proposal also. We've briefly touched on this topic during the compiler-meeting on 2021-11-25, though I never got around to writing a proper proposal for it. I'm excited @jamii did! Also, we should benefit from Zig being pre-1.0 and experiment with our own Wasm ABI definition, rather than attempting to interop with existing ones such as from Rust. So I'm in favor of @jamii's comment.

I do have some questions regarding the proposed rules:

i32,i64,f32,f64 parameters are mapped to the same types in wasm.

What about other scalar types, such as u8? Are they disallowed, or would they represent the smallest possible Wasm type?
In Wasm a i32 represents a 32-bit integer. It's neither unsigned nor signed. The interpretation is defined by the operation, such as i32.lt_s vs i32.lt. It is true that in Javascript an integer is signed, but ideally, this proposal is runtime-agnostic.

Same for whatever zig types end up corresponding to vector and ref types, if any.

Zig allows for arbitrary vector sizes using @Vector, Wasm only supports v128 when the feature is enabled. Is this a compile-error when the cpu feature is missing? Does the ABI allow arbitrary vector sizes?

Also, I'd like to suggest looking into Wasm's component model, and especially its ABI. It currently doesn't support multi-value returns, but it's worthwhile to verify whether this can/will support the use case as described by the OP.

@jamii
Copy link
Author

jamii commented Aug 3, 2023

I'd be inclined to push as much into libraries as possible, and focus the abi on allowing libraries to express every wasm type. Even the struct unpacking I proposed isn't entirely necessary.

So a really minimal proposal:

  • i32,i64,f32,f64 are mapped directly.
  • Maybe u32/u64 are mapped to i32/i64, because I can't think of a reason for a library to map them to anything else.
  • Using u8 is an error, because it could be mapped to i32 by itself or packed together with some other small parameters - that's a decision for libraries.
  • If vector types are enabled, u128/i128 map to v128. Libraries can decide what to do with @Vector types.
  • If multivalue is enabled, the return type may be a tuple of scalar values.
  • Ref types seem more complicated and it might be reasonable to put them off until llvm decides how to represent them.

Also, I'd like to suggest looking into Wasm's component model,

The component model abi is mostly about serde of values in linear memory. I think that the only relevant part here is how some values are flattened into parameters. This code should be easy to write in comptime zig:

MAX_FLAT_PARAMS = 16
MAX_FLAT_RESULTS = 1

def flatten_functype(ft, context):
  flat_params = flatten_types(ft.param_types())
  if len(flat_params) > MAX_FLAT_PARAMS:
    flat_params = ['i32']

  flat_results = flatten_types(ft.result_types())
  if len(flat_results) > MAX_FLAT_RESULTS:
    match context:
      case 'lift':
        flat_results = ['i32']
      case 'lower':
        flat_params += ['i32']
        flat_results = []

  return CoreFuncType(flat_params, flat_results)

def flatten_types(ts):
  return [ft for t in ts for ft in flatten_type(t)]

def flatten_type(t):
  match despecialize(t):
    case Bool()               : return ['i32']
    case U8() | U16() | U32() : return ['i32']
    case S8() | S16() | S32() : return ['i32']
    case S64() | U64()        : return ['i64']
    case Float32()            : return ['f32']
    case Float64()            : return ['f64']
    case Char()               : return ['i32']
    case String() | List(_)   : return ['i32', 'i32']
    case Record(fields)       : return flatten_record(fields)
    case Variant(cases)       : return flatten_variant(cases)
    case Flags(labels)        : return ['i32'] * num_i32_flags(labels)
    case Own(_) | Borrow(_)   : return ['i32']

def flatten_variant(cases):
  flat = []
  for c in cases:
    if c.t is not None:
      for i,ft in enumerate(flatten_type(c.t)):
        if i < len(flat):
          flat[i] = join(flat[i], ft)
        else:
          flat.append(ft)
  return flatten_type(discriminant_type(cases)) + flat

def flatten_record(fields):
  flat = []
  for f in fields:
    flat += flatten_type(f.t)
  return flat

def join(a, b):
  if a == b: return a
  if (a == 'i32' and b == 'f32') or (a == 'f32' and b == 'i32'): return 'i32'
  return 'i64'

Once the spec supports multivalue we'll have MAX_FLAT_RESULTS > 1, in which case the return type can be a tuple rather than a scalar value.

When MAX_FLAT_PARAMS/RESULTS is exceeded it falls back to passing a pointer to linear memory where the remaining value are serialized. The serialization is orthogonal to the wasm abi.

@JustAnotherCodemonkey
Copy link

Is there any update on this? I have a project to which this would be useful and would be very interested in changes such as this, especially the multi-value returns.

Once the spec supports multivalue

I believe it does now. The github for the proposal is archived and the readme says it was accepted 4 years ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-wasm 32-bit and 64-bit WebAssembly proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

4 participants