Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENV module aliasing #943

Open
guybedford opened this issue Mar 14, 2017 · 20 comments
Open

ENV module aliasing #943

guybedford opened this issue Mar 14, 2017 · 20 comments

Comments

@guybedford
Copy link

I'm interested in compiling C to WASM in a way that allows me to alias the 'ENV' import for external functions ((import "env" "external_func"...) output in wast).

The natural thing I'd like to be able to do is to supply a function attribute in C that associates the external function with an import module name in WASM - something like:

void custom_logger(int val) __attribute__ ((import, name ("./custom-logger.wasm")));

This repo seems to define the "env" as being the default import, along with an importMap system, although I couldn't tell how that could be configured.

Does something similar to this already exist, or does work along these lines sound like a worthwhile effort to pursue through the toolchain?

@kripken
Copy link
Member

kripken commented Mar 14, 2017

Is the idea that you'd import one wasm module from another? You'd need to specify a function name then, not an entire wasm module, unless you're suggesting an approach where there is a whole module for each function?

In general in the current toolchain, every unsupplied symbol becomes an import from env. Generally people then implement those in JS (using --js-library etc.). For implementation in wasm, that could be done using dynamic linking, e.g. dlopen, which works (but it not optimized).

@guybedford
Copy link
Author

Potentially for importing a pre-built wasm library from wasm, but a better example might be loading a js function (say a logger) where we know it already exists in a compiled form in js. The function name can come from the same name as the c function, although a custom symbol could be provided for that as well, but I'm not sure it's strictly necessary flexibility.

It's just about controlling the 'env' name when it is known to be another resolution during the linking process, to get some analog to c-style static library linking.

Let me know if that makes sense?

@kripken
Copy link
Member

kripken commented Mar 15, 2017

I don't think I understand yet. Where do you replace "env" in your example void custom_logger(int val) __attribute__ ((import, name ("./custom-logger.wasm")));?

@guybedford
Copy link
Author

The example workflow would replace 'env' in the output with './custom-logger.wasm' (should really use '.js' in the example though), where 'void custom_logger(int val)' is a function definition only.

@guybedford
Copy link
Author

To clarify the exact workflow, the idea is that the following:

void custom_logger(int val) __attribute__ ((import, name ("./custom-logger.js")));

where custom_logger is treated as an external symbol, provides the WAST output:

(import "./custom-logger.js" "custom_logger" (func $custom_logger (param i32)))

The process of setting this env path could be an option somewhere in the workflow as well, it just seemed like an attribute carried through the chain of tooling the easiest from what I could tell.

@kripken
Copy link
Member

kripken commented Mar 15, 2017

I see. Yeah, seems like an option to allow customizing the module name for an import makes sense. Curious what other people think.

@guybedford
Copy link
Author

@sunfishcode I'd be interested to hear your feedback here if you can. It's just about ensuring we have the tools for managing this compilation boundary down to the runtime semantics, in whatever form.

@dschuff
Copy link
Member

dschuff commented Mar 17, 2017

This sort of capability is pretty much the sort of thing we've been thinking about defining along with the linking work.

At a high level, we want a way to define:

  • What functions are exported when a module is linked (this would probably include some C++ syntax and a linker capability; see also Valid imports and exports on the web #585)
  • What the "module name" of a linked module is. This would at least include a JS capability because you explicitly provide this mapping from import module name to module when you instantiate a wasm module. But it might also include some sort of metadata on the module, so the JS loader code knows what the module thinks its name is
  • How undefined functions are handled when linking, and what module name imported functions have after linking. This would also include both C++ syntax and linker capability, and is what you are talking about here.

Obviously how this is designed depends on the overall static and dynamic linking scheme is expected to work. We've been working on the linking bits in https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md and in the LLVM/binaryen/wabt code and haven't gotten as far as the C++ syntax yet but

  1. we are getting close enough now that it's probably worth bringing it up again and
  2. it makes sense to have the syntax also work as an "escape hatch" to operate outside that system. Because it totally makes sense to say "I just want to provide some JS module and function and make it work myself", which is kind of what I think you are talking about here. Also emscripten's various wasm<->JS binding mechanisms would presumably build on this primitive too.

@dschuff
Copy link
Member

dschuff commented Mar 17, 2017

oh and along with 2 above, I think the syntax you propose here is pretty much along the lines of what we've been thinking about too.

@sunfishcode
Copy link
Member

As far as I'm aware, the "env" module is just something that works for simple cases right now, rather than being part of a more elaborate plan.

The way two-level namespacing works on Darwin is that, at static link time, the linker resolves symbols that will be provided by dynamic libraries, and records which library each symbol is resolved by. I expect we'll eventually have something similar to this, where the source code doesn't specify module names, and then linkers can rewrite the module name afterwards when it's determined which modules will be providing which symbols.

Do we also need some way for source code to specify module names up front? It would certainly be useful right now, while most of the tools aren't mature and there aren't many other options. Longer term, it might be a little redundant with the linker module detection, but it might continue to be more convenient for some use cases.

@dschuff
Copy link
Member

dschuff commented Mar 17, 2017

Yeah I agree that linker module detection would be the expected way for things to work, but I also think it would be useful to have the "escape hatch" to allow the user (or some source-level framework) to specify how particular symbols are imported.

@guybedford
Copy link
Author

guybedford commented Mar 17, 2017

To give some background, I'm the author of SystemJS and node-es-module-loader, both of which provide the ability to load WASM modules alongside ES modules in the browser and NodeJS respectively.

As I've been experimenting with build tooling for WASM, I've been defining this boundary by using visibility attributes on functions exactly as in the #585 approach which works perfectly to my needs, and then this exactly handles the module name component.

Certainly this is a process that should be handled by the linker and overall build tooling, where the resolution should be determined by the tool - so an attribute shouldn't necessarily be used directly by end-users... but if the attribute were to be supported through the toolchain, that would then enable a path for the linker tooling to use this attribute to communicate to LLVM potentially? Then it just happens to provide a path forward for tooling today.

So my question is - does a function attribute make sense as the unit of management for link tooling in future, and if not, what other mechanism might there be?

@sunfishcode
Copy link
Member

The use case of typical C/C++ code that doesn't have module names, and that relies on the linker to assign module names, wouldn't use attributes. In this scenario, the linker doesn't need to feed any information back into LLVM etc.; it just needs to add module names to the imports of the output wasm module.

So, I think the answer to your specific question is that no, the attribute can't be the exclusive unit of management for link tooling. However, it may yet be worth having, along side other tools.

One question I have is about the meaning of module names. A filesystem path like "./custom-logger.js" is probably very handy for simple cases, because you can just provide that file yourself at the appropriate path in the filesystem. But it's another matter in the case of using libraries written by different people. If you're using a library, do you want it using attributes to specify paths to where it expects various modules will be at runtime? Will module names differ if you're using node versus running in a browser? I haven't thought about this deeply and don't know what the answers are.

@guybedford
Copy link
Author

The linker in C/C++ is well-established such that resolving a symbol can be assumed a standard protocol. We don't have this same thing in the world of the JS/WASM runtime - rather we have NodeJS module resolution and browser module resolution, and then compilers and build tools all have their own configurations and options for customizing this module resolution.

I think it will be important for LLVM to be able to serve this need for WASM compilations to be able to fit into the JS ecosystem in this way. This would mean accepting that LLVM shouldn't be a source of truth for determining the output import names, but rather should allow these to be tuned from tools that drive the process around LLVM. It is possible in JS (and likely also in WASM) to build modules without even having external modules present at build time as we don't need binary information as the export interface is already clear.

A function attribute would lend itself well to being driven through tooling, alternatively some ability to provide a manifest describing the import boundary and what import names should be emitted.

I really think some path here is needed, and it would be great to see how progress can be made. I'd be interested to hear further what would be the best way to go about this, whether is it being taken care of, is it not yet necessary or ready, or if can I assist in any way?

@sunfishcode
Copy link
Member

I'm not opposed to the attribute. I'd like to understand it a little better.

Would you mind spelling out a use case or two in a little more detail? Who would typically pick the module names, and at which phase in the build pipeline relative to C/C++ compilation? What information do they include (filesystem paths, versioning, prefixing/mangling, or anything else than a simple identifier)?

Do module names ever need to be different depending on the environment the code will be run in? For example node versus browser? Or, runtime deployments with differing filesystem layouts?

@guybedford
Copy link
Author

So the assumption I'm running on here is that we would be able to load WASM alongside ES modules in both NodeJS and the browser via

import {fn} from './file.wasm';
export function thing () {
  return fn();
}

, with WASM in turn loading JS and other WASM imports using the same loader resolution process, which seems to be a discussed direction in the WASM docs.

The general workflow for JS apps is to start with something that loads based on the NodeJS module resolution system, then to build that into an optimized browser build. It's pretty clear that this heavily ingrained approach is not going to be changing at all easily at least.

Say I'm a library author writing a piece of JS code I want to publish to npm, which has a part of itself written in WASM, and the WASM also imports from JS. In writing the WASM file that imports a local JS file, I would want to be sure that the import specifier in file.wasm is loading ./wasm-dep.js from the same folder.

So assuming this kind of JS/WASM module loading integration in browser and in Node (which, while I understand the dynamic WebAssembly.instantiate will always be supported, enables easier portable protocols for integration into Node and the browser), the only information needed is the ability to control this output module specifier to control the JS/WASM static import boundary. Even if my library is built using some magical new language build tool compiling to JS and WASM, if I want to publish portable code to npm, that tool would likely also want the ability to set this module name in order to provide portable library interfaces from WASM.

For browser optimization a build tool will then take that combined tree and perform optimizations, including perhaps mangling and altering module names and resolutions for the JS/WASM interface boundaries, but this process would at this point likely be independent of LLVM, at least initially in any tooling that would be developed for this. So the main use case that applies initially I think is the one of creating portable libraries for publishing to npm or browser CDNs.

@guybedford
Copy link
Author

To answer the other question, for the Node v browser code distinction, the standard process it to provide separate entry points into the app (main and browser configured in package.json), then to rely on static dead code ellimination from inlining conditions like if (typeof window !== 'undefined') require('browser-code');. Variations on these themes will likely naturally scale to the WASM/ES module contracts for builds between Node and browser.

@sunfishcode
Copy link
Member

Ok. For this use case of portable libraries for publishing to npm or browser CDNs specifically:

Who would typically pick the module names, and at which phase in the build pipeline relative to C/C++ compilation? I'm not very familiar with modules, but from what I'm reading now, it looks like browser-compatible module names are URLs (possibly relative), which seems like they'd be fairly site-specific, and not the kind of thing that a C/C++ library shared by multiple sites would want to have hard-coded. Consequently, the main use case for this attribute would be in tools that generate C/C++ header files, for creating builds tailored to specific sites. Does that sound right?

@guybedford
Copy link
Author

Yes exactly, this is very much for tool wrapping headers or in the bc / s files, assuming we have this ES module / WASM module loader integration.

It's a little wider though as even at the development phase there is a need to want to control the module specifier - so it would even be part of my process as an app developer, building C into wasm with the right names.

There may also be cases where shared C/C++ libraries want to hard-code these specifier names. Say if the C/C++ library wanted to specify a dependency on a WASM npm library from npm itself, for the "wasm architecture" case, it could hard code that dependency from C as a plain specifier like c-lib/x, but mostly tooling injections would likely be the workflow I'd expect.

@sunfishcode
Copy link
Member

Thanks, I think I have a handle on it now. The import directive sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants