Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WASM Plugin / extension system #104

Closed
thedodd opened this issue Dec 31, 2020 · 17 comments
Closed

WASM Plugin / extension system #104

thedodd opened this issue Dec 31, 2020 · 17 comments
Labels
discussion This item needs some discussion needs design This item needs design work

Comments

@thedodd
Copy link
Member

thedodd commented Dec 31, 2020

Abstract

Given that there will always be new features that folks want to add to Trunk, but adding too much to Trunk core would cause a fair amount of bloat, we will eventually want to expose a plugin system. The following is a non-exhaustive list of what we would like to see from the plugin system (please comment below if you would like to add or remove items):

  • Ability for users to create their own plugins which will be loaded by the Trunk CLI and will be called as part of the standard Trunk build pipeline.
  • Ability for plugins to declare the asset types which they will operate on.
    • In the source HTML, this should be declared as something like <link data-trunk data-plugin rel="my-plugin-type" any-other-attrs="will be passed to plugin"/> (this needs further discussion).
    • Trunk should see that the link is a plugin, and then will call any registered plugin which matches the rel="my-plugin-type".

The Trunk team should build and maintain a trunk-plugin library which exposes common types used by Trunk itself, and which plugins should use to facilitate communication between Trunk and the plugin. Big shoutout to @lukechu10 for pointing out that WASM is perfect for this.

  • This library will expose types needed to declare the Trunk ABI version which the plugin is using. This will allow us to safely evolve the plugin ABI without breaking old plugins.
  • Which runtime should we use? Personally I like wasmer a lot. I've used it a bit and it is pretty solid. A lot of other folks in the WASM community outside of the Rust context are using it as well.
  • What should the ABI look like which the WASM plugin modules need to expose?
  • What are the ABI capabilities which Trunk should expose to plugins? This is gonna be a big item for discussion.

We need to gather some feedback.

  • What do folks want to do with these plugins? This is nothing new. Many of the build/bundle tools in the JS ecosystem have plugin systems.
  • What sort of data will the plugins need?
  • What should our algorithm be for plugin discovery?

We will also want to discuss what Trunk itself should do in order to aid plugin authors in compiling, optimizing and publishing their WASM-plugins.

There is a lot to discuss here.

@thedodd thedodd added discussion This item needs some discussion needs design This item needs design work labels Dec 31, 2020
@rakshith-ravi
Copy link
Contributor

rakshith-ravi commented Jan 4, 2021

Here are some of the possible ways we can approach this:

Option 1: FFI


The plugins can be exported as libraries by the plugin author and they would be imported as ".so" files through dynamic linking by trunk itself.

Pros

  • Well, it's fast. It's all binaries at the end of the day, so it's super fast, and it's super rigid. Things can't go wrong much and it'd simplify error handling.
  • That's pretty much it, in my opinion. Maybe I'm missing something? Idk

Cons

I see a few:

  • API (and consequently, ABI) versioning is horribly difficult. We'll need a function that checks the versions that the plugins are compatible with (if they're built with an older version), and if they're not compatible, we'll either need to give a polyfill layer (which is a pain when it comes to ABIs) or ignore that plugin and throw a warning to the user. Moreover, the ABI for the function to get the plugin version (fn get_plugin_version() -> Version) itself can't change. If that changes, then you can't even call the function that returns the version of the plugin, to know if it's compatible or not.
  • FFI is known to be unsafe and entering unsafe territory is, well, unsafe. We'll have to heavily test for memory issues and segfaults. More importantly, we have to make sure a rogue plugin can't modify values we don't want to give it access to by manipulating the memory.
  • If a new plugin is installed while trunk is running, reloading a library dynamically while the application is running is very difficult. It's not impossible, but we'd rather focus on features in trunk than issues like those, wouldn't we?
  • All the plugin authors would have to built multiple library files (.so, .dll, etc), one for each platform in order to distribute the plugin, which again brings the complexity of plugin distribution. Moreover, the plugin author would have to test the plugin for each platform and if they don't, it gives a bad experience to the end user when they face a platform specific bug. Additionally, if the end user is running their system on an obscure system (raspberry Pi, arm64, armv8, etc), then the plugins may not work for their system even though the plugin has been tested with Linux, albeit x86.

Option 2: Microservice approach


We can follow a microservice-like architecture where the plugins can be a microservice and can communicate with the main trunk CLI using a socket (either TCP socket or a unix socket, but windows is finicky about unix sockets) and transfer data using that. I've made a similar approach when I was working on project Juno. In fact, we can use project Juno itself and build using it, since we'll end up saving time by not writing most of the base for the communication service, and project Juno gives us libraries in Rust to help us with the communication.

Pros

  • The APIs can be easily upgraded whenever needed.
  • The plugins communicate through a socket, which means the plugins can be written in any language that the plugin authors feel comfortable with.
  • Since the plugins can be written in any language, we can write a polyfill library that takes every single webpack / gulp plugin and interfaces it with the socket, automatically increasing the number of plugins we have compatible with trunk. This might not necessarily work, and I haven't taken a look into how this could be possible, but it's an option. If it does, this would exponentially increase the capability of trunk drastically.
  • Using something like Juno would give the plugins "hooks" that they can attach to and respond to those hooks if required.
  • The entire socket communication and handling communication wouldn't have to be done by us anymore.
  • The plugin discovery is automatically done by an accompanying project to juno called Guillotine (which I haven't documented it yet, but basically it handles the entire running of binaries and modules, and restarts it if it crashes automatically)

Cons

  • It will still involve multiple binaries running independently, which is very difficult to keep track of. Works for a microservice architecture, but for something that's gonna run on a user's system, it's very difficult to debug if something goes wrong. Would not recommend going down that path.
  • The problem of binaries being built for each platform still exists. The plugin will have to be built for each platform independently. And if the user is using some obscure platform that rust and trunk supports but not the plugin, it'll become difficult for them.
  • Rogue plugins can make some serious damage to the end user's system if it's not kept in check.

Option 3: Embedded scripting


My personal favorite: some kind of embedded scripting to control what the plugin does. I'd personally recommend Rhai but I'm open for suggestions. The plugin content can be downloaded as a script file and can be loaded into Rhai and executed from there.

Pros

  • Simple and dynamic.
  • Plugins can be distributed using a git repo (kinda like how cargo distributes it's index, or like how typescript distributes it's "DefinitelyTyped" package). Each plugin can be a folder in a git repo with a bunch of rhai files and that repo can be used to distribute content to the end user.
  • We can possibly have a trunk-plugin.toml file in the root of the plugin folder which mentions what hooks it listens to and possibly which file to execute for which hook (if the plugin has multiple files)
  • Rhai makes it safe, since we can control what data the plugin gets and what it doesn't.
  • A crash in a plugin wouldn't crash the entire application
  • Plugins are isolated from each other, and they can be exposed to each other if (and only if) required.
  • The plugins will only be written once and not compiled. So the end user does not have to worry about supporting and distributing plugins for obscure platforms.
  • Dynamic loading of plugins (and even hot-reload of plugins) becomes exponentially easier to deal with.

Cons

  • It's slow. It's naturally gonna be slower than compiled rust, but I don't think it's too slow (since rhai is used for game dev, it's gotta be fast enough) and I believe it's worth the tradeoff

My personal suggestion is to go with option 3. But I'd love to hear more opinions and suggestions so that we can iterate on this and come up with a good approach to this 😬

Food for thought:
Do we want to have a separate .trunk folder in the root of the repo (and consequently move the Trunk.toml there?) that allows for all the trunk config, as well as the plugins folder to be populated there. It'll give us a good working space to put plugin cache or (or maybe plugin data, if the plugin wants to store it's own data) and stuff without worrying about bloating up the user's project.

@ranile
Copy link
Contributor

ranile commented Jan 4, 2021

If we can do safe FFI (like with rodrimati1992/abi_stable_crates/) then that sounds like the best option to me. Its an option worth exploring. If that doesn't work out for whatever reason, scripting language support definitely sounds like a good option (maybe we can even use Python, if that's possible).

@lukechu10
Copy link
Contributor

lukechu10 commented Jan 20, 2021

How about using wasmer or wasmtime and run the plugins in a sandboxed wasm container? It's fast (near native speed), secure (because of sandbox), hot-reloadable (I think), and can eventually support different languages, even though Rust is probably the best. It is also 100% cross-platform. The plugin author can ship the same .wasm file to any platform that supports wasm.

@thedodd
Copy link
Member Author

thedodd commented Jan 20, 2021

@lukechu10 ahhh yea. I love that idea! I'm really glad that you mentioned this @lukechu10! I feel like we would be fools not to pursue this as a foremost option. We are literally a bundler for Rust WASM hahaha!

  • We embed the wasmer (or wasmtime) runtime.
  • Have a config option for the default path to look for plugins.
  • Load the plugins when we see a directive which uses a plugin by name.
  • We can define the WASM ABI which plugins need to expose in a Trunk library. The interface will allow us to identify the plugin and call its various ABI methods.

This is good. Love it.

Part of what we will need to do is provide specific levels of access to filesystem or other such capabilities, but that can be an integral part of the plugin ABI.

@thedodd
Copy link
Member Author

thedodd commented Jan 20, 2021

I am going to update the header of this issue to reflect this option as the primary option moving forward. My big questions right now if we go with this option are:

  • Which runtime should we use? Personally I like wasmer a lot. I've used it a bit and it is pretty solid. A lot of other folks in the WASM community outside of the Rust context are using it as well.
  • What should the ABI look like which the WASM modules need to expose?
  • What are the ABI capabilities which Trunk should expose to plugins? This is gonna be a big item for discussion.

@thedodd thedodd changed the title Plugin / extension system WASM Plugin / extension system Jan 20, 2021
@DzenanJupic
Copy link
Contributor

I love the idea of having a plugin system. That would be quite useful, and I'd like to contribute to that.
In my opinion, it would be nice to allow for specifying when the plugin runs (before, while, or after the usual asset pipelines).

@thedodd
Copy link
Member Author

thedodd commented Mar 15, 2021

@DzenanJupic agreed. Currently the pipeline system has not concept of stages, everything is just spawned and executed eagerly. I do think that introducing a stage concept would be quite valuable, along with the artifacts manifest we've been discussing in other contexts. Having a set of stages plus an artifact manifest would allow for a fairly robust post-processing system — among other things. Off the cuff, we could use the following set of stages:

  • pre-build: anything which can be executed before the main cargo build.
  • build: this basically corresponds 1:1 with the rust-app cargo build, but should probably also include any rust-workers when we finish-up that work in the future.
  • post-build: anything which needs to be executed after the main cargo build.

Alternative

That said, instead of using stages we could use a graph structure. Here's the idea:

  • Every pipeline — defined in the index.html or implicitly via defaults (like the default rust-app) — will be a root node of the build graph.
  • We could update trunk to look for the attributes data-id & data-depends-on.
    • data-id is a way to attach a trunk specific ID to a pipeline (trunk will ensure they are unique).
    • data-depeneds-on would just be a reference to some pipeline by ID, and trunk will ensure that the dependent pipeline is not invoked until the dependee pipeline has completed successfully, and trunk will pass the output of the dependee to the dependent.
    • We could even make the data-depends-on attribute take a vector of IDs, in which case the pipeline would take multiple inputs and would not be invoked until all dependencies have completed successfully.
    • For plugins (which will quite likely be WASM), trunk could just serialize the output as JSON or the like, and then pass that along to dependents. In such a case the WASM pipeline would just have to parse the JSON as an initial step.

This has the advantage of being able to explicitly declare dependencies and order of execution. With stages, dependency order is not really explicit, and if one pipeline depends upon the output of another pipeline from the same stage, then dependency resolution is a bit of a mess.

Turns out, there are lots of nice tools for working with graphs in Rust. I'm actually leaning towards the graph approach. @DzenanJupic thoughts?

@DzenanJupic
Copy link
Contributor

@thedodd That sounds like a great idea and has definitely advantages over just using individual build stages.

Especially because this also allows for parallel execution of independent paths in the graph. If, for example, there's a JS plugin, that minifies javascript files and removes dead code, another plugin/pipeline for compiling SASS could just run side by side with the JS plugin. So implementing a graph like a build system would come with great performance advantages.

@DzenanJupic
Copy link
Contributor

DzenanJupic commented Mar 17, 2021

Regarding the question what the API of the WASM module should look like:

I think that there are three main ways to do this:

  1. Define a specific set of extension-types that have predefined APIs. So for example there could be a type that takes the content of a file, processes it, and spits out a new version of that file. The WASM module then would contain some kind of function fn run(String) -> Result<String, Error>. The user could i.e. specify the type with a data-plugin-type attribute, like: <link data-trunk data-plugin href="main.js" rel="js-minifier" data-plugin-type="process_file"/>.

    • Pros:
      • Straight forward to implement and use
      • Secure by default
      • Easy to handle
    • Cons:
      • Unflexible, since there needs to be a type for every use case
      • Backwards compatibility could get a problem later on
  2. Supply a bunch of host functions to the WASM module. These host functions could be used via #[wasm_bindgen] extern "C" { /* ... */ }. Possible functions would i.e. be read_file, read_dir, or write_file. So basically a restricted clone of WASI with additional security features like "only allow the plugin to read from this dir", or "only expose this file to the plugin". The user could i.e. define a data-plugin-permissions attribute to give certain permissions to the plugin. The WASM module would then contain a fn main() -> Result<(), Error> { /* ... */ } that would do the rest.

    • Pros:
      • Flexible
      • A lot of freedom for the plugin author
      • The extension can retrieve inner state from trunk at runtime
    • Cons:
      • More complex implementation
      • A wrong implementation can lead to security vulnerabilities
      • Backwards compatibility could get a problem later on
  3. Allow the user to specify the arguments himself. There are two ways here. Either have one attribute, i.e. data-plugin-arguments, that allows the user to pass a list of arguments like file(path/to/file), 'some string', 42 to the plugin, or, like @thedodd proposed here to pass all data- attributes to the plugin. The second option is actually better since the naming of the attributes eliminates the need to remember the correct argument order.
    As @thedodd mentioned in this comment even though the user can specify arguments freely, they should probably not be passed to the extension directly. Instead, trunk will store them in a data structure defined in the trunk-plugin crate, and pass it to the WASM module.

    • Pros:
      • Flexible
      • Complete user control
      • Secure by default

I personally think that a mix of two and three could be a good fit. Option two would give plugin authors the ability to communicate with trunk, which can be quite useful. And option three gives the user a lot of control over the plugin.
Option one on the other hand would definitely be the easiest to use for both the user and the plugin author.

What do you think? Is there something I forgot or got wrong?

@thedodd
Copy link
Member Author

thedodd commented Mar 17, 2021

Regarding the question what the API of the WASM module should look like:

@DzenanJupic solid overview. I would actually propose a modified version of option 3., which I've somewhat described above in the paragraph where I talk about the trunk-plugin crate.

  • Instead of users having to declare the arguments to their WASM plugins, Trunk will pass a serialized blob of data to the plugin. Probably just JSON.
  • Our trunk-plugin library (which plugin authors will use to aid in creating their plugins) will declare an ABI versioned model which folks can use along with serde (or some other tool if being written in a different language) to deserialize the blob.

This will provide all of the Pros you've outlined for 3., but with none of the Cons, as we will be providing a well-defined and versioned model for the data which is passed to the plugin.

We can pass along whatever data we see fit to the plugins, but at a minimum it should include the trunk-id (generated by Trunk) of the <link data-trunk data-plugin .../> element which declared the plugin.

  • As mentioned above, Trunk will also call a specific expected function on the plugin to do post-processing of the HTML. This is exactly how Trunk currently works. As pipelines finish, Trunk executes their completion callbacks passing in a mutable handle to the HTML (one at a time, the callbacks are not concurrent).
    • The trunk-plugin module could also provide a mechanism to handle parsing of the HTML using the same library which we use internally in Trunk.
    • There is probably a lot more to cover on this front, but generally this should solve the problem.

Outputs

That said, if we go with the modified 3., we should also require that plugins provide a well-formed JSON (or whatever) blob of output, which Trunk will be able to use for various things (e.g., updating manifest files and the like, see #9 for more details).

Downloading Plugins

Fortunately for us, @dnaka91 has done some really great work over in #146 related to downloading & caching of various artifacts (wasm-bindgen, wasm-opt &c). We SHOULD be able to use these same abstractions to find and download WASM modules to be loaded into our WASM runtime for execution.

There are still outstanding questions on exactly what our protocol should be for defining the location of modules. An easy approach may be to just require plugin authors to use https://wapm.io/ (WASM package manager), but maybe we need more flexibility / less locking.

Thoughts?

@dnaka91
Copy link
Contributor

dnaka91 commented Mar 19, 2021

Just adding my two cents.

I would like to propose a 4th option Protobuf/gRPC. Was looking a bit at the wasmtime and wasmer crates and the interaction with a wasm module seems to require quite a lot of unsafe code and in general lots of effort to pass data around.

Therefore, it could be combined with solution 3 and just use Protobuf for all data. This would allow to have a properly defined interface and allow future upgrades easily. JSON might work as well but doesn't have a schema that can be easily shared.

Another approach would be to require the plugin to be a binary that's then simply called with the Protobuf message passed in over stdin (that's how the protoc compiler does it).
That would not require any setup to load a binary, instead just spawning a process. In addition it would allow to write the plugin in almost any language as we're not limited to only wasm.

And if the binary needs to continue to run (for example invoking it again and again could be slow due to upstart every time), then eventually gRPC could be used, run over stdin/stdout or local TCP.

These approaches have similar issues as to option 2, the Microservice approach, but would help simplify the interface between trunk and plugins.

This is just an idea from my side, option 3 sounds great as well.

@thedodd
Copy link
Member Author

thedodd commented Mar 19, 2021

@dnaka91 thanks for the input. Very interesting set of options. I do think the binary approach (a la protoc) is worth consideration.

One thing I would like to share: https://github.com/wasmerio/wasmer/blob/master/examples/table.rs

  • This example demonstrates a few patterns for loading a WASM module and then invoking an exported function with some arguments.
  • No unsafe required, everything is nicely wrapped in result types and standard Rust error handling.
  • We will need to do a bit of experimentation, but we could totally pass along a bytes array (maybe using just pointers if wasmer-types crate doesn't help) to the plugins.
  • The bytes array we pass to the WASM functions could be protobuf. We could just publish a protobuf file in the root of our repo which folks could reference for codegen in their own plugin repo, which could be any other language.

I will say that I have a lot of hesitation about getting into the land of starting networked components for the plugin system. As soon as we get into that realm, the number of potential problems which users may come across will rise sharply. I would like to keep that option as a last resort.

@thedodd
Copy link
Member Author

thedodd commented Mar 19, 2021

Another thing which we will probably want to do, if we go down the WASM path (which almost without a doubt we will/should), is provide a Trunk CLI subcommand to aid in compiling Rust WASM plugins. We could leverage the work @dnaka91 has already started on downloading and using wasm-opt to optimize the final WASM for the user.

I'll add a few on this topic to this issue's description.

@DzenanJupic
Copy link
Contributor

There are still outstanding questions on exactly what our protocol should be for defining the location of modules. An easy approach may be to just require plugin authors to use https://wapm.io/ (WASM package manager), but maybe we need more flexibility / less locking.

Another thing which we will probably want to do, if we go down the WASM path (which almost without a doubt we will/should), is provide a Trunk CLI subcommand to aid in compiling Rust WASM plugins.

How about a similar approach to cargo with the cargo install command?
Trunk could have it's own ~/.trunk directory. When calling trunk install <PKG>, trunk will look for the plugin on https://wapm.io/ and install it in this directory. Alternatively, plugins could also be installed from GitHub via the --git flag or from a folder via the --path flag.

For plugins that are only used project-wide, there could be a similar directory in the project folder.

@reenigneEsrever92
Copy link

reenigneEsrever92 commented Jul 24, 2021

Maybe nushell is worth a look, too. It basically builds on the idea of piping structured data from one program to another. Maybe that would even be enough. So building with trunk would actually just be a set of programs and their accumulated pipes.

Pros (I think):

  • Great test ability
  • Very Simple
  • Great composability

One could easily download these programs according to a config file (maybe even describe a graphed pipeline in there).

Under any circumstances I think the probably greatest advantage with other systems solving similar Problems in other ecosystems is the complete customizability of a build (see webpack or even gradle for java). Which I can only see be accomplished through embracing the WASM approach being embraced maybe even further than previously discussed?

Would really like to hear your thoughts on that.

@thedodd
Copy link
Member Author

thedodd commented Jul 28, 2021

The simplicity of being able to pass a protobuf payload (or the like) between CLI binaries does seem quite nice. Can't beat the simplicity of such a model.

Now that @dnaka91 's auto-download system has landed, we could leverage that to be able to download and install basically any binary (given enough info), and make them available for potentially multiple projects. This would also work for downloading wasm modules. The plugins could live on a Github release page or any other location really.


All of these patterns have similar pros:

  • We have a nice way to download, cache & reference them for invocation/use.
  • WASM, Rust, Go, C, C++, Swift ... and a bunch of other languages all have support to either produce a nice binary to be used as an extension (CLI approach) or to be compiled as a wasm module for Trunk to load and execute.
  • They will all need to be driven via some data payload communicating available assets, resources, config. And they will all need to produce some output which Trunk can use to synthesis the final build.

Non-wasm cons:

  • They could do ANYTHING, including making insecure network calls, accessing data which they should not be able to access, install programs which you may not want installed, et cetera.

WASM:

  • We can lock down the WASM module's capabilities. EG, only provide FS or network access as needed. The user is able to clearly encode this in their source index.html and/or Trunk.toml.

@techninja1008 techninja1008 mentioned this issue Aug 6, 2021
4 tasks
@thedodd thedodd mentioned this issue Aug 8, 2021
@thedodd
Copy link
Member Author

thedodd commented Aug 8, 2021

Moving the discussion over to #223. We can re-open this if needed.

@thedodd thedodd closed this as completed Aug 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion This item needs some discussion needs design This item needs design work
Projects
None yet
Development

No branches or pull requests

7 participants