-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace load
with load-by-name
#74
base: main
Are you sure you want to change the base?
Conversation
This change removes the `load` function, which generated a `graph` using some opaque bytes, a `graph-encoding` enum, and an `execution-target`. This mechanism allowed WebAssembly guest code (i.e., running within the WebAssembly sandbox) to control _when_ a model is loaded, but by doing so, exposed details that users will likely not need. In FaaS use cases, e.g., user code simply does not have the time to retrieve and load a model for every HTTP request. This PR proposes instead that users _always_ load models outside the sandbox and then load them by a host-specified name. This is a proposal intended for discussion, not a foregone conclusion, so please provide feedback! If you have a use case that relies directly on users being able to load models via buffers, that would undermine the assumptions of this PR (that no one will use wasi-nn in this way). But consider the downsides of the current approach: wasi-nn must keep track of an ever growing list of graph encodings and users must somehow "see through" wasi-nn to set up the model buffers. Switching to `load-by-name`--now called `load`--would resolve these issues, moving any model and framework details into the host configuration, where they already exist anyways.
Greetings from the WasmEdge team. We prefer to replace the current |
I agree with the replacement of
By using a URI string, I believe the backend could be smart enough to fetch / cache models rather than relying on a shared moniker between the guest and the host. |
cc: @shschaefer, @radu-matei, @geekbeast |
This change alters the `wasi-nn` world to split out two different modes of operation: - `inference`: this continues the traditional mechanism for computing with wasi-nn, by passing named `tensor`s to a `context`. Now that `tensor`s are resources, we pass all inputs and return all outputs together, eliminating `get-input` and `set-output` - `prompt`: this new mode expects a `string` prompt which is passed along to a backend LLM. The returned string is not streamed, but could be in the future This change also adds metadata modification of the `graph` via `list-properties`, `get-property` and `set-property`. It is unclear whether these methods should hang off the `context` objects instead (TODO). It is also unclear whether the model of `load`-ing a `graph` and then initializing it into one of the two modes via `inference::init` or `prompt::init` is the best approach; most graphs are one or the other so it does not make sense to open the door to `init` failures. [bytecodealliance#74] (replace `load` with `load-by-name`) is replicated in this commit. [bytecodealliance#75] (return errors as records) and [bytecodealliance#76] (remove the error constructor) is superseded by this commit, since every error is simply returned as a `string` and the `error` resource is removed. [bytecodealliance#74]: WebAssembly/wasi-nn#74 [bytecodealliance#75]: WebAssembly/wasi-nn#75 [bytecodealliance#76]: WebAssembly/wasi-nn#76
I don't have any objections, but I agree with Stuart that we should resolve conformance rules before making a decision here. I think URIs could be a potential future enhancement, but I think in the short term it is up to the host if it does something smart with the name. WASI-NN shouldn't care about the URIs, but any host is free to implement some URI-handling mechanism that correctly routes loading the model underneath the covers. |
This change removes the
load
function, which generated agraph
using some opaque bytes, agraph-encoding
enum, and anexecution-target
. This mechanism allowed WebAssembly guest code (i.e., running within the WebAssembly sandbox) to control when a model is loaded, but by doing so, exposed details that users will likely not need. In FaaS use cases, e.g., user code simply does not have the time to retrieve and load a model for every HTTP request.This PR proposes instead that users always load models outside the sandbox and then load them by a host-specified name. This is a proposal intended for discussion, not a foregone conclusion, so please provide feedback! If you have a use case that relies directly on users being able to load models via buffers, that would undermine the assumptions of this PR (that no one will use wasi-nn in this way).
But consider the downsides of the current approach: wasi-nn must keep track of an ever growing list of graph encodings and users must somehow "see through" wasi-nn to set up the model buffers. Switching to
load-by-name
--now calledload
--would resolve these issues, moving any model and framework details into the host configuration, where they already exist anyways.