- 
                Notifications
    You must be signed in to change notification settings 
- Fork 39
Description
Currently the wasi-nn API only allows loading ML models from their byte-serialized format (i.e., using load). This can be problematic for several reasons:
- models may be large — "gigabytes" large
- not all backends provide a way to load models from bytes (e.g., TF expects a certain filesystem layout)
- retrieving and loading a model can be the most expensive part of an inference request — hosts may want to load a model once and reuse it across multiple Wasm module instantiations
I would like to propose "named models" — a way of solving these issues. Other WASI proposals, such as wasi-filesystem and wasi-sockets, provide a way of creating pre-instantiation resources that are then available to the Wasm module once instantiated (see, e.g., the --dir and --listenfd flags on the Wasmtime CLI). If a similar idea were available to wasi-nn, users could specify models before instantiation and these could be shared across instances. This sharing could only happen, however, if the models are "named."
Spec changes
To support this in the specification, one would need the ability to load a model using only a name and (possibly) the ability to a load a model from bytes and name it. This way there could be some symmetry between the host and guest functionality. I think this could be supported by adding the following functions:
// Like the `load` function, but the host would retain a mapping from `name` to the `graph`.
load_named: func(builder: graph-builder-array, encoding: graph-encoding, target: execution-target, name: string) -> expected<graph, error>
// Retrieve the loaded `graph` for the given `name`; this could be pre-loaded prior to instantiation or loaded by `load_named`.
get_named: func(name: string) -> expected<graph, error>Obviously the ability to load a "named model" for all instances running in a host is up for debate: perhaps the available scope of that name should only be the Wasm instance itself or some host-specified neighborhood. I included the most controversial version, global scope, to see what people think. I also think the host may want to implement some way to limit the resources consumed by wasi-nn; this is a host implementation concern, discussed below.
Host engine changes
Though this repository is the spec repository and is primarily concerned with the Wasm-visible API, I think it would be valuable to discuss what changes this might imply for an engine implementing wasi-nn. Here are some suggestions:
- 
The engine might want to limit the resources available to a wasi-nn-using module: this could take the form of limiting the number of models loaded via loadorload_named, limiting the size of the models loaded (somehow), etc. One could imagine a flag like--nn-max-modelsto do something like this. (I would also think it would be great to have a generic way to limit any WASI API if anyone has thoughts on that).
- 
The engine would likely want a way to preload some models to avoid load-ing them repeatedly in new Wasm instances. One could imagine a flag like--nn-preload <name>:<encoding>:<path>to tell the engine both the name of the model and how to load it. All modules instantiated by that engine would have the models available for retrieval withget_named.