feature: add "named models"

Currently the wasi-nn API only allows loading ML models from their byte-serialized format (i.e., using `load`). This can be problematic for several reasons:
- models may be large &mdash; "gigabytes" large
- not all backends provide a way to load models from bytes (e.g., TF expects a certain filesystem layout)
- retrieving and loading a model can be the most expensive part of an inference request &mdash; hosts may want to load a model once and reuse it across multiple Wasm module instantiations

I would like to propose "named models" &mdash; a way of solving these issues. Other WASI proposals, such as wasi-filesystem and wasi-sockets, provide a way of creating pre-instantiation resources that are then available to the Wasm module once instantiated (see, e.g., the `--dir` and `--listenfd` flags on the Wasmtime CLI). If a similar idea were available to wasi-nn, users could specify models before instantiation and these could be shared across instances. This sharing could only happen, however, if the models are "named."

### Spec changes

To support this in the specification, one would need the ability to load a model using only a name and (possibly) the ability to a load a model from bytes and name it. This way there could be some symmetry between the host and guest functionality. I think this could be supported by adding the following functions:

```wit
// Like the `load` function, but the host would retain a mapping from `name` to the `graph`.
load_named: func(builder: graph-builder-array, encoding: graph-encoding, target: execution-target, name: string) -> expected<graph, error>

// Retrieve the loaded `graph` for the given `name`; this could be pre-loaded prior to instantiation or loaded by `load_named`.
get_named: func(name: string) -> expected<graph, error>
```

Obviously the ability to load a "named model" for all instances running in a host is up for debate: perhaps the available scope of that `name` should only be the Wasm instance itself or some host-specified neighborhood. I included the most controversial version, global scope, to see what people think. I also think the host may want to implement some way to limit the resources consumed by `wasi-nn`; this is a host implementation concern, discussed below.

### Host engine changes

Though this repository is the spec repository and is primarily concerned with the Wasm-visible API, I think it would be valuable to discuss what changes this might imply for an engine implementing wasi-nn. Here are some suggestions:

1. The engine might want to limit the resources available to a wasi-nn-using module: this could take the form of limiting the number of models loaded via `load` or `load_named`, limiting the size of the models loaded (somehow), etc. One could imagine a flag like `--nn-max-models` to do something like this. (I would also think it would be great to have a generic way to limit any WASI API if anyone has thoughts on that).

2. The engine would likely want a way to preload some models to avoid `load`-ing them repeatedly in new Wasm instances. One could imagine a flag like `--nn-preload <name>:<encoding>:<path>` to tell the engine both the name of the model and how to load it. All modules instantiated by that engine would have the models available for retrieval with `get_named`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature: add "named models" #36

Spec changes

Host engine changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feature: add "named models" #36

Description

Spec changes

Host engine changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions