Skip to content

Conversation

@CharlieFRuan
Copy link
Member

Prior to this PR, when users createEngine() or call reload() with a model that is too large for the device, likely the device would keep generating, ignoring OOM issue and correctness. See #356 and #209.

This PR catches such error with device.lost.then(), depending on tvmjs to call device.destroy() upon detecting error in createBuffer() via apache/tvm#17005.

We have only observed createBuffer() errors and hence will only process such kind of errors for now. Besides, since most OOM errors occur in reload(), we make the error handling synchronous despite using .then() by throwing the error at the end of reload() if there is one.

@CharlieFRuan
Copy link
Member Author

Example of trying to allocate a KV cache with 900k context length (should be similar for trying to load a model that is too large):
Screenshot 2024-05-17 at 2 25 06 AM

@CharlieFRuan CharlieFRuan marked this pull request as draft May 17, 2024 09:42
@CharlieFRuan
Copy link
Member Author

Marked as a draft for now as it depends on apache/tvm#17005

@CharlieFRuan CharlieFRuan marked this pull request as ready for review May 21, 2024 20:25
@CharlieFRuan CharlieFRuan merged commit b762bf4 into mlc-ai:main May 21, 2024
CharlieFRuan added a commit that referenced this pull request May 21, 2024
### Changes
Main changes include:
- New model `Hermes-2-Pro-Mistral-7B` in `prebuiltAppConfig` via:
  - #390
- Various `index.js` and `index.js.map` post-processings to resolve
frontend compatibility issues with `require()` and `perf_hoooks`
  - #397
  - #406
- Catch WebGPU OOM error upon `reload()` and `CreateEngine()`:
  - #402
- Service Worker support (in addition to Extension Service Worker):
  - #395
  - #400
  - #401

### WASM Version
v0_2_34 as no change is required.

### TVMjs
TVMjs compiled at
apache/tvm@a5862a5,
with only one change in `tvm/web`:
apache/tvm#17005
atebites-hub pushed a commit to atebites-hub/web-llm that referenced this pull request Oct 4, 2025
Prior to this PR, when users `createEngine()` or call `reload()` with a
model that is too large for the device, likely the device would keep
generating, ignoring OOM issue and correctness. See
mlc-ai#356 and
mlc-ai#209.

This PR catches such error with `device.lost.then()`, depending on tvmjs
to call `device.destroy()` upon detecting error in `createBuffer()` via
apache/tvm#17005.

We have only observed `createBuffer()` errors and hence will only
process such kind of errors for now. Besides, since most OOM errors
occur in `reload()`, we make the error handling synchronous despite
using `.then()` by throwing the error at the end of `reload()` if there
is one.
atebites-hub pushed a commit to atebites-hub/web-llm that referenced this pull request Oct 4, 2025
### Changes
Main changes include:
- New model `Hermes-2-Pro-Mistral-7B` in `prebuiltAppConfig` via:
  - mlc-ai#390
- Various `index.js` and `index.js.map` post-processings to resolve
frontend compatibility issues with `require()` and `perf_hoooks`
  - mlc-ai#397
  - mlc-ai#406
- Catch WebGPU OOM error upon `reload()` and `CreateEngine()`:
  - mlc-ai#402
- Service Worker support (in addition to Extension Service Worker):
  - mlc-ai#395
  - mlc-ai#400
  - mlc-ai#401

### WASM Version
v0_2_34 as no change is required.

### TVMjs
TVMjs compiled at
apache/tvm@a5862a5,
with only one change in `tvm/web`:
apache/tvm#17005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant