Skip to content

Conversation

@CharlieFRuan
Copy link
Member

Prior to this PR, WebGPU errors such as OOM are only logged as a warning without affecting the program. This PR handles WebGPU error using pushErrorScope() and popErrorScope() following https://github.com/gpuweb/gpuweb/blob/main/design/ErrorHandling.md.

We replace createBuffer() with tryCreateBuffer(), in which we catch all three types of errors. For now, we treat any error occurred in createBuffer() fatal and hence do device.destroy(). When a device is initiated, we use device.lost.then() to listen to the event of device.destroy(), upon which we log the error and call Instance.dispose(), prompting the user to re-initialize.

See mlc-ai/web-llm#356 for motivation.

Tested end-to-end with WebLLM.

@tqchen tqchen merged commit afb6416 into apache:main May 17, 2024
CharlieFRuan added a commit to mlc-ai/web-llm that referenced this pull request May 21, 2024
Prior to this PR, when users `createEngine()` or call `reload()` with a
model that is too large for the device, likely the device would keep
generating, ignoring OOM issue and correctness. See
#356 and
#209.

This PR catches such error with `device.lost.then()`, depending on tvmjs
to call `device.destroy()` upon detecting error in `createBuffer()` via
apache/tvm#17005.

We have only observed `createBuffer()` errors and hence will only
process such kind of errors for now. Besides, since most OOM errors
occur in `reload()`, we make the error handling synchronous despite
using `.then()` by throwing the error at the end of `reload()` if there
is one.
CharlieFRuan added a commit to mlc-ai/web-llm that referenced this pull request May 21, 2024
### Changes
Main changes include:
- New model `Hermes-2-Pro-Mistral-7B` in `prebuiltAppConfig` via:
  - #390
- Various `index.js` and `index.js.map` post-processings to resolve
frontend compatibility issues with `require()` and `perf_hoooks`
  - #397
  - #406
- Catch WebGPU OOM error upon `reload()` and `CreateEngine()`:
  - #402
- Service Worker support (in addition to Extension Service Worker):
  - #395
  - #400
  - #401

### WASM Version
v0_2_34 as no change is required.

### TVMjs
TVMjs compiled at
apache/tvm@a5862a5,
with only one change in `tvm/web`:
apache/tvm#17005
atebites-hub pushed a commit to atebites-hub/web-llm that referenced this pull request Oct 4, 2025
Prior to this PR, when users `createEngine()` or call `reload()` with a
model that is too large for the device, likely the device would keep
generating, ignoring OOM issue and correctness. See
mlc-ai#356 and
mlc-ai#209.

This PR catches such error with `device.lost.then()`, depending on tvmjs
to call `device.destroy()` upon detecting error in `createBuffer()` via
apache/tvm#17005.

We have only observed `createBuffer()` errors and hence will only
process such kind of errors for now. Besides, since most OOM errors
occur in `reload()`, we make the error handling synchronous despite
using `.then()` by throwing the error at the end of `reload()` if there
is one.
atebites-hub pushed a commit to atebites-hub/web-llm that referenced this pull request Oct 4, 2025
### Changes
Main changes include:
- New model `Hermes-2-Pro-Mistral-7B` in `prebuiltAppConfig` via:
  - mlc-ai#390
- Various `index.js` and `index.js.map` post-processings to resolve
frontend compatibility issues with `require()` and `perf_hoooks`
  - mlc-ai#397
  - mlc-ai#406
- Catch WebGPU OOM error upon `reload()` and `CreateEngine()`:
  - mlc-ai#402
- Service Worker support (in addition to Extension Service Worker):
  - mlc-ai#395
  - mlc-ai#400
  - mlc-ai#401

### WASM Version
v0_2_34 as no change is required.

### TVMjs
TVMjs compiled at
apache/tvm@a5862a5,
with only one change in `tvm/web`:
apache/tvm#17005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants