[Device] Catch WebGPU OOM error #402

CharlieFRuan · 2024-05-17T09:40:51Z

Prior to this PR, when users createEngine() or call reload() with a model that is too large for the device, likely the device would keep generating, ignoring OOM issue and correctness. See #356 and #209.

This PR catches such error with device.lost.then(), depending on tvmjs to call device.destroy() upon detecting error in createBuffer() via apache/tvm#17005.

We have only observed createBuffer() errors and hence will only process such kind of errors for now. Besides, since most OOM errors occur in reload(), we make the error handling synchronous despite using .then() by throwing the error at the end of reload() if there is one.

CharlieFRuan · 2024-05-17T09:42:24Z

Example of trying to allocate a KV cache with 900k context length (should be similar for trying to load a model that is too large):

CharlieFRuan · 2024-05-17T09:42:59Z

Marked as a draft for now as it depends on apache/tvm#17005

### Changes Main changes include: - New model `Hermes-2-Pro-Mistral-7B` in `prebuiltAppConfig` via: - #390 - Various `index.js` and `index.js.map` post-processings to resolve frontend compatibility issues with `require()` and `perf_hoooks` - #397 - #406 - Catch WebGPU OOM error upon `reload()` and `CreateEngine()`: - #402 - Service Worker support (in addition to Extension Service Worker): - #395 - #400 - #401 ### WASM Version v0_2_34 as no change is required. ### TVMjs TVMjs compiled at apache/tvm@a5862a5, with only one change in `tvm/web`: apache/tvm#17005

Prior to this PR, when users `createEngine()` or call `reload()` with a model that is too large for the device, likely the device would keep generating, ignoring OOM issue and correctness. See mlc-ai#356 and mlc-ai#209. This PR catches such error with `device.lost.then()`, depending on tvmjs to call `device.destroy()` upon detecting error in `createBuffer()` via apache/tvm#17005. We have only observed `createBuffer()` errors and hence will only process such kind of errors for now. Besides, since most OOM errors occur in `reload()`, we make the error handling synchronous despite using `.then()` by throwing the error at the end of `reload()` if there is one.

### Changes Main changes include: - New model `Hermes-2-Pro-Mistral-7B` in `prebuiltAppConfig` via: - mlc-ai#390 - Various `index.js` and `index.js.map` post-processings to resolve frontend compatibility issues with `require()` and `perf_hoooks` - mlc-ai#397 - mlc-ai#406 - Catch WebGPU OOM error upon `reload()` and `CreateEngine()`: - mlc-ai#402 - Service Worker support (in addition to Extension Service Worker): - mlc-ai#395 - mlc-ai#400 - mlc-ai#401 ### WASM Version v0_2_34 as no change is required. ### TVMjs TVMjs compiled at apache/tvm@a5862a5, with only one change in `tvm/web`: apache/tvm#17005

[Device] Catch WebGPU OOM error

20cbc3e

CharlieFRuan marked this pull request as draft May 17, 2024 09:42

CharlieFRuan marked this pull request as ready for review May 21, 2024 20:25

CharlieFRuan merged commit b762bf4 into mlc-ai:main May 21, 2024

CharlieFRuan mentioned this pull request May 21, 2024

[Version] Bump version to 0.2.36 #407

Merged

This was referenced May 21, 2024

Gibberish output with Llama-2-7b-chat-hf-q4f32_1 #356

Closed

Chat demo does not work on Android because of maxStorageBufferBindingSize #209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Device] Catch WebGPU OOM error #402

[Device] Catch WebGPU OOM error #402

Uh oh!

CharlieFRuan commented May 17, 2024

Uh oh!

CharlieFRuan commented May 17, 2024

Uh oh!

CharlieFRuan commented May 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Device] Catch WebGPU OOM error #402

[Device] Catch WebGPU OOM error #402

Uh oh!

Conversation

CharlieFRuan commented May 17, 2024

Uh oh!

CharlieFRuan commented May 17, 2024

Uh oh!

CharlieFRuan commented May 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant