[Engine] Allow manually aborting reload, fix unexpected deviceLostError #525

Neet-Nestor · 2024-08-04T22:09:40Z

Manually aborting reload

This PR updates the engine reload() and unload() methods to allow users to abort an uncompleted reload() by either:

call unload() any time before reload() completed
call reload() again before the previous reload() completed

Example added in examples/abort-reload.
Console output:

Start to fetch params
get_started.js:16 Fetching param cache[0/108]: 28MB fetched. 0% completed, 1 secs elapsed. It can take a while when we first visit this page to populate the cache. Later refreshes will become faster.
get_started.js:28 calling unload
engine.ts:154 Reload() is aborted. Failed to execute 'add' on 'Cache': Cache.add() was aborted

Related issues:
#484
#499

Note on unload() and unexpected device lost error

Previously, we had an issue where a device lost error is reported when we simply switch a model intentionally (i.e. calling reload()). This is because unload() sets deviceLostIsError back to true immediately after calling this.pipeline.dispose(), which destroys the WebGPU device internally. However, WebGPU is asynchronous and may not finish after dispose() returns. This PR also fixes this issue by making unload() wait until the device is actually destroyed by introducing LLMChatPipeline.sync(). Otherwise, the old device's device-lost callback may still be triggered. See apache/tvm#17250

CharlieFRuan · 2024-08-05T19:01:30Z

Currently blocked by unexpected deviceLostError. Will fix that before merging this PR.

CharlieFRuan · 2024-08-06T20:43:09Z

Depends on apache/tvm#17250

CharlieFRuan

LGTM, thank you!

No breaking changes. The only diff is the following PR: - #525 - This PR updates the engine reload() and unload() methods to allow users to abort an uncompleted reload() by either: - call unload() any time before reload() completed - call reload() again before the previous reload() completed - Besides, it fixes the previous issue where `device lost error` is raised unexpectedly when user simply switches a model ### TVMjs - To support the above PR, TVMjs is updated and compiled at apache/tvm@1fcb620 - Difference: - Device error lost fix: apache/tvm#17250 - Add AbortSignal to fetching APIs: - apache/tvm#17208 - apache/tvm#17227 - apache/tvm#17233

…or (mlc-ai#525) ### Manually aborting reload This PR updates the engine `reload()` and `unload()` methods to allow users to abort an uncompleted `reload()` by either: - call `unload()` any time before `reload()` completed - call `reload()` again before the previous `reload()` completed ### Note on unload() and unexpected device lost error Previously, we had an issue where a device lost error is reported when we simply switch a model intentionally (i.e. calling `reload()`). This is because `unload()` sets `deviceLostIsError` back to true immediately after calling `this.pipeline.dispose()`, which destroys the WebGPU device internally. However, WebGPU is asynchronous and may not finish after `dispose()` returns. This PR also fixes this issue by making `unload()` wait until the device is actually destroyed by introducing `LLMChatPipeline.sync()`. --------- Co-authored-by: Charlie Ruan <[email protected]>

No breaking changes. The only diff is the following PR: - mlc-ai#525 - This PR updates the engine reload() and unload() methods to allow users to abort an uncompleted reload() by either: - call unload() any time before reload() completed - call reload() again before the previous reload() completed - Besides, it fixes the previous issue where `device lost error` is raised unexpectedly when user simply switches a model ### TVMjs - To support the above PR, TVMjs is updated and compiled at apache/tvm@1fcb620 - Difference: - Device error lost fix: apache/tvm#17250 - Add AbortSignal to fetching APIs: - apache/tvm#17208 - apache/tvm#17227 - apache/tvm#17233

…or (mlc-ai#525) ### Manually aborting reload This PR updates the engine `reload()` and `unload()` methods to allow users to abort an uncompleted `reload()` by either: - call `unload()` any time before `reload()` completed - call `reload()` again before the previous `reload()` completed ### Note on unload() and unexpected device lost error Previously, we had an issue where a device lost error is reported when we simply switch a model intentionally (i.e. calling `reload()`). This is because `unload()` sets `deviceLostIsError` back to true immediately after calling `this.pipeline.dispose()`, which destroys the WebGPU device internally. However, WebGPU is asynchronous and may not finish after `dispose()` returns. This PR also fixes this issue by making `unload()` wait until the device is actually destroyed by introducing `LLMChatPipeline.sync()`. --------- Co-authored-by: Charlie Ruan <[email protected]>

No breaking changes. The only diff is the following PR: - mlc-ai#525 - This PR updates the engine reload() and unload() methods to allow users to abort an uncompleted reload() by either: - call unload() any time before reload() completed - call reload() again before the previous reload() completed - Besides, it fixes the previous issue where `device lost error` is raised unexpectedly when user simply switches a model ### TVMjs - To support the above PR, TVMjs is updated and compiled at apache/tvm@1fcb620 - Difference: - Device error lost fix: apache/tvm#17250 - Add AbortSignal to fetching APIs: - apache/tvm#17208 - apache/tvm#17227 - apache/tvm#17233

Neet-Nestor requested a review from CharlieFRuan August 4, 2024 22:09

CharlieFRuan mentioned this pull request Aug 4, 2024

[Tracking][WebLLM] Function calling (beta) and Embeddings #526

Open

7 tasks

Neet-Nestor added 2 commits August 4, 2024 20:56

[Engine] Allow manually aborting reloading

c083598

try catch AbortError

fac8634

Neet-Nestor force-pushed the abort branch from 6b36bca to fac8634 Compare August 5, 2024 00:56

flatsiedatsie mentioned this pull request Aug 6, 2024

Feature request: engine.preload() #529

Open

Make unload wait until device actually destroyed

77e465b

CharlieFRuan changed the title ~~[Engine] Allow manually aborting reloading~~ [Engine] Allow manually aborting reload, fix unexpected deviceLostError Aug 6, 2024

CharlieFRuan merged commit ddac6d1 into main Aug 8, 2024

CharlieFRuan reviewed Aug 8, 2024

View reviewed changes

CharlieFRuan mentioned this pull request Aug 8, 2024

[Version] Bump to version 0.2.54 #530

Merged

TomYeoman mentioned this pull request Mar 14, 2025

How to let the user cancel loading the model and stop it from fetching params #499

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Engine] Allow manually aborting reload, fix unexpected deviceLostError #525

[Engine] Allow manually aborting reload, fix unexpected deviceLostError #525

Uh oh!

Neet-Nestor commented Aug 4, 2024 •

edited by CharlieFRuan

Loading

Uh oh!

CharlieFRuan commented Aug 5, 2024

Uh oh!

CharlieFRuan commented Aug 6, 2024

Uh oh!

CharlieFRuan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Engine] Allow manually aborting reload, fix unexpected deviceLostError #525

[Engine] Allow manually aborting reload, fix unexpected deviceLostError #525

Uh oh!

Conversation

Neet-Nestor commented Aug 4, 2024 • edited by CharlieFRuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Manually aborting reload

Note on unload() and unexpected device lost error

Uh oh!

CharlieFRuan commented Aug 5, 2024

Uh oh!

CharlieFRuan commented Aug 6, 2024

Uh oh!

CharlieFRuan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Neet-Nestor commented Aug 4, 2024 •

edited by CharlieFRuan

Loading