[Serve] add allocator in Storage as the upstream change #1996

yongwww · 2024-03-21T20:39:47Z

The changes in apache/tvm#16750 modified the signature of the Storage, this pull request updates the caller code in mlc-llm to accommodate the new Storage class signature. Ran into build error w/o the change.

mlc-llm/cpp/serve/model.cc:67:96: error: no matching function for call to ‘tvm::runtime::memory::Storage::Storage(tvm::runtime::memory::Buffer)’
   67 |         memory::Storage(allocator->Alloc(device_host, {prefill_chunk_size_}, DataType::Int(32)));

cc: @MasterJH5574 @vinx13 @tqchen

This PR introduces the IPC memory and customized all-reduce kernel dispatches for tensor parallelism. We add a new compiler flag `--allreduce-strategy`, which supports `"ring"`, `"one-shot"` and `"two-shot"`. The flag defaults to `"ring"`, which means this PR makes no difference if people do not manually change the all-reduce strategy. As of now the IPC-memory-backed customized all-reduce kernels are only available on CUDA. To enable all-reduce strategies other than "ring", here are some example compile commands: ```python python -m mlc_llm compile model/mlc-chat-config.json --device cuda --opt "allreduce-strategy=one-shot" -o model/lib.so python -m mlc_llm compile model/mlc-chat-config.json --device cuda --opt "allreduce-strategy=two-shot" -o model/lib.so ``` Please be aware that, you probably also need to specify other compiler flags, for example, like `--opt "cublas_gemm=1;allreduce-strategy=one-shot"`.

yongwww · 2024-03-21T20:46:23Z

failed to reopen it, will create a new one

MasterJH5574 and others added 2 commits March 20, 2024 19:57

[Serve] add allocator to Storage

07224f9

yongwww closed this Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Serve] add allocator in Storage as the upstream change #1996

[Serve] add allocator in Storage as the upstream change #1996

Uh oh!

yongwww commented Mar 21, 2024

Uh oh!

yongwww commented Mar 21, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Serve] add allocator in Storage as the upstream change #1996

[Serve] add allocator in Storage as the upstream change #1996

Uh oh!

Conversation

yongwww commented Mar 21, 2024

Uh oh!

yongwww commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yongwww commented Mar 21, 2024 •

edited

Loading