[doc][distributed] add suggestion for distributed inference #6418

youkaichao · 2024-07-13T18:51:26Z

No description provided.

github-actions · 2024-07-13T18:51:37Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only trigger fastcheck CI to run, which consists only a small and essential subset of tests to quickly catch errors with the flexibility to run extra individual tests on top (you can do this by unblocking test steps in the Buildkite run).

Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well.

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

simon-mo · 2024-07-13T18:58:11Z

If users want to increase the throughput and lower the latency of a small model, they should also use TP

youkaichao · 2024-07-13T19:11:09Z

If users want to increase the throughput and lower the latency of a small model, they should also use TP

true (mainly for increasing throughput I think). added in 0d99585

…ject#6418)

…ject#6418) Signed-off-by: Alvant <[email protected]>

…ject#6418) Signed-off-by: LeiWang1999 <[email protected]>

add suggestion

125643d

youkaichao requested a review from zhuohan123 July 13, 2024 18:51

add tips for increasing tp for throughput

0d99585

youkaichao merged commit 94b82e8 into vllm-project:main Jul 15, 2024

youkaichao deleted the dist_suggest branch July 15, 2024 16:46

youkaichao mentioned this pull request Jul 15, 2024

[WIP][Core] Support tensor parallel division with remainder of attention heads #5367

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[doc][distributed] add suggestion for distributed inference (vllm-pro…

b3ebd19

…ject#6418)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[doc][distributed] add suggestion for distributed inference (vllm-pro…

53ee276

…ject#6418) Signed-off-by: Alvant <[email protected]>

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[doc][distributed] add suggestion for distributed inference (vllm-pro…

81d86bb

…ject#6418) Signed-off-by: LeiWang1999 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[doc][distributed] add suggestion for distributed inference #6418

[doc][distributed] add suggestion for distributed inference #6418

Uh oh!

youkaichao commented Jul 13, 2024

Uh oh!

github-actions bot commented Jul 13, 2024

Uh oh!

simon-mo commented Jul 13, 2024

Uh oh!

youkaichao commented Jul 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[doc][distributed] add suggestion for distributed inference #6418

[doc][distributed] add suggestion for distributed inference #6418

Uh oh!

Conversation

youkaichao commented Jul 13, 2024

Uh oh!

github-actions bot commented Jul 13, 2024

Uh oh!

simon-mo commented Jul 13, 2024

Uh oh!

youkaichao commented Jul 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants