-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
We noticed that in the _get_logits function of vllm, gather instead of all_gather will be used under certain conditions (the main condition is that for non-tpu devices):
Code link:
The change from using all_gather to gather is initially added in this PR for your reference: vllm-project/vllm#2221.
While in SGLang, we see currently all_gather is always used:
| logits = tensor_model_parallel_all_gather(logits) |
Does SGLang have the plan to add gather instead of only all_gather when gathering the logits? Per the practice in vllm, using gather seems to have better performance than all_gather on devices which have gather support.
Related resources
No response
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed