Skip to content

Commit d571ca0

Browse files
authored
[ci][distributed] add tests for custom allreduce (#5689)
1 parent afed90a commit d571ca0

File tree

2 files changed

+10
-5
lines changed

2 files changed

+10
-5
lines changed

.buildkite/test-pipeline.yaml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,11 @@ steps:
182182
- pip install -r requirements-docs.txt
183183
- SPHINXOPTS=\"-W\" make html
184184

185-
- label: A100 status
185+
- label: Distributed Tests (A100)
186186
gpu: a100
187187
commands:
188-
- nvidia-smi
188+
# NOTE: don't test llama model here, it seems hf implementation is buggy
189+
# see https://github.com/vllm-project/vllm/pull/5689 for details
190+
- pytest -v -s distributed/test_custom_all_reduce.py
191+
- TEST_DIST_MODEL=facebook/opt-125m DISTRIBUTED_EXECUTOR_BACKEND=ray pytest -v -s distributed/test_basic_distributed_correctness.py
192+
- TEST_DIST_MODEL=facebook/opt-125m DISTRIBUTED_EXECUTOR_BACKEND=mp pytest -v -s distributed/test_basic_distributed_correctness.py

tests/distributed/test_custom_all_reduce.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@
1111
from vllm.distributed.parallel_state import (get_tensor_model_parallel_group,
1212
get_tp_group, graph_capture)
1313

14-
from ..utils import (init_test_distributed_environment,
14+
from ..utils import (ensure_model_parallel_initialized,
15+
init_test_distributed_environment,
1516
multi_process_tensor_parallel)
1617

1718
random.seed(42)
@@ -27,8 +28,8 @@ def graph_allreduce(tp_size, pp_size, rank, distributed_init_port):
2728
torch.cuda.set_device(device)
2829
init_test_distributed_environment(tp_size, pp_size, rank,
2930
distributed_init_port)
30-
31-
group = get_tensor_model_parallel_group()
31+
ensure_model_parallel_initialized(tp_size, pp_size)
32+
group = get_tensor_model_parallel_group().device_group
3233

3334
# A small all_reduce for warmup.
3435
# this is needed because device communicators might be created lazily

0 commit comments

Comments
 (0)