-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
fix: vllm serve
on Apple silicon
#17473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Right now commands like `vllm serve TinyLlama/TinyLlama-1.1B-Chat-v1.0` on Apple silicon fail with triton errors like these. ``` $ vllm serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 INFO 04-30 09:33:49 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING 04-30 09:33:49 [importing.py:28] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernelcompilation. INFO 04-30 09:33:49 [importing.py:53] Triton module has been replaced with a placeholder. INFO 04-30 09:33:50 [__init__.py:239] Automatically detected platform cpu. Traceback (most recent call last): File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/bin/vllm", line 5, in <module> from vllm.entrypoints.cli.main import main File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/entrypoints/cli/main.py", line 7, in <module> import vllm.entrypoints.cli.benchmark.main File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/entrypoints/cli/benchmark/main.py", line 6, in <module> import vllm.entrypoints.cli.benchmark.throughput File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/entrypoints/cli/benchmark/throughput.py", line 4, in <module> from vllm.benchmarks.throughput import add_cli_args, main File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/benchmarks/throughput.py", line 18, in <module> from vllm.benchmarks.datasets import (AIMODataset, BurstGPTDataset, File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/benchmarks/datasets.py", line 34, in <module> from vllm.lora.utils import get_adapter_absolute_path File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/lora/utils.py", line 15, in <module> from vllm.lora.fully_sharded_layers import ( File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/lora/fully_sharded_layers.py", line 14, in <module> from vllm.lora.layers import (ColumnParallelLinearWithLoRA, File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/lora/layers.py", line 29, in <module> from vllm.model_executor.layers.logits_processor import LogitsProcessor File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/model_executor/layers/logits_processor.py", line 13, in <module> from vllm.model_executor.layers.vocab_parallel_embedding import ( File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/model_executor/layers/vocab_parallel_embedding.py", line 139, in <module> @torch.compile(dynamic=True, backend=current_platform.simple_compile_backend) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/__init__.py", line 2543, in fn return compile( ^^^^^^^^ File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/__init__.py", line 2572, in compile return torch._dynamo.optimize( ^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 944, in optimize return _optimize(rebuild_ctx, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 998, in _optimize backend = get_compiler_fn(backend) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 878, in get_compiler_fn from .repro.after_dynamo import wrap_backend_debug File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 35, in <module> from torch._dynamo.debug_utils import ( File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/debug_utils.py", line 44, in <module> from torch._dynamo.testing import rand_strided File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 33, in <module> from torch._dynamo.backends.debugging import aot_eager File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/backends/debugging.py", line 35, in <module> from functorch.compile import min_cut_rematerialization_partition File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/functorch/compile/__init__.py", line 2, in <module> from torch._functorch.aot_autograd import ( File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 26, in <module> from torch._inductor.output_code import OutputCode File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 52, in <module> from .runtime.autotune_cache import AutotuneCacheBundler File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_inductor/runtime/autotune_cache.py", line 23, in <module> from .triton_compat import Config File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_inductor/runtime/triton_compat.py", line 16, in <module> from triton import Config ImportError: cannot import name 'Config' from 'triton' (unknown location) ``` We cannot install `triton` on Apple silicon because there are no [available distributions][1]. This change adds more placeholders for triton modules and classes that are imported when calling `vllm serve`. [1]: https://pypi.org/project/triton/#files Signed-off-by: David Xia <[email protected]>
I feel this is more an inductor issue, shall we just turn off the inductor by default if we detect it's non-GPU? |
cc: @zou3519 thoughts? |
#17317 also works for me on Apple silicon. That PR looks more mature and mentions inductor so maybe this one isn't necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has overlap with #17317. Also, I mentioned this in the other PR, but I don't like inserting a dummy module into sys.modules["triton"] -- this is asking for trouble. What if triton changes, or what if a third-party library (torch) imports triton?
If we need a short-term fix we can figure something out but the IMO the right fix is to stop monkey-patching sys.modules["triton"]
closing in favor of #17317 |
Right now commands like
vllm serve TinyLlama/TinyLlama-1.1B-Chat-v1.0
on Apple silicon fail with triton errors like these.vllm serve
errors onmain
branchWe cannot install
triton
on Apple silicon because there are no available distributions.This change adds more placeholders for triton modules and classes that are imported when calling
vllm serve
.related Slack thread