Skip to content

Conversation

davidxia
Copy link
Contributor

@davidxia davidxia commented Apr 30, 2025

Right now commands like vllm serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 on Apple silicon fail with triton errors like these.

vllm serve errors on main branch
$ vllm serve TinyLlama/TinyLlama-1.1B-Chat-v1.0
INFO 04-30 09:33:49 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-30 09:33:49 [importing.py:28] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernelcompilation.
INFO 04-30 09:33:49 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 04-30 09:33:50 [__init__.py:239] Automatically detected platform cpu.
Traceback (most recent call last):
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/bin/vllm", line 5, in <module>
    from vllm.entrypoints.cli.main import main
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/entrypoints/cli/main.py", line 7, in <module>
    import vllm.entrypoints.cli.benchmark.main
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/entrypoints/cli/benchmark/main.py", line 6, in <module>
    import vllm.entrypoints.cli.benchmark.throughput
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/entrypoints/cli/benchmark/throughput.py", line 4, in <module>
    from vllm.benchmarks.throughput import add_cli_args, main
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/benchmarks/throughput.py", line 18, in <module>
    from vllm.benchmarks.datasets import (AIMODataset, BurstGPTDataset,
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/benchmarks/datasets.py", line 34, in <module>
    from vllm.lora.utils import get_adapter_absolute_path
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/lora/utils.py", line 15, in <module>
    from vllm.lora.fully_sharded_layers import (
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/lora/fully_sharded_layers.py", line 14, in <module>
    from vllm.lora.layers import (ColumnParallelLinearWithLoRA,
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/lora/layers.py", line 29, in <module>
    from vllm.model_executor.layers.logits_processor import LogitsProcessor
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/model_executor/layers/logits_processor.py", line 13, in <module>
    from vllm.model_executor.layers.vocab_parallel_embedding import (
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/model_executor/layers/vocab_parallel_embedding.py", line 139, in <module>
    @torch.compile(dynamic=True, backend=current_platform.simple_compile_backend)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/__init__.py", line 2543, in fn
    return compile(
           ^^^^^^^^
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/__init__.py", line 2572, in compile
    return torch._dynamo.optimize(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 944, in optimize
    return _optimize(rebuild_ctx, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 998, in _optimize
    backend = get_compiler_fn(backend)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 878, in get_compiler_fn
    from .repro.after_dynamo import wrap_backend_debug
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 35, in <module>
    from torch._dynamo.debug_utils import (
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/debug_utils.py", line 44, in <module>
    from torch._dynamo.testing import rand_strided
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 33, in <module>
    from torch._dynamo.backends.debugging import aot_eager
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/backends/debugging.py", line 35, in <module>
    from functorch.compile import min_cut_rematerialization_partition
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/functorch/compile/__init__.py", line 2, in <module>
    from torch._functorch.aot_autograd import (
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 26, in <module>
    from torch._inductor.output_code import OutputCode
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 52, in <module>
    from .runtime.autotune_cache import AutotuneCacheBundler
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_inductor/runtime/autotune_cache.py", line 23, in <module>
    from .triton_compat import Config
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_inductor/runtime/triton_compat.py", line 16, in <module>
    from triton import Config
ImportError: cannot import name 'Config' from 'triton' (unknown location)

We cannot install triton on Apple silicon because there are no available distributions.

This change adds more placeholders for triton modules and classes that are imported when calling vllm serve.

related Slack thread

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@davidxia davidxia marked this pull request as ready for review April 30, 2025 14:04
Right now commands like `vllm serve TinyLlama/TinyLlama-1.1B-Chat-v1.0` on
Apple silicon fail with triton errors like these.

```
$ vllm serve TinyLlama/TinyLlama-1.1B-Chat-v1.0
INFO 04-30 09:33:49 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-30 09:33:49 [importing.py:28] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernelcompilation.
INFO 04-30 09:33:49 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 04-30 09:33:50 [__init__.py:239] Automatically detected platform cpu.
Traceback (most recent call last):
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/bin/vllm", line 5, in <module>
    from vllm.entrypoints.cli.main import main
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/entrypoints/cli/main.py", line 7, in <module>
    import vllm.entrypoints.cli.benchmark.main
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/entrypoints/cli/benchmark/main.py", line 6, in <module>
    import vllm.entrypoints.cli.benchmark.throughput
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/entrypoints/cli/benchmark/throughput.py", line 4, in <module>
    from vllm.benchmarks.throughput import add_cli_args, main
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/benchmarks/throughput.py", line 18, in <module>
    from vllm.benchmarks.datasets import (AIMODataset, BurstGPTDataset,
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/benchmarks/datasets.py", line 34, in <module>
    from vllm.lora.utils import get_adapter_absolute_path
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/lora/utils.py", line 15, in <module>
    from vllm.lora.fully_sharded_layers import (
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/lora/fully_sharded_layers.py", line 14, in <module>
    from vllm.lora.layers import (ColumnParallelLinearWithLoRA,
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/lora/layers.py", line 29, in <module>
    from vllm.model_executor.layers.logits_processor import LogitsProcessor
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/model_executor/layers/logits_processor.py", line 13, in <module>
    from vllm.model_executor.layers.vocab_parallel_embedding import (
  File "/Users/dxia/src/github.com/vllm-project/vllm/vllm/model_executor/layers/vocab_parallel_embedding.py", line 139, in <module>
    @torch.compile(dynamic=True, backend=current_platform.simple_compile_backend)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/__init__.py", line 2543, in fn
    return compile(
           ^^^^^^^^
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/__init__.py", line 2572, in compile
    return torch._dynamo.optimize(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 944, in optimize
    return _optimize(rebuild_ctx, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 998, in _optimize
    backend = get_compiler_fn(backend)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 878, in get_compiler_fn
    from .repro.after_dynamo import wrap_backend_debug
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 35, in <module>
    from torch._dynamo.debug_utils import (
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/debug_utils.py", line 44, in <module>
    from torch._dynamo.testing import rand_strided
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 33, in <module>
    from torch._dynamo.backends.debugging import aot_eager
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/backends/debugging.py", line 35, in <module>
    from functorch.compile import min_cut_rematerialization_partition
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/functorch/compile/__init__.py", line 2, in <module>
    from torch._functorch.aot_autograd import (
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 26, in <module>
    from torch._inductor.output_code import OutputCode
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 52, in <module>
    from .runtime.autotune_cache import AutotuneCacheBundler
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_inductor/runtime/autotune_cache.py", line 23, in <module>
    from .triton_compat import Config
  File "/Users/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_inductor/runtime/triton_compat.py", line 16, in <module>
    from triton import Config
ImportError: cannot import name 'Config' from 'triton' (unknown location)
```

We cannot install `triton` on Apple silicon because there are no [available
distributions][1].

This change adds more placeholders for triton modules and classes that are
imported when calling `vllm serve`.

[1]: https://pypi.org/project/triton/#files

Signed-off-by: David Xia <[email protected]>
@houseroad
Copy link
Collaborator

I feel this is more an inductor issue, shall we just turn off the inductor by default if we detect it's non-GPU?

@houseroad
Copy link
Collaborator

cc: @zou3519 thoughts?

@davidxia
Copy link
Contributor Author

davidxia commented May 1, 2025

#17317 also works for me on Apple silicon. That PR looks more mature and mentions inductor so maybe this one isn't necessary?

Copy link
Collaborator

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has overlap with #17317. Also, I mentioned this in the other PR, but I don't like inserting a dummy module into sys.modules["triton"] -- this is asking for trouble. What if triton changes, or what if a third-party library (torch) imports triton?

If we need a short-term fix we can figure something out but the IMO the right fix is to stop monkey-patching sys.modules["triton"]

@davidxia
Copy link
Contributor Author

davidxia commented May 1, 2025

closing in favor of #17317

@davidxia davidxia closed this May 1, 2025
@davidxia davidxia deleted the patch12 branch May 1, 2025 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants