Skip to content

Conversation

@liangfu
Copy link
Contributor

@liangfu liangfu commented Dec 12, 2023

This PR adds an option that setup vLLM to build with Neuron toolchain (include neuronx-cc and transformers-neuronx).

This would help us build

vllm-0.2.7+neuron212

, where the neuron version comes out of the compiler version (neuronx-cc 2.12).

This is part of the effort to add support to accelerate LLM inference with Trainium/Inferentia (see #1866) .

@WoosukKwon WoosukKwon added the aws-neuron Related to AWS Inferentia & Trainium label Dec 13, 2023
@WoosukKwon WoosukKwon self-requested a review December 21, 2023 06:27
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @liangfu, apologies for the late review and thanks for the PR! I like this PR in that you didn't submit a big PR at once but instead split it into small parts. :)

Overall, I think moving the import statements is not a good idea. Considering the architecture you showed last time, I think we can just skip loading the modules that try to import custom ops. WDYT?

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liangfu Apologies for the late review and thanks for addressing my comments! Left some very minor comments on styles. Looking forward to the next PRs!

liangfu and others added 2 commits January 17, 2024 09:59
@WoosukKwon WoosukKwon merged commit 18473cf into vllm-project:main Jan 18, 2024
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Jan 18, 2024
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aws-neuron Related to AWS Inferentia & Trainium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants