Skip to content

Conversation

@kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Aug 7, 2024

The reused classes are

  1. CompressionFormat
  2. QuantizationArgs
  3. QuantizationStrategy
  4. QuantizationType

@github-actions
Copy link

github-actions bot commented Aug 7, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@kylesayrs kylesayrs marked this pull request as draft August 7, 2024 17:59
@kylesayrs kylesayrs force-pushed the compressed-tensors-reuse branch from aaa041e to 8960860 Compare August 8, 2024 17:58
@kylesayrs kylesayrs changed the title [Misc] DO NOT MERGE compressed-tensors code reuse [Misc] compressed-tensors code reuse Aug 8, 2024
@kylesayrs kylesayrs marked this pull request as ready for review August 8, 2024 18:32
@dsikka
Copy link
Contributor

dsikka commented Aug 8, 2024

/ready

@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 8, 2024
@dsikka
Copy link
Contributor

dsikka commented Aug 8, 2024

@kylesayrs you're missing

compressed-tensors==0.4.0 # required for compressed-tensors

@kylesayrs kylesayrs requested a review from dsikka August 9, 2024 13:13
@robertgshaw2-redhat
Copy link
Collaborator

Current state looks good so far.

Biggest piece of feedback is that we are still rewriting the logic associated with parsing the config. Specifically, the get_scheme function in compressed-tensors.py has this duplicated code

It will be tricky to fix this (because the vLLM state_dict is not a 1:1 map with the transformers state_dict), so feel free to reach out if you need any pointers.

@dsikka
Copy link
Contributor

dsikka commented Aug 9, 2024

@robertgshaw2-neuralmagic I think updating the get_scheme function is beyond this scope of this PR. I'd like to first land using compressed-tensors without any dependency conflicts. Refactoring get_scheme should be a follow-up

@kylesayrs
Copy link
Contributor Author

These test failures seem unrelated to this PR? The a few seem to be cuda errors and one is complaining about bad llm metrics measurements

@robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-neuralmagic I think updating the get_scheme function is beyond this scope of this PR. I'd like to first land using compressed-tensors without any dependency conflicts. Refactoring get_scheme should be a follow-up

Sounds good.

@kylesayrs im just running this by simon but we should be good to go

@kylesayrs kylesayrs force-pushed the compressed-tensors-reuse branch from 049dc9c to ce29b08 Compare August 13, 2024 18:36
@mgoin mgoin merged commit 373538f into vllm-project:main Aug 13, 2024
@kylesayrs kylesayrs deleted the compressed-tensors-reuse branch August 13, 2024 23:05
kylesayrs added a commit to neuralmagic/vllm that referenced this pull request Aug 14, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants