fp8 no-absorb enable #14725

LucasWilkinson · 2025-03-13T04:06:43Z

Enable VLLM_MLA_PERFORM_MATRIX_ABSORPTION=0 for fp8 by just up converting to fp16, also switch to using bmm from einsum to make it more obvious the kernels needed / easier to integrate an fp8 bmm (we would need block-scale support for 64x128, or 128x64, I need to work through it)

Based on these calculations (may be bugged): https://docs.google.com/spreadsheets/d/17eoqEbhblvtNsRRlFSjCQnEXZiBxtLgZGKD4IgZUz38/edit?usp=sharing

VLLM_MLA_PERFORM_MATRIX_ABSORPTION=0 should introduce 143% memory overhead
while:
VLLM_MLA_PERFORM_MATRIX_ABSORPTION=1 (default) should introduce 318% memory overhead

we will likely want to make VLLM_MLA_PERFORM_MATRIX_ABSORPTION=0 the default

with VLLM_MLA_PERFORM_MATRIX_ABSORPTION=0:

lm_eval --model local-completions --tasks gsm8k --model_args model=/home/vllm-dev/DeepSeek-R1,base_url=http://127.0.0.1:8000/v1/completions,num_concurrent=5,max_retries=3,tokenized_requests=False --limit 100

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  | 0.97|±  |0.0171|
|     |       |strict-match    |     5|exact_match|↑  | 0.97|±  |0.0171|

VLLM_MLA_PERFORM_MATRIX_ABSORPTION=0 VLLM_USE_V1=1

Data Preview:
  backend  input_tokens  output_tokens  output_toks/s     req/s  median_itl_ms  median_ttft_ms
2    vllm          1000           1000    1269.083834  1.269084      31.285999     2318.670340
1    vllm          5000           1000    1046.350954  1.046351      33.370881     5510.511930
3    vllm         10000           1000     865.023539  0.865024      37.076649     8501.588057
0    vllm         32000           1000     190.611408  0.190611      35.992813   107927.603456

VLLM_MLA_PERFORM_MATRIX_ABSORPTION=1 VLLM_USE_V1=1

Data Preview:
  backend  input_tokens  output_tokens  output_toks/s     req/s  median_itl_ms  median_ttft_ms
2    vllm          1000           1000    1379.393797  1.379394      30.282677     2025.573374
1    vllm          5000           1000    1038.084492  1.038084      33.778368     5517.865633
3    vllm         10000           1000     571.708739  0.571709      36.978330     8523.583977
0    vllm         32000           1000     161.668698  0.161669      43.343701   115767.377675

Signed-off-by: Lucas Wilkinson <[email protected]>

github-actions · 2025-03-13T04:06:52Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

LucasWilkinson · 2025-03-24T18:09:09Z

superseded by: #14770

fp8 no-absorb enable

e22e533

Signed-off-by: Lucas Wilkinson <[email protected]>

mergify bot added the v1 label Mar 13, 2025

LucasWilkinson mentioned this pull request Mar 13, 2025

[Attention] MLA get rid of materialization #14770

Merged

robertgshaw2-redhat closed this Mar 24, 2025

robertgshaw2-redhat deleted the lwilkinson/fp8-no-materialize branch March 24, 2025 18:04

robertgshaw2-redhat restored the lwilkinson/fp8-no-materialize branch March 24, 2025 18:06

LucasWilkinson deleted the lwilkinson/fp8-no-materialize branch March 24, 2025 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fp8 no-absorb enable #14725

fp8 no-absorb enable #14725

Uh oh!

LucasWilkinson commented Mar 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 13, 2025

Uh oh!

LucasWilkinson commented Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

fp8 no-absorb enable #14725

fp8 no-absorb enable #14725

Uh oh!

Conversation

LucasWilkinson commented Mar 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2025

Uh oh!

LucasWilkinson commented Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LucasWilkinson commented Mar 13, 2025 •

edited by github-actions bot

Loading