[Core] Rework dtype resolution #18751

DarkLight1337 · 2025-05-27T06:53:10Z

Key changes:

float32 models will be downcasted to whatever dtype is prioritized by the platform (defined by the first item in current_platform.supported_dtypes). This means that models may be downcasted to bfloat16 instead of float16.
Attempt to detect safetensors dtype from HF Hub repo if the dtype is not specified in the configuration file. This avoids accidentally casting float16 weights to bfloat16 if the dtype is not inside the config on HF Hub.
Exclude some models (gemma2, gemma3, plamo2, glm4) from being downcasted to float16 because of reported numerical stability issues.
Clean up some related log messages

Related to #17123

Signed-off-by: DarkLight1337 <[email protected]>

github-actions · 2025-05-27T06:53:19Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

mergify · 2025-05-28T12:47:57Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Isotr0py

Overall LGTM, thanks for the rework!

vllm/config.py

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-05-30T15:41:49Z

@noooop it looks like the test fails for both float32 (https://buildkite.com/vllm/ci/builds/21117/steps?jid=019720de-8721-4243-b969-ed37a0c12143) and float16 (https://buildkite.com/vllm/ci/builds/21121/steps?jid=0197215d-dc0c-4177-b921-7e0120cc9ca6)... any idea why?

Isotr0py · 2025-05-30T16:26:29Z

Hmm, I tried the tests pipeline with this PR and it can pass locally:

pytest -s -v tests/models/language/pooling/test_snowflake_arctic_embed.py -k test_embed_models_mteb -m 'not core_model'

VLLM: 0.6811808101978006
SentenceTransformers: 0.6811455579280281
Difference: -3.525226977252327e-05
PASSED

CI outputs:

[2025-05-30T12:18:42Z] VLLM: torch.float32 0.6811515659720713
[2025-05-30T12:18:42Z] SentenceTransformer: torch.float32 0.6806553028675326
[2025-05-30T12:18:42Z] Difference: -0.0004962631045386523
[2025-05-30T12:18:43Z] FAILED

Seems the degradation is from sentence-transformers side again... ☹️

noooop · 2025-05-31T04:06:35Z

torch.float16 0.6811108652277692
torch.bfloat16 0.6803641333652566

This result is closer to bfloat16, so I think the model was converted to bfloat16 somewhere.

I don't know why SentenceTransformer's model_kwargs={"torch_dtype": dtype} is not working, still use torch.float32.

but torch.set_default_dtype(dtype) does.

import mteb
import torch
from sentence_transformers import SentenceTransformer

MTEB_EMBED_TASKS = ["STS12"]

model_name = "Snowflake/snowflake-arctic-embed-m-long"


def run_mteb_embed_task(encoder, tasks):
    tasks = mteb.get_tasks(tasks=tasks)
    evaluation = mteb.MTEB(tasks=tasks)
    results = evaluation.run(encoder, verbosity=0, output_folder=None)

    main_score = results[0].scores["test"][0]["main_score"]
    return main_score


def run_mteb_embed_task_st(model_name, tasks, dtype):
    #torch.set_default_dtype(dtype)
    model = SentenceTransformer(model_name,
                                trust_remote_code=True,
                                model_kwargs={"torch_dtype": dtype})


    model_dtype = next(model.parameters()).dtype
    st_main_score = run_mteb_embed_task(model, tasks)

    print(model_dtype, st_main_score)


run_mteb_embed_task_st(model_name, MTEB_EMBED_TASKS, torch.float16)
run_mteb_embed_task_st(model_name, MTEB_EMBED_TASKS, torch.bfloat16)

This means that models may be downcasted to bfloat16 instead of float16.

I don't think using bfloat16 is a good idea for embedding models.

noooop · 2025-05-31T05:39:38Z

I can reproduce this error if there is somewhere torch.set_default_dtype(torch.bfloat16).

import pytest
import torch
from tests.models.language.pooling.mteb_utils import mteb_test_embed_models
from tests.models.utils import EmbedModelInfo

MODELS = [
    EmbedModelInfo("Snowflake/snowflake-arctic-embed-m-long",
                   is_matryoshka=False,
                   architecture="NomicBertModel",
                   dtype="half",
                   enable_test=True),
]

torch.set_default_dtype(torch.bfloat16)

@pytest.mark.parametrize("model_info", MODELS)
def test_embed_models_mteb(hf_runner, vllm_runner,
                           model_info: EmbedModelInfo) -> None:
    mteb_test_embed_models(hf_runner, vllm_runner, model_info)

Placing with hf_runner below with set_default_torch_dtype(vllm_dtype) can fix it.

    with set_default_torch_dtype(vllm_dtype):
        with hf_runner(
            model_info.name, is_sentence_transformer=True,
            dtype=vllm_dtype) as hf_model:

            if hf_model_callback is not None:
                hf_model_callback(hf_model)

            st_main_score = run_mteb_embed_task(hf_model, MTEB_EMBED_TASKS)

            st_dtype = next(hf_model.model.parameters()).dtype

    default_dtype = torch.get_default_dtype()
    print("default_dtype: ", default_dtype)
    print("VLLM:", vllm_dtype, vllm_main_score)
    print("SentenceTransformers:", st_dtype, st_main_score)
    print("Difference:", st_main_score - vllm_main_score)

    assert st_main_score == pytest.approx(vllm_main_score, abs=MTEB_EMBED_TOL)

you may need

# ruff: noqa: SIM117

Using SentenceTransformers and VLLM with the same dtype might miss discovering potential issues.

DarkLight1337 · 2025-05-31T05:53:54Z

I see, let's default to float16 for embedding models and later change it to float32 in your PR then

noooop · 2025-05-31T05:54:19Z

Now there is a serious problem.

I don't know why SentenceTransformer's model_kwargs={"torch_dtype": dtype} is not working.

but torch.set_default_dtype(dtype) does.

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: amit <[email protected]>

[Core] Refactor dtype resolution

f547f71

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from WoosukKwon and youkaichao May 27, 2025 06:53

DarkLight1337 added 2 commits May 27, 2025 07:01

Add noqa

185db65

Signed-off-by: DarkLight1337 <[email protected]>

Try to read dtype of safetensors weights

efd2d51

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested a review from ywang96 as a code owner May 27, 2025 13:43

mergify bot added the multi-modality Related to multi-modality (#4194) label May 27, 2025

DarkLight1337 added 2 commits May 27, 2025 14:13

Update

8300ff3

Signed-off-by: DarkLight1337 <[email protected]>

Update

960255e

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 changed the title ~~[Core] Refactor dtype resolution~~ [Core] Rework dtype resolution May 27, 2025

mergify bot added the needs-rebase label May 28, 2025

Isotr0py approved these changes May 30, 2025

View reviewed changes

vllm/config.py Show resolved Hide resolved

DarkLight1337 added 2 commits May 30, 2025 05:21

Also include gemma2

ee0f1a6

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into refactor-dtype-resolution

e84179f

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label May 30, 2025

mergify bot removed the needs-rebase label May 30, 2025

DarkLight1337 added 6 commits May 30, 2025 07:55

Fix tests

e00f1aa

Signed-off-by: DarkLight1337 <[email protected]>

Fix

721b427

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into refactor-dtype-resolution

8094419

Fix

447bea2

Signed-off-by: DarkLight1337 <[email protected]>

Fix test

82a8853

Signed-off-by: DarkLight1337 <[email protected]>

Use the exact same dtype for both impls

bfa796d

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added 4 commits May 31, 2025 05:59

Revert changes to pooling tests

49119fc

Signed-off-by: DarkLight1337 <[email protected]>

Don't downcast dtype for pooling models

a9bfe1c

Signed-off-by: DarkLight1337 <[email protected]>

Fix

7e5ffeb

Signed-off-by: DarkLight1337 <[email protected]>

Fix

48c5c2b

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 merged commit 6aa8f9a into vllm-project:main Jun 1, 2025
63 checks passed

DarkLight1337 deleted the refactor-dtype-resolution branch June 1, 2025 03:04

amitm02 pushed a commit to amitm02/vllm that referenced this pull request Jun 1, 2025

[Core] Rework dtype resolution (vllm-project#18751)

a635cbc

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: amit <[email protected]>

amitm02 pushed a commit to amitm02/vllm that referenced this pull request Jun 1, 2025

[Core] Rework dtype resolution (vllm-project#18751)

5087dcc

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: amit <[email protected]>

eicherseiji mentioned this pull request Jun 6, 2025

[Bugfix] Allow CUDA_VISIBLE_DEVICES='' in Platform.device_id_to_physical_device_id #18979

Merged

houseroad mentioned this pull request Jun 14, 2025

[Bugfix][1/n] Fix the speculative decoding test by setting the target dtype #19633

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Rework dtype resolution #18751

[Core] Rework dtype resolution #18751

Uh oh!

DarkLight1337 commented May 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

mergify bot commented May 28, 2025

Uh oh!

Isotr0py left a comment

Uh oh!

Uh oh!

DarkLight1337 commented May 30, 2025

Uh oh!

Isotr0py commented May 30, 2025

Uh oh!

noooop commented May 31, 2025 •

edited

Loading

Uh oh!

noooop commented May 31, 2025 •

edited

Loading

Uh oh!

DarkLight1337 commented May 31, 2025

Uh oh!

noooop commented May 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Core] Rework dtype resolution #18751

[Core] Rework dtype resolution #18751

Uh oh!

Conversation

DarkLight1337 commented May 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

mergify bot commented May 28, 2025

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented May 30, 2025

Uh oh!

Isotr0py commented May 30, 2025

Uh oh!

noooop commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented May 31, 2025

Uh oh!

noooop commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DarkLight1337 commented May 27, 2025 •

edited by github-actions bot

Loading

noooop commented May 31, 2025 •

edited

Loading

noooop commented May 31, 2025 •

edited

Loading

noooop commented May 31, 2025 •

edited

Loading