[Core] Cast multimodal input in hf processor #18862

lgeiger · 2025-05-28T22:52:17Z

This is a follow up to #18756 and instead directly does the multimodal input casting as part of the huggingface preprocessing.
I think this is a bit cleaner and has two advantages: It moves the blocking casting/copy off the main thread and now does the conversion before serialisation which also reduces the amount of data that needs to be serialised and de-serialised. @DarkLight1337 Let me know if you see any disadvantages of doing this.

Before:

After:

I've done some quick benchmark and for some models I'm seeing very small improvement in throughput (0.5%-0.7%).

github-actions · 2025-05-28T22:52:26Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/inputs/registry.py

DarkLight1337 · 2025-05-29T02:30:45Z

Overall this approach does make more sense than what I originally did in #18756, thanks!

DarkLight1337 · 2025-05-29T16:34:22Z

Can you fix the failing tests? Looks like you need to import torch for real inside the function

vllm/inputs/registry.py

lgeiger · 2025-05-29T16:49:40Z

Can you fix the failing tests? Looks like you need to import torch for real inside the function

Ah sorry, should be fixed in 02924492824112f2f6d43f247ba70c91059d9989. Looks like the type annotation needs to be a string to support older versions of python.

lgeiger · 2025-05-29T18:51:46Z

Looks like not all items in the dict are tensors which breaks CI. 9d0d47c58009ef4dd646daa8a2955280406f65f7 should fix that.

DarkLight1337 · 2025-05-30T08:53:07Z

Looks like V1 test is hanging in this PR, can you investigate it?

DarkLight1337 · 2025-05-30T16:08:42Z

cc @njhill any idea?

njhill · 2025-05-30T20:13:30Z

The tokenizers warning is a red herring and shouldn't be an issue. I don't think we should change the mp method in the test to workaround.

If you can repro locally, you could check where things may be stuck by running with env var PYTHONFAULTHANDLER=1 and then sending a SIGABRT to the front-end and back-end procs, which will dump all of the thread stacks.

lgeiger · 2025-05-31T01:02:36Z

If you can repro locally, you could check where things may be stuck by running with env var PYTHONFAULTHANDLER=1 and then sending a SIGABRT to the front-end and back-end procs, which will dump all of the thread stacks.

Here are the dumps from both processes:

Current thread 0x00007b3bc3ce3080 (most recent call first):
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/video_processing_utils.py", line 240 in _prepare_input_videos
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/video_processing_utils.py", line 263 in preprocess
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/video_processing_utils.py", line 197 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/models/qwen2_vl/processing_qwen2_vl.py", line 146 in __call__
  File "/home/ubuntu/vllm/vllm/inputs/registry.py", line 162 in call_hf_processor
  File "/home/ubuntu/vllm/vllm/model_executor/models/qwen2_vl.py", line 1018 in _call_hf_processor
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1290 in _apply_hf_processor_text_mm
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1360 in _apply_hf_processor_mm_only
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1399 in _apply_hf_processor_main
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1552 in _cached_apply_hf_processor
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1786 in apply
  File "/home/ubuntu/vllm/vllm/multimodal/profiling.py", line 168 in _get_dummy_mm_inputs
  File "/home/ubuntu/vllm/vllm/multimodal/profiling.py", line 255 in get_mm_max_tokens
  File "/home/ubuntu/vllm/vllm/multimodal/registry.py", line 131 in get_max_tokens_per_item_by_modality
  File "/home/ubuntu/vllm/vllm/multimodal/registry.py", line 157 in get_max_tokens_per_item_by_nonzero_modality
  File "/home/ubuntu/vllm/vllm/v1/core/encoder_cache_manager.py", line 124 in _compute_encoder_budget_multimodal
  File "/home/ubuntu/vllm/vllm/v1/core/encoder_cache_manager.py", line 94 in compute_encoder_budget
  File "/home/ubuntu/vllm/vllm/v1/worker/gpu_model_runner.py", line 127 in __init__
  File "/home/ubuntu/vllm/vllm/v1/worker/gpu_worker.py", line 144 in init_device
  File "/home/ubuntu/vllm/vllm/worker/worker_base.py", line 604 in init_device
  File "/home/ubuntu/vllm/vllm/utils.py", line 2601 in run_method
  File "/home/ubuntu/vllm/vllm/executor/uniproc_executor.py", line 56 in collective_rpc
  File "/home/ubuntu/vllm/vllm/executor/uniproc_executor.py", line 46 in _init_executor
  File "/home/ubuntu/vllm/vllm/executor/executor_base.py", line 52 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 74 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 398 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 499 in run_engine_core
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 71 in _launch
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 19 in __init__
  File "/usr/lib/python3.12/multiprocessing/context.py", line 282 in _Popen
  File "/usr/lib/python3.12/multiprocessing/process.py", line 121 in start
  File "/home/ubuntu/vllm/vllm/v1/utils.py", line 223 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 461 in _init_engines_direct
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 404 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 693 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 126 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 191 in from_engine_args
  File "/home/ubuntu/vllm/tests/v1/engine/test_async_llm.py", line 95 in test_load
  File "/usr/lib/python3.12/asyncio/events.py", line 88 in _run
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1987 in _run_once
  File "/usr/lib/python3.12/asyncio/base_events.py", line 641 in run_forever
  File "/usr/lib/python3.12/asyncio/base_events.py", line 674 in run_until_complete
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 773 in inner
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 1627 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 508 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 337 in _main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/ubuntu/vllm/.venv/bin/pytest", line 10 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, PIL._imaging, regex._regex, markupsafe._speedups, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, zmq.backend.cython._zmq, PIL._imagingft, msgspec._core, _cffi_backend, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, sentencepiece._sentencepiece, PIL._imagingmath, vllm.cumem_allocator, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box (total: 205)

Thread 0x00007b3aa61fe6c0 (most recent call first):
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/zmq/utils/garbage.py", line 46 in run
  File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007b3bc3ce3080 (most recent call first):
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/zmq/sugar/poll.py", line 106 in poll
  File "/home/ubuntu/vllm/vllm/v1/utils.py", line 311 in wait_for_engine_startup
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 488 in _wait_for_engine_startup
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 473 in _init_engines_direct
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 404 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 693 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 126 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 191 in from_engine_args
  File "/home/ubuntu/vllm/tests/v1/engine/test_async_llm.py", line 95 in test_load
  File "/usr/lib/python3.12/asyncio/events.py", line 88 in _run
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1987 in _run_once
  File "/usr/lib/python3.12/asyncio/base_events.py", line 641 in run_forever
  File "/usr/lib/python3.12/asyncio/base_events.py", line 674 in run_until_complete
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 773 in inner
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 1627 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 508 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 337 in _main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/ubuntu/vllm/.venv/bin/pytest", line 10 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, PIL._imaging, regex._regex, markupsafe._speedups, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, zmq.backend.cython._zmq, PIL._imagingft, msgspec._core, _cffi_backend, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, sentencepiece._sentencepiece, PIL._imagingmath (total: 195)
Aborted (core dumped)

njhill · 2025-05-31T03:45:11Z

Looks like it's stuck in transformers here: huggingface/transformers@a31fa21#diff-d3478155ac25ae1107d16a4464001bfd54770bf12cd9bab881233a1b4d216e3fR240 🤔

lgeiger · 2025-06-01T14:13:15Z

Looks like it's stuck in transformers here: huggingface/transformers@a31fa21#diff-d3478155ac25ae1107d16a4464001bfd54770bf12cd9bab881233a1b4d216e3fR240 🤔

Yes, I also checked with transformers v4.51.3 which is before this change and it still get's stuck

lgeiger · 2025-06-03T17:41:56Z

It seems like not calling BatchFeature.to() resolves the deadlock for me locally (04779eaae27666598452e0e18cb0189c3353d93b). I don't really know why this is the case though, maybe something weird in BatchFeature.to() is going on. Let's see what CI thinks.

Signed-off-by: Lukas Geiger <[email protected]>

lgeiger · 2025-06-03T23:56:26Z

Looks like I'm now running into #16054 on CI. Rebased on to main to re-trigger CI.

DarkLight1337 · 2025-06-04T03:23:32Z

Nice, thanks for looking into the deadlock problem!

vadiklyutiy · 2025-06-18T13:30:02Z

@lgeiger
I noticed that on Qwen2.5-VL in the current maybe_cast_dtype always come <class 'transformers.feature_extraction_utils.BatchFeature'> objs. And actual conversion fp32->bf16 happens in _process_image_input.
Is it as expected?

lgeiger · 2025-06-18T14:45:11Z

@vadiklyutiy I'm not sure I fully understand, could you elaborate? For context, this PR is a follow up to #18756. In #18756 dtype conversion has already been removed from DeepseekVL2 and Gemma3. These are now unnecessary as the multi modal input will already have the correct dtype when passed to _process_image_input. I guess the same is true for Qwen2.5-VL. My guess would be that any dtype conversion in _process_image_input of Qwen2.5-VL can be removed as well. Feel free to verify that this is the case and send a PR cleaning this up

vadiklyutiy · 2025-06-18T14:53:51Z

For Qwen2.5-VL in _process_image_input image come as fp32 and converted to bf16 right before calling self.vision

vadiklyutiy · 2025-06-18T14:56:17Z

I guess it is supposed that image should be converted in maybe_cast_dtype. But now there never tensors come and nothing converted.

lgeiger requested review from DarkLight1337, WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners May 28, 2025 22:52

mergify bot added multi-modality Related to multi-modality (#4194) speculative-decoding v1 tpu Related to Google TPUs labels May 28, 2025

lgeiger commented May 28, 2025

View reviewed changes

vllm/inputs/registry.py Outdated Show resolved Hide resolved

DarkLight1337 added this to Multi-modality Core May 29, 2025

DarkLight1337 moved this to In Progress in Multi-modality Core May 29, 2025

lgeiger force-pushed the non-blocking-casting branch from b6d42a2 to a25576c Compare May 29, 2025 10:43

DarkLight1337 approved these changes May 29, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) May 29, 2025 14:42

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 29, 2025

lgeiger commented May 29, 2025

View reviewed changes

vllm/inputs/registry.py Outdated Show resolved Hide resolved

auto-merge was automatically disabled May 29, 2025 16:46
Head branch was pushed to by a user without write access

lgeiger force-pushed the non-blocking-casting branch from e61670b to 0292449 Compare May 29, 2025 16:47

DarkLight1337 enabled auto-merge (squash) May 29, 2025 16:50

auto-merge was automatically disabled May 29, 2025 18:50
Head branch was pushed to by a user without write access

lgeiger force-pushed the non-blocking-casting branch from 9d0d47c to b48017f Compare May 30, 2025 09:35

lgeiger force-pushed the non-blocking-casting branch from b1344b4 to 5d5c4b4 Compare May 31, 2025 01:42

lgeiger force-pushed the non-blocking-casting branch from 5d5c4b4 to 04779ea Compare June 3, 2025 17:40

lgeiger added 5 commits June 3, 2025 23:55

[Core] Cast multimodal input in hf processor

a54b03e

Signed-off-by: Lukas Geiger <[email protected]>

Support processors that do not return BatchFeature

fdad899

Signed-off-by: Lukas Geiger <[email protected]>

Update vllm/inputs/registry.py

1e2052b

Signed-off-by: Lukas Geiger <[email protected]>

Add check for torch.Tensor before casting

e7483cf

Signed-off-by: Lukas Geiger <[email protected]>

Do not call BatchFeature.to() to fix deadlock

bf31e39

Signed-off-by: Lukas Geiger <[email protected]>

lgeiger force-pushed the non-blocking-casting branch from 04779ea to bf31e39 Compare June 3, 2025 23:55

lgeiger requested a review from DarkLight1337 June 3, 2025 23:56

vllm-bot merged commit 1409ef9 into vllm-project:main Jun 4, 2025
58 of 60 checks passed

github-project-automation bot moved this from In Progress to Done in Multi-modality Core Jun 4, 2025

lgeiger deleted the non-blocking-casting branch June 4, 2025 06:15

lgeiger mentioned this pull request Jun 4, 2025

[Core] Batch multi modal input using pinned memory #19169

Merged

DarkLight1337 mentioned this pull request Jun 9, 2025

[Bugfix] Fix auto dtype casting for BatchFeature #19316

Merged

3 tasks

njhill mentioned this pull request Jun 20, 2025

[CI/Build][Bugfix] Fix deadlock on v1 engine test CI #19872

Merged

4 tasks

Uh oh!

[Core] Cast multimodal input in hf processor #18862

[Core] Cast multimodal input in hf processor #18862

Uh oh!

Conversation

lgeiger commented May 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 28, 2025

Uh oh!

Uh oh!

DarkLight1337 commented May 29, 2025

Uh oh!

DarkLight1337 commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lgeiger commented May 29, 2025

Uh oh!

lgeiger commented May 29, 2025

Uh oh!

DarkLight1337 commented May 30, 2025

Uh oh!

DarkLight1337 commented May 30, 2025

Uh oh!

njhill commented May 30, 2025

Uh oh!

lgeiger commented May 31, 2025

Uh oh!

njhill commented May 31, 2025

Uh oh!

lgeiger commented Jun 1, 2025

Uh oh!

lgeiger commented Jun 3, 2025

Uh oh!

lgeiger commented Jun 3, 2025

Uh oh!

DarkLight1337 commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vadiklyutiy commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgeiger commented Jun 18, 2025

Uh oh!

vadiklyutiy commented Jun 18, 2025

Uh oh!

vadiklyutiy commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lgeiger commented May 28, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented May 29, 2025 •

edited

Loading

DarkLight1337 commented Jun 4, 2025 •

edited

Loading

vadiklyutiy commented Jun 18, 2025 •

edited

Loading