Skip to content

Conversation

@lgeiger
Copy link
Contributor

@lgeiger lgeiger commented May 28, 2025

This is a follow up to #18756 and instead directly does the multimodal input casting as part of the huggingface preprocessing.
I think this is a bit cleaner and has two advantages: It moves the blocking casting/copy off the main thread and now does the conversion before serialisation which also reduces the amount of data that needs to be serialised and de-serialised. @DarkLight1337 Let me know if you see any disadvantages of doing this.

Before:
Screenshot 2025-05-28 at 23 48 02
After:
Screenshot 2025-05-28 at 23 48 46

I've done some quick benchmark and for some models I'm seeing very small improvement in throughput (0.5%-0.7%).

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added multi-modality Related to multi-modality (#4194) speculative-decoding v1 tpu Related to Google TPUs labels May 28, 2025
@DarkLight1337
Copy link
Member

Overall this approach does make more sense than what I originally did in #18756, thanks!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 29, 2025 14:42
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 29, 2025
@DarkLight1337
Copy link
Member

DarkLight1337 commented May 29, 2025

Can you fix the failing tests? Looks like you need to import torch for real inside the function

auto-merge was automatically disabled May 29, 2025 16:46

Head branch was pushed to by a user without write access

@lgeiger lgeiger force-pushed the non-blocking-casting branch from e61670b to 0292449 Compare May 29, 2025 16:47
@lgeiger
Copy link
Contributor Author

lgeiger commented May 29, 2025

Can you fix the failing tests? Looks like you need to import torch for real inside the function

Ah sorry, should be fixed in 02924492824112f2f6d43f247ba70c91059d9989. Looks like the type annotation needs to be a string to support older versions of python.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 29, 2025 16:50
auto-merge was automatically disabled May 29, 2025 18:50

Head branch was pushed to by a user without write access

@lgeiger
Copy link
Contributor Author

lgeiger commented May 29, 2025

Looks like not all items in the dict are tensors which breaks CI. 9d0d47c58009ef4dd646daa8a2955280406f65f7 should fix that.

@DarkLight1337
Copy link
Member

Looks like V1 test is hanging in this PR, can you investigate it?

@lgeiger lgeiger force-pushed the non-blocking-casting branch from 9d0d47c to b48017f Compare May 30, 2025 09:35
@DarkLight1337
Copy link
Member

cc @njhill any idea?

@njhill
Copy link
Member

njhill commented May 30, 2025

The tokenizers warning is a red herring and shouldn't be an issue. I don't think we should change the mp method in the test to workaround.

If you can repro locally, you could check where things may be stuck by running with env var PYTHONFAULTHANDLER=1 and then sending a SIGABRT to the front-end and back-end procs, which will dump all of the thread stacks.

@lgeiger
Copy link
Contributor Author

lgeiger commented May 31, 2025

If you can repro locally, you could check where things may be stuck by running with env var PYTHONFAULTHANDLER=1 and then sending a SIGABRT to the front-end and back-end procs, which will dump all of the thread stacks.

Here are the dumps from both processes:

Current thread 0x00007b3bc3ce3080 (most recent call first):
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/video_processing_utils.py", line 240 in _prepare_input_videos
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/video_processing_utils.py", line 263 in preprocess
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/video_processing_utils.py", line 197 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/models/qwen2_vl/processing_qwen2_vl.py", line 146 in __call__
  File "/home/ubuntu/vllm/vllm/inputs/registry.py", line 162 in call_hf_processor
  File "/home/ubuntu/vllm/vllm/model_executor/models/qwen2_vl.py", line 1018 in _call_hf_processor
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1290 in _apply_hf_processor_text_mm
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1360 in _apply_hf_processor_mm_only
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1399 in _apply_hf_processor_main
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1552 in _cached_apply_hf_processor
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1786 in apply
  File "/home/ubuntu/vllm/vllm/multimodal/profiling.py", line 168 in _get_dummy_mm_inputs
  File "/home/ubuntu/vllm/vllm/multimodal/profiling.py", line 255 in get_mm_max_tokens
  File "/home/ubuntu/vllm/vllm/multimodal/registry.py", line 131 in get_max_tokens_per_item_by_modality
  File "/home/ubuntu/vllm/vllm/multimodal/registry.py", line 157 in get_max_tokens_per_item_by_nonzero_modality
  File "/home/ubuntu/vllm/vllm/v1/core/encoder_cache_manager.py", line 124 in _compute_encoder_budget_multimodal
  File "/home/ubuntu/vllm/vllm/v1/core/encoder_cache_manager.py", line 94 in compute_encoder_budget
  File "/home/ubuntu/vllm/vllm/v1/worker/gpu_model_runner.py", line 127 in __init__
  File "/home/ubuntu/vllm/vllm/v1/worker/gpu_worker.py", line 144 in init_device
  File "/home/ubuntu/vllm/vllm/worker/worker_base.py", line 604 in init_device
  File "/home/ubuntu/vllm/vllm/utils.py", line 2601 in run_method
  File "/home/ubuntu/vllm/vllm/executor/uniproc_executor.py", line 56 in collective_rpc
  File "/home/ubuntu/vllm/vllm/executor/uniproc_executor.py", line 46 in _init_executor
  File "/home/ubuntu/vllm/vllm/executor/executor_base.py", line 52 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 74 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 398 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 499 in run_engine_core
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 71 in _launch
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 19 in __init__
  File "/usr/lib/python3.12/multiprocessing/context.py", line 282 in _Popen
  File "/usr/lib/python3.12/multiprocessing/process.py", line 121 in start
  File "/home/ubuntu/vllm/vllm/v1/utils.py", line 223 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 461 in _init_engines_direct
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 404 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 693 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 126 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 191 in from_engine_args
  File "/home/ubuntu/vllm/tests/v1/engine/test_async_llm.py", line 95 in test_load
  File "/usr/lib/python3.12/asyncio/events.py", line 88 in _run
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1987 in _run_once
  File "/usr/lib/python3.12/asyncio/base_events.py", line 641 in run_forever
  File "/usr/lib/python3.12/asyncio/base_events.py", line 674 in run_until_complete
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 773 in inner
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 1627 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 508 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 337 in _main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/ubuntu/vllm/.venv/bin/pytest", line 10 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, PIL._imaging, regex._regex, markupsafe._speedups, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, zmq.backend.cython._zmq, PIL._imagingft, msgspec._core, _cffi_backend, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, sentencepiece._sentencepiece, PIL._imagingmath, vllm.cumem_allocator, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box (total: 205)
Thread 0x00007b3aa61fe6c0 (most recent call first):
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/zmq/utils/garbage.py", line 46 in run
  File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007b3bc3ce3080 (most recent call first):
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/zmq/sugar/poll.py", line 106 in poll
  File "/home/ubuntu/vllm/vllm/v1/utils.py", line 311 in wait_for_engine_startup
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 488 in _wait_for_engine_startup
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 473 in _init_engines_direct
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 404 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 693 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 126 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 191 in from_engine_args
  File "/home/ubuntu/vllm/tests/v1/engine/test_async_llm.py", line 95 in test_load
  File "/usr/lib/python3.12/asyncio/events.py", line 88 in _run
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1987 in _run_once
  File "/usr/lib/python3.12/asyncio/base_events.py", line 641 in run_forever
  File "/usr/lib/python3.12/asyncio/base_events.py", line 674 in run_until_complete
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 773 in inner
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 1627 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 508 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 337 in _main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/ubuntu/vllm/.venv/bin/pytest", line 10 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, PIL._imaging, regex._regex, markupsafe._speedups, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, zmq.backend.cython._zmq, PIL._imagingft, msgspec._core, _cffi_backend, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, sentencepiece._sentencepiece, PIL._imagingmath (total: 195)
Aborted (core dumped)

@lgeiger lgeiger force-pushed the non-blocking-casting branch from b1344b4 to 5d5c4b4 Compare May 31, 2025 01:42
@njhill
Copy link
Member

njhill commented May 31, 2025

@lgeiger
Copy link
Contributor Author

lgeiger commented Jun 1, 2025

Looks like it's stuck in transformers here: huggingface/transformers@a31fa21#diff-d3478155ac25ae1107d16a4464001bfd54770bf12cd9bab881233a1b4d216e3fR240 🤔

Yes, I also checked with transformers v4.51.3 which is before this change and it still get's stuck

@lgeiger lgeiger force-pushed the non-blocking-casting branch from 5d5c4b4 to 04779ea Compare June 3, 2025 17:40
@lgeiger
Copy link
Contributor Author

lgeiger commented Jun 3, 2025

It seems like not calling BatchFeature.to() resolves the deadlock for me locally (04779eaae27666598452e0e18cb0189c3353d93b). I don't really know why this is the case though, maybe something weird in BatchFeature.to() is going on. Let's see what CI thinks.

@lgeiger lgeiger force-pushed the non-blocking-casting branch from 04779ea to bf31e39 Compare June 3, 2025 23:55
@lgeiger
Copy link
Contributor Author

lgeiger commented Jun 3, 2025

Looks like I'm now running into #16054 on CI. Rebased on to main to re-trigger CI.

@lgeiger lgeiger requested a review from DarkLight1337 June 3, 2025 23:56
@DarkLight1337
Copy link
Member

DarkLight1337 commented Jun 4, 2025

Nice, thanks for looking into the deadlock problem!

@vllm-bot vllm-bot merged commit 1409ef9 into vllm-project:main Jun 4, 2025
58 of 60 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Multi-modality Core Jun 4, 2025
@lgeiger lgeiger deleted the non-blocking-casting branch June 4, 2025 06:15
@vadiklyutiy
Copy link
Contributor

vadiklyutiy commented Jun 18, 2025

@lgeiger
I noticed that on Qwen2.5-VL in the current maybe_cast_dtype always come <class 'transformers.feature_extraction_utils.BatchFeature'> objs. And actual conversion fp32->bf16 happens in _process_image_input.
Is it as expected?

@lgeiger
Copy link
Contributor Author

lgeiger commented Jun 18, 2025

@vadiklyutiy I'm not sure I fully understand, could you elaborate? For context, this PR is a follow up to #18756. In #18756 dtype conversion has already been removed from DeepseekVL2 and Gemma3. These are now unnecessary as the multi modal input will already have the correct dtype when passed to _process_image_input. I guess the same is true for Qwen2.5-VL. My guess would be that any dtype conversion in _process_image_input of Qwen2.5-VL can be removed as well. Feel free to verify that this is the case and send a PR cleaning this up

@vadiklyutiy
Copy link
Contributor

For Qwen2.5-VL in _process_image_input image come as fp32 and converted to bf16 right before calling self.vision

@vadiklyutiy
Copy link
Contributor

I guess it is supposed that image should be converted in maybe_cast_dtype. But now there never tensors come and nothing converted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding tpu Related to Google TPUs v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants