Skip to content

Conversation

@lgeiger
Copy link
Contributor

@lgeiger lgeiger commented Sep 16, 2025

Purpose

This PR hashes mode, palette and data of images separately which prevents the need for converting all images to RGBA. See #24925 (comment)

Test Plan

Correctness should be covered by the existing hasher tests on CI.

The performance can be measured using:

import numpy as np
from PIL import Image
from vllm.multimodal.hasher import MultiModalHasher

np.random.seed(42)
data = np.random.randint(0, 255, size=(3840, 2160, 3), dtype=np.uint8)
data = Image.fromarray(data)

%timeit MultiModalHasher.hash_kwargs(data=data)

Test Result

For a 4k PIL image this speeds up hashing by ~35%. This is not massive, but might add up in cases with lots of multimodal input.

# main
25.1 ms ± 124 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# This PR
16.3 ms ± 74.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Sep 16, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the performance of hashing PIL.Image objects by avoiding conversion to RGBA. Instead, it hashes the image's mode, data, and palette separately. This is a good optimization that, according to your tests, speeds up hashing by ~35% for 4k images. I've found one potential issue with the implementation regarding hash uniqueness for palettized images and provided a suggestion to address it.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025
Signed-off-by: Lukas Geiger <[email protected]>
@vllm-bot vllm-bot merged commit 03191cd into vllm-project:main Sep 17, 2025
37 of 40 checks passed
@lgeiger lgeiger deleted the mm-hash-image branch September 17, 2025 08:31
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants