-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[Core][MultiModalHasher] Hash images without converting image mode #24969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Lukas Geiger <[email protected]>
9c2612a to
089f1e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves the performance of hashing PIL.Image objects by avoiding conversion to RGBA. Instead, it hashes the image's mode, data, and palette separately. This is a good optimization that, according to your tests, speeds up hashing by ~35% for 4k images. I've found one potential issue with the implementation regarding hash uniqueness for palettized images and provided a suggestion to address it.
Signed-off-by: Lukas Geiger <[email protected]>
5ea7b6e to
b8fe820
Compare
…llm-project#24969) Signed-off-by: Lukas Geiger <[email protected]>
…llm-project#24969) Signed-off-by: Lukas Geiger <[email protected]> Signed-off-by: charlifu <[email protected]>
…llm-project#24969) Signed-off-by: Lukas Geiger <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…llm-project#24969) Signed-off-by: Lukas Geiger <[email protected]>
…llm-project#24969) Signed-off-by: Lukas Geiger <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Purpose
This PR hashes mode, palette and data of images separately which prevents the need for converting all images to RGBA. See #24925 (comment)
Test Plan
Correctness should be covered by the existing hasher tests on CI.
The performance can be measured using:
Test Result
For a 4k PIL image this speeds up hashing by ~35%. This is not massive, but might add up in cases with lots of multimodal input.