LLaVA OV: fix unpadding precision #34779

zucchini-nlp · 2024-11-18T11:20:33Z

What does this PR do?

Fixes #34625. There was a small precision error in unpadding because the modeling code casts the size to list, while processing code works with tensors. This PR casts everything to list to match the calculations

HuggingFaceDocBuilderDev · 2024-11-18T11:47:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel

Thanks for the fix! Just a question re types

src/transformers/models/llava_onevision/processing_llava_onevision.py

zucchini-nlp · 2024-11-18T14:06:40Z

sorry @qubvel , wrong tag for review

ArthurZucker

Thanks, does it fix #34625 entirely? (let's close it if so!)

Scyther-07 · 2024-12-02T10:09:40Z

Hey @zucchini-nlp,
I am running the following code but it is throwing me a TypeError:

from transformers import AutoProcessor, LlavaForConditionalGeneration, BitsAndBytesConfig, LlavaNextProcessor
model_id = "llava-hf/llava-v1.6-mistral-7b-hf"
processor = AutoProcessor.from_pretrained(model_id)

Error:
TypeError: LlavaNextProcessor.__init__() got an unexpected keyword argument 'image_token'

Now, I looked at the commits of this PR and it looks good to me. The issue might be with your last PR #33424. Kindly look into this.

zucchini-nlp · 2024-12-02T10:23:03Z

@Scyther-07 hey, which transformers version you are using?

Scyther-07 · 2024-12-02T10:27:05Z

@Scyther-07 hey, which transformers version you are using?

It's 4.39.3.
I just noticed that the above code works fine on Google Colab but throws an error in the Kaggle Notebook. I don't know what to make of it. I think I should shift to Colab.

zucchini-nlp · 2024-12-02T10:32:30Z

@Scyther-07 hmm, the 4.39.3 should throw error indeed and you need at least v4.43 to bypass the error. In fact we are currently changing the way inputs for VLMs are processed, thus I'd recommend to use the latest transformers after release. It will be v4.47 around next 1-2 weeks, not released yet now :)

Scyther-07 · 2024-12-02T11:51:12Z

Yeah, it worked. Too foolish of me. Thanks for the help!

* fix * propagate * type check

ambar3497 · 2025-02-18T18:57:16Z

I am getting the mismatch error when I'm passing two images my Transformers version is 4.48 can someone help me understand how can I work on this issue to fix it. my mismatch is by a factor of 2 so something like this -

ValueError Traceback (most recent call last)
Cell In[3], line 10
7 input_type, processed_paths = process_document(file_path, output_dir)
9 if input_type == "image":
---> 10 response = process_llm(input_type, processed_paths, model, processor)
11 print(f"{response}")

Cell In[1], line 461, in process_llm(input_type, input_paths, model, processor)
458 raise ValueError("Invalid input type. Must be 'text' or 'image'.")
460 # Generate response
--> 461 generated_ids = model.generate(**inputs, max_new_tokens=2000, pad_token_id=processor.tokenizer.eos_token_id)
462 generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
463 response_json = extract_json_from_string(generated_texts)

ValueError: Image features and image tokens do not match: tokens: 841, features 1682

zucchini-nlp · 2025-02-18T19:06:09Z

@ambar3497 can you give a clear small reproducer, without external libraries and functions being called?

zucchini-nlp added 2 commits November 18, 2024 12:16

fix

3f4dbcd

propagate

f5313ef

zucchini-nlp requested a review from qubvel November 18, 2024 11:20

qubvel reviewed Nov 18, 2024

View reviewed changes

src/transformers/models/llava_onevision/processing_llava_onevision.py Outdated Show resolved Hide resolved

type check

496e56c

qubvel approved these changes Nov 18, 2024

View reviewed changes

zucchini-nlp requested review from ArthurZucker and qubvel November 18, 2024 14:06

qubvel removed their request for review November 18, 2024 14:40

ArthurZucker approved these changes Nov 19, 2024

View reviewed changes

zucchini-nlp merged commit 145fbd4 into huggingface:main Nov 20, 2024
10 checks passed

chenweize1998 mentioned this pull request Nov 27, 2024

Bug: orig_height and orig_width variable undeifined in llava processing #34952

Closed

4 tasks

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

LLaVA OV: fix unpadding precision (huggingface#34779)

87015b1

* fix * propagate * type check

sheryc mentioned this pull request Jan 20, 2025

Fix Llava-NeXT / Llava-NeXT Video / Llava-OneVision's token unpadding mismatch #35779

Merged

5 tasks

LLaVA OV: fix unpadding precision #34779

LLaVA OV: fix unpadding precision #34779

Uh oh!

Conversation

zucchini-nlp commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 18, 2024

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp commented Nov 18, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Scyther-07 commented Dec 2, 2024

Uh oh!

zucchini-nlp commented Dec 2, 2024

Uh oh!

Scyther-07 commented Dec 2, 2024

Uh oh!

zucchini-nlp commented Dec 2, 2024

Uh oh!

Scyther-07 commented Dec 2, 2024

Uh oh!

ambar3497 commented Feb 18, 2025

Uh oh!

zucchini-nlp commented Feb 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zucchini-nlp commented Nov 18, 2024 •

edited

Loading