Skip to content

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Nov 18, 2024

What does this PR do?

Fixes #34625. There was a small precision error in unpadding because the modeling code casts the size to list, while processing code works with tensors. This PR casts everything to list to match the calculations

@zucchini-nlp zucchini-nlp requested a review from qubvel November 18, 2024 11:20
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Just a question re types

@zucchini-nlp
Copy link
Member Author

sorry @qubvel , wrong tag for review

@qubvel qubvel removed their request for review November 18, 2024 14:40
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, does it fix #34625 entirely? (let's close it if so!)

@zucchini-nlp zucchini-nlp merged commit 145fbd4 into huggingface:main Nov 20, 2024
10 checks passed
@Scyther-07
Copy link

Hey @zucchini-nlp,
I am running the following code but it is throwing me a TypeError:

from transformers import AutoProcessor, LlavaForConditionalGeneration, BitsAndBytesConfig, LlavaNextProcessor
model_id = "llava-hf/llava-v1.6-mistral-7b-hf"
processor = AutoProcessor.from_pretrained(model_id)

Error:
TypeError: LlavaNextProcessor.__init__() got an unexpected keyword argument 'image_token'

Now, I looked at the commits of this PR and it looks good to me. The issue might be with your last PR #33424. Kindly look into this.

@zucchini-nlp
Copy link
Member Author

@Scyther-07 hey, which transformers version you are using?

@Scyther-07
Copy link

@Scyther-07 hey, which transformers version you are using?

It's 4.39.3.
I just noticed that the above code works fine on Google Colab but throws an error in the Kaggle Notebook. I don't know what to make of it. I think I should shift to Colab.

@zucchini-nlp
Copy link
Member Author

@Scyther-07 hmm, the 4.39.3 should throw error indeed and you need at least v4.43 to bypass the error. In fact we are currently changing the way inputs for VLMs are processed, thus I'd recommend to use the latest transformers after release. It will be v4.47 around next 1-2 weeks, not released yet now :)

@Scyther-07
Copy link

Yeah, it worked. Too foolish of me. Thanks for the help!

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
@ambar3497
Copy link

I am getting the mismatch error when I'm passing two images my Transformers version is 4.48 can someone help me understand how can I work on this issue to fix it. my mismatch is by a factor of 2 so something like this -


ValueError Traceback (most recent call last)
Cell In[3], line 10
7 input_type, processed_paths = process_document(file_path, output_dir)
9 if input_type == "image":
---> 10 response = process_llm(input_type, processed_paths, model, processor)
11 print(f"{response}")

Cell In[1], line 461, in process_llm(input_type, input_paths, model, processor)
458 raise ValueError("Invalid input type. Must be 'text' or 'image'.")
460 # Generate response
--> 461 generated_ids = model.generate(**inputs, max_new_tokens=2000, pad_token_id=processor.tokenizer.eos_token_id)
462 generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
463 response_json = extract_json_from_string(generated_texts)

ValueError: Image features and image tokens do not match: tokens: 841, features 1682

@zucchini-nlp
Copy link
Member Author

@ambar3497 can you give a clear small reproducer, without external libraries and functions being called?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLaVA-OneVision mismatch between image features and image tokens

6 participants