Bug: `orig_height` and `orig_width` variable undeifined in llava processing

### System Info

- `transformers` version: 4.47.0.dev0                                     
- Platform: Linux-5.10.112-005.ali5000.alios7.x86_64-x86_64-with-glibc2.32
- Python version: 3.10.13                                                 
- Huggingface_hub version: 0.26.2                                         
- Safetensors version: 0.4.5                                              
- Accelerate version: 1.1.1                                               
- Accelerate config:    not found                                         
- PyTorch version (GPU?): 2.4.0 (True)                                    
- Tensorflow version (GPU?): not installed (NA)                           
- Flax version (CPU?/GPU?/TPU?): not installed (NA)                       
- Jax version: not installed                                              
- JaxLib version: not installed                                           
- Using distributed or parallel set-up in script?: True
- Using GPU in script?: True                                         
- GPU type: NVIDIA H800

### Who can help?

@zucchini-nlp 

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

Here's a minimum script to reproduce the bug. When `return_tensors` is not specified, it raises `UnboundLocalError: local variable 'orig_height' referenced before assignment`. 

```python
from transformers import AutoProcessor
from datasets import load_dataset

processor = AutoProcessor.from_pretrained("models/llava-hf/llama3-llava-next-8b-hf/")
dataset = load_dataset("HuggingFaceM4/LLaVA_Wild_Modif")

text = processor.apply_chat_template(
    [
        {
            "role": "user",
            "content": [{"type": "text", "text": dataset["test"][0]["question"]}, {"type": "image", "text": None}],
        }
    ],
    add_generation_prompt=True,
)

# Raise the error "UnboundLocalError: local variable 'orig_height' referenced before assignment"
inputs = processor(images=[dataset["test"][0]["image"]], text=text)

# No error when specifying return_tensors:
# inputs = processor(images=[dataset["test"][0]["image"]], text=text, return_tensors='pt')
```

### Expected behavior

The code should not raise error without specifying `return_tensors`. The bug is possibly introduced from https://github.com/huggingface/transformers/pull/34779

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: `orig_height` and `orig_width` variable undeifined in llava processing #34952

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: orig_height and orig_width variable undeifined in llava processing #34952

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug: `orig_height` and `orig_width` variable undeifined in llava processing #34952