Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()`

### System Info

- `transformers` version: 4.49.0.dev0
- Platform: macOS-15.1.1-arm64-arm-64bit
- Python version: 3.11.10
- Huggingface_hub version: 0.27.1
- Safetensors version: 0.5.2
- Accelerate version: 1.2.1
- Accelerate config:    not found
- PyTorch version (GPU?): 2.5.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: no

### Who can help?

@gante
@ArthurZucker

Related PR that discusses recent default `max_length`-related changes: #34814.

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

When using `generate()` with a model that has `generation_config.max_length=20`, the output length differs depending on whether `max_length` is passed explicitly or used implicitly from the `generation_config`.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Setup from tests/generation/test_utils.py::GenerationIntegrationTests
article = "Today a dragon flew over Paris."
model = AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-gpt2")
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-gpt2")
input_ids = tokenizer(article, return_tensors="pt").input_ids

# Case 1: Implicit max_length from generation_config
out_gen_implicit = model.generate(input_ids=input_ids)
print(out_gen_implicit.shape[-1])  # 36

# Case 2: Explicit max_length
out_gen_explicit = model.generate(
    input_ids=input_ids,
    max_length=model.generation_config.max_length
)
print(out_gen_explicit.shape[-1])  # 20

```

In the first case, the generated text is longer than in the second case (36 vs. 20 tokens).


#### Reason and scope

In the first case, `max_length` is overwritten as follows in file `src/transformers/generation/utils.py`, function `_prepare_generated_length`:
```python
if generation_config.max_length == GenerationConfig().max_length:
    generation_config.max_length = generation_config.max_length + input_ids_length
```

Since `GenerationConfig().max_length` defaults to 20, the bug only affects models with `generation_config.max_length` set to 20.

### Expected behavior

The calls `model.generate(input_ids=input_ids)` and `model.generate(input_ids=input_ids, max_length=model.generation_config.max_length)` should generate texts of the same length when `generation_config.max_length` is set to 20.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()` #35765

System Info

Who can help?

Information

Tasks

Reproduction

Reason and scope

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent output lengths when max_length=20 is set implicitly vs explicitly in generate() #35765

Description

System Info

Who can help?

Information

Tasks

Reproduction

Reason and scope

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()` #35765