-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Description
System Info
transformersversion: 4.49.0.dev0- Platform: macOS-15.1.1-arm64-arm-64bit
- Python version: 3.11.10
- Huggingface_hub version: 0.27.1
- Safetensors version: 0.5.2
- Accelerate version: 1.2.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.5.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: no
Who can help?
Related PR that discusses recent default max_length-related changes: #34814.
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
When using generate() with a model that has generation_config.max_length=20, the output length differs depending on whether max_length is passed explicitly or used implicitly from the generation_config.
from transformers import AutoModelForCausalLM, AutoTokenizer
# Setup from tests/generation/test_utils.py::GenerationIntegrationTests
article = "Today a dragon flew over Paris."
model = AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-gpt2")
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-gpt2")
input_ids = tokenizer(article, return_tensors="pt").input_ids
# Case 1: Implicit max_length from generation_config
out_gen_implicit = model.generate(input_ids=input_ids)
print(out_gen_implicit.shape[-1]) # 36
# Case 2: Explicit max_length
out_gen_explicit = model.generate(
input_ids=input_ids,
max_length=model.generation_config.max_length
)
print(out_gen_explicit.shape[-1]) # 20In the first case, the generated text is longer than in the second case (36 vs. 20 tokens).
Reason and scope
In the first case, max_length is overwritten as follows in file src/transformers/generation/utils.py, function _prepare_generated_length:
if generation_config.max_length == GenerationConfig().max_length:
generation_config.max_length = generation_config.max_length + input_ids_lengthSince GenerationConfig().max_length defaults to 20, the bug only affects models with generation_config.max_length set to 20.
Expected behavior
The calls model.generate(input_ids=input_ids) and model.generate(input_ids=input_ids, max_length=model.generation_config.max_length) should generate texts of the same length when generation_config.max_length is set to 20.