[Whisper] Check length of prompt + max new tokens #26164

sanchit-gandhi · 2023-09-14T14:02:14Z

What does this PR do?

Fixes #25422: adds a check for the combined length of prompt + max new tokens. If this total exceeds the model's max length (max_target_positions), we throw an error.

cc @connor-henderson @Helene-Maxcici

sanchit-gandhi · 2023-09-14T14:05:37Z

src/transformers/models/whisper/modeling_whisper.py

            decoder_start_token_id, *text_prompt_ids = prompt_ids
            # Slicing the text prompt ids in a manner consistent with the OpenAI implementation
            # to accomodate context space for the prefix (see https://github.com/openai/whisper/blob/c09a7ae299c4c34c5839a76380ae407e7d785914/whisper/decoding.py#L599)
-            text_prompt_ids = text_prompt_ids[-self.config.max_length // 2 - 1 :]


The equivalent variable for the slicing in the OAI implementation is n_ctx: https://github.com/openai/whisper/blob/c09a7ae299c4c34c5839a76380ae407e7d785914/whisper/decoding.py#L599

This corresponds to our max_target_positions:

transformers/src/transformers/models/whisper/configuration_whisper.py

Line 123 in 7c63e6f

max_target_positions (`int`, *optional*, defaults to 448):

=> I've corrected this in our implementation. Note that for the pre-trained checkpoints, we set max_length = max_target_positions, so this won't change the behaviour here.

However, it will fix the behaviour for newly initialised checkpoints, where max_length does not match max_target_positions (former defaults to 20, latter defaults to 448).

HuggingFaceDocBuilderDev · 2023-09-14T14:22:19Z

The documentation is not available anymore as the PR was closed or merged.

ArthurZucker

Nice! It's not backward comp but it's fix as you mentioned so let's go with this! 💉

[Whisper] Check length of prompt + max new tokens

6a4d936

sanchit-gandhi commented Sep 14, 2023

View reviewed changes

sanchit-gandhi mentioned this pull request Sep 14, 2023

Whisper Prompting max_new_tokens #25422

Closed

4 tasks

sanchit-gandhi requested a review from gante September 14, 2023 14:06

sanchit-gandhi assigned ArthurZucker and unassigned ArthurZucker Sep 14, 2023

sanchit-gandhi requested a review from ArthurZucker September 14, 2023 14:06

ArthurZucker approved these changes Sep 14, 2023

View reviewed changes

sanchit-gandhi merged commit c7b4d0b into huggingface:main Sep 15, 2023

sanchit-gandhi deleted the whisper-max-len branch September 15, 2023 14:46

parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023

[Whisper] Check length of prompt + max new tokens (huggingface#26164)

2ac3926

trungkienbkhn mentioned this pull request Jan 5, 2024

[feat] update distil-whisper SYSTRAN/faster-whisper#557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Whisper] Check length of prompt + max new tokens #26164

[Whisper] Check length of prompt + max new tokens #26164

Uh oh!

sanchit-gandhi commented Sep 14, 2023 •

edited

Loading

Uh oh!

sanchit-gandhi Sep 14, 2023 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Sep 14, 2023 •

edited

Loading

Uh oh!

ArthurZucker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Whisper] Check length of prompt + max new tokens #26164

[Whisper] Check length of prompt + max new tokens #26164

Uh oh!

Conversation

sanchit-gandhi commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

sanchit-gandhi Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sanchit-gandhi commented Sep 14, 2023 •

edited

Loading

sanchit-gandhi Sep 14, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 14, 2023 •

edited

Loading