Fix failing `MusicgenTest .test_pipeline_text_to_audio` #26586

ydshieh · 2023-10-04T09:32:29Z

What does this PR do?

Fix failing MusicgenTest .test_pipeline_text_to_audio

HuggingFaceDocBuilderDev · 2023-10-04T09:47:51Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh · 2023-10-04T10:12:53Z

tests/test_pipeline_mixin.py

+        elif config_vocab_size is None and hasattr(model.config, "text_encoder"):
+            config_vocab_size = getattr(model.config.text_encoder, "vocab_size", None)


Try to get the vocab size for musicgen (tiny) model

Note that for MusicGen, the input/output vocab sizes are different:

Input vocab size is the vocab size of the text encoder model (text_encoder.vocab_size)

Output vocab size is the number of vq-codes the audio decoder can generate (decoder.vocab_size). These vq-codes then go through the codec model to recover the final waveform

Thus, the actual output is continuous (a 1-d audio waveform)

Either way, I don't think the config vocab size is required for the TTA test:

transformers/tests/pipelines/test_pipelines_text_to_audio.py

Line 180 in fc296f4

def run_pipeline_test(self, speech_generator, _):

ydshieh · 2023-10-04T10:13:35Z

tests/models/musicgen/test_modeling_musicgen.py

+    # Get `IndexError: index out of range in self` in `torch.embedding`.
+    # TODO: check tiny model creation for this model.
+    def is_pipeline_test_to_skip(
+        self, pipeline_test_casse_name, config_class, model_architecture, tokenizer_name, processor_name
+    ):
+        if pipeline_test_casse_name == "TextToAudioPipelineTests":
+            return True
+
+        return False


Still failing due to some index issue in embedding. We probably need to check how the tiny model for this architecture is created.

This is because some of the input text contains token id's that are outside of the model's vocab size:

transformers/tests/pipelines/test_pipelines_text_to_audio.py

Line 181 in fc296f4

outputs = speech_generator("This is a test")

=> the word test is probably out-of-vocabulary, so is giving this index error in the embeddings

We can either update the tokenizer to include all the ids in the model test and expand the model's vocab size accordingly, or update the tests to only use tokens that are in the tokenizer/model vocab

Yes, the model config might have small vocab_size than the tokenizer one. This is more from the tiny model creation script that can't handle something special in musicgen's config.

Manually modifying the tiny model's config is not recommended, as it fixes the test quickly but the underlying issue (tiny model creation) is still there.

I will check what I can do with the script, but in the meantime let's not fail main anymore.

The config file shows

https://huggingface.co/hf-internal-testing/tiny-random-MusicgenForConditionalGeneration/blob/main/config.json

both have vocab_size 99: which indeed means the config file is not updated with the (possibly converted) tokenizer.

Cool! Agree that we should't change the model's config on-the-fly, but instead we could change text inputs in the test to be valid tokens in the model's vocabulary?

This means we have two options: we could either update the model's vocab size in the tiny model on the Hub, or update the text inputs in the test to be part of the current model's vocabulary. Either one is good for me in a follow-up PR!

ydshieh · 2023-10-06T08:24:35Z

Hi @sanchit-gandhi

I have updated the tiny models on the Hub (config/model file) manually. I have to update the commit sha info. so the pipeline tests will take this new versions.

Failing test passes now.

I will rework the tiny model creation script in a separate PR.

sanchit-gandhi

Thanks for fixing this @ydshieh! 🎵

ydshieh · 2023-10-06T12:44:50Z

@LysandreJik Ready for a review and we might have a green CI during the weekend :-) 🙏

LysandreJik

Thanks both! Onto green CI 💚

ydshieh commented Oct 4, 2023

View reviewed changes

ydshieh requested a review from LysandreJik October 4, 2023 10:14

ydshieh added 4 commits October 6, 2023 10:31

fix

4a15d69

fix

85366c0

Fix

17b97b6

Fix

9350590

ydshieh force-pushed the skip_musicgen_pip_test branch from e4d314e to 9350590 Compare October 6, 2023 08:31

sanchit-gandhi approved these changes Oct 6, 2023

View reviewed changes

LysandreJik approved these changes Oct 6, 2023

View reviewed changes

LysandreJik merged commit e840aa6 into main Oct 6, 2023

LysandreJik deleted the skip_musicgen_pip_test branch October 6, 2023 13:54

		elif config_vocab_size is None and hasattr(model.config, "text_encoder"):
		config_vocab_size = getattr(model.config.text_encoder, "vocab_size", None)

Fix failing MusicgenTest .test_pipeline_text_to_audio #26586

Fix failing MusicgenTest .test_pipeline_text_to_audio #26586

Uh oh!

Conversation

ydshieh commented Oct 4, 2023

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh Oct 5, 2023

Choose a reason for hiding this comment

Uh oh!

ydshieh Oct 5, 2023

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi Oct 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Oct 6, 2023

Uh oh!

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Oct 6, 2023

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix failing `MusicgenTest .test_pipeline_text_to_audio` #26586

Fix failing `MusicgenTest .test_pipeline_text_to_audio` #26586

HuggingFaceDocBuilderDev commented Oct 4, 2023 •

edited

Loading

sanchit-gandhi Oct 4, 2023 •

edited

Loading

sanchit-gandhi Oct 4, 2023 •

edited

Loading

sanchit-gandhi Oct 5, 2023 •

edited

Loading