Improve Bark Generation

### Feature request

According to this [notebook](https://github.com/suno-ai/bark/blob/773624d26db84278a55aacae9a16d7b25fbccab8/notebooks/long_form_generation.ipynb#L160) from the original [bark repo](https://github.com/suno-ai/bark/):

> Advanced Long-Form Generation
Somtimes Bark will hallucinate a little extra audio at the end of the prompt. We can solve this issue by lowering the threshold for bark to stop generating text. We use the min_eos_p kwarg in generate_text_semantic

This rests on an early stopping strategy yet to be implemented in the transformers implementation of Bark. `min_eos_p` is used during the first sub-model generation, i.e `BarkSemanticModel.generate`. It would be great to add this feature to improve Bark generation.



### Motivation

Sometimes, generated speech have weird artefact at the end of the speech, due to this missing feature.

### Your contribution

Some pointers:
- Where `min_eos_p` is used in the original code: [here](https://github.com/suno-ai/bark/blob/773624d26db84278a55aacae9a16d7b25fbccab8/bark/generation.py#L377-L514).
- In the HF implementation, the semantic model is called during generation [here](https://github.com/huggingface/transformers/blob/b71f20a7c9f3716d30f6738501559acf863e2c5c/src/transformers/models/bark/modeling_bark.py#L1583-L1588), which then called `BarkSemanticModel.generate` [here](https://github.com/huggingface/transformers/blob/b71f20a7c9f3716d30f6738501559acf863e2c5c/src/transformers/models/bark/modeling_bark.py#L799-L805).
- Ideally, we'd add a [custom stopping criteria](https://huggingface.co/docs/transformers/v4.34.0/en/internal/generation_utils#transformers.StoppingCriteria), but as indicated in #23674 it's not yet possible to use [`scores`](https://huggingface.co/docs/transformers/v4.34.0/en/internal/generation_utils#transformers.StoppingCriteria.__call__.scores) without setting `return_dict_in_generate=True, output_scores=True `.
- Instead, let's add [here](https://github.com/huggingface/transformers/blob/b71f20a7c9f3716d30f6738501559acf863e2c5c/src/transformers/models/bark/modeling_bark.py#L802) a [logits processor](https://huggingface.co/docs/transformers/v4.34.0/en/internal/generation_utils#logitsprocessor) which will set every tokens probability other than the EOS token to `-inf`, when the probabiliy of the EOS token id is superior to `min_eos_p`. 


`min_eos_p` should be set by default to None (to keep backward compatibility) in `BarkSemanticGenerationConfig` [here](https://github.com/huggingface/transformers/blob/v4.34.0/src/transformers/models/bark/generation_configuration_bark.py) and could be possibly passed to `BarkModel.generate` kwargs without causing issues!




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Bark Generation #26672

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve Bark Generation #26672

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions