Skip to content

Improve Bark Generation #26672

@ylacombe

Description

@ylacombe

Feature request

According to this notebook from the original bark repo:

Advanced Long-Form Generation
Somtimes Bark will hallucinate a little extra audio at the end of the prompt. We can solve this issue by lowering the threshold for bark to stop generating text. We use the min_eos_p kwarg in generate_text_semantic

This rests on an early stopping strategy yet to be implemented in the transformers implementation of Bark. min_eos_p is used during the first sub-model generation, i.e BarkSemanticModel.generate. It would be great to add this feature to improve Bark generation.

Motivation

Sometimes, generated speech have weird artefact at the end of the speech, due to this missing feature.

Your contribution

Some pointers:

min_eos_p should be set by default to None (to keep backward compatibility) in BarkSemanticGenerationConfig here and could be possibly passed to BarkModel.generate kwargs without causing issues!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions