-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[Frontend] add chunking audio for > 30s audio #19597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
cc @NickLucche |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks a lot for contributing to vLLM!
Good job here, although the solution isn't super efficient (eg no batching on pre-processing) I think it will work very well.
There's also room for optimization in a separate PR for the splitting code: we could vectorize the RMS computation better with numpy or at least give numba a try to try and reduce that overhead.
What we need to add in this PR though is at least a test case with a longer audio (splitting code testing would be nice but more optional).
To do that we need to add a new asset to the s3 bucket. @DarkLight1337 could you help here?
I've been using this one to test, but any audio would do. LibriVox ones are public https://librivox.org/.
Also, this test will fail now https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_transcription_validation.py#L77, we need to remove it |
Signed-off-by: nguyenhoangthuan99 <[email protected]>
Signed-off-by: nguyenhoangthuan99 <[email protected]>
Signed-off-by: nguyenhoangthuan99 <[email protected]>
Signed-off-by: nguyenhoangthuan99 <[email protected]>
Signed-off-by: nguyenhoangthuan99 <[email protected]>
Signed-off-by: nguyenhoangthuan99 <[email protected]>
Signed-off-by: nguyenhoangthuan99 <[email protected]>
Hi @NickLucche , I fixed all your comments and added 1 test for long audio request - for this test, I just duplicate the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for re-using the same test, I had forgotten I had one with a repeated timeseries.
Hi @DarkLight1337 , can you help me to trigger the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well
Hi @NickLucche, @DarkLight1337 I see all checks are passed, can we merge this PR? 🙏 |
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Purpose
fix #15012
My approach is to cut the lowest energy point of the last 1 second of each 30s audio chunk, then add all chunks to the engine client. wdyt @NickLucche
Test Plan
using openai python sdk to test both
Test Result
The transcription text is printed successfully
(Optional) Documentation Update