[Frontend] add chunking audio for > 30s audio #19597

nguyenhoangthuan99 · 2025-06-13T04:32:58Z

Purpose

My approach is to cut the lowest energy point of the last 1 second of each 30s audio chunk, then add all chunks to the engine client. wdyt @NickLucche

Test Plan

using openai python sdk to test both

from openai import OpenAI
import os
client = OpenAI(api_key="key",base_url="http://localhost:8000/v1") 

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-small", 
            file=audio_file
        )
        return transcription.text
def transcribe_audio_stream(file_path):
    with open(file_path, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-small", 
            file=audio_file,
            stream = True
        )
        for event in transcription:
            print(event)
        return None
# Usage
result = transcribe_audio("path/to/>30s/audio")
print("Transcription:", result)

transcribe_audio_stream("path/to/>30s/audio")

Test Result

The transcription text is printed successfully

(Optional) Documentation Update

gemini-code-assist · 2025-06-13T04:33:02Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

github-actions · 2025-06-13T04:33:08Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

DarkLight1337 · 2025-06-13T05:25:41Z

cc @NickLucche

NickLucche

Hey, thanks a lot for contributing to vLLM!

Good job here, although the solution isn't super efficient (eg no batching on pre-processing) I think it will work very well.
There's also room for optimization in a separate PR for the splitting code: we could vectorize the RMS computation better with numpy or at least give numba a try to try and reduce that overhead.

What we need to add in this PR though is at least a test case with a longer audio (splitting code testing would be nice but more optional).
To do that we need to add a new asset to the s3 bucket. @DarkLight1337 could you help here?
I've been using this one to test, but any audio would do. LibriVox ones are public https://librivox.org/.

vllm/entrypoints/openai/serving_transcription.py

NickLucche · 2025-06-13T09:01:13Z

Also, this test will fail now https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_transcription_validation.py#L77, we need to remove it

Signed-off-by: nguyenhoangthuan99 <[email protected]>

nguyenhoangthuan99 · 2025-06-16T08:14:38Z

Hi @NickLucche , I fixed all your comments and added 1 test for long audio request - for this test, I just duplicate the marry_had_lamb audio 10 times and expect the string Mary had a little lamb appears 10 times in the transcription result. Can you take a look at it.

NickLucche

LGTM! Thanks for re-using the same test, I had forgotten I had one with a repeated timeseries.

nguyenhoangthuan99 · 2025-06-16T09:41:49Z

Hi @DarkLight1337 , can you help me to trigger the buildkite/ci/pr pipeline, I see the github-action bot mentioned that only PR reviewer(s) can trigger it

DarkLight1337

LGTM as well

nguyenhoangthuan99 · 2025-06-16T21:52:30Z

Hi @NickLucche, @DarkLight1337 I see all checks are passed, can we merge this PR? 🙏

gemini-code-assist · 2025-06-17T03:34:03Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

nguyenhoangthuan99 requested a review from aarnphm as a code owner June 13, 2025 04:32

mergify bot added the frontend label Jun 13, 2025

NickLucche requested changes Jun 13, 2025

View reviewed changes

nguyenhoangthuan99 requested review from DarkLight1337, robertgshaw2-redhat and simon-mo as code owners June 15, 2025 07:11

nguyenhoangthuan99 requested a review from NickLucche June 15, 2025 16:15

nguyenhoangthuan99 added 7 commits June 16, 2025 12:16

[Frontend] add chunking audio for > 30s audio

7e2bf4f

Signed-off-by: nguyenhoangthuan99 <[email protected]>

[Frontend] fix linting and pre commit check

1fd27fb

Signed-off-by: nguyenhoangthuan99 <[email protected]>

[Frontend] fix linting and pre commit check

abff94c

Signed-off-by: nguyenhoangthuan99 <[email protected]>

[Frontend] fix comment

2068b4c

Signed-off-by: nguyenhoangthuan99 <[email protected]>

[Frontend] remove bad request test and add long audio test

f8b9175

Signed-off-by: nguyenhoangthuan99 <[email protected]>

[Frontend] fix precommit hook fail

f51d7f1

Signed-off-by: nguyenhoangthuan99 <[email protected]>

[Frontend] fix precommit hook fail

edda1e9

Signed-off-by: nguyenhoangthuan99 <[email protected]>

nguyenhoangthuan99 force-pushed the main branch from b2b36f2 to edda1e9 Compare June 16, 2025 05:16

NickLucche approved these changes Jun 16, 2025

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 16, 2025

DarkLight1337 approved these changes Jun 16, 2025

View reviewed changes

DarkLight1337 merged commit ede5c4e into vllm-project:main Jun 17, 2025
74 checks passed

This was referenced Jun 17, 2025

[Usage]: Transcription "Maximum clip duration (30s) exceeded #15012

Closed

[Feature]: Evaluate prompt presence on subsequent audio chunks #19772

Closed

Uh oh!

[Frontend] add chunking audio for > 30s audio #19597

[Frontend] add chunking audio for > 30s audio #19597

Uh oh!

Conversation

nguyenhoangthuan99 commented Jun 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot commented Jun 13, 2025

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

DarkLight1337 commented Jun 13, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NickLucche commented Jun 13, 2025

Uh oh!

nguyenhoangthuan99 commented Jun 16, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

nguyenhoangthuan99 commented Jun 16, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

nguyenhoangthuan99 commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot commented Jun 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nguyenhoangthuan99 commented Jun 13, 2025 •

edited by github-actions bot

Loading

nguyenhoangthuan99 commented Jun 16, 2025 •

edited

Loading