Skip to content

Conversation

nguyenhoangthuan99
Copy link
Contributor

@nguyenhoangthuan99 nguyenhoangthuan99 commented Jun 13, 2025

Purpose

fix #15012

My approach is to cut the lowest energy point of the last 1 second of each 30s audio chunk, then add all chunks to the engine client. wdyt @NickLucche

Test Plan

using openai python sdk to test both

from openai import OpenAI
import os
client = OpenAI(api_key="key",base_url="http://localhost:8000/v1") 

def transcribe_audio(file_path):
    with open(file_path, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-small", 
            file=audio_file
        )
        return transcription.text
def transcribe_audio_stream(file_path):
    with open(file_path, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-small", 
            file=audio_file,
            stream = True
        )
        for event in transcription:
            print(event)
        return None
# Usage
result = transcribe_audio("path/to/>30s/audio")
print("Transcription:", result)

transcribe_audio_stream("path/to/>30s/audio")

Test Result

The transcription text is printed successfully

(Optional) Documentation Update

Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the frontend label Jun 13, 2025
@DarkLight1337
Copy link
Member

cc @NickLucche

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks a lot for contributing to vLLM!

Good job here, although the solution isn't super efficient (eg no batching on pre-processing) I think it will work very well.
There's also room for optimization in a separate PR for the splitting code: we could vectorize the RMS computation better with numpy or at least give numba a try to try and reduce that overhead.

What we need to add in this PR though is at least a test case with a longer audio (splitting code testing would be nice but more optional).
To do that we need to add a new asset to the s3 bucket. @DarkLight1337 could you help here?
I've been using this one to test, but any audio would do. LibriVox ones are public https://librivox.org/.

@NickLucche
Copy link
Collaborator

Also, this test will fail now https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_transcription_validation.py#L77, we need to remove it

@nguyenhoangthuan99
Copy link
Contributor Author

Hi @NickLucche , I fixed all your comments and added 1 test for long audio request - for this test, I just duplicate the marry_had_lamb audio 10 times and expect the string Mary had a little lamb appears 10 times in the transcription result. Can you take a look at it.

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for re-using the same test, I had forgotten I had one with a repeated timeseries.

@nguyenhoangthuan99
Copy link
Contributor Author

Hi @DarkLight1337 , can you help me to trigger the buildkite/ci/pr pipeline, I see the github-action bot mentioned that only PR reviewer(s) can trigger it

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 16, 2025
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well

@nguyenhoangthuan99
Copy link
Contributor Author

nguyenhoangthuan99 commented Jun 16, 2025

Hi @NickLucche, @DarkLight1337 I see all checks are passed, can we merge this PR? 🙏

@DarkLight1337 DarkLight1337 merged commit ede5c4e into vllm-project:main Jun 17, 2025
74 checks passed
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Usage]: Transcription "Maximum clip duration (30s) exceeded

3 participants