Skip to content

Conversation

mdestagnol
Copy link
Contributor

@mdestagnol mdestagnol commented Sep 17, 2025

Currently whisper_full_with_state doesn't work with VAD.

The main inference method is whisper_full_with_state. This method takes a whisper_context and a whisper_state. On top of this method are built 2 convenience methods, whisper_full and whisper_full_parallel.

The VAD logic is currently only implemented in those convenience methods, but not in whisper_full_with_state itself. This is an issue whenever we need to deal with more than one state (when processing multiple real time streams for example).

This pull request moves the logic into whisper_full_with_state and fix a few incorrect calls to ctx->state instead of state directly.

@danbev
Copy link
Member

danbev commented Sep 21, 2025

@mdestagnol Thanks for the pull request.

I've been thinking about this and if I recall correctly, the original pr for VAD actually did something like this. I recall getting feedback about the state should be passed in and not be updated inside of this function (I've not been able to find the comment though I'm afraid).

So I think exposing whisper_vad might be a better option here.

@mdestagnol
Copy link
Contributor Author

@mdestagnol Thanks for the pull request.

I've been thinking about this and if I recall correctly, the original pr for VAD actually did something like this. I recall getting feedback about the state should be passed in and not be updated inside of this function (I've not been able to find the comment though I'm afraid).

So I think exposing whisper_vad might be a better option here.

Isn't state's purpose to persist the context of whisper_full? State is how the whisper_full_with_state method maintains its own context between different runs. It is actively modified by whisper_full_with_state at each run (even outside of VAD). What do you think?

@tazz4843
Copy link
Contributor

Whatever route is taken, this should be documented anyways since it was extremely confusing wondering why VAD wasn't working despite having the switch on.

@danbev
Copy link
Member

danbev commented Sep 27, 2025

What do you think?

Sorry, I make a mistake (recalled incorrectly) about the reason for moving the VAD processing out of whisper_full_with_state. Looking at it closer I found this was moved in 7497754 and related to when parallel processing in use (-p/--processors). We would have to take the parallel processing into account before moving this back into whisper_full_with_state or that parallel processing will not work correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants