Skip to content

Conversation

cjpais
Copy link
Contributor

@cjpais cjpais commented Mar 6, 2024

Should address #5852, #5863

The issue is all normal token processing is skipped for multimodal so it outputs no tokens processed in the response. This is wrong.

To fix, we set the number of tokens processed to it's correct value in ingest_images where the prompt is tokenized for multimodal.

Additionally a fix for the prompt being set to the empty string for multimodal responses. Basically we iteratively rebuild the initial prompt since it was cleared.

I think it would be best to unify the token processing between multimodal and regular, but this PR does not aim to address that.

@chigkim
Copy link

chigkim commented Mar 6, 2024

Awesome! confirming it works! Thank you!

@ggerganov
Copy link
Member

ggerganov commented Mar 6, 2024

I'm attempting to refactor most of the server code in #5882. The tentative plan is to remove multimodal functionality and potentially reintroduce it at a later stage. In the meantime llava-cli can server as a LLaVA example

Since it touches pretty much the entire code, I don't plan to merge server related PRs in the meantime to avoid resolving conflicts

@cjpais
Copy link
Contributor Author

cjpais commented Mar 6, 2024

Wasn't aware of the rewrite, but excited for it. Makes sense to me not to merge.

Happy to close this PR if it makes sense to you. I'll keep my branch up for those who want to patch or cherrypick.

@mathpopo
Copy link

Wasn't aware of the rewrite, but excited for it. Makes sense to me not to merge.

Happy to close this PR if it makes sense to you. I'll keep my branch up for those who want to patch or cherrypick.

can give me zhe branch name? i use latest version , server upload image , just error?

@crashr
Copy link
Contributor

crashr commented Mar 29, 2024

@mathpopo You can fetch from this PR and create a local branch. You could name it "multimodal".

git fetch origin pull/5896/head:multimodal
git checkout multimodal

@ggerganov
Copy link
Member

This is outdated since we removed the multimodal functionality from server

@ggerganov ggerganov closed this May 10, 2024
@mofosyne mofosyne added the obsolete? Marker for potentially obsolete PR label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

obsolete? Marker for potentially obsolete PR Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants