Bug: moondream2 inference not correct (severe quality degradation compared to reference)

### What happened?

Moondream2 is a superb vision model, however on llama.cpp it performs at quality below vanilla llava-1
@vikhyat maybe you'd like to take a look ?

I compared images using python and using llama.cpp, both in fp16 format
moondream2 does recognize images roughly, also the language part seems to work but the quality is totally off through llama.cpp
When asked about spatial information (like lower left corner) it tends to just give anything from the left side or even a random object
On python, the response is precise and surprisingly accurate.

I looked a bit deeper (https://github.com/vikhyat/moondream/blob/main/moondream/vision_encoder.py) and this appears to have support for multiple resolutions, while on llama.cpp it runs in llava-1.5 mode.

However, in my test image llama.cpp creates 729 input embeddings for the image, python did the same.
So it's not just the input embedding count, something deeper is going wrong. My guess is that the sampling/patches are mixed up somehow.

For reference: moondream2 support was merged here: https://github.com/ggerganov/llama.cpp/pull/6899

### Name and Version

abd894a

### What operating system are you seeing the problem on?

_No response_

### Relevant log output

**Below is an example image:**
<img width="592" alt="image" src="https://github.com/ggerganov/llama.cpp/assets/78893154/1d62ff2f-f7fa-47ed-8dac-ca5473908278">

Prompt:`<image>\n\nQuestion: What is in the lower left corner?\n\nAnswer:`
Answer on python: "In the lower left corner, there is a green sticky note pad."
Answer on llave-cli: "A cup of coffee is in the lower left corner."
(I used the official supplied gguf files)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: moondream2 inference not correct (severe quality degradation compared to reference) #8037

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: moondream2 inference not correct (severe quality degradation compared to reference) #8037

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions