Gibberish output with `Llama-2-7b-chat-hf-q4f32_1`

Chrome Version: 125.0.6283.3
OS: ChromeOS
GPU: Intel(R) Graphics (ADL GT2) - Intel open-source Mesa driver: Mesa 23.3.0 (git-5cb3f1e4fa)
Dawn Backend: Vulkan

**What steps will reproduce the problem?**
1. Go to https://webllm.mlc.ai/#chat-demo
2. Select `Llama-2-7b-chat-hf-q4f32_1`
3. Enter `What color is the dress?`

**What is the expected result?**
Some text that at least makes sense.

**What happens instead?**
Some gibberish text appears.
DevTools JavaScript console contains the following logs:
```
llm_chat.ts:150 Using prefillChunkSize:  1024
llm_chat.ts:180 Using maxWindowLength:  4096
llm_chat.ts:202 Using Paged KVCache
15vkAllocateMemory failed with VK_ERROR_OUT_OF_DEVICE_MEMORY
    at CheckVkOOMThenSuccessImpl (..<URL>)

15vkAllocateMemory failed with VK_ERROR_OUT_OF_DEVICE_MEMORY
    at CheckVkOOMThenSuccessImpl (..<URL>)
```
Then I enter "What color is the dress?"
```
97[Invalid Buffer (unlabeled)] is invalid.
 - While validating entries[0] as a Buffer.
Expected entry layout: { type: BufferBindingType::Storage, hasDynamicOffset: 0, minBindingSize: 0 }
 - While validating [BindGroupDescriptor] against [BindGroupLayout (unlabeled)]
 - While calling [Device].CreateBindGroup([BindGroupDescriptor]).

162[Invalid BindGroup (unlabeled)] is invalid.
 - While encoding [ComputePassEncoder (unlabeled)].SetBindGroup(0, [Invalid BindGroup (unlabeled)], 0, ...).

161[Invalid CommandBuffer] is invalid.
 - While calling [Queue].Submit([[Invalid CommandBuffer]])

97[Invalid Buffer (unlabeled)] is invalid.
 - While validating entries[0] as a Buffer.
Expected entry layout: { type: BufferBindingType::Storage, hasDynamicOffset: 0, minBindingSize: 0 }
 - While validating [BindGroupDescriptor] against [BindGroupLayout (unlabeled)]
 - While calling [Device].CreateBindGroup([BindGroupDescriptor]).

65[Invalid Buffer (unlabeled)] is invalid.
 - While validating entries[0] as a Buffer.
Expected entry layout: { type: BufferBindingType::ReadOnlyStorage, hasDynamicOffset: 0, minBindingSize: 0 }
 - While validating [BindGroupDescriptor] against [BindGroupLayout (unlabeled)]
 - While calling [Device].CreateBindGroup([BindGroupDescriptor]).

162[Invalid BindGroup (unlabeled)] is invalid.
 - While encoding [ComputePassEncoder (unlabeled)].SetBindGroup(0, [Invalid BindGroup (unlabeled)], 0, ...).

161[Invalid CommandBuffer] is invalid.
 - While calling [Queue].Submit([[Invalid CommandBuffer]])

65[Invalid Buffer (unlabeled)] is invalid.
 - While validating entries[0] as a Buffer.
Expected entry layout: { type: BufferBindingType::ReadOnlyStorage, hasDynamicOffset: 0, minBindingSize: 0 }
 - While validating [BindGroupDescriptor] against [BindGroupLayout (unlabeled)]
 - While calling [Device].CreateBindGroup([BindGroupDescriptor]).

WebGPU: too many warnings, no more warnings will be reported to the console for this GPUDevice.
/#chat-demo:1 WebGPU: too many warnings, no more warnings will be reported to the console for this GPUDevice.
```

**Note**
It does work properly with the following f16 variants: `Llama-2-7b-chat-hf-q4f16_1` and `Llama-2-7b-chat-hf-q4f16_1-1k`
I can reproduce with `Llama-2-13b-chat-hf-q4f16_1`



![image](https://github.com/mlc-ai/web-llm/assets/634478/2ea50e0b-302f-454e-8acd-7b0dcea34474)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gibberish output with `Llama-2-7b-chat-hf-q4f32_1` #356

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gibberish output with Llama-2-7b-chat-hf-q4f32_1 #356

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Gibberish output with `Llama-2-7b-chat-hf-q4f32_1` #356