QWen1.8b acuracy in noquantize

## 🐛 Bug
I used qwen1.5 1.8b to fine-tune the model and wanted to use mlc to deploy inference. During the test, I found that even with the q0f32 parameter without quantization, the accuracy of the test model still dropped by 5 absolute percentage points.

## To Reproduce

Steps to reproduce the behavior:

1.
1.
1.



## Expected behavior



## Environment

 - Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
 - Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu
 - Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) A100
 - How you installed MLC-LLM (`conda`, source): conda source
 - How you installed TVM-Unity (`pip`, source): pip
 - Python version (e.g. 3.10): 3.10
 - GPU driver version (if applicable):
 - CUDA/cuDNN version (if applicable):
 - TVM Unity Hash Tag (`python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"`, applicable if you compile models):
 - Any other relevant information:

## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QWen1.8b acuracy in noquantize #2522

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

QWen1.8b acuracy in noquantize #2522

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions