Skip to content

QWen1.8b acuracy in noquantize #2522

@chenzhenbupt

Description

@chenzhenbupt

🐛 Bug

I used qwen1.5 1.8b to fine-tune the model and wanted to use mlc to deploy inference. During the test, I found that even with the q0f32 parameter without quantization, the accuracy of the test model still dropped by 5 absolute percentage points.

To Reproduce

Steps to reproduce the behavior:

Expected behavior

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
  • Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) A100
  • How you installed MLC-LLM (conda, source): conda source
  • How you installed TVM-Unity (pip, source): pip
  • Python version (e.g. 3.10): 3.10
  • GPU driver version (if applicable):
  • CUDA/cuDNN version (if applicable):
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
  • Any other relevant information:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions