-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugConfirmed bugsConfirmed bugs
Description
🐛 Bug
I used qwen1.5 1.8b to fine-tune the model and wanted to use mlc to deploy inference. During the test, I found that even with the q0f32 parameter without quantization, the accuracy of the test model still dropped by 5 absolute percentage points.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Environment
- Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
- Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu
- Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) A100
- How you installed MLC-LLM (
conda, source): conda source - How you installed TVM-Unity (
pip, source): pip - Python version (e.g. 3.10): 3.10
- GPU driver version (if applicable):
- CUDA/cuDNN version (if applicable):
- TVM Unity Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): - Any other relevant information:
Additional context
Metadata
Metadata
Assignees
Labels
bugConfirmed bugsConfirmed bugs