Skip to content

[Bug] Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC can't run #1845

@DanielProkhorov

Description

@DanielProkhorov

🐛 Bug

FileNotFoundError: Cannot find the model library that corresponds to None when running mixtral

To Reproduce

I followed this example for usage: #1529 (comment)

from mlc_chat import ChatConfig, ChatModule, callback
from mlc_chat.support import logging
logging.enable_logging()

MODEL = "HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC"
NUM_GPU = 1

def main():
    cm = ChatModule(MODEL, device="cuda:1", chat_config=ChatConfig(
        sliding_window_size=1024,
        tensor_parallel_shards=NUM_GPU,
    ))
    cm.generate("Who is Garry Kasparow?", progress_callback=callback.StreamToStdout(callback_interval=2))

if __name__ == "__main__":
    main()

Expected behavior

Mixtral can be loaded and inferenced

Environment

  • Platform: CUDA
  • Operating system: Ubuntu
  • Device: H100
  • Python version 3.12

Error trace:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/__main__.py", line 47, in <module>
    main()
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/__main__.py", line 24, in main
    cli.main(sys.argv[2:])
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/cli/compile.py", line 131, in main
    compile(
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 230, in compile
    _compile(args, model_config)
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 177, in _compile
    args.build_func(
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/support/auto_target.py", line 235, in build
    relax.build(
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/relax/vm_build.py", line 335, in build
    mod = pipeline(mod)
          ^^^^^^^^^^^^^
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__
    return _ffi_transform_api.RunPass(self, mod)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/compiler_pass/pipeline.py", line 157, in _pipeline
    mod = seq(mod)
          ^^^^^^^^
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__
    return _ffi_transform_api.RunPass(self, mod)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
  11: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::__mk_TVM10::{lambda(tvm::transform::Pass, tvm::IRModule)#1}>(tvm::transform::__mk_TVM10::{lambda(tvm::transform::Pass, tvm::IRModule)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
  10: tvm::transform::Pass::operator()(tvm::IRModule) const
  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  6: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  5: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::IRModule, tvm::transform::PassContext)>::AssignTypedLambda<tvm::relax::transform::StaticPlanBlockMemory()::{lambda(tvm::IRModule, tvm::transform::PassContext)#1}>(tvm::relax::transform::StaticPlanBlockMemory()::{lambda(tvm::IRModule, tvm::transform::PassContext)#1})::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  4: tvm::relax::StaticPlanBlockMemory(tvm::IRModule)
  3: tvm::relax::StorageAllocatorInit::Initialize(tvm::IRModule const&, tvm::arith::Analyzer*)
  2: tvm::relax::StorageAllocatorInit::VisitExpr_(tvm::relax::FunctionNode const*)
  1: tvm::relax::SetTIRVarUpperBound(tvm::relax::Function, tvm::arith::Analyzer*)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/relax/transform/static_plan_block_memory.cc", line 360
TVMError: Check failed: value->value > 0 (-1 vs. 0) : The entry value of attr `tir_var_upper_bound` should be a positive integer, while -1 is got.
Traceback (most recent call last):
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/chat_module.py", line 756, in __init__
    self.model_lib_path = _get_lib_module_path(
                          ^^^^^^^^^^^^^^^^^^^^^
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/chat_module.py", line 578, in _get_lib_module_path
    raise FileNotFoundError(err_msg)
FileNotFoundError: Cannot find the model library that corresponds to `None`.
`None` is either provided in the `chat_config` you passed in, or specified in .cache/mlc_chat/model_weights/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/mlc-chat-config.json.
We searched over the following possible paths: 
- None-cuda.so
- dist/prebuilt/lib/None-cuda.so
- dist/HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/None-cuda.so
- .cache/mlc_chat/model_weights/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/None-cuda.so
- .cache/mlc_chat/model_weights/junrushao/None-cuda.so
If you would like to directly specify the model library path, you may consider passing in the `ChatModule.model_lib_path` parameter.
Please checkout https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb for an example on how to load a model.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_mlc.py", line 16, in <module>
    main()
  File "test_mlc.py", line 9, in main
    cm = ChatModule(MODEL, device="cuda:1", chat_config=ChatConfig(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/chat_module.py", line 771, in __init__
    jit.jit(
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/jit.py", line 122, in jit
    _run_jit(
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/site-packages/mlc_chat/interface/jit.py", line 95, in _run_jit
    subprocess.run(cmd, check=True)
  File "miniconda3/envs/mlc-chat-venv/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['miniconda3/envs/mlc-chat-venv/bin/python', '-m', 'mlc_chat', 'compile', '.cache/mlc_chat/model_weights/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC', '--opt', 'flashinfer=1;cublas_gemm=1;faster_transformer=1;cudagraph=0', '--overrides', 'sliding_window_size=1024;prefill_chunk_size=4096;attention_sink_size=4;max_batch_size=80;tensor_parallel_shards=1', '--device', 'cuda:1', '--output', '/tmp/tmpu_l85k1j/lib.so']' returned non-zero exit status 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions