[Bug] "TVM Internal Error" CodeFuse-Deepseek Compilation

## 🐛 Bug



I have a feeling this error is due to disregarding the potential differences between this model and CodeLlama's (I'm too much of a novice to know what about the nuts & bolts need changing to mirror CodeLlama's terms in order for a seamless conv-template conversion of the tokenizer config). 

Immediate Error:

`tvm.error.InternalError: Traceback (most recent call last):
  File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/ir/expr.cc", line 88
InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32`

I have the full one below.

## To Reproduce

 https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B/tree/main (this is DeepSeekCoder's base model, but IIRC, its beefed up like Instructor, and thus that's how I intended to use it as)

Steps to reproduce the behavior:

1. Convert Model weights 
2. Generate the pre-compilation config. 
3. Compile!


So, for context, here is the last step prior to the error

<details><summary>Details</summary>
<p>

`~/local/gitrepos/mlc-llm/dist/models main +9 !4 ?31 > mlc_chat gen_config ./CFDS --quantization q4f16_1 --conv-template codellama_instruct -o ./Codefuse-Deepseek

[2024-03-08 20:17:46] INFO auto_config.py:115: Found model configuration: CFDS/config.json
[2024-03-08 20:17:46] INFO auto_config.py:153: Found model type: llama. Use `--model-type` to override.
[2024-03-08 20:17:46] INFO llama_model.py:52: context_window_size not found in config.json. Falling back to max_position_embeddings (16384)
[2024-03-08 20:17:46] INFO llama_model.py:72: prefill_chunk_size defaults to context_window_size (16384)
[2024-03-08 20:17:46] INFO config.py:106: Overriding max_batch_size from 1 to 80
[2024-03-08 20:17:46] INFO gen_config.py:121: [generation_config.json] Setting bos_token_id: 32013
[2024-03-08 20:17:46] INFO gen_config.py:121: [generation_config.json] Setting eos_token_id: 32014
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/tokenizer.model
[2024-03-08 20:17:46] INFO gen_config.py:133: Found tokenizer config: CFDS/tokenizer.json. Copying to Codefuse-Deepseek/tokenizer.json
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/vocab.json
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/merges.txt
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/added_tokens.json
[2024-03-08 20:17:46] INFO gen_config.py:133: Found tokenizer config: CFDS/tokenizer_config.json. Copying to Codefuse-Deepseek/tokenizer_config.json
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting pad_token_id: 0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting temperature: 0.7
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting presence_penalty: 0.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting frequency_penalty: 0.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting repetition_penalty: 1.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting top_p: 0.95
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting mean_gen_len: 128
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting max_gen_len: 512
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting shift_fill_factor: 0.3
[2024-03-08 20:17:46] INFO gen_config.py:186: Dumping configuration file to: Codefuse-Deepseek/mlc-chat-config.json`

</p>
</details> 

## Expected behavior



## Environment

MacOS, Macbook Pro, M1 Max (Silicon/Metal/MPS)
 - How you installed MLC-LLM (`conda`, source): Conda
 - How you installed TVM-Unity (`pip`, source):  pip
 - Useragent: conda/23.11.0 requests/2.31.0 CPython/3.10.13 Darwin/23.4.0 OSX/14.4 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.5
 - TVM Unity Hash Tag (`python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"`, applicable if you compile models):
<details><summary>Details</summary>
<p>

USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU:
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: f06d486b4a1a27f0bbb072688a5fc41e7b15323c
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-03-08 02:04:22 -0500
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON

</p>
</details> 

 - Any other relevant information:

Full 
`mlc_chat compile ./CodeFuse-Deepseek/mlc-chat-config.json --device metal -o ../lib/CodeFuse-Deepseek.so
[2024-03-08 20:23:36] INFO auto_config.py:69: Found model configuration: CodeFuse-Deepseek/mlc-chat-config.json
[2024-03-08 20:23:36] INFO auto_device.py:76: Found device: metal:0
[2024-03-08 20:23:36] INFO auto_target.py:70: Found configuration of target device "metal:0": {"thread_warp_size": 32, "max_threads_per_block": 1024, "max_function_args": 31, "max_num_threads": 256, "kind": "metal", "max_shared_memory_per_block": 32768, "tag": "", "keys": ["metal", "gpu"]}
[2024-03-08 20:23:36] INFO auto_target.py:102: Found host LLVM triple: arm64-apple-darwin23.4.0
[2024-03-08 20:23:36] INFO auto_target.py:103: Found host LLVM CPU: apple-m1
[2024-03-08 20:23:36] INFO auto_config.py:153: Found model type: llama. Use `--model-type` to override.
Compiling with arguments:
  --config          LlamaConfig(hidden_size=7168, intermediate_size=19200, num_attention_heads=56, num_hidden_layers=62, rms_norm_eps=1e-06, vocab_size=32256, position_embedding_base=100000, context_window_size=16384, prefill_chunk_size=16384, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, max_batch_size=80, kwargs={})
  --quantization    GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7)
  --model-type      llama
  --target          {"thread_warp_size": 32, "host": {"mtriple": "arm64-apple-darwin23.4.0", "tag": "", "kind": "llvm", "mcpu": "apple-m1", "keys": ["arm_cpu", "cpu"]}, "max_threads_per_block": 1024, "max_function_args": 31, "max_num_threads": 256, "kind": "metal", "max_shared_memory_per_block": 32768, "tag": "", "keys": ["metal", "gpu"]}
  --opt             flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0
  --system-lib-prefix ""
  --output          ../lib/CodeFuse-Deepseek.so
  --overrides       context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None
[2024-03-08 20:23:36] INFO compile.py:136: Creating model from: LlamaConfig(hidden_size=7168, intermediate_size=19200, num_attention_heads=56, num_hidden_layers=62, rms_norm_eps=1e-06, vocab_size=32256, position_embedding_base=100000, context_window_size=16384, prefill_chunk_size=16384, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, max_batch_size=80, kwargs={})
[2024-03-08 20:23:37] INFO compile.py:155: Exporting the model to TVM Unity compiler
[2024-03-08 20:23:38] INFO compile.py:161: Running optimizations using TVM Unity
[2024-03-08 20:23:38] INFO compile.py:174: Registering metadata: {'model_type': 'llama', 'quantization': 'q4f16_1', 'context_window_size': 16384, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 16384, 'tensor_parallel_shards': 1, 'kv_cache_bytes': 0}
[2024-03-08 20:23:38] INFO pipeline.py:46: Running TVM Relax graph-level optimizations
[2024-03-08 20:27:11] INFO pipeline.py:46: Lowering to TVM TIR kernels
[2024-03-08 20:27:16] INFO pipeline.py:46: Running TVM TIR-level optimizations
[2024-03-08 20:27:38] INFO pipeline.py:46: Running TVM Dlight low-level optimizations
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/bin/mlc_chat", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/__main__.py", line 24, in main
    cli.main(sys.argv[2:])
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/cli/compile.py", line 131, in main
    compile(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 230, in compile
    _compile(args, model_config)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 177, in _compile
    args.build_func(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/support/auto_target.py", line 242, in build
    relax.build(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/relax/vm_build.py", line 335, in build
    mod = pipeline(mod)
          ^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__
    return _ffi_transform_api.RunPass(self, mod)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/compiler_pass/pipeline.py", line 159, in _pipeline
    mod = seq(mod)
          ^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__
    return _ffi_transform_api.RunPass(self, mod)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 307, in _pass_func
    return inst.transform_module(mod, ctx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/base/transform.py", line 64, in transform_module
    sch = _apply_rules(func, target, self.rules, tunable=False)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/base/transform.py", line 80, in _apply_rules
    space = rule.apply(func, target, tunable)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/gpu/fallback.py", line 77, in apply
    bx, tx = sch.split(  # pylint: disable=invalid-name
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/tir/schedule/_type_checker.py", line 340, in wrap
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/tir/schedule/schedule.py", line 811, in split
    _ffi_api.ScheduleSplit(  # type: ignore # pylint: disable=no-member
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm.error.InternalError: Traceback (most recent call last):
  File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/ir/expr.cc", line 88
InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32`

## Additional context



Based on what I've read elsewhere, I imagine I'm running into issues because I did not build from source, particularly TVM. My other thought is that seeing int32 is odd to me, I'm wondering if that is supposed to be int64 and it may just be some env. flag I need. I also saw someone mention the ability to use --sep-embed (as it is a llama model), donno if that was phased out but it didn't work for me. 

Thanks for any help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] "TVM Internal Error" CodeFuse-Deepseek Compilation #1911

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] "TVM Internal Error" CodeFuse-Deepseek Compilation #1911

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions