-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
🐛 Bug
I have a feeling this error is due to disregarding the potential differences between this model and CodeLlama's (I'm too much of a novice to know what about the nuts & bolts need changing to mirror CodeLlama's terms in order for a seamless conv-template conversion of the tokenizer config).
Immediate Error:
tvm.error.InternalError: Traceback (most recent call last): File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/ir/expr.cc", line 88 InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32
I have the full one below.
To Reproduce
https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B/tree/main (this is DeepSeekCoder's base model, but IIRC, its beefed up like Instructor, and thus that's how I intended to use it as)
Steps to reproduce the behavior:
- Convert Model weights
- Generate the pre-compilation config.
- Compile!
So, for context, here is the last step prior to the error
Details
`~/local/gitrepos/mlc-llm/dist/models main +9 !4 ?31 > mlc_chat gen_config ./CFDS --quantization q4f16_1 --conv-template codellama_instruct -o ./Codefuse-Deepseek
[2024-03-08 20:17:46] INFO auto_config.py:115: Found model configuration: CFDS/config.json
[2024-03-08 20:17:46] INFO auto_config.py:153: Found model type: llama. Use --model-type to override.
[2024-03-08 20:17:46] INFO llama_model.py:52: context_window_size not found in config.json. Falling back to max_position_embeddings (16384)
[2024-03-08 20:17:46] INFO llama_model.py:72: prefill_chunk_size defaults to context_window_size (16384)
[2024-03-08 20:17:46] INFO config.py:106: Overriding max_batch_size from 1 to 80
[2024-03-08 20:17:46] INFO gen_config.py:121: [generation_config.json] Setting bos_token_id: 32013
[2024-03-08 20:17:46] INFO gen_config.py:121: [generation_config.json] Setting eos_token_id: 32014
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/tokenizer.model
[2024-03-08 20:17:46] INFO gen_config.py:133: Found tokenizer config: CFDS/tokenizer.json. Copying to Codefuse-Deepseek/tokenizer.json
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/vocab.json
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/merges.txt
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/added_tokens.json
[2024-03-08 20:17:46] INFO gen_config.py:133: Found tokenizer config: CFDS/tokenizer_config.json. Copying to Codefuse-Deepseek/tokenizer_config.json
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting pad_token_id: 0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting temperature: 0.7
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting presence_penalty: 0.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting frequency_penalty: 0.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting repetition_penalty: 1.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting top_p: 0.95
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting mean_gen_len: 128
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting max_gen_len: 512
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting shift_fill_factor: 0.3
[2024-03-08 20:17:46] INFO gen_config.py:186: Dumping configuration file to: Codefuse-Deepseek/mlc-chat-config.json`
Expected behavior
Environment
MacOS, Macbook Pro, M1 Max (Silicon/Metal/MPS)
- How you installed MLC-LLM (
conda, source): Conda - How you installed TVM-Unity (
pip, source): pip - Useragent: conda/23.11.0 requests/2.31.0 CPython/3.10.13 Darwin/23.4.0 OSX/14.4 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.5
- TVM Unity Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
Details
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU:
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: f06d486b4a1a27f0bbb072688a5fc41e7b15323c
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-03-08 02:04:22 -0500
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
- Any other relevant information:
Full
mlc_chat compile ./CodeFuse-Deepseek/mlc-chat-config.json --device metal -o ../lib/CodeFuse-Deepseek.so [2024-03-08 20:23:36] INFO auto_config.py:69: Found model configuration: CodeFuse-Deepseek/mlc-chat-config.json [2024-03-08 20:23:36] INFO auto_device.py:76: Found device: metal:0 [2024-03-08 20:23:36] INFO auto_target.py:70: Found configuration of target device "metal:0": {"thread_warp_size": 32, "max_threads_per_block": 1024, "max_function_args": 31, "max_num_threads": 256, "kind": "metal", "max_shared_memory_per_block": 32768, "tag": "", "keys": ["metal", "gpu"]} [2024-03-08 20:23:36] INFO auto_target.py:102: Found host LLVM triple: arm64-apple-darwin23.4.0 [2024-03-08 20:23:36] INFO auto_target.py:103: Found host LLVM CPU: apple-m1 [2024-03-08 20:23:36] INFO auto_config.py:153: Found model type: llama. Use --model-type to override. Compiling with arguments: --config LlamaConfig(hidden_size=7168, intermediate_size=19200, num_attention_heads=56, num_hidden_layers=62, rms_norm_eps=1e-06, vocab_size=32256, position_embedding_base=100000, context_window_size=16384, prefill_chunk_size=16384, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, max_batch_size=80, kwargs={}) --quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7) --model-type llama --target {"thread_warp_size": 32, "host": {"mtriple": "arm64-apple-darwin23.4.0", "tag": "", "kind": "llvm", "mcpu": "apple-m1", "keys": ["arm_cpu", "cpu"]}, "max_threads_per_block": 1024, "max_function_args": 31, "max_num_threads": 256, "kind": "metal", "max_shared_memory_per_block": 32768, "tag": "", "keys": ["metal", "gpu"]} --opt flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0 --system-lib-prefix "" --output ../lib/CodeFuse-Deepseek.so --overrides context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None [2024-03-08 20:23:36] INFO compile.py:136: Creating model from: LlamaConfig(hidden_size=7168, intermediate_size=19200, num_attention_heads=56, num_hidden_layers=62, rms_norm_eps=1e-06, vocab_size=32256, position_embedding_base=100000, context_window_size=16384, prefill_chunk_size=16384, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, max_batch_size=80, kwargs={}) [2024-03-08 20:23:37] INFO compile.py:155: Exporting the model to TVM Unity compiler [2024-03-08 20:23:38] INFO compile.py:161: Running optimizations using TVM Unity [2024-03-08 20:23:38] INFO compile.py:174: Registering metadata: {'model_type': 'llama', 'quantization': 'q4f16_1', 'context_window_size': 16384, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 16384, 'tensor_parallel_shards': 1, 'kv_cache_bytes': 0} [2024-03-08 20:23:38] INFO pipeline.py:46: Running TVM Relax graph-level optimizations [2024-03-08 20:27:11] INFO pipeline.py:46: Lowering to TVM TIR kernels [2024-03-08 20:27:16] INFO pipeline.py:46: Running TVM TIR-level optimizations [2024-03-08 20:27:38] INFO pipeline.py:46: Running TVM Dlight low-level optimizations Traceback (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/bin/mlc_chat", line 8, in <module> sys.exit(main()) ^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/__main__.py", line 24, in main cli.main(sys.argv[2:]) File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/cli/compile.py", line 131, in main compile( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 230, in compile _compile(args, model_config) File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 177, in _compile args.build_func( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/support/auto_target.py", line 242, in build relax.build( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/relax/vm_build.py", line 335, in build mod = pipeline(mod) ^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__ return _ffi_transform_api.RunPass(self, mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3 File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/compiler_pass/pipeline.py", line 159, in _pipeline mod = seq(mod) ^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__ return _ffi_transform_api.RunPass(self, mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3 File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 307, in _pass_func return inst.transform_module(mod, ctx) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/base/transform.py", line 64, in transform_module sch = _apply_rules(func, target, self.rules, tunable=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/base/transform.py", line 80, in _apply_rules space = rule.apply(func, target, tunable) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/gpu/fallback.py", line 77, in apply bx, tx = sch.split( # pylint: disable=invalid-name ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/tir/schedule/_type_checker.py", line 340, in wrap return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/tir/schedule/schedule.py", line 811, in split _ffi_api.ScheduleSplit( # type: ignore # pylint: disable=no-member File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err tvm.error.InternalError: Traceback (most recent call last): File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/ir/expr.cc", line 88 InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32
Additional context
Based on what I've read elsewhere, I imagine I'm running into issues because I did not build from source, particularly TVM. My other thought is that seeing int32 is odd to me, I'm wondering if that is supposed to be int64 and it may just be some env. flag I need. I also saw someone mention the ability to use --sep-embed (as it is a llama model), donno if that was phased out but it didn't work for me.
Thanks for any help!