-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Description
System Info
OS : Ubunto
Virtual Env :
accelerate==0.18.0
certifi==2022.12.7
charset-normalizer==3.1.0
cmake==3.26.3
filelock==3.12.0
huggingface-hub==0.13.4
idna==3.4
Jinja2==3.1.2
lit==16.0.1
MarkupSafe==2.1.2
mpmath==1.3.0
networkx==3.1
numpy==1.24.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
packaging==23.1
psutil==5.9.5
PyYAML==6.0
regex==2023.3.23
requests==2.28.2
sentencepiece==0.1.98
sympy==1.11.1
tokenizers==0.13.3
torch==2.0.0
tqdm==4.65.0
transformers==4.28.1
triton==2.0.0
typing_extensions==4.5.0
urllib3==1.26.15
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Used following command to convert llama-13 weights into hf.
python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir /home/unconveretd-weights --model_size 13B --output_dir /home/test-converted
Expected behavior
It should generated the converted weights. But instead it is generating this error
Loading the checkpoint in a Llama model.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 41/41 [00:17<00:00, 2.35it/s]
Saving in the Transformers format.
Saving a LlamaTokenizerFast to /home/test-converted.
Traceback (most recent call last):
File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 278, in
main()
File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 274, in main
write_tokenizer(args.output_dir, spm_path)
File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 248, in write_tokenizer
tokenizer = tokenizer_class(input_tokenizer_path)
File "/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 89, in init
super().init(
File "/home/myenv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 117, in init
slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
File "/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in init
self.sp_model.Load(vocab_file)
File "/home/myenv/lib/python3.10/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/home/myenv/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: unk is not defined.