Skip to content

Commit e3f00d1

Browse files
authored
Modify README to include info on loading LLaMA (#18)
1 parent 09e9245 commit e3f00d1

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,19 @@ python -m cacheflow.http_frontend.fastapi_frontend
5353
# At another terminal
5454
python -m cacheflow.http_frontend.gradio_webserver
5555
```
56+
57+
## Load LLaMA weights
58+
59+
Since LLaMA weight is not fully public, we cannot directly download the LLaMA weights from huggingface. Therefore, you need to follow the following process to load the LLaMA weights.
60+
61+
1. Converting LLaMA weights to huggingface format with [this script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py).
62+
```bash
63+
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
64+
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path/llama-7b
65+
```
66+
Please make sure that `llama` is included in the output directory name.
67+
2. For all the commands above, specify the model with `--model /output/path/llama-7b` to load the model. For example:
68+
```bash
69+
python simple_server.py --model /output/path/llama-7b
70+
python -m cacheflow.http_frontend.fastapi_frontend --model /output/path/llama-7b
71+
```

0 commit comments

Comments
 (0)