Skip to content

Commit 30e55c9

Browse files
authored
Merge pull request #140 from joerunde/vendor-scripts
🍱 include scripts with AFTU installs
2 parents 281ff22 + adafb73 commit 30e55c9

17 files changed

+3734
-3715
lines changed
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Scripts for using Foundation Model Stack (FMS) on AIU hardware
2+
3+
The scripts provided here allow you to run FMS on AIU device for a variety of models.
4+
5+
Let's look at some of the example usage below.
6+
7+
## How to run an encoder model
8+
The script [run_encoders.py](https://github.com/foundation-model-stack/aiu-fms-testing-utils/blob/main/scripts/run_encoders.py) allows you to run an encoder model. Its usage is demonstrated below.
9+
10+
```bash
11+
# run RoBERTa on AIU
12+
python3 run_encoders.py --architecture=hf_pretrained --model_path=/home/senuser/roberta --tokenizer=/home/senuser/roberta --unfuse_weights --device_type=aiu --compile --compile_dynamic
13+
14+
# run RoBERTa on CPU
15+
python3 run_encoders.py --architecture=hf_pretrained --model_path=/home/senuser/roberta --tokenizer=/home/senuser/roberta --unfuse_weights --device_type=cpu
16+
```
17+
18+
## How to run a decoder model
19+
The script [inference.py](https://github.com/foundation-model-stack/aiu-fms-testing-utils/blob/main/scripts/inference.py) allows you to run a decoder model. Its usage is demonstrated below for various Llama and Granite models.
20+
21+
```bash
22+
# run 194m on AIU
23+
python3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama3.194m --tokenizer=/home/senuser/llama3.194m --unfuse_weights --min_pad_length 64 --device_type=aiu --max_new_tokens=5 --compile --default_dtype=fp16 --compile_dynamic
24+
25+
# run 194m on CPU
26+
python3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama3.194m --tokenizer=/home/senuser/llama3.194m --unfuse_weights --min_pad_length 64 --device_type=cpu --max_new_tokens=5 --default_dtype=fp32
27+
28+
# run 7b on AIU
29+
python3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama2.7b --tokenizer=/home/senuser/llama2.7b --unfuse_weights --min_pad_length 64 --device_type=aiu --max_new_tokens=5 --compile --default_dtype=fp16 --compile_dynamic
30+
31+
# run 7b on CPU
32+
python3 inference.py --architecture=hf_pretrained --model_path=/home/senuser/llama2.7b--tokenizer=/home/senuser/llama2.7b --unfuse_weights --min_pad_length 64 --device_type=cpu --max_new_tokens=5 --default_dtype=fp32
33+
34+
# run gpt_bigcode (granite) 3b on AIU
35+
python3 inference.py --architecture=gpt_bigcode --variant=ibm.3b --model_path=/home/senuser/gpt_bigcode.granite.3b/*00002.bin --model_source=hf --tokenizer=/home/senuser/gpt_bigcode.granite.3b --unfuse_weights --min_pad_length 64 --device_type=aiu --max_new_tokens=5 --prompt_type=code --compile --default_dtype=fp16 --compile_dynamic
36+
37+
# run gpt_bigcode (granite) 3b on CPU
38+
python3 inference.py --architecture=gpt_bigcode --variant=ibm.3b --model_path=/home/senuser/gpt_bigcode.granite.3b/*00002.bin --model_source=hf --tokenizer=/home/senuser/gpt_bigcode.granite.3b --unfuse_weights --min_pad_length 64 --device_type=cpu --max_new_tokens=5 --prompt_type=code --default_dtype=fp32
39+
```
40+
41+
## How to run tensor parallel
42+
The [small-toy.py](https://github.com/foundation-model-stack/aiu-fms-testing-utils/blob/main/scripts/small-toy.py) is a slimmed down version of the Big Toy model. The purpose of this script is to demonstrate how to run a tensor parallel model with the FMS on AIU hardware.
43+
44+
The `--nproc-per-node` command line option controls the number of AIUs to use (number of parallel processes).
45+
46+
```bash
47+
# 1 AIU (sequential)
48+
# Inductor (CPU) backend (default)
49+
torchrun --nproc-per-node 1 ./small-toy.py
50+
# AIU backend
51+
torchrun --nproc-per-node 1 ./small-toy.py --backend aiu
52+
53+
# 2 AIUs (tensor parallel)
54+
# Inductor (CPU) backend (default)
55+
torchrun --nproc-per-node 2 ./small-toy.py
56+
# AIU backend
57+
torchrun --nproc-per-node 2 ./small-toy.py --backend aiu
58+
```
59+
60+
## How to validate models
61+
The [validation.py](https://github.com/foundation-model-stack/aiu-fms-testing-utils/blob/main/scripts/validation.py) provides an example of validating models on AIU through comparisons to other devices. Its usage is demonstrated below for various cases.
62+
63+
```bash
64+
# Run a llama 194m model, grab the example inputs in the script, generate validation tokens on cpu, validate token equivalency:
65+
python3 scripts/validation.py --architecture=hf_pretrained --model_path=/home/devel/models/llama-194m --tokenizer=/home/devel/models/llama-194m --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --compile_dynamic
66+
67+
# Run a llama 194m model, grab the example inputs in a folder, generate validation tokens on cpu, validate token equivalency:
68+
python3 scripts/validation.py --architecture=hf_pretrained --model_path=/home/devel/models/llama-194m --tokenizer=/home/devel/models/llama-194m --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --prompt_path=/home/devel/aiu-fms-testing-utils/prompts/test/*.txt --compile_dynamic
69+
70+
# Run a llama 194m model, grab the example inputs in a folder, grab validation text from a folder, validate token equivalency (will only validate up to max(max_new_tokens, tokens_in_validation_file)):
71+
python3 scripts/validation.py --architecture=hf_pretrained --model_path=/home/devel/models/llama-194m --tokenizer=/home/devel/models/llama-194m --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --prompt_path=/home/devel/aiu-fms-testing-utils/prompts/test/*.txt --validation_files_path=/home/devel/aiu-fms-testing-utils/prompts/validation/*.txt --compile_dynamic
72+
73+
# Validate a reduced size version of llama 8b
74+
python3 scripts/validation.py --architecture=hf_configured --model_path=/home/devel/models/llama-8b --tokenizer=/home/devel/models/llama-8b --unfuse_weights --batch_size=1 --min_pad_length=64 --max_new_tokens=10 --extra_get_model_kwargs nlayers=3 --compile_dynamic
75+
```
76+
77+
To run a logits-based validation, pass `--validation_level=1` to the validation script. This will check for the logits output to match at every step of the model through cross-entropy loss. You can control the acceptable threshold with `--logits_loss_threshold`.
78+

0 commit comments

Comments
 (0)