Skip to content

Commit 0e47515

Browse files
ZhaoqiongZtye1
andauthored
[Doc[ Update LLM tutorials and docker build version for IPEX 2.7 (#5506)
* update release link version in doc * update llm readme with prebuilt wheels * update llm env_activate with training * remove training in env_activate.bat --------- Co-authored-by: Ye Ting <[email protected]>
1 parent c2cd2d1 commit 0e47515

File tree

6 files changed

+79
-18
lines changed

6 files changed

+79
-18
lines changed

docs/tutorials/llm/int4_weight_only_quantization.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -129,9 +129,9 @@ Intel® Extension for PyTorch\* implements Weight-Only Quantization for Intel®
129129

130130
### Environment Setup
131131

132-
Please refer to the [env setup](https://github.com/intel/intel-extension-for-pytorch/blob/v2.6.10%2Bxpu/examples/gpu/llm/inference/README.md).
132+
Please refer to the [env setup](https://github.com/intel/intel-extension-for-pytorch/blob/v2.7.10%2Bxpu/examples/gpu/llm/inference/README.md).
133133

134-
Example can be found at [Learn WOQ](https://github.com/intel/intel-extension-for-pytorch/tree/v2.6.10%2Bxpu/examples/gpu/llm/inference#learn-to-quantize-llm-and-run-inference).
134+
Example can be found at [Learn WOQ](https://github.com/intel/intel-extension-for-pytorch/tree/v2.7.10%2Bxpu/examples/gpu/llm/inference#learn-to-quantize-llm-and-run-inference).
135135

136136
### Run Weight-Only Quantization LLM on Intel® GPU
137137

@@ -182,7 +182,7 @@ output = loaded_model.generate(inputs)
182182
```
183183

184184

185-
#### Execute [WOQ benchmark script](https://github.com/intel/intel-extension-for-pytorch/blob/v2.6.10%2Bxpu/examples/gpu/llm/inference/run_benchmark_woq.sh)
185+
#### Execute [WOQ benchmark script](https://github.com/intel/intel-extension-for-pytorch/blob/v2.7.10%2Bxpu/examples/gpu/llm/inference/run_benchmark_woq.sh)
186186

187187
```python
188188
bash run_benchmark_woq.sh

docs/tutorials/llm/llm_optimize_transformers.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ API documentation is available at [API Docs page](../api_doc.html#ipex.llm.optim
99

1010
## Pseudocode of Common Usage Scenarios
1111

12-
The following sections show pseudocode snippets to invoke Intel® Extension for PyTorch\* APIs to work with LLMs. Complete examples can be found at [the Example directory](https://github.com/intel/intel-extension-for-pytorch/tree/v2.6.10%2Bxpu/examples/gpu/llm/inference).
12+
The following sections show pseudocode snippets to invoke Intel® Extension for PyTorch\* APIs to work with LLMs. Complete examples can be found at [the Example directory](https://github.com/intel/intel-extension-for-pytorch/tree/v2.7.10%2Bxpu/examples/gpu/llm/inference).
1313

1414
### FP16
1515

@@ -117,7 +117,7 @@ print(modelJit.graph_for(inference_dta))
117117

118118
Distributed inference can be performed with `DeepSpeed`. Based on original Intel® Extension for PyTorch\* scripts, the following code changes are required.
119119

120-
Check Distributed Examples in [LLM example](https://github.com/intel/intel-extension-for-pytorch/tree/v2.6.10%2Bxpu/examples/gpu/llm/inference) for complete codes.
120+
Check Distributed Examples in [LLM example](https://github.com/intel/intel-extension-for-pytorch/tree/v2.7.10%2Bxpu/examples/gpu/llm/inference) for complete codes.
121121

122122

123123

examples/gpu/llm/README.md

Lines changed: 71 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,92 @@
11
# LLM Optimization Overview
22

3-
Here you can find benchmarking scripts for large language models (LLM) text generation. These scripts:
3+
Here you can find examples for large language models (LLM) text generation. These scripts:
44

5-
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other models such as Baichuan2-13B and Phi3-mini.
5+
> [!NOTE]
6+
> New Llama models like Llama3.2-1B, Llama3.2-3B and Llama3.3-7B are also supported from release v2.7.10+xpu.
7+
8+
- Include both inference/finetuning(lora)/bitsandbytes(qlora-finetuning)/training.
69
- Include both single instance and distributed (DeepSpeed) use cases for FP16 optimization.
10+
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other models such as Baichuan2-13B and Phi3-mini.
711
- Cover model generation inference with low precision cases for different models with best performance and accuracy (fp16 AMP and weight only quantization)
812

913
## Environment Setup
1014

11-
### [Recommended] Docker-based environment setup with compilation from source
15+
### [Recommended] Docker-based environment setup with prebuilt wheel files
16+
17+
```bash
18+
# Get the Intel® Extension for PyTorch* source code
19+
git clone https://github.com/intel/intel-extension-for-pytorch.git
20+
cd intel-extension-for-pytorch
21+
git checkout release/xpu/2.7.10
22+
git submodule sync
23+
git submodule update --init --recursive
24+
25+
# Build an image with the provided Dockerfile by installing Intel® Extension for PyTorch* with prebuilt wheels
26+
docker build -f examples/gpu/llm/Dockerfile -t ipex-llm:xpu .
27+
28+
# Run the container with command below
29+
docker run -it --rm --privileged -v /dev/dri/by-path:/dev/dri/by-path ipex-llm:xpu bash
30+
31+
# When the command prompt shows inside the docker container, enter llm examples directory
32+
cd llm
33+
34+
# Activate environment variables
35+
source ./tools/env_activate.sh [inference|fine-tuning|bitsandbytes|training]
36+
# on Windows, use env_activate.bat instead
37+
call .\tools\env_activate.bat [inference|fine-tuning|bitsandbytes]
38+
```
39+
### Conda-based environment setup with prebuilt wheel files
40+
41+
Make sure the driver packages are installed. Refer to [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.7.10%2Bxpu&os=linux%2Fwsl2&package=pip).
42+
43+
```bash
44+
45+
# Get the Intel® Extension for PyTorch* source code
46+
git clone https://github.com/intel/intel-extension-for-pytorch.git
47+
cd intel-extension-for-pytorch
48+
git checkout release/xpu/2.7.10
49+
git submodule sync
50+
git submodule update --init --recursive
51+
52+
# Make sure you have GCC >= 11 is installed on your system.
53+
# Create a conda environment
54+
conda create -n llm python=3.10 -y
55+
conda activate llm
56+
# Setup the environment with the provided script
57+
cd examples/gpu/llm
58+
# If you want to install Intel® Extension for PyTorch\* with prebuilt wheels, use the commands below:
59+
python ./tools/env_setup.py --setup --deploy
60+
conda deactivate
61+
conda activate llm
62+
source ./tools/env_activate.sh [inference|fine-tuning|bitsandbytes|training]
63+
# on Windows, use env_activate.bat instead
64+
call .\tools\env_activate.bat [inference|fine-tuning|bitsandbytes]
65+
```
66+
67+
### Docker-based environment setup with compilation from source
1268

1369
```bash
1470
# Get the Intel® Extension for PyTorch* source code
1571
git clone https://github.com/intel/intel-extension-for-pytorch.git
1672
cd intel-extension-for-pytorch
17-
git checkout xpu-main
73+
git checkout release/xpu/2.7.10
1874
git submodule sync
1975
git submodule update --init --recursive
2076

2177
# Build an image with the provided Dockerfile by compiling Intel® Extension for PyTorch* from source
22-
docker build -f examples/gpu/llm/Dockerfile --build-arg COMPILE=ON -t ipex-llm:xpu-main .
78+
docker build -f examples/gpu/llm/Dockerfile --build-arg COMPILE=ON -t ipex-llm:xpu .
2379

2480
# Run the container with command below
25-
docker run -it --rm --privileged -v /dev/dri/by-path:/dev/dri/by-path ipex-llm:xpu-main bash
81+
docker run -it --rm --privileged -v /dev/dri/by-path:/dev/dri/by-path ipex-llm:xpu bash
2682

2783
# When the command prompt shows inside the docker container, enter llm examples directory
2884
cd llm
2985

3086
# Activate environment variables
31-
source ./tools/env_activate.sh [inference|fine-tuning|bitsandbytes]
87+
source ./tools/env_activate.sh [inference|fine-tuning|bitsandbytes|training]
88+
# on Windows, use env_activate.bat instead
89+
call .\tools\env_activate.bat [inference|fine-tuning|bitsandbytes]
3290
```
3391

3492
### Conda-based environment setup with compilation from source
@@ -40,7 +98,7 @@ Make sure the driver and Base Toolkit are installed. Refer to [Installation Guid
4098
# Get the Intel® Extension for PyTorch* source code
4199
git clone https://github.com/intel/intel-extension-for-pytorch.git
42100
cd intel-extension-for-pytorch
43-
git checkout xpu-main
101+
git checkout release/xpu/2.7.10
44102
git submodule sync
45103
git submodule update --init --recursive
46104

@@ -56,7 +114,9 @@ python ./tools/env_setup.py --setup --install-pytorch compile --aot <AOT> --onea
56114

57115
conda deactivate
58116
conda activate llm
59-
source ./tools/env_activate.sh [inference|fine-tuning|bitsandbytes]
117+
source ./tools/env_activate.sh [inference|fine-tuning|bitsandbytes|training]
118+
# on Windows, use env_activate.bat instead
119+
call .\tools\env_activate.bat [inference|fine-tuning|bitsandbytes]
60120
```
61121

62122
where <br />
@@ -74,3 +134,5 @@ For inference example scripts, visit the [inference](./inference/) directory.
74134
For fine-tuning example scripts, visit the [fine-tuning](./fine-tuning/) directory.
75135

76136
For fine-tuning with quantized model, visit the [bitsandbytes](./bitsandbytes/) directory.
137+
138+
For fine-tuning with quantized model, visit the [training](./training/) directory.

examples/gpu/llm/inference/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# LLM Inference Overview
22

3-
Here you can find the inference benchmarking scripts for large language models (LLM) text generation. These scripts:
3+
Here you can find the inference examples for large language models (LLM) text generation. These scripts:
44

55
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other Chinese models such as GLM4-9B, Baichuan2-13B and Phi3-mini.
66
- Include both single instance and distributed (DeepSpeed) use cases for FP16 optimization.
@@ -9,7 +9,7 @@ Here you can find the inference benchmarking scripts for large language models (
99

1010
## Validated Models
1111

12-
Currently, only support Transformers 4.44.2. Support for newer versions of Transformers and more models will be available in the future.
12+
Currently, only support Transformers 4.48.3. Support for newer versions of Transformers and more models will be available in the future.
1313

1414
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Core™ Ultra Processors with Intel® Arc™ Graphics | Optimized on Intel® Arc™ B-Series Graphics (B580) |
1515
|---|:---:|:---:|:---:|:---:|:---:|:---:|

examples/gpu/llm/tools/env_activate.bat

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
setlocal enabledelayedexpansion
33

44
:: Usage message
5-
set "MSG_USAGE=Usage: call %~nx0 [inference|fine-tuning|bitsandbytes|training]"
5+
set "MSG_USAGE=Usage: call %~nx0 [inference|fine-tuning|bitsandbytes]"
66

77
:: Check if an argument is provided
88
if "%~1"=="" (

examples/gpu/llm/training/Mixtral/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ We provide examples for `wikitext-2-raw-v1` dataset from Hugging Face.
2323

2424

2525
```bash
26-
export OCL_ICD_VENDORS=/etc/OpenCL/vendors
2726
export TORCH_LLM_ALLREDUCE=1
2827

2928
export model='mistralai/Mistral-7B-v0.1'

0 commit comments

Comments
 (0)