You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Doc[ Update LLM tutorials and docker build version for IPEX 2.7 (#5506)
* update release link version in doc
* update llm readme with prebuilt wheels
* update llm env_activate with training
* remove training in env_activate.bat
---------
Co-authored-by: Ye Ting <[email protected]>
Copy file name to clipboardExpand all lines: docs/tutorials/llm/int4_weight_only_quantization.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -129,9 +129,9 @@ Intel® Extension for PyTorch\* implements Weight-Only Quantization for Intel®
129
129
130
130
### Environment Setup
131
131
132
-
Please refer to the [env setup](https://github.com/intel/intel-extension-for-pytorch/blob/v2.6.10%2Bxpu/examples/gpu/llm/inference/README.md).
132
+
Please refer to the [env setup](https://github.com/intel/intel-extension-for-pytorch/blob/v2.7.10%2Bxpu/examples/gpu/llm/inference/README.md).
133
133
134
-
Example can be found at [Learn WOQ](https://github.com/intel/intel-extension-for-pytorch/tree/v2.6.10%2Bxpu/examples/gpu/llm/inference#learn-to-quantize-llm-and-run-inference).
134
+
Example can be found at [Learn WOQ](https://github.com/intel/intel-extension-for-pytorch/tree/v2.7.10%2Bxpu/examples/gpu/llm/inference#learn-to-quantize-llm-and-run-inference).
135
135
136
136
### Run Weight-Only Quantization LLM on Intel® GPU
Copy file name to clipboardExpand all lines: docs/tutorials/llm/llm_optimize_transformers.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ API documentation is available at [API Docs page](../api_doc.html#ipex.llm.optim
9
9
10
10
## Pseudocode of Common Usage Scenarios
11
11
12
-
The following sections show pseudocode snippets to invoke Intel® Extension for PyTorch\* APIs to work with LLMs. Complete examples can be found at [the Example directory](https://github.com/intel/intel-extension-for-pytorch/tree/v2.6.10%2Bxpu/examples/gpu/llm/inference).
12
+
The following sections show pseudocode snippets to invoke Intel® Extension for PyTorch\* APIs to work with LLMs. Complete examples can be found at [the Example directory](https://github.com/intel/intel-extension-for-pytorch/tree/v2.7.10%2Bxpu/examples/gpu/llm/inference).
Distributed inference can be performed with `DeepSpeed`. Based on original Intel® Extension for PyTorch\* scripts, the following code changes are required.
119
119
120
-
Check Distributed Examples in [LLM example](https://github.com/intel/intel-extension-for-pytorch/tree/v2.6.10%2Bxpu/examples/gpu/llm/inference) for complete codes.
120
+
Check Distributed Examples in [LLM example](https://github.com/intel/intel-extension-for-pytorch/tree/v2.7.10%2Bxpu/examples/gpu/llm/inference) for complete codes.
Copy file name to clipboardExpand all lines: examples/gpu/llm/README.md
+71-9Lines changed: 71 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,34 +1,92 @@
1
1
# LLM Optimization Overview
2
2
3
-
Here you can find benchmarking scripts for large language models (LLM) text generation. These scripts:
3
+
Here you can find examples for large language models (LLM) text generation. These scripts:
4
4
5
-
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other models such as Baichuan2-13B and Phi3-mini.
5
+
> [!NOTE]
6
+
> New Llama models like Llama3.2-1B, Llama3.2-3B and Llama3.3-7B are also supported from release v2.7.10+xpu.
7
+
8
+
- Include both inference/finetuning(lora)/bitsandbytes(qlora-finetuning)/training.
6
9
- Include both single instance and distributed (DeepSpeed) use cases for FP16 optimization.
10
+
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other models such as Baichuan2-13B and Phi3-mini.
7
11
- Cover model generation inference with low precision cases for different models with best performance and accuracy (fp16 AMP and weight only quantization)
8
12
9
13
## Environment Setup
10
14
11
-
### [Recommended] Docker-based environment setup with compilation from source
15
+
### [Recommended] Docker-based environment setup with prebuilt wheel files
16
+
17
+
```bash
18
+
# Get the Intel® Extension for PyTorch* source code
### Conda-based environment setup with prebuilt wheel files
40
+
41
+
Make sure the driver packages are installed. Refer to [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.7.10%2Bxpu&os=linux%2Fwsl2&package=pip).
42
+
43
+
```bash
44
+
45
+
# Get the Intel® Extension for PyTorch* source code
Copy file name to clipboardExpand all lines: examples/gpu/llm/inference/README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# LLM Inference Overview
2
2
3
-
Here you can find the inference benchmarking scripts for large language models (LLM) text generation. These scripts:
3
+
Here you can find the inference examples for large language models (LLM) text generation. These scripts:
4
4
5
5
- Support Llama, GPT-J, Qwen, OPT, Bloom model families and some other Chinese models such as GLM4-9B, Baichuan2-13B and Phi3-mini.
6
6
- Include both single instance and distributed (DeepSpeed) use cases for FP16 optimization.
@@ -9,7 +9,7 @@ Here you can find the inference benchmarking scripts for large language models (
9
9
10
10
## Validated Models
11
11
12
-
Currently, only support Transformers 4.44.2. Support for newer versions of Transformers and more models will be available in the future.
12
+
Currently, only support Transformers 4.48.3. Support for newer versions of Transformers and more models will be available in the future.
13
13
14
14
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Core™ Ultra Processors with Intel® Arc™ Graphics | Optimized on Intel® Arc™ B-Series Graphics (B580) |
0 commit comments