diff --git a/docs/build/eps.md b/docs/build/eps.md index 3edacac1b37dc..a5b4453524d09 100644 --- a/docs/build/eps.md +++ b/docs/build/eps.md @@ -1,836 +1,837 @@ ---- -title: Build with different EPs -parent: Build ONNX Runtime -description: Learm how to build ONNX Runtime from source for different execution providers -nav_order: 3 -redirect_from: /docs/how-to/build/eps ---- - -# Build ONNX Runtime with Execution Providers -{: .no_toc } - -## Contents -{: .no_toc } - -* TOC placeholder -{:toc} - -## Execution Provider Shared Libraries - -The oneDNN, TensorRT, OpenVINO™, CANN, and QNN providers are built as shared libraries vs being statically linked into the main onnxruntime. This enables them to be loaded only when needed, and if the dependent libraries of the provider are not installed onnxruntime will still run fine, it just will not be able to use that provider. For non shared library providers, all dependencies of the provider must exist to load onnxruntime. - -### Built files -{: .no_toc } - -On Windows, shared provider libraries will be named 'onnxruntime_providers_\*.dll' (for example onnxruntime_providers_openvino.dll). -On Unix, they will be named 'libonnxruntime_providers_\*.so' -On Mac, they will be named 'libonnxruntime_providers_\*.dylib'. - -There is also a shared library that shared providers depend on called onnxruntime_providers_shared (with the same naming convension applied as above). - -Note: It is not recommended to put these libraries in a system location or added to a library search path (like LD_LIBRARY_PATH on Unix). If multiple versions of onnxruntime are installed on the system this can make them find the wrong libraries and lead to undefined behavior. - -### Loading the shared providers -{: .no_toc } - -Shared provider libraries are loaded by the onnxruntime code (do not load or depend on them in your client code). The API for registering shared or non shared providers is identical, the difference is that shared ones will be loaded at runtime when the provider is added to the session options (through a call like OrtSessionOptionsAppendExecutionProvider_OpenVINO or SessionOptionsAppendExecutionProvider_OpenVINO in the C API). -If a shared provider library cannot be loaded (if the file doesn't exist, or its dependencies don't exist or not in the path) then an error will be returned. - -The onnxruntime code will look for the provider shared libraries in the same location as the onnxruntime shared library is (or the executable statically linked to the static library version). - ---- - -## CUDA - -### Prerequisites -{: .no_toc } - -* Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn) - * The CUDA execution provider for ONNX Runtime is built and tested with CUDA 11.8, 12.2 and cuDNN 8.9. Check [here](../execution-providers/CUDA-ExecutionProvider.md#requirements) for more version information. - * The path to the CUDA installation must be provided via the CUDA_HOME environment variable, or the `--cuda_home` parameter. The installation directory should contain `bin`, `include` and `lib` sub-directories. - * The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found. - * The path to the cuDNN installation must be provided via the CUDNN_HOME environment variable, or `--cudnn_home` parameter. In Windows, the installation directory should contain `bin`, `include` and `lib` sub-directories. - * cuDNN 8.* requires ZLib. Follow the [cuDNN 8.9 installation guide](https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-890/install-guide/index.html) to install zlib in Linux or Windows. - * In Windows, the path to the cuDNN bin directory must be added to the PATH environment variable so that cudnn64_8.dll is found. - -### Build Instructions -{: .no_toc } - -#### Windows -``` -.\build.bat --use_cuda --cudnn_home --cuda_home -``` - -#### Linux -``` -./build.sh --use_cuda --cudnn_home --cuda_home -``` - -A Dockerfile is available [here](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles#cuda). - -### Build Options - -To specify GPU architectures (see [Compute Capability](https://developer.nvidia.com/cuda-gpus)), you can append parameters like `--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=80;86;89`. - -With `--cmake_extra_defines onnxruntime_USE_CUDA_NHWC_OPS=ON`, the CUDA EP can be compiled with additional NHWC ops. This option is not enabled by default due to the small amount of supported NHWC operators. - -Another very helpful CMake build option is to build with NVTX support (`--cmake_extra_defines onnxruntime_ENABLE_NVTX_PROFILE=ON`) that will enable much easier profiling using [Nsight Systems](https://developer.nvidia.com/nsight-systems) and correlates CUDA kernels with their actual ONNX operator. - -`--enable_cuda_line_info` or `--cmake_extra_defines onnxruntime_ENABLE_CUDA_LINE_NUMBER_INFO=ON` will enable [NVCC generation of line-number information for device code](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#generate-line-info-lineinfo). It might be helpful when you run [Compute Sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) tools on CUDA kernels. - -If your Windows machine has multiple versions of CUDA installed and you want to use an older version of CUDA, you need append parameters like `--cuda_version `. - -When your build machine has many CPU cores and less than 64 GB memory, there is chance of out of memory error like `nvcc error : 'cicc' died due to signal 9`. The solution is to limit number of parallel NVCC threads with parameters like `--parallel 4 --nvcc_threads 1`. - -### Notes on older versions of ONNX Runtime, CUDA and Visual Studio -{: .no_toc } - -* Depending on compatibility between the CUDA, cuDNN, and Visual Studio versions you are using, you may need to explicitly install an earlier version of the MSVC toolset. -* For older version of ONNX Runtime and CUDA, and Visual Studio: - * CUDA 10.0 is [known to work](https://devblogs.microsoft.com/cppblog/cuda-10-is-now-available-with-support-for-the-latest-visual-studio-2017-versions/) with toolsets from 14.11 up to 14.16 (Visual Studio 2017 15.9), and should continue to work with future Visual Studio versions - * CUDA 9.2 is known to work with the 14.11 MSVC toolset (Visual Studio 15.3 and 15.4) - * To install the 14.11 MSVC toolset, see [this page](https://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017). - * To use the 14.11 toolset with a later version of Visual Studio 2017 you have two options: - 1. Setup the Visual Studio environment variables to point to the 14.11 toolset by running vcvarsall.bat, prior to running the build script. e.g. if you have VS2017 Enterprise, an x64 build would use the following command `"C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" amd64 -vcvars_ver=14.11` For convenience, .\build.amd64.1411.bat will do this and can be used in the same way as .\build.bat. e.g. ` .\build.amd64.1411.bat --use_cuda` - - 2. Alternatively, if you have CMake 3.13 or later you can specify the toolset version via the `--msvc_toolset` build script parameter. e.g. `.\build.bat --msvc_toolset 14.11` - -* If you have multiple versions of CUDA installed on a Windows machine and are building with Visual Studio, CMake will use the build files for the highest version of CUDA it finds in the BuildCustomization folder. -e.g. C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\VC\VCTargets\BuildCustomizations\. -If you want to build with an earlier version, you must temporarily remove the 'CUDA x.y.*' files for later versions from this directory. - ---- - -## TensorRT - -See more information on the TensorRT Execution Provider [here](../execution-providers/TensorRT-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - - * Follow [instructions for CUDA execution provider](#cuda) to install CUDA and cuDNN, and setup environment variables. - * Follow [instructions for installing TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/latest/installing-tensorrt/installing.html) - * The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 10.8. - * The path to TensorRT installation must be provided via the `--tensorrt_home` parameter. - * ONNX Runtime uses [TensorRT built-in parser](https://developer.nvidia.com/tensorrt/download) from `tensorrt_home` by default. - * To use open-sourced [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt/tree/main) parser instead, add `--use_tensorrt_oss_parser` parameter in build commands below. - * The default version of open-sourced onnx-tensorrt parser is specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt). - * To specify a different version of onnx-tensorrt parser: - * Select the commit of [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt/commits) that you preferred; - * Run `sha1sum` command with downloaded onnx-tensorrt zip file to acquire the SHA1 hash - * Update [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt) with updated onnx-tensorrt commit and hash info. - * Please make sure TensorRT built-in parser/open-sourced onnx-tensorrt specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt) are **version-matched**, if enabling `--use_tensorrt_oss_parser`. - * i.e It's version-matched if assigning `tensorrt_home` with path to TensorRT-10.9 built-in binaries and onnx-tensorrt [10.9-GA branch](https://github.com/onnx/onnx-tensorrt/tree/release/10.9-GA) specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt). - - -### **[Note to ORT 1.21.0 open-sourced parser users]** - -* ORT 1.21.0 links against onnx-tensorrt 10.8-GA, which requires upcoming onnx 1.18. - * Here's a temporarily fix to preview on onnx-tensorrt 10.8-GA (or newer) when building ORT 1.21.0: - * Replace the [onnx line in cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/rel-1.21.0/cmake/deps.txt#L38) - with `onnx;https://github.com/onnx/onnx/archive/f22a2ad78c9b8f3bd2bb402bfce2b0079570ecb6.zip;324a781c31e30306e30baff0ed7fe347b10f8e3c` - * Download [this](https://github.com/microsoft/onnxruntime/blob/7b2733a526c12b5ef4475edd47fd9997ebc2b2c6/cmake/patches/onnx/onnx.patch) as raw file and save file to [cmake/patches/onnx/onnx.patch](https://github.com/microsoft/onnxruntime/blob/rel-1.21.0/cmake/patches/onnx/onnx.patch) (do not copy/paste from browser, as it might alter line break type) - * Build ORT 1.21.0 with trt-related flags above (including `--use_tensorrt_oss_parser`) - -### Build Instructions -{: .no_toc } - -#### Windows -```bash -# to build with tensorrt built-in parser -.\build.bat --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home --cmake_generator "Visual Studio 17 2022" - -# to build with specific version of open-sourced onnx-tensorrt parser configured in cmake/deps.txt -.\build.bat --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home --use_tensorrt_oss_parser --cmake_generator "Visual Studio 17 2022" -``` - -#### Linux - -```bash -# to build with tensorrt built-in parser -./build.sh --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home - -# to build with specific version of open-sourced onnx-tensorrt parser configured in cmake/deps.txt -./build.sh --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --use_tensorrt_oss_parser --tensorrt_home --skip_submodule_sync -``` - -Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#tensorrt) - -**Note** Building with `--use_tensorrt_oss_parser` with TensorRT 8.X requires additional flag --cmake_extra_defines onnxruntime_USE_FULL_PROTOBUF=ON - ---- - -## NVIDIA Jetson TX1/TX2/Nano/Xavier/Orin - -### Build Instructions -{: .no_toc } - -These instructions are for the latest [JetPack SDK](https://developer.nvidia.com/embedded/jetpack). - -1. Clone the ONNX Runtime repo on the Jetson host - - ```bash - git clone --recursive https://github.com/microsoft/onnxruntime - ``` - -2. Specify the CUDA compiler, or add its location to the PATH. - - 1. JetPack 5.x users can upgrade to the latest CUDA release without updating the JetPack version or Jetson Linux BSP (Board Support Package). - - 1. For JetPack 5.x users, CUDA>=11.8 and GCC>9.4 are required to be installed on and after ONNX Runtime 1.17. - - 2. Check [this official blog](https://developer.nvidia.com/blog/simplifying-cuda-upgrades-for-nvidia-jetson-users/) for CUDA upgrade instruction (CUDA 12.2 has been verified on JetPack 5.1.2 on Jetson Xavier NX). - - 1. If there's no `libnvcudla.so` under `/usr/local/cuda-12.2/compat`: `sudo apt-get install -y cuda-compat-12-2` and add `export LD_LIBRARY_PATH="/usr/local/cuda-12.2/lib64:/usr/local/cuda-12.2/compat:$LD_LIBRARY_PATH"` to `~/.bashrc`. - - 3. Check [here](https://developer.nvidia.com/cuda-gpus#collapse5) for compute capability datasheet. - - 2. CMake can't automatically find the correct `nvcc` if it's not in the `PATH`. `nvcc` can be added to `PATH` via: - - ```bash - export PATH="/usr/local/cuda/bin:${PATH}" - ``` - - or: - - ```bash - export CUDACXX="/usr/local/cuda/bin/nvcc" - ``` - - 3. Update TensorRT libraries - - 1. Jetpack 5.x supports up to TensorRT 8.5. Jetpack 6.x are equipped with TensorRT 8.6-10.3. - - 2. Jetpack 6.x users can download latest TensorRT 10 TAR package for **jetpack** on [TensorRT SDK website](https://developer.nvidia.com/tensorrt/download/10x). - - 3. Check [here](../execution-providers/TensorRT-ExecutionProvider.md#requirements) for TensorRT/CUDA support matrix among all ONNX Runtime versions. - -3. Install the ONNX Runtime build dependencies on the Jetpack host: - - ```bash - sudo apt install -y --no-install-recommends \ - build-essential software-properties-common libopenblas-dev \ - libpython3.10-dev python3-pip python3-dev python3-setuptools python3-wheel - ``` - -4. Cmake is needed to build ONNX Runtime. Please check the minimum required CMake version [here](https://github.com/microsoft/onnxruntime/blob/main/cmake/CMakeLists.txt#L6). Download from https://cmake.org/download/ and add cmake executable to `PATH` to use it. - -5. Build the ONNX Runtime Python wheel: - - 1. Build `onnxruntime-gpu` wheel with CUDA and TensorRT support (update paths to CUDA/CUDNN/TensorRT libraries if necessary): - - ```bash - ./build.sh --config Release --update --build --parallel --build_wheel \ - --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu \ - --tensorrt_home /usr/lib/aarch64-linux-gnu - ``` - -​ Notes: - -* By default, `onnxruntime-gpu` wheel file will be captured under `path_to/onnxruntime/build/Linux/Release/dist/` (build path can be customized by adding `--build_dir` followed by a customized path to the build command above). - -* Append `--skip_tests --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' 'onnxruntime_BUILD_UNIT_TESTS=OFF' 'onnxruntime_USE_FLASH_ATTENTION=OFF' -'onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION=OFF'` to the build command to opt out optional features and reduce build time. - -* For a portion of Jetson devices like the Xavier series, higher power mode involves more cores (up to 6) to compute but it consumes more resource when building ONNX Runtime. Set `--parallel 1` in the build command if OOM happens and system is hanging. - -## oneDNN - -See more information on oneDNN (formerly DNNL) [here](../execution-providers/oneDNN-ExecutionProvider.md). - -### Build Instructions -{: .no_toc } - - -The DNNL execution provider can be built for Intel CPU or GPU. To build for Intel GPU, install [Intel SDK for OpenCL Applications](https://software.intel.com/content/www/us/en/develop/tools/opencl-sdk.html) or build OpenCL from [Khronos OpenCL SDK](https://github.com/KhronosGroup/OpenCL-SDK). Pass in the OpenCL SDK path as dnnl_opencl_root to the build command. Install the latest GPU driver - [Windows graphics driver](https://downloadcenter.intel.com/product/80939/Graphics), [Linux graphics compute runtime and OpenCL driver](https://github.com/intel/compute-runtime/releases). - -For CPU -#### Windows -`.\build.bat --use_dnnl` - -#### Linux -`./build.sh --use_dnnl` - -For GPU -#### Windows - -`.\build.bat --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "c:\program files (x86)\intelswtools\sw_dev_tools\opencl\sdk"` -#### Linux - -`./build.sh --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "/opt/intel/sw_dev_tools/opencl-sdk"` - -#### Build Phython Wheel - - -OneDNN EP build supports building Python wheel for both Windows and linux using flag --build_wheel - -`.\build.bat --config RelWithDebInfo --parallel --build_shared_lib --cmake_generator "Visual Studio 16 2019" --build_wheel --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "C:\Program Files (x86)\IntelSWTools\system_studio_2020\OpenCL\sdk"` - ---- - -## OpenVINO - -See more information on the OpenVINO™ Execution Provider [here](../execution-providers/OpenVINO-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - -1. Install the OpenVINO™ offline/online installer from Intel® Distribution of OpenVINO™TM Toolkit **Release 2024.3** for the appropriate OS and target hardware: - * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE). - * [Linux - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE) - - Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions. - - *2024.5 is the current recommended OpenVINO™ version. [OpenVINO™ 2024.5](https://docs.openvino.ai/2024/index.html) is minimal OpenVINO™ version requirement.* - -2. Configure the target hardware with specific follow on instructions: - * To configure Intel® Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#windows), [Linux](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#linux) - - -3. Initialize the OpenVINO™ environment by running the setupvars script as shown below. This is a required step: - * For Windows: - ``` - C:\\setupvars.bat - ``` - * For Linux: - ``` - $ source /setupvars.sh - ``` - **Note:** If you are using a dockerfile to use OpenVINO™ Execution Provider, sourcing OpenVINO™ won't be possible within the dockerfile. You would have to explicitly set the LD_LIBRARY_PATH to point to OpenVINO™ libraries location. Refer our [dockerfile](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles/Dockerfile.openvino). - -### Build Instructions -{: .no_toc } - -#### Windows - -``` -.\build.bat --config RelWithDebInfo --use_openvino --build_shared_lib --build_wheel -``` - -*Note: The default Windows CMake Generator is Visual Studio 2019, but you can also use the newer Visual Studio 2022 by passing `--cmake_generator "Visual Studio 17 2022"` to `.\build.bat`* - -#### Linux - -```bash -./build.sh --config RelWithDebInfo --use_openvino --build_shared_lib --build_wheel -``` - -* `--build_wheel` Creates python wheel file in dist/ folder. Enable it when building from source. -* `--use_openvino` builds the OpenVINO™ Execution Provider in ONNX Runtime. -* ``: Specifies the default hardware target for building OpenVINO™ Execution Provider. This can be overriden dynamically at runtime with another option (refer to [OpenVINO™-ExecutionProvider](../execution-providers/OpenVINO-ExecutionProvider.md#summary-of-options) for more details on dynamic device selection). Below are the options for different Intel target devices. - -Refer to [Intel GPU device naming convention](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#device-naming-convention) for specifying the correct hardware target in cases where both integrated and discrete GPU's co-exist. - -| Hardware Option | Target Device | -| --------------- | ------------------------| -| CPU | Intel® CPUs | -| GPU | Intel® Integrated Graphics | -| GPU.0 | Intel® Integrated Graphics | -| GPU.1 | Intel® Discrete Graphics | -| NPU | Intel® Neural Processor Unit | -| HETERO:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... | All Intel® silicons mentioned above | -| MULTI:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... | All Intel® silicons mentioned above | -| AUTO:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... | All Intel® silicons mentioned above | - -Specifying Hardware Target for HETERO or Multi or AUTO device Build: - -HETERO:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... -The DEVICE_TYPE can be any of these devices from this list ['CPU','GPU', 'NPU'] - -A minimum of two device's should be specified for a valid HETERO or MULTI or AUTO device build. - -``` -Example's: HETERO:GPU,CPU or AUTO:GPU,CPU or MULTI:GPU,CPU -``` - -#### Disable subgraph partition Feature -* Builds the OpenVINO™ Execution Provider in ONNX Runtime with sub graph partitioning disabled. - -* With this option enabled. Fully supported models run on OpenVINO Execution Provider else they completely fall back to default CPU EP. - -* To enable this feature during build time. Use `--use_openvino ` `_NO_PARTITION` - -``` -Usage: --use_openvino CPU_FP32_NO_PARTITION or --use_openvino GPU_FP32_NO_PARTITION or - --use_openvino GPU_FP16_NO_PARTITION -``` - -For more information on OpenVINO™ Execution Provider's ONNX Layer support, Topology support, and Intel hardware enabled, please refer to the document [OpenVINO™-ExecutionProvider](../execution-providers/OpenVINO-ExecutionProvider.md) - ---- - -## QNN -See more information on the QNN execution provider [here](../execution-providers/QNN-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } -* Install the Qualcomm AI Engine Direct SDK (Qualcomm Neural Network SDK) [Linux/Android/Windows](https://qpm.qualcomm.com/main/tools/details/qualcomm_ai_engine_direct) - -* Install [cmake-3.28](https://cmake.org/download/) or higher. - -* Install Python 3.10 or higher. - * [Python 3.12 for Windows Arm64](https://www.python.org/ftp/python/3.12.9/python-3.12.9-arm64.exe) - * [Python 3.12 for Windows x86-64](https://www.python.org/ftp/python/3.12.9/python-3.12.9-amd64.exe) - * Note: Windows on Arm supports a x86-64 Python environment via emulation. Ensure that the Arm64 Python environment is actived for a native Arm64 ONNX Runtime build. - -* Checkout the source tree: - - ```bash - git clone --recursive https://github.com/Microsoft/onnxruntime.git - cd onnxruntime - ``` - -* Install ONNX Runtime Python dependencies. - ```bash - pip install -r requirements.txt - ``` - -### Build Options -{: .no_toc } - -* `--use_qnn [QNN_LIBRARY_KIND]`: Builds the QNN Execution provider. `QNN_LIBRARY_KIND` is optional and specifies whether to build the QNN Execution Provider as a shared library (default) or static library. - * `--use_qnn` or `--use_qnn shared_lib`: Builds the QNN Execution Provider as a shared library. - * `--use_qnn static_lib`: Builds QNN Execution Provider as a static library linked into ONNX Runtime. This is required for Android builds. -* `--qnn_home QNN_SDK_PATH`: The path to the Qualcomm AI Engine Direct SDK. - * Example on Windows: `--qnn_home 'C:\Qualcomm\AIStack\QAIRT\2.31.0.250130'` - * Example on Linux: `--qnn_home /opt/qcom/aistack/qairt/2.31.0.250130` -* `--build_wheel`: Enables Python bindings and builds Python wheel. -* `--arm64`: Cross-compile for Arm64. -* `--arm64ec`: Cross-compile for Arm64EC. Arm64EC code runs with native performance and is interoperable with x64 code running under emulation within the same process on a Windows on Arm device. Refer to the [Arm64EC Overview](https://learn.microsoft.com/en-us/windows/arm/arm64ec). - -Run `python tools/ci_build/build.py --help` for a description of all available build options. - -### Build Instructions -{: .no_toc } - -#### Windows (native x86-64 or native Arm64) -``` -.\build.bat --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --skip_tests --build_dir build\Windows -``` - -Notes: -* Not all Qualcomm backends (e.g., HTP) are supported for model execution on a native x86-64 build. Refer to the [Qualcomm SDK backend documentation](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/backend.html) for more information. -* Even if a Qualcomm backend does not support execution on x86-64, the QNN Execution provider may be able to [generate compiled models](../execution-providers/QNN-ExecutionProvider.md#qnn-context-binary-cache-feature) for the Qualcomm backend. - -#### Windows (Arm64 cross-compile target) -``` -.\build.bat --arm64 --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --build_dir build\Windows -``` - -#### Windows (Arm64EC cross-compile target) -``` -.\build.bat --arm64ec --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --build_dir build\Windows -``` - -#### Windows (Arm64X cross-compile target) -Use the `build_arm64x.bat` script to build Arm64X binaries. Arm64X binaries bundle both Arm64 and Arm64EC code, making Arm64X compatible with both Arm64 and Arm64EC processes on a Windows on Arm device. Refer to the [Arm64X PE files overview](https://learn.microsoft.com/en-us/windows/arm/arm64x-pe). - -``` -.\build_arm64x.bat --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --cmake_generator "Visual Studio 17 2022" --config Release --parallel -``` -Notes: -* Do not specify a `--build_dir` option because `build_arm64x.bat` sets specific build directories. -* The above command places Arm64X binaries in the `.\build\arm64ec-x\Release\Release\` directory. - -#### Linux (x86_64) -``` -./build.sh --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --config Release --parallel --skip_tests --build_dir build/Linux -``` - -#### Android (cross-compile): - -Please reference [Build OnnxRuntime For Android](android.md) -``` -# on Windows -.\build.bat --build_shared_lib --android --config Release --parallel --use_qnn static_lib --qnn_home [QNN_SDK_PATH] --android_sdk_path [android_SDK path] --android_ndk_path [android_NDK path] --android_abi arm64-v8a --android_api [api-version] --cmake_generator Ninja --build_dir build\Android - -# on Linux -./build.sh --build_shared_lib --android --config Release --parallel --use_qnn static_lib --qnn_home [QNN_SDK_PATH] --android_sdk_path [android_SDK path] --android_ndk_path [android_NDK path] --android_abi arm64-v8a --android_api [api-version] --cmake_generator Ninja --build_dir build/Android -``` - ---- - -## DirectML -See more information on the DirectML execution provider [here](../execution-providers/DirectML-ExecutionProvider.md). -### Windows -{: .no_toc } - -``` -.\build.bat --use_dml -``` -### Notes -{: .no_toc } - -The DirectML execution provider supports building for both x64 and x86 architectures. DirectML is only supported on Windows. - ---- - -## Arm Compute Library -See more information on the ACL Execution Provider [here](../execution-providers/community-maintained/ACL-ExecutionProvider.md). - -### Build Instructions -{: .no_toc } - -You must first build Arm Compute Library 24.07 for your platform as described in the [documentation](https://github.com/ARM-software/ComputeLibrary). -See [here](inferencing.md#arm) for information on building for Arm®-based devices. - -Add the following options to `build.sh` to enable the ACL Execution Provider: - -``` ---use_acl --acl_home=/path/to/ComputeLibrary --acl_libs=/path/to/ComputeLibrary/build -``` - -## Arm NN - -See more information on the Arm NN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - - -* Supported backend: i.MX8QM Armv8 CPUs -* Supported BSP: i.MX8QM BSP - * Install i.MX8QM BSP: `source fsl-imx-xwayland-glibc-x86_64-fsl-image-qt5-aarch64-toolchain-4*.sh` -* Set up the build environment - -```bash -source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux -alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake" -``` - -* See [here](inferencing.md#arm) for information on building for Arm-based devices - -### Build Instructions -{: .no_toc } - - -```bash -./build.sh --use_armnn -``` - -The Relu operator is set by default to use the CPU execution provider for better performance. To use the Arm NN implementation build with --armnn_relu flag - -```bash -./build.sh --use_armnn --armnn_relu -``` - -The Batch Normalization operator is set by default to use the CPU execution provider. To use the Arm NN implementation build with --armnn_bn flag - -```bash -./build.sh --use_armnn --armnn_bn -``` - -To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the Arm NN home directory and build directory respectively. -The Arm Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively. - -```bash -./build.sh --use_armnn --armnn_home /path/to/armnn --armnn_libs /path/to/armnn/build --acl_home /path/to/ComputeLibrary --acl_libs /path/to/acl/build -``` - ---- - -## RKNPU -See more information on the RKNPU Execution Provider [here](../execution-providers/community-maintained/RKNPU-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - - -* Supported platform: RK1808 Linux -* See [here](inferencing.md#arm) for information on building for Arm-based devices -* Use gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu instead of gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf, and modify CMAKE_CXX_COMPILER & CMAKE_C_COMPILER in tool.cmake: - -``` -set(CMAKE_CXX_COMPILER aarch64-linux-gnu-g++) -set(CMAKE_C_COMPILER aarch64-linux-gnu-gcc) -``` - -### Build Instructions -{: .no_toc } - -#### Linux - -1. Download [rknpu_ddk](https://github.com/airockchip/rknpu_ddk.git) to any directory. - -2. Build ONNX Runtime library and test: - - ```bash - ./build.sh --arm --use_rknpu --parallel --build_shared_lib --build_dir build_arm --config MinSizeRel --cmake_extra_defines RKNPU_DDK_PATH= CMAKE_TOOLCHAIN_FILE= ONNX_CUSTOM_PROTOC_EXECUTABLE= - ``` - -3. Deploy ONNX runtime and librknpu_ddk.so on the RK1808 board: - - ```bash - libonnxruntime.so.1.2.0 - onnxruntime_test_all - rknpu_ddk/lib64/librknpu_ddk.so - ``` - ---- - -## AMD Vitis AI -See more information on the Vitis AI Execution Provider [here](../execution-providers/Vitis-AI-ExecutionProvider.md). - -### Windows -{: .no_toc } - -From the Visual Studio Developer Command Prompt or Developer PowerShell, execute the following command: - -``` -.\build.bat --use_vitisai --build_shared_lib --parallel --config Release -``` - -If you wish to leverage the Python APIs, please include the `--build_wheel` flag: - -``` -.\build.bat --use_vitisai --build_shared_lib --parallel --config Release --build_wheel -``` - -You can override also override the installation location by specifying CMAKE_INSTALL_PREFIX via the cmake_extra_defines parameter. -e.g. - -``` -.\build.bat --use_vitisai --build_shared_lib --parallel --config Release --cmake_extra_defines CMAKE_INSTALL_PREFIX=D:\onnxruntime -``` -### Linux -{: .no_toc } - -Currently Linux support is only enabled for AMD Adapable SoCs. Please refer to the guidance [here](../execution-providers/Vitis-AI-ExecutionProvider.md#amd-adaptable-soc-installation) for SoC targets. - ---- - -## AMD MIGraphX - -See more information on the MIGraphX Execution Provider [here](../execution-providers/MIGraphX-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - -* Install [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/) - * The MIGraphX execution provider for ONNX Runtime is built and tested with ROCm6.3.1 -* Install [MIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX) - * The path to MIGraphX installation must be provided via the `--migraphx_home parameter`. - -### Build Instructions -{: .no_toc } - -#### Linux - -```bash -./build.sh --config --parallel --use_migraphx --migraphx_home -``` - -Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles#migraphx). - -#### Build Phython Wheel - -`./build.sh --config Release --build --build_wheel --parallel --use_migraphx --migraphx_home /opt/rocm` - -Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```. - ---- - -## AMD ROCm - -See more information on the ROCm Execution Provider [here](../execution-providers/ROCm-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - -* Install [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/) - * The ROCm execution provider for ONNX Runtime is built and tested with ROCm6.3.1 - -### Build Instructions -{: .no_toc } - -#### Linux - -```bash -./build.sh --config --parallel --use_rocm --rocm_home -``` - -Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#rocm). - -#### Build Phython Wheel - -`./build.sh --config Release --build --build_wheel --parallel --use_rocm --rocm_home /opt/rocm` - -Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```. - ---- - -## NNAPI - -Usage of NNAPI on Android platforms is via the NNAPI Execution Provider (EP). - -See the [NNAPI Execution Provider](../execution-providers/NNAPI-ExecutionProvider.md) documentation for more details. - -The pre-built ONNX Runtime Mobile package for Android includes the NNAPI EP. - -If performing a custom build of ONNX Runtime, support for the NNAPI EP or CoreML EP must be enabled when building. - -### Create a minimal build with NNAPI EP support - -Please see [the instructions](./android.md) for setting up the Android environment required to build. The Android build can be cross-compiled on Windows or Linux. - -Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: - -* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the NNAPI EP. -* Add `--use_nnapi` to include the NNAPI EP in the build - -#### Example build commands with the NNAPI EP enabled - -Windows example: - -```dos -.\build.bat --config MinSizeRel --android --android_sdk_path D:\Android --android_ndk_path D:\Android\ndk\21.1.6352462\ --android_abi arm64-v8a --android_api 29 --cmake_generator Ninja --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config -``` - -Linux example: - -```bash -./build.sh --config MinSizeRel --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config ` -``` - -## CoreML - -Usage of CoreML on iOS and macOS platforms is via the CoreML EP. - -See the [CoreML Execution Provider](../execution-providers/CoreML-ExecutionProvider.md) documentation for more details. - -The pre-built ONNX Runtime Mobile package for iOS includes the CoreML EP. - -### Create a minimal build with CoreML EP support - -Please see [the instructions](./ios.md) for setting up the iOS environment required to build. The iOS/macOS build must be performed on a mac machine. - -Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: - -* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the CoreML EP. -* Add `--use_coreml` to include the CoreML EP in the build - -## XNNPACK - -Usage of XNNPACK on Android/iOS/Windows/Linux platforms is via the XNNPACK EP. - -See the [XNNPACK Execution Provider](../execution-providers/Xnnpack-ExecutionProvider.md) documentation for more details. - -The pre-built ONNX Runtime package([`onnxruntime-android`](https://mvnrepository.com/artifact/com.microsoft.onnxruntime/onnxruntime-android)) for Android includes the XNNPACK EP. - -The pre-built ONNX Runtime Mobile package for iOS, `onnxruntime-c` and `onnxruntime-objc` in [CocoaPods](https://cocoapods.org/), includes the XNNPACK EP. (Package `onnxruntime-objc` with XNNPACK will be available since 1.14.) - - -If performing a custom build of ONNX Runtime, support for the XNNPACK EP must be enabled when building. - -### Build for Android -#### Create a minimal build with XNNPACK EP support - -Please see [the instructions](./android.md) for setting up the Android environment required to build. The Android build can be cross-compiled on Windows or Linux. - -Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: - -* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the XNNPACK EP. -* Add `--use_xnnpack` to include the XNNPACK EP in the build - -##### Example build commands with the XNNPACK EP enabled - -Windows example: - -```bash -.\build.bat --config MinSizeRel --android --android_sdk_path D:\Android --android_ndk_path D:\Android\ndk\21.1.6352462\ --android_abi arm64-v8a --android_api 29 --cmake_generator Ninja --minimal_build extended --use_xnnpack --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config -``` - -Linux example: - -```bash -./build.sh --config MinSizeRel --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --minimal_build extended --use_xnnpack --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config ` -``` -If you don't mind MINIMAL build, you can use the following command to build XNNPACK EP for Android: -Linux example: -```bash -./build.sh --cmake_generator "Ninja" --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --use_xnnpack -``` -### Build for iOS (available since 1.14) -A Mac machine is required to build package for iOS. Please follow this [guide](./ios.md) to set up environment firstly. -#### Create a minimal build with XNNPACK EP support - -Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: - -* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the XNNPACK EP. -* Add `--use_xnnpack` to include the XNNPACK EP in the build - -```dos -./build.sh --config --use_xcode \ - --ios --ios_sysroot iphoneos --osx_arch arm64 --apple_deploy_target --use_xnnpack --minimal_build extended --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config -``` - -### Build for Windows -```dos -.\build.bat --config --use_xnnpack -``` -### Build for Linux -```bash -./build.sh --config --use_xnnpack -``` - ---- - -## CANN - -See more information on the CANN Execution Provider [here](../execution-providers/community-maintained/CANN-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - -1. Install the CANN Toolkit for the appropriate OS and target hardware by following [documentation](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/51RC1alphaX/softwareinstall/instg/atlasdeploy_03_0017.html) for detailed instructions, please. - -2. Initialize the CANN environment by running the script as shown below. - - ```bash - # Default path, change it if needed. - source /usr/local/Ascend/ascend-toolkit/set_env.sh - ``` - -### Build Instructions -{: .no_toc } - -#### Linux - -```bash -./build.sh --config --build_shared_lib --parallel --use_cann -``` - -### Notes -{: .no_toc } - -* The CANN execution provider supports building for both x64 and aarch64 architectures. -* CANN excution provider now is only supported on Linux. - -## Azure - -See the [Azure Execution Provider](../execution-providers/Azure-ExecutionProvider.md) documentation for more details. - -### Prerequisites - -For Linux, before building, please: - -* install openssl dev package into the system, which is openssl-dev for redhat and libssl-dev for ubuntu. -* if have multiple openssl dev versions installed in the system, please set environment variable "OPENSSL_ROOT_DIR" to the desired version, for example: - -```base -set OPENSSL_ROOT_DIR=/usr/local/ssl3.x/ -``` - -### Build Instructions - -#### Windows - -```dos -build.bat --config --build_shared_lib --build_wheel --use_azure -``` - -#### Linux - -```bash -./build.sh --config --build_shared_lib --build_wheel --use_azure -``` +--- +title: Build with different EPs +parent: Build ONNX Runtime +description: Learm how to build ONNX Runtime from source for different execution providers +nav_order: 3 +redirect_from: /docs/how-to/build/eps +--- + +# Build ONNX Runtime with Execution Providers +{: .no_toc } + +## Contents +{: .no_toc } + +* TOC placeholder +{:toc} + +## Execution Provider Shared Libraries + +The oneDNN, TensorRT, OpenVINO™, CANN, and QNN providers are built as shared libraries vs being statically linked into the main onnxruntime. This enables them to be loaded only when needed, and if the dependent libraries of the provider are not installed onnxruntime will still run fine, it just will not be able to use that provider. For non shared library providers, all dependencies of the provider must exist to load onnxruntime. + +### Built files +{: .no_toc } + +On Windows, shared provider libraries will be named 'onnxruntime_providers_\*.dll' (for example onnxruntime_providers_openvino.dll). +On Unix, they will be named 'libonnxruntime_providers_\*.so' +On Mac, they will be named 'libonnxruntime_providers_\*.dylib'. + +There is also a shared library that shared providers depend on called onnxruntime_providers_shared (with the same naming convension applied as above). + +Note: It is not recommended to put these libraries in a system location or added to a library search path (like LD_LIBRARY_PATH on Unix). If multiple versions of onnxruntime are installed on the system this can make them find the wrong libraries and lead to undefined behavior. + +### Loading the shared providers +{: .no_toc } + +Shared provider libraries are loaded by the onnxruntime code (do not load or depend on them in your client code). The API for registering shared or non shared providers is identical, the difference is that shared ones will be loaded at runtime when the provider is added to the session options (through a call like OrtSessionOptionsAppendExecutionProvider_OpenVINO or SessionOptionsAppendExecutionProvider_OpenVINO in the C API). +If a shared provider library cannot be loaded (if the file doesn't exist, or its dependencies don't exist or not in the path) then an error will be returned. + +The onnxruntime code will look for the provider shared libraries in the same location as the onnxruntime shared library is (or the executable statically linked to the static library version). + +--- + +## CUDA + +### Prerequisites +{: .no_toc } + +* Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn) + * The CUDA execution provider for ONNX Runtime is built and tested with CUDA 12.x and cuDNN 9. Check [here](../execution-providers/CUDA-ExecutionProvider.md#requirements) for more version information. + * The path to the CUDA installation must be provided via the CUDA_HOME environment variable, or the `--cuda_home` parameter. The installation directory should contain `bin`, `include` and `lib` sub-directories. + * The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found. + * The path to the cuDNN installation must be provided via the CUDNN_HOME environment variable, or `--cudnn_home` parameter. In Windows, the installation directory should contain `bin`, `include` and `lib` sub-directories. + * cuDNN 8.* requires ZLib. Follow the [cuDNN 8.9 installation guide](https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-890/install-guide/index.html) to install zlib in Linux or Windows. + * In Windows, the path to the cuDNN bin directory must be added to the PATH environment variable so that cudnn64_8.dll is found. + +### Build Instructions +{: .no_toc } + +#### Windows +``` +.\build.bat --use_cuda --cudnn_home --cuda_home +``` + +#### Linux +``` +./build.sh --use_cuda --cudnn_home --cuda_home +``` + +A Dockerfile is available [here](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles#cuda). + +### Build Options + +To specify GPU architectures (see [Compute Capability](https://developer.nvidia.com/cuda-gpus)), you can append parameters like `--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=80;86;89`. + +With `--cmake_extra_defines onnxruntime_USE_CUDA_NHWC_OPS=ON`, the CUDA EP can be compiled with additional NHWC ops. This option is not enabled by default due to the small amount of supported NHWC operators. + +Another very helpful CMake build option is to build with NVTX support (`--cmake_extra_defines onnxruntime_ENABLE_NVTX_PROFILE=ON`) that will enable much easier profiling using [Nsight Systems](https://developer.nvidia.com/nsight-systems) and correlates CUDA kernels with their actual ONNX operator. + +`--enable_cuda_line_info` or `--cmake_extra_defines onnxruntime_ENABLE_CUDA_LINE_NUMBER_INFO=ON` will enable [NVCC generation of line-number information for device code](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#generate-line-info-lineinfo). It might be helpful when you run [Compute Sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) tools on CUDA kernels. + +If your Windows machine has multiple versions of CUDA installed and you want to use an older version of CUDA, you need append parameters like `--cuda_version `. + +When your build machine has many CPU cores and less than 64 GB memory, there is chance of out of memory error like `nvcc error : 'cicc' died due to signal 9`. The solution is to limit number of parallel NVCC threads with parameters like `--parallel 4 --nvcc_threads 1`. + +### Notes on older versions of ONNX Runtime, CUDA and Visual Studio +{: .no_toc } + +* Depending on compatibility between the CUDA, cuDNN, and Visual Studio versions you are using, you may need to explicitly install an earlier version of the MSVC toolset. +* For older version of ONNX Runtime and CUDA, and Visual Studio: + * CUDA 10.0 is [known to work](https://devblogs.microsoft.com/cppblog/cuda-10-is-now-available-with-support-for-the-latest-visual-studio-2017-versions/) with toolsets from 14.11 up to 14.16 (Visual Studio 2017 15.9), and should continue to work with future Visual Studio versions + * CUDA 9.2 is known to work with the 14.11 MSVC toolset (Visual Studio 15.3 and 15.4) + * To install the 14.11 MSVC toolset, see [this page](https://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017). + * To use the 14.11 toolset with a later version of Visual Studio 2017 you have two options: + 1. Setup the Visual Studio environment variables to point to the 14.11 toolset by running vcvarsall.bat, prior to running the build script. e.g. if you have VS2017 Enterprise, an x64 build would use the following command `"C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" amd64 -vcvars_ver=14.11` For convenience, .\build.amd64.1411.bat will do this and can be used in the same way as .\build.bat. e.g. ` .\build.amd64.1411.bat --use_cuda` + + 2. Alternatively, if you have CMake 3.13 or later you can specify the toolset version via the `--msvc_toolset` build script parameter. e.g. `.\build.bat --msvc_toolset 14.11` + +* If you have multiple versions of CUDA installed on a Windows machine and are building with Visual Studio, CMake will use the build files for the highest version of CUDA it finds in the BuildCustomization folder. +e.g. C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\VC\VCTargets\BuildCustomizations\. +If you want to build with an earlier version, you must temporarily remove the 'CUDA x.y.*' files for later versions from this directory. + +--- + +## TensorRT + +See more information on the TensorRT Execution Provider [here](../execution-providers/TensorRT-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + + * Follow [instructions for CUDA execution provider](#cuda) to install CUDA and cuDNN, and setup environment variables. + * Follow [instructions for installing TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/latest/installing-tensorrt/installing.html) + * The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 10.9. + * The path to TensorRT installation must be provided via the `--tensorrt_home` parameter. + * ONNX Runtime uses [TensorRT built-in parser](https://developer.nvidia.com/tensorrt/download) from `tensorrt_home` by default. + * To use open-sourced [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt/tree/main) parser instead, add `--use_tensorrt_oss_parser` parameter in build commands below. + * The default version of open-sourced onnx-tensorrt parser is specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt). + * To specify a different version of onnx-tensorrt parser: + * Select the commit of [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt/commits) that you preferred; + * Run `sha1sum` command with downloaded onnx-tensorrt zip file to acquire the SHA1 hash + * Update [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt) with updated onnx-tensorrt commit and hash info. + * Please make sure TensorRT built-in parser/open-sourced onnx-tensorrt specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt) are **version-matched**, if enabling `--use_tensorrt_oss_parser`. + * i.e It's version-matched if assigning `tensorrt_home` with path to TensorRT-10.9 built-in binaries and onnx-tensorrt [10.9-GA branch](https://github.com/onnx/onnx-tensorrt/tree/release/10.9-GA) specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt). + + +### **[Note to ORT 1.21/1.22 open-sourced parser users]** + +* ORT 1.21/1.22 link against onnx-tensorrt 10.8-GA/10.9-GA, which requires newly released onnx 1.18. + * Here's a temporarily fix to preview on onnx-tensorrt 10.8-GA/10.9-GA when building ORT 1.21/1.22: + * Replace the [onnx line in cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/rel-1.21.0/cmake/deps.txt#L38) + with `onnx;https://github.com/onnx/onnx/archive/e709452ef2bbc1d113faf678c24e6d3467696e83.zip;c0b9f6c29029e13dea46b7419f3813f4c2ca7db8` + * Download [this](https://github.com/microsoft/onnxruntime/blob/7b2733a526c12b5ef4475edd47fd9997ebc2b2c6/cmake/patches/onnx/onnx.patch) as raw file and save file to [cmake/patches/onnx/onnx.patch](https://github.com/microsoft/onnxruntime/blob/rel-1.21.0/cmake/patches/onnx/onnx.patch) (do not copy/paste from browser, as it might alter line break type) + * Build ORT with trt-related flags above (including `--use_tensorrt_oss_parser`) + * The [onnx 1.18](https://github.com/onnx/onnx/releases/tag/v1.18.0) is supported by latest ORT main branch. Please checkout main branch and build ORT-TRT with `--use_tensorrt_oss_parser` to enable OSS parser with full onnx 1.18 support. + +### Build Instructions +{: .no_toc } + +#### Windows +```bash +# to build with tensorrt built-in parser +.\build.bat --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home --cmake_generator "Visual Studio 17 2022" + +# to build with specific version of open-sourced onnx-tensorrt parser configured in cmake/deps.txt +.\build.bat --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home --use_tensorrt_oss_parser --cmake_generator "Visual Studio 17 2022" +``` + +#### Linux + +```bash +# to build with tensorrt built-in parser +./build.sh --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home + +# to build with specific version of open-sourced onnx-tensorrt parser configured in cmake/deps.txt +./build.sh --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --use_tensorrt_oss_parser --tensorrt_home --skip_submodule_sync +``` + +Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#tensorrt) + +**Note** Building with `--use_tensorrt_oss_parser` with TensorRT 8.X requires additional flag --cmake_extra_defines onnxruntime_USE_FULL_PROTOBUF=ON + +--- + +## NVIDIA Jetson TX1/TX2/Nano/Xavier/Orin + +### Build Instructions +{: .no_toc } + +These instructions are for the latest [JetPack SDK](https://developer.nvidia.com/embedded/jetpack). + +1. Clone the ONNX Runtime repo on the Jetson host + + ```bash + git clone --recursive https://github.com/microsoft/onnxruntime + ``` + +2. Specify the CUDA compiler, or add its location to the PATH. + + 1. JetPack 5.x users can upgrade to the latest CUDA release without updating the JetPack version or Jetson Linux BSP (Board Support Package). + + 1. For JetPack 5.x users, CUDA>=11.8 and GCC>9.4 are required to be installed on and after ONNX Runtime 1.17. + + 2. Check [this official blog](https://developer.nvidia.com/blog/simplifying-cuda-upgrades-for-nvidia-jetson-users/) for CUDA upgrade instruction (CUDA 12.2 has been verified on JetPack 5.1.2 on Jetson Xavier NX). + + 1. If there's no `libnvcudla.so` under `/usr/local/cuda-12.2/compat`: `sudo apt-get install -y cuda-compat-12-2` and add `export LD_LIBRARY_PATH="/usr/local/cuda-12.2/lib64:/usr/local/cuda-12.2/compat:$LD_LIBRARY_PATH"` to `~/.bashrc`. + + 3. Check [here](https://developer.nvidia.com/cuda-gpus#collapse5) for compute capability datasheet. + + 2. CMake can't automatically find the correct `nvcc` if it's not in the `PATH`. `nvcc` can be added to `PATH` via: + + ```bash + export PATH="/usr/local/cuda/bin:${PATH}" + ``` + + or: + + ```bash + export CUDACXX="/usr/local/cuda/bin/nvcc" + ``` + + 3. Update TensorRT libraries + + 1. Jetpack 5.x supports up to TensorRT 8.5. Jetpack 6.x are equipped with TensorRT 8.6-10.3. + + 2. Jetpack 6.x users can download latest TensorRT 10 TAR package for **jetpack** on [TensorRT SDK website](https://developer.nvidia.com/tensorrt/download/10x). + + 3. Check [here](../execution-providers/TensorRT-ExecutionProvider.md#requirements) for TensorRT/CUDA support matrix among all ONNX Runtime versions. + +3. Install the ONNX Runtime build dependencies on the Jetpack host: + + ```bash + sudo apt install -y --no-install-recommends \ + build-essential software-properties-common libopenblas-dev \ + libpython3.10-dev python3-pip python3-dev python3-setuptools python3-wheel + ``` + +4. Cmake is needed to build ONNX Runtime. Please check the minimum required CMake version [here](https://github.com/microsoft/onnxruntime/blob/main/cmake/CMakeLists.txt#L6). Download from https://cmake.org/download/ and add cmake executable to `PATH` to use it. + +5. Build the ONNX Runtime Python wheel: + + 1. Build `onnxruntime-gpu` wheel with CUDA and TensorRT support (update paths to CUDA/CUDNN/TensorRT libraries if necessary): + + ```bash + ./build.sh --config Release --update --build --parallel --build_wheel \ + --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu \ + --tensorrt_home /usr/lib/aarch64-linux-gnu + ``` + +​ Notes: + +* By default, `onnxruntime-gpu` wheel file will be captured under `path_to/onnxruntime/build/Linux/Release/dist/` (build path can be customized by adding `--build_dir` followed by a customized path to the build command above). + +* Append `--skip_tests --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' 'onnxruntime_BUILD_UNIT_TESTS=OFF' 'onnxruntime_USE_FLASH_ATTENTION=OFF' +'onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION=OFF'` to the build command to opt out optional features and reduce build time. + +* For a portion of Jetson devices like the Xavier series, higher power mode involves more cores (up to 6) to compute but it consumes more resource when building ONNX Runtime. Set `--parallel 1` in the build command if OOM happens and system is hanging. + +## TensorRT-RTX + +See more information on the NV TensorRT RTX Execution Provider [here](../execution-providers/TensorRTRTX-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + + * Follow [instructions for CUDA execution provider](#cuda) to install CUDA and setup environment variables. + * Intall TensorRT for RTX from nvidia.com (TODO: add link when available) + +### Build Instructions +{: .no_toc } +`build.bat --config Release --parallel 32 --build_dir _build --build_shared_lib --use_nv_tensorrt_rtx --tensorrt_home "C:\dev\TensorRT-RTX-1.1.0.3" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9" --cmake_generator "Visual Studio 17 2022" --use_vcpkg` +Replace the --tensorrt_home and --cuda_home with correct paths to CUDA and TensorRT-RTX installations. + +## oneDNN + +See more information on oneDNN (formerly DNNL) [here](../execution-providers/oneDNN-ExecutionProvider.md). + +### Build Instructions +{: .no_toc } + + +The DNNL execution provider can be built for Intel CPU or GPU. To build for Intel GPU, install [Intel SDK for OpenCL Applications](https://software.intel.com/content/www/us/en/develop/tools/opencl-sdk.html) or build OpenCL from [Khronos OpenCL SDK](https://github.com/KhronosGroup/OpenCL-SDK). Pass in the OpenCL SDK path as dnnl_opencl_root to the build command. Install the latest GPU driver - [Windows graphics driver](https://downloadcenter.intel.com/product/80939/Graphics), [Linux graphics compute runtime and OpenCL driver](https://github.com/intel/compute-runtime/releases). + +For CPU +#### Windows +`.\build.bat --use_dnnl` + +#### Linux +`./build.sh --use_dnnl` + +For GPU +#### Windows + +`.\build.bat --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "c:\program files (x86)\intelswtools\sw_dev_tools\opencl\sdk"` +#### Linux + +`./build.sh --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "/opt/intel/sw_dev_tools/opencl-sdk"` + +#### Build Phython Wheel + + +OneDNN EP build supports building Python wheel for both Windows and linux using flag --build_wheel + +`.\build.bat --config RelWithDebInfo --parallel --build_shared_lib --cmake_generator "Visual Studio 16 2019" --build_wheel --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "C:\Program Files (x86)\IntelSWTools\system_studio_2020\OpenCL\sdk"` + +--- + +## OpenVINO + +See more information on the OpenVINO™ Execution Provider [here](../execution-providers/OpenVINO-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + +1. Install the OpenVINO™ offline/online installer from Intel® Distribution of OpenVINO™TM Toolkit **Release 2025.3** for the appropriate OS and target hardware: + * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2025_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE). + * [Linux - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2025_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE) + + Follow [documentation](https://docs.openvino.ai/2025/index.html) for detailed instructions. + + *2025.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2025.0](https://docs.openvino.ai/2025/index.html) is minimal OpenVINO™ version requirement.* + +2. Install CMake 3.28 or higher. Download from the [official CMake website](https://cmake.org/download/). + +3. Configure the target hardware with specific follow on instructions: + * To configure Intel® Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-gpu.html#windows), [Linux](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-gpu.html#linux) + +4. Initialize the OpenVINO™ environment by running the setupvars script as shown below. This is a required step: + * For Windows: + ``` + C:\\setupvars.bat + ``` + * For Linux: + ``` + $ source /setupvars.sh + ``` + + +### Build Instructions +{: .no_toc } + +#### Windows + +``` +.\build.bat --config Release --use_openvino --build_shared_lib --build_wheel +``` + +*Note: The default Windows CMake Generator is Visual Studio 2019, but you can also use the newer Visual Studio 2022 by passing `--cmake_generator "Visual Studio 17 2022"` to `.\build.bat`* + +#### Linux + +```bash +./build.sh --config Release --use_openvino --build_shared_lib --build_wheel +``` + +* `--build_wheel` Creates python wheel file in dist/ folder. Enable it when building from source. +* `--use_openvino` builds the OpenVINO™ Execution Provider in ONNX Runtime. +* ``: Specifies the default hardware target for building OpenVINO™ Execution Provider. This can be overriden dynamically at runtime with another option (refer to [OpenVINO™-ExecutionProvider](../execution-providers/OpenVINO-ExecutionProvider.md#summary-of-options) for more details on dynamic device selection). Below are the options for different Intel target devices. + +Refer to [Intel GPU device naming convention](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#device-naming-convention) for specifying the correct hardware target in cases where both integrated and discrete GPU's co-exist. + +| Hardware Option | Target Device | +| --------------- | ------------------------| +| CPU | Intel® CPUs | +| GPU | Intel® Integrated Graphics | +| GPU.0 | Intel® Integrated Graphics | +| GPU.1 | Intel® Discrete Graphics | +| NPU | Intel® Neural Processor Unit | + + +#### Disable subgraph partition Feature +* Builds the OpenVINO™ Execution Provider in ONNX Runtime with graph partitioning disabled, which will run fully supported models on OpenVINO Execution Provider else they completely fall back to default CPU EP, + +* To enable this feature during build time. Use `--use_openvino ` `_NO_PARTITION` + +``` +Usage: --use_openvino CPU_NO_PARTITION or --use_openvino GPU_NO_PARTITION or --use_openvino NPU_NO_PARTITION +``` + +For more information on OpenVINO™ Execution Provider's ONNX Layer support, Topology support, and Intel hardware enabled, please refer to the document [OpenVINO™-ExecutionProvider](../execution-providers/OpenVINO-ExecutionProvider.md#support-coverage) + +--- + +## QNN +See more information on the QNN execution provider [here](../execution-providers/QNN-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } +* Install the Qualcomm AI Engine Direct SDK (Qualcomm Neural Network SDK) [Linux/Android/Windows](https://qpm.qualcomm.com/main/tools/details/qualcomm_ai_engine_direct) + +* Install [cmake-3.28](https://cmake.org/download/) or higher. + +* Install Python 3.10 or higher. + * [Python 3.12 for Windows Arm64](https://www.python.org/ftp/python/3.12.9/python-3.12.9-arm64.exe) + * [Python 3.12 for Windows x86-64](https://www.python.org/ftp/python/3.12.9/python-3.12.9-amd64.exe) + * Note: Windows on Arm supports a x86-64 Python environment via emulation. Ensure that the Arm64 Python environment is actived for a native Arm64 ONNX Runtime build. + +* Checkout the source tree: + + ```bash + git clone --recursive https://github.com/Microsoft/onnxruntime.git + cd onnxruntime + ``` + +* Install ONNX Runtime Python dependencies. + ```bash + pip install -r requirements.txt + ``` + +### Build Options +{: .no_toc } + +* `--use_qnn [QNN_LIBRARY_KIND]`: Builds the QNN Execution provider. `QNN_LIBRARY_KIND` is optional and specifies whether to build the QNN Execution Provider as a shared library (default) or static library. + * `--use_qnn` or `--use_qnn shared_lib`: Builds the QNN Execution Provider as a shared library. + * `--use_qnn static_lib`: Builds QNN Execution Provider as a static library linked into ONNX Runtime. This is required for Android builds. +* `--qnn_home QNN_SDK_PATH`: The path to the Qualcomm AI Engine Direct SDK. + * Example on Windows: `--qnn_home 'C:\Qualcomm\AIStack\QAIRT\2.31.0.250130'` + * Example on Linux: `--qnn_home /opt/qcom/aistack/qairt/2.31.0.250130` +* `--build_wheel`: Enables Python bindings and builds Python wheel. +* `--arm64`: Cross-compile for Arm64. +* `--arm64ec`: Cross-compile for Arm64EC. Arm64EC code runs with native performance and is interoperable with x64 code running under emulation within the same process on a Windows on Arm device. Refer to the [Arm64EC Overview](https://learn.microsoft.com/en-us/windows/arm/arm64ec). + +Run `python tools/ci_build/build.py --help` for a description of all available build options. + +### Build Instructions +{: .no_toc } + +#### Windows (native x86-64 or native Arm64) +``` +.\build.bat --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --skip_tests --build_dir build\Windows +``` + +Notes: +* Not all Qualcomm backends (e.g., HTP) are supported for model execution on a native x86-64 build. Refer to the [Qualcomm SDK backend documentation](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/backend.html) for more information. +* Even if a Qualcomm backend does not support execution on x86-64, the QNN Execution provider may be able to [generate compiled models](../execution-providers/QNN-ExecutionProvider.md#qnn-context-binary-cache-feature) for the Qualcomm backend. + +#### Windows (Arm64 cross-compile target) +``` +.\build.bat --arm64 --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --build_dir build\Windows +``` + +#### Windows (Arm64EC cross-compile target) +``` +.\build.bat --arm64ec --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --build_dir build\Windows +``` + +#### Windows (Arm64X cross-compile target) +Use the `build_arm64x.bat` script to build Arm64X binaries. Arm64X binaries bundle both Arm64 and Arm64EC code, making Arm64X compatible with both Arm64 and Arm64EC processes on a Windows on Arm device. Refer to the [Arm64X PE files overview](https://learn.microsoft.com/en-us/windows/arm/arm64x-pe). + +``` +.\build_arm64x.bat --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --cmake_generator "Visual Studio 17 2022" --config Release --parallel +``` +Notes: +* Do not specify a `--build_dir` option because `build_arm64x.bat` sets specific build directories. +* The above command places Arm64X binaries in the `.\build\arm64ec-x\Release\Release\` directory. + +#### Linux (x86_64) +``` +./build.sh --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --config Release --parallel --skip_tests --build_dir build/Linux +``` + +#### Android (cross-compile): + +Please reference [Build OnnxRuntime For Android](android.md) +``` +# on Windows +.\build.bat --build_shared_lib --android --config Release --parallel --use_qnn static_lib --qnn_home [QNN_SDK_PATH] --android_sdk_path [android_SDK path] --android_ndk_path [android_NDK path] --android_abi arm64-v8a --android_api [api-version] --cmake_generator Ninja --build_dir build\Android + +# on Linux +./build.sh --build_shared_lib --android --config Release --parallel --use_qnn static_lib --qnn_home [QNN_SDK_PATH] --android_sdk_path [android_SDK path] --android_ndk_path [android_NDK path] --android_abi arm64-v8a --android_api [api-version] --cmake_generator Ninja --build_dir build/Android +``` + +--- + +## DirectML +See more information on the DirectML execution provider [here](../execution-providers/DirectML-ExecutionProvider.md). +### Windows +{: .no_toc } + +``` +.\build.bat --use_dml +``` +### Notes +{: .no_toc } + +The DirectML execution provider supports building for both x64 and x86 architectures. DirectML is only supported on Windows. + +--- + +## Arm Compute Library +See more information on the ACL Execution Provider [here](../execution-providers/community-maintained/ACL-ExecutionProvider.md). + +### Build Instructions +{: .no_toc } + +You must first build Arm Compute Library 24.07 for your platform as described in the [documentation](https://github.com/ARM-software/ComputeLibrary). +See [here](inferencing.md#arm) for information on building for Arm®-based devices. + +Add the following options to `build.sh` to enable the ACL Execution Provider: + +``` +--use_acl --acl_home=/path/to/ComputeLibrary --acl_libs=/path/to/ComputeLibrary/build +``` + +## Arm NN + +See more information on the Arm NN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + + +* Supported backend: i.MX8QM Armv8 CPUs +* Supported BSP: i.MX8QM BSP + * Install i.MX8QM BSP: `source fsl-imx-xwayland-glibc-x86_64-fsl-image-qt5-aarch64-toolchain-4*.sh` +* Set up the build environment + +```bash +source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux +alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake" +``` + +* See [here](inferencing.md#arm) for information on building for Arm-based devices + +### Build Instructions +{: .no_toc } + + +```bash +./build.sh --use_armnn +``` + +The Relu operator is set by default to use the CPU execution provider for better performance. To use the Arm NN implementation build with --armnn_relu flag + +```bash +./build.sh --use_armnn --armnn_relu +``` + +The Batch Normalization operator is set by default to use the CPU execution provider. To use the Arm NN implementation build with --armnn_bn flag + +```bash +./build.sh --use_armnn --armnn_bn +``` + +To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the Arm NN home directory and build directory respectively. +The Arm Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively. + +```bash +./build.sh --use_armnn --armnn_home /path/to/armnn --armnn_libs /path/to/armnn/build --acl_home /path/to/ComputeLibrary --acl_libs /path/to/acl/build +``` + +--- + +## RKNPU +See more information on the RKNPU Execution Provider [here](../execution-providers/community-maintained/RKNPU-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + + +* Supported platform: RK1808 Linux +* See [here](inferencing.md#arm) for information on building for Arm-based devices +* Use gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu instead of gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf, and modify CMAKE_CXX_COMPILER & CMAKE_C_COMPILER in tool.cmake: + +``` +set(CMAKE_CXX_COMPILER aarch64-linux-gnu-g++) +set(CMAKE_C_COMPILER aarch64-linux-gnu-gcc) +``` + +### Build Instructions +{: .no_toc } + +#### Linux + +1. Download [rknpu_ddk](https://github.com/airockchip/rknpu_ddk.git) to any directory. + +2. Build ONNX Runtime library and test: + + ```bash + ./build.sh --arm --use_rknpu --parallel --build_shared_lib --build_dir build_arm --config MinSizeRel --cmake_extra_defines RKNPU_DDK_PATH= CMAKE_TOOLCHAIN_FILE= ONNX_CUSTOM_PROTOC_EXECUTABLE= + ``` + +3. Deploy ONNX runtime and librknpu_ddk.so on the RK1808 board: + + ```bash + libonnxruntime.so.1.2.0 + onnxruntime_test_all + rknpu_ddk/lib64/librknpu_ddk.so + ``` + +--- + +## AMD Vitis AI +See more information on the Vitis AI Execution Provider [here](../execution-providers/Vitis-AI-ExecutionProvider.md). + +### Windows +{: .no_toc } + +From the Visual Studio Developer Command Prompt or Developer PowerShell, execute the following command: + +``` +.\build.bat --use_vitisai --build_shared_lib --parallel --config Release +``` + +If you wish to leverage the Python APIs, please include the `--build_wheel` flag: + +``` +.\build.bat --use_vitisai --build_shared_lib --parallel --config Release --build_wheel +``` + +You can override also override the installation location by specifying CMAKE_INSTALL_PREFIX via the cmake_extra_defines parameter. +e.g. + +``` +.\build.bat --use_vitisai --build_shared_lib --parallel --config Release --cmake_extra_defines CMAKE_INSTALL_PREFIX=D:\onnxruntime +``` +### Linux +{: .no_toc } + +Currently Linux support is only enabled for AMD Adapable SoCs. Please refer to the guidance [here](../execution-providers/Vitis-AI-ExecutionProvider.md#installation-for-amd-adaptable-socs) for SoC targets. + +--- + +## AMD MIGraphX + +See more information on the MIGraphX Execution Provider [here](../execution-providers/MIGraphX-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + +* Install [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/) + * The MIGraphX execution provider for ONNX Runtime is built and tested with ROCm6.3.1 +* Install [MIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX) + * The path to MIGraphX installation must be provided via the `--migraphx_home parameter`. + +### Build Instructions +{: .no_toc } + +#### Linux + +```bash +./build.sh --config --parallel --use_migraphx --migraphx_home +``` + +Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles#migraphx). + +#### Build Phython Wheel + +`./build.sh --config Release --build_wheel --parallel --use_migraphx --migraphx_home /opt/rocm` + +Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```. + +--- + +## AMD ROCm + +See more information on the ROCm Execution Provider [here](../execution-providers/ROCm-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + +* Install [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/) + * The ROCm execution provider for ONNX Runtime is built and tested with ROCm6.3.1 + +### Build Instructions +{: .no_toc } + +#### Linux + +```bash +./build.sh --config --parallel --use_rocm --rocm_home +``` + +Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#rocm). + +#### Build Phython Wheel + +`./build.sh --config Release --build_wheel --parallel --use_rocm --rocm_home /opt/rocm` + +Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```. + +--- + +## NNAPI + +Usage of NNAPI on Android platforms is via the NNAPI Execution Provider (EP). + +See the [NNAPI Execution Provider](../execution-providers/NNAPI-ExecutionProvider.md) documentation for more details. + +The pre-built ONNX Runtime Mobile package for Android includes the NNAPI EP. + +If performing a custom build of ONNX Runtime, support for the NNAPI EP or CoreML EP must be enabled when building. + +### Create a minimal build with NNAPI EP support + +Please see [the instructions](./android.md) for setting up the Android environment required to build. The Android build can be cross-compiled on Windows or Linux. + +Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: + +* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the NNAPI EP. +* Add `--use_nnapi` to include the NNAPI EP in the build + +#### Example build commands with the NNAPI EP enabled + +Windows example: + +```dos +.\build.bat --config MinSizeRel --android --android_sdk_path D:\Android --android_ndk_path D:\Android\ndk\21.1.6352462\ --android_abi arm64-v8a --android_api 29 --cmake_generator Ninja --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config +``` + +Linux example: + +```bash +./build.sh --config MinSizeRel --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config ` +``` + +## CoreML + +Usage of CoreML on iOS and macOS platforms is via the CoreML EP. + +See the [CoreML Execution Provider](../execution-providers/CoreML-ExecutionProvider.md) documentation for more details. + +The pre-built ONNX Runtime Mobile package for iOS includes the CoreML EP. + +### Create a minimal build with CoreML EP support + +Please see [the instructions](./ios.md) for setting up the iOS environment required to build. The iOS/macOS build must be performed on a mac machine. + +Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: + +* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the CoreML EP. +* Add `--use_coreml` to include the CoreML EP in the build + +## XNNPACK + +Usage of XNNPACK on Android/iOS/Windows/Linux platforms is via the XNNPACK EP. + +See the [XNNPACK Execution Provider](../execution-providers/Xnnpack-ExecutionProvider.md) documentation for more details. + +The pre-built ONNX Runtime package([`onnxruntime-android`](https://mvnrepository.com/artifact/com.microsoft.onnxruntime/onnxruntime-android)) for Android includes the XNNPACK EP. + +The pre-built ONNX Runtime Mobile package for iOS, `onnxruntime-c` and `onnxruntime-objc` in [CocoaPods](https://cocoapods.org/), includes the XNNPACK EP. (Package `onnxruntime-objc` with XNNPACK will be available since 1.14.) + + +If performing a custom build of ONNX Runtime, support for the XNNPACK EP must be enabled when building. + +### Build for Android +#### Create a minimal build with XNNPACK EP support + +Please see [the instructions](./android.md) for setting up the Android environment required to build. The Android build can be cross-compiled on Windows or Linux. + +Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: + +* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the XNNPACK EP. +* Add `--use_xnnpack` to include the XNNPACK EP in the build + +##### Example build commands with the XNNPACK EP enabled + +Windows example: + +```bash +.\build.bat --config MinSizeRel --android --android_sdk_path D:\Android --android_ndk_path D:\Android\ndk\21.1.6352462\ --android_abi arm64-v8a --android_api 29 --cmake_generator Ninja --minimal_build extended --use_xnnpack --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config +``` + +Linux example: + +```bash +./build.sh --config MinSizeRel --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --minimal_build extended --use_xnnpack --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config ` +``` +If you don't mind MINIMAL build, you can use the following command to build XNNPACK EP for Android: +Linux example: +```bash +./build.sh --cmake_generator "Ninja" --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --use_xnnpack +``` +### Build for iOS (available since 1.14) +A Mac machine is required to build package for iOS. Please follow this [guide](./ios.md) to set up environment firstly. +#### Create a minimal build with XNNPACK EP support + +Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: + +* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the XNNPACK EP. +* Add `--use_xnnpack` to include the XNNPACK EP in the build + +```dos +./build.sh --config --use_xcode \ + --ios --ios_sysroot iphoneos --osx_arch arm64 --apple_deploy_target --use_xnnpack --minimal_build extended --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config +``` + +### Build for Windows +```dos +.\build.bat --config --use_xnnpack +``` +### Build for Linux +```bash +./build.sh --config --use_xnnpack +``` + +--- + +## CANN + +See more information on the CANN Execution Provider [here](../execution-providers/community-maintained/CANN-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + +1. Install the CANN Toolkit for the appropriate OS and target hardware by following [documentation](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/51RC1alphaX/softwareinstall/instg/atlasdeploy_03_0017.html) for detailed instructions, please. + +2. Initialize the CANN environment by running the script as shown below. + + ```bash + # Default path, change it if needed. + source /usr/local/Ascend/ascend-toolkit/set_env.sh + ``` + +### Build Instructions +{: .no_toc } + +#### Linux + +```bash +./build.sh --config --build_shared_lib --parallel --use_cann +``` + +### Notes +{: .no_toc } + +* The CANN execution provider supports building for both x64 and aarch64 architectures. +* CANN excution provider now is only supported on Linux. + +## Azure + +See the [Azure Execution Provider](../execution-providers/Azure-ExecutionProvider.md) documentation for more details. + +### Prerequisites + +For Linux, before building, please: + +* install openssl dev package into the system, which is openssl-dev for redhat and libssl-dev for ubuntu. +* if have multiple openssl dev versions installed in the system, please set environment variable "OPENSSL_ROOT_DIR" to the desired version, for example: + +```base +set OPENSSL_ROOT_DIR=/usr/local/ssl3.x/ +``` + +### Build Instructions + +#### Windows + +```dos +build.bat --config --build_shared_lib --build_wheel --use_azure +``` + +#### Linux + +```bash +./build.sh --config --build_shared_lib --build_wheel --use_azure +``` \ No newline at end of file diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md index 34dca5aba4858..0707034e9d2d3 100644 --- a/docs/execution-providers/OpenVINO-ExecutionProvider.md +++ b/docs/execution-providers/OpenVINO-ExecutionProvider.md @@ -1,631 +1,697 @@ ---- -title: Intel - OpenVINO™ -description: Instructions to execute OpenVINO™ Execution Provider for ONNX Runtime. -parent: Execution Providers -nav_order: 3 -redirect_from: /docs/reference/execution-providers/OpenVINO-ExecutionProvider ---- - -# OpenVINO™ Execution Provider -{: .no_toc } - -Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution Provider. Please refer to [this](https://software.intel.com/en-us/openvino-toolkit/hardware) page for details on the Intel hardware supported. - -## Contents -{: .no_toc } - -* TOC placeholder -{:toc} - -## Install - -Pre-built packages and Docker images are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release. -* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.6 Release](https://github.com/intel/onnxruntime/releases) -* Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/) -* Docker image: [openvino/onnxruntime_ep_ubuntu20](https://hub.docker.com/r/openvino/onnxruntime_ep_ubuntu20) - -## Requirements - -ONNX Runtime OpenVINO™ Execution Provider is compatible with three lastest releases of OpenVINO™. - -|ONNX Runtime|OpenVINO™|Notes| -|---|---|---| -|1.21.0|2025.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.6)| -|1.20.0|2024.4|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.5)| -|1.19.0|2024.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.4)| -|1.18.0|2024.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.3)| -|1.17.1|2023.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.2)| - -## Build - -For build instructions, please see the [BUILD page](../build/eps.md#openvino). - -## Usage - -**Set OpenVINO™ Environment for Python** - -Please download onnxruntime-openvino python packages from PyPi.org: -``` -pip install onnxruntime-openvino -``` - -* **Windows** - - To enable OpenVINO™ Execution Provider with ONNX Runtime on Windows it is must to set up the OpenVINO™ Environment Variables using the full installer package of OpenVINO™. - Initialize the OpenVINO™ environment by running the setupvars script as shown below. This is a required step: - - ``` - C:\ \setupvars.bat - ``` - -* **Linux** - - OpenVINO™ Execution Provider with Onnx Runtime on Linux, installed from PyPi.org comes with prebuilt OpenVINO™ libs and supports flag CXX11_ABI=0. So there is no need to install OpenVINO™ separately. - - But if there is need to enable CX11_ABI=1 flag of OpenVINO, build Onnx Runtime python wheel packages from source. For build instructions, please see the [BUILD page](../build/eps.md#openvino). - OpenVINO™ Execution Provider wheels on Linux built from source will not have prebuilt OpenVINO™ libs so we must set the OpenVINO™ Environment Variable using the full installer package of OpenVINO™: - - ``` - $ source /setupvars.sh - ``` - -**Set OpenVINO™ Environment for C++** - -For Running C++/C# ORT Samples with the OpenVINO™ Execution Provider it is must to set up the OpenVINO™ Environment Variables using the full installer package of OpenVINO™. -Initialize the OpenVINO™ environment by running the setupvars script as shown below. This is a required step: - * For Windows run: - ``` - C:\ \setupvars.bat - ``` - * For Linux run: - ``` - $ source /setupvars.sh - ``` - **Note:** If you are using a dockerfile to use OpenVINO™ Execution Provider, sourcing OpenVINO™ won't be possible within the dockerfile. You would have to explicitly set the LD_LIBRARY_PATH to point to OpenVINO™ libraries location. Refer our [dockerfile](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles/Dockerfile.openvino). - - -**Set OpenVINO™ Environment for C#** - -To use csharp api for openvino execution provider create a custom nuget package. Follow the instructions [here](../build/inferencing.md#build-nuget-packages) to install prerequisites for nuget creation. Once prerequisites are installed follow the instructions to [build openvino execution provider](../build/eps.md#openvino) and add an extra flag `--build_nuget` to create nuget packages. Two nuget packages will be created Microsoft.ML.OnnxRuntime.Managed and Microsoft.ML.OnnxRuntime.Openvino. - -## Features - -### OpenCL queue throttling for GPU devices - -Enables [OpenCL queue throttling](https://docs.openvino.ai/2024/api/c_cpp_api/group__ov__runtime__ocl__gpu__prop__cpp__api.html) for GPU devices. Reduces CPU utilization when using GPUs with OpenVINO EP. - -### Model caching - -OpenVINO™ supports [model caching](https://docs.openvino.ai/2024/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html). - -Model caching feature is supported on CPU, NPU, GPU along with kernel caching on iGPU, dGPU. - -This feature enables users to save and load the blob file directly on to the hardware device target and perform inference with improved Inference Latency. - -Kernel Caching on iGPU and dGPU: - -This feature also allows user to save kernel caching as cl_cache files for models with dynamic input shapes. These cl_cache files can be loaded directly onto the iGPU/dGPU hardware device target and inferencing can be performed. - -#### Enabling Model Caching via Runtime options using c++/python API's. - -This flow can be enabled by setting the runtime config option 'cache_dir' specifying the path to dump and load the blobs (CPU, NPU, iGPU, dGPU) or cl_cache(iGPU, dGPU) while using the c++/python API'S. - -Refer to [Configuration Options](#configuration-options) for more information about using these runtime options. - -### Support for INT8 Quantized models - -Int8 models are supported on CPU, GPU and NPU. - -### Support for Weights saved in external files - -OpenVINO™ Execution Provider now supports ONNX models that store weights in external files. It is especially useful for models larger than 2GB because of protobuf limitations. - -See the [OpenVINO™ ONNX Support documentation](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-onnx.html). - -Converting and Saving an ONNX Model to External Data: -Use the ONNX API's.[documentation](https://github.com/onnx/onnx/blob/master/docs/ExternalData.md#converting-and-saving-an-onnx-model-to-external-data). - -Example: - -```python -import onnx -onnx_model = onnx.load("model.onnx") # Your model in memory as ModelProto -onnx.save_model(onnx_model, 'saved_model.onnx', save_as_external_data=True, all_tensors_to_one_file=True, location='data/weights_data', size_threshold=1024, convert_attribute=False) -``` - -Note: -1. In the above script, model.onnx is loaded and then gets saved into a file called 'saved_model.onnx' which won't have the weights but this new onnx model now will have the relative path to where the weights file is located. The weights file 'weights_data' will now contain the weights of the model and the weights from the original model gets saved at /data/weights_data. - -2. Now, you can use this 'saved_model.onnx' file to infer using your sample. But remember, the weights file location can't be changed. The weights have to be present at /data/weights_data - -3. Install the latest ONNX Python package using pip to run these ONNX Python API's successfully. - -### Support for IO Buffer Optimization - -To enable IO Buffer Optimization we have to set OPENCL_LIBS, OPENCL_INCS environment variables before build. For IO Buffer Optimization, the model must be fully supported on OpenVINO™ and we must provide in the remote context cl_context void pointer as C++ Configuration Option. We can provide cl::Buffer address as Input using GPU Memory Allocator for input and output. - -Example: -```bash -//Set up a remote context -cl::Context _context; -..... -// Set the context through openvino options -std::unordered_map ov_options; -ov_options[context] = std::to_string((unsigned long long)(void *) _context.get()); -..... -//Define the Memory area -Ort::MemoryInfo info_gpu("OpenVINO_GPU", OrtAllocatorType::OrtDeviceAllocator, 0, OrtMemTypeDefault); -//Create a shared buffer , fill in with data -cl::Buffer shared_buffer(_context, CL_MEM_READ_WRITE, imgSize, NULL, &err); -.... -//Cast it to void*, and wrap it as device pointer for Ort::Value -void *shared_buffer_void = static_cast(&shared_buffer); -Ort::Value inputTensors = Ort::Value::CreateTensor( - info_gpu, shared_buffer_void, imgSize, inputDims.data(), - inputDims.size(), ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT); -``` - -### Multi-threading for OpenVINO™ Execution Provider - -OpenVINO™ Execution Provider for ONNX Runtime enables thread-safe deep learning inference - -### Multi streams for OpenVINO™ Execution Provider -OpenVINO™ Execution Provider for ONNX Runtime allows multiple stream execution for difference performance requirements part of API 2.0 - -### Auto-Device Execution for OpenVINO EP - -Use `AUTO:,..` as the device name to delegate selection of an actual accelerator to OpenVINO™. Auto-device internally recognizes and selects devices from CPU, integrated GPU, discrete Intel GPUs (when available) and NPU (when available) depending on the device capabilities and the characteristic of CNN models, for example, precisions. Then Auto-device assigns inference requests to the selected device. - -From the application point of view, this is just another device that handles all accelerators in full system. - -For more information on Auto-Device plugin of OpenVINO™, please refer to the -[Intel OpenVINO™ Auto Device Plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#automatic-device-selection). - -### Heterogeneous Execution for OpenVINO™ Execution Provider - -The heterogeneous execution enables computing for inference on one network on several devices. Purposes to execute networks in heterogeneous mode: - -* To utilize accelerator's power and calculate the heaviest parts of the network on the accelerator and execute unsupported layers on fallback devices like the CPU to utilize all available hardware more efficiently during one inference. - -For more information on Heterogeneous plugin of OpenVINO™, please refer to the -[Intel OpenVINO™ Heterogeneous Plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.html). - -### Multi-Device Execution for OpenVINO EP - -Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel. Potential gains are as follows: - -* Improved throughput that multiple devices can deliver (compared to single-device execution) -* More consistent performance, since the devices can now share the inference burden (so that if one device is becoming too busy, another device can take more of the load) - -For more information on Multi-Device plugin of OpenVINO™, please refer to the -[Intel OpenVINO™ Multi Device Plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#multi-stream-execution). - -### Export OpenVINO Compiled Blob -Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. This feature is currently enabled for fully supported models only. It complies with the ORT session config keys -``` - Ort::SessionOptions session_options; - - // Enable EP context feature to dump the partitioned graph which includes the EP context into Onnx file. - // "0": disable. (default) - // "1": enable. - - session_options.AddConfigEntry(kOrtSessionOptionEpContextEnable, "1"); - - // Flag to specify whether to dump the EP context into single Onnx model or pass bin path. - // "0": dump the EP context into separate file, keep the file name in the Onnx model. - // "1": dump the EP context into the Onnx model. (default). - - session_options.AddConfigEntry(kOrtSessionOptionEpContextEmbedMode, "1"); - - // Specify the file path for the Onnx model which has EP context. - // Defaults to /original_file_name_ctx.onnx if not specified - - session_options.AddConfigEntry(kOrtSessionOptionEpContextFilePath, ".\ov_compiled_epctx.onnx"); - - sess = onnxruntime.InferenceSession(, session_options) -``` -Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h) for more information about session options. - -### Enable QDQ Optimizations Passes -Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled. -Refer to [Configuration Options](#configuration-options) for more information about using these runtime options. - -### Loading Custom JSON OV Config During Runtime -This feature is developed to facilitate loading of OVEP parameters from a single JSON configuration file. -The JSON input schema must be of format - -``` -{ - "DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"} -} -``` -where "DEVICE_KEY" can be CPU, NPU or GPU , "PROPERTY" must be a valid entity defined in OV from its properties.hpp sections and "PROPERTY_VALUE" must be passed in as a string. If we pass any other type like int/bool we encounter errors from ORT like below - - -Exception during initialization: [json.exception.type_error.302] type must be string, but is a number. - -While one can set the int/bool values like this "NPU_TILES": "2" which is valid. -If someone passes incorrect keys, it will be skipped with a warning while incorrect values assigned to a valid key will result in an exception arising from OV framework. - -The valid properties are of 2 types viz. MUTABLE (R/W) & IMMUTABLE (R ONLY) these are also governed while setting the same. If an IMMUTABLE property is being set, we skip setting the same with a similar warning. - -### OpenVINO Execution Provider Supports EP-Weight Sharing across sessions -The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports EP-Weight Sharing, enabling models to efficiently share weights across multiple inference sessions. This feature enhances the execution of Large Language Models (LLMs) with prefill and KV cache, reducing memory consumption and improving performance when running multiple inferences. - -With EP-Weight Sharing, prefill and KV cache models can now reuse the same set of weights, minimizing redundancy and optimizing inference. Additionally, this ensures that EP Context nodes are still created even when the model undergoes subgraph partitioning. - -These changes enable weight sharing between two models using the session context option: ep.share_ep_contexts. -Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/5068ab9b190c549b546241aa7ffbe5007868f595/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h#L319) for more details on configuring this runtime option. - -### OVEP supports CreateSessionFromArray API -The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports creating sessions from memory using the CreateSessionFromArray API. This allows loading models directly from memory buffers instead of file paths. The CreateSessionFromArray loads the model in memory then creates a session from the in-memory byte array. - -Note: -Use the -l argument when running the inference with perf_test using CreateSessionFromArray API. - -## Configuration Options - -OpenVINO™ Execution Provider can be configured with certain options at runtime that control the behavior of the EP. These options can be set as key-value pairs as below:- - -### Python API -Key-Value pairs for config options can be set using InferenceSession API as follow:- - -``` -session = onnxruntime.InferenceSession(, providers=['OpenVINOExecutionProvider'], provider_options=[{Key1 : Value1, Key2 : Value2, ...}]) -``` -*Note that the releases from (ORT 1.10) will require explicitly setting the providers parameter if you want to use execution providers other than the default CPU provider (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.* - -### C/C++ API 2.0 -The session configuration options are passed to SessionOptionsAppendExecutionProvider API as shown in an example below for GPU device type: - -``` -std::unordered_map options; -options[device_type] = "GPU"; -options[precision] = "FP32"; -options[num_of_threads] = "8"; -options[num_streams] = "8"; -options[cache_dir] = ""; -options[context] = "0x123456ff"; -options[enable_qdq_optimizer] = "True"; -options[load_config] = "config_path.json"; -session_options.AppendExecutionProvider_OpenVINO_V2(options); -``` - -### C/C++ Legacy API -Note: This API is no longer officially supported. Users are requested to move to V2 API. - -The session configuration options are passed to SessionOptionsAppendExecutionProvider_OpenVINO() API as shown in an example below for GPU device type: - -``` -OrtOpenVINOProviderOptions options; -options.device_type = "GPU_FP32"; -options.num_of_threads = 8; -options.cache_dir = ""; -options.context = 0x123456ff; -options.enable_opencl_throttling = false; -SessionOptions.AppendExecutionProvider_OpenVINO(session_options, &options); -``` - -### Onnxruntime Graph level Optimization -OpenVINO™ backend performs hardware, dependent as well as independent optimizations on the graph to infer it on the target hardware with best possible performance. In most cases it has been observed that passing the ONNX input graph as it is without explicit optimizations would lead to best possible optimizations at kernel level by OpenVINO™. For this reason, it is advised to turn off high level optimizations performed by ONNX Runtime for OpenVINO™ Execution Provider. This can be done using SessionOptions() as shown below:- - -* #### Python API - ``` - options = onnxruntime.SessionOptions() - options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL - sess = onnxruntime.InferenceSession(, options) - ``` - -* #### C/C++ API - ``` - SessionOptions::SetGraphOptimizationLevel(ORT_DISABLE_ALL); - ``` - -## Summary of options - -The following table lists all the available configuration options for API 2.0 and the Key-Value pairs to set them: - -| **Key** | **Key type** | **Allowable Values** | **Value type** | **Description** | -| --- | --- | --- | --- | --- | -| device_type | string | CPU, NPU, GPU, GPU.0, GPU.1 based on the available GPUs, NPU, Any valid Hetero combination, Any valid Multi or Auto devices combination | string | Overrides the accelerator hardware type with these values at runtime. If this option is not explicitly set, default hardware specified during build is used. | -| precision | string | FP32, FP16, ACCURACY based on the device_type chosen | string | Supported precisions for HW {CPU:FP32, GPU:[FP32, FP16, ACCURACY], NPU:FP16}. Default precision for HW for optimized performance {CPU:FP32, GPU:FP16, NPU:FP16}. To execute model with the default input precision, select ACCURACY precision type. | -| num_of_threads | string | Any unsigned positive number other than 0 | size_t | Overrides the accelerator default value of number of threads with this value at runtime. If this option is not explicitly set, default value of 8 during build time will be used for inference. | -| num_streams | string | Any unsigned positive number other than 0 | size_t | Overrides the accelerator default streams with this value at runtime. If this option is not explicitly set, default value of 1, performance for latency is used during build time will be used for inference. | -| cache_dir | string | Any valid string path on the hardware target | string | Explicitly specify the path to save and load the blobs enabling model caching feature.| -| context | string | OpenCL Context | void* | This option is only available when OpenVINO EP is built with OpenCL flags enabled. It takes in the remote context i.e the cl_context address as a void pointer.| -| enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). | -| enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. | -| load_config | string | Any custom JSON path | string | This option enables a feature for loading custom JSON OV config during runtime which sets OV parameters. | - - -Valid Hetero or Multi or Auto Device combinations: -`HETERO:,...` -The `device` can be any of these devices from this list ['CPU','GPU', 'NPU'] - -A minimum of two DEVICE_TYPE'S should be specified for a valid HETERO, MULTI, or AUTO Device Build. - -Example: -HETERO:GPU,CPU AUTO:GPU,CPU MULTI:GPU,CPU - -Deprecated device_type option : -CPU_FP32, GPU_FP32, GPU_FP16, NPU_FP16 are no more supported. They will be deprecated in the future release. Kindly upgrade to latest device_type and precision option. - -## Support Coverage - -**ONNX Layers supported using OpenVINO** - -The table below shows the ONNX layers supported and validated using OpenVINO™ Execution Provider.The below table also lists the Intel hardware support for each of the layers. CPU refers to Intel® -Atom, Core, and Xeon processors. GPU refers to the Intel Integrated Graphics. Intel Discrete Graphics. For NPU if an op is not supported we fallback to CPU. - -| **ONNX Layers** | **CPU** | **GPU** | -| --- | --- | --- | -| Abs | Yes | Yes | -| Acos | Yes | Yes | -| Acosh | Yes | Yes | -| Add | Yes | Yes | -| And | Yes | Yes | -| ArgMax | Yes | Yes | -| ArgMin | Yes | Yes | -| Asin | Yes | Yes | -| Asinh | Yes | Yes | -| Atan | Yes | Yes | -| Atanh | Yes | Yes | -| AveragePool | Yes | Yes | -| BatchNormalization | Yes | Yes | -| BitShift | Yes | No | -| Ceil | Yes | Yes | -| Celu | Yes | Yes | -| Cast | Yes | Yes | -| Clip | Yes | Yes | -| Concat | Yes | Yes | -| Constant | Yes | Yes | -| ConstantOfShape | Yes | Yes | -| Conv | Yes | Yes | -| ConvInteger | Yes | Yes | -| ConvTranspose | Yes | Yes | -| Cos | Yes | Yes | -| Cosh | Yes | Yes | -| CumSum | Yes | Yes | -| DepthToSpace | Yes | Yes | -| DequantizeLinear | Yes | Yes | -| Div | Yes | Yes | -| Dropout | Yes | Yes | -| Einsum | Yes | Yes | -| Elu | Yes | Yes | -| Equal | Yes | Yes | -| Erf | Yes | Yes | -| Exp | Yes | Yes | -| Expand | Yes | Yes | -| EyeLike | Yes | No | -| Flatten | Yes | Yes | -| Floor | Yes | Yes | -| Gather | Yes | Yes | -| GatherElements | No | No | -| GatherND | Yes | Yes | -| Gemm | Yes | Yes | -| GlobalAveragePool | Yes | Yes | -| GlobalLpPool | Yes | Yes | -| GlobalMaxPool | Yes | Yes | -| Greater | Yes | Yes | -| GreaterOrEqual | Yes | Yes | -| GridSample | Yes | No | -| HardMax | Yes | Yes | -| HardSigmoid | Yes | Yes | -| Identity | Yes | Yes | -| If | Yes | Yes | -| ImageScaler | Yes | Yes | -| InstanceNormalization | Yes | Yes | -| LeakyRelu | Yes | Yes | -| Less | Yes | Yes | -| LessOrEqual | Yes | Yes | -| Log | Yes | Yes | -| LogSoftMax | Yes | Yes | -| Loop | Yes | Yes | -| LRN | Yes | Yes | -| LSTM | Yes | Yes | -| MatMul | Yes | Yes | -| MatMulInteger | Yes | No | -| Max | Yes | Yes | -| MaxPool | Yes | Yes | -| Mean | Yes | Yes | -| MeanVarianceNormalization | Yes | Yes | -| Min | Yes | Yes | -| Mod | Yes | Yes | -| Mul | Yes | Yes | -| Neg | Yes | Yes | -| NonMaxSuppression | Yes | Yes | -| NonZero | Yes | No | -| Not | Yes | Yes | -| OneHot | Yes | Yes | -| Or | Yes | Yes | -| Pad | Yes | Yes | -| Pow | Yes | Yes | -| PRelu | Yes | Yes | -| QuantizeLinear | Yes | Yes | -| QLinearMatMul | Yes | No | -| Range | Yes | Yes | -| Reciprocal | Yes | Yes | -| ReduceL1 | Yes | Yes | -| ReduceL2 | Yes | Yes | -| ReduceLogSum | Yes | Yes | -| ReduceLogSumExp | Yes | Yes | -| ReduceMax | Yes | Yes | -| ReduceMean | Yes | Yes | -| ReduceMin | Yes | Yes | -| ReduceProd | Yes | Yes | -| ReduceSum | Yes | Yes | -| ReduceSumSquare | Yes | Yes | -| Relu | Yes | Yes | -| Reshape | Yes | Yes | -| Resize | Yes | Yes | -| ReverseSequence | Yes | Yes | -| RoiAlign | Yes | Yes | -| Round | Yes | Yes | -| Scatter | Yes | Yes | -| ScatterElements | Yes | Yes | -| ScatterND | Yes | Yes | -| Selu | Yes | Yes | -| Shape | Yes | Yes | -| Shrink | Yes | Yes | -| Sigmoid | Yes | Yes | -| Sign | Yes | Yes | -| Sin | Yes | Yes | -| Sinh | Yes | No | -| SinFloat | No | No | -| Size | Yes | Yes | -| Slice | Yes | Yes | -| Softmax | Yes | Yes | -| Softplus | Yes | Yes | -| Softsign | Yes | Yes | -| SpaceToDepth | Yes | Yes | -| Split | Yes | Yes | -| Sqrt | Yes | Yes | -| Squeeze | Yes | Yes | -| Sub | Yes | Yes | -| Sum | Yes | Yes | -| Softsign | Yes | No | -| Tan | Yes | Yes | -| Tanh | Yes | Yes | -| ThresholdedRelu | Yes | Yes | -| Tile | Yes | Yes | -| TopK | Yes | Yes | -| Transpose | Yes | Yes | -| Unsqueeze | Yes | Yes | -| Upsample | Yes | Yes | -| Where | Yes | Yes | -| Xor | Yes | Yes | - - -### Topology Support - -Below topologies from ONNX open model zoo are fully supported on OpenVINO™ Execution Provider and many more are supported through sub-graph partitioning. -For NPU if model is not supported we fallback to CPU. - -### Image Classification Networks - -| **MODEL NAME** | **CPU** | **GPU** | -| --- | --- | --- | -| bvlc_alexnet | Yes | Yes | -| bvlc_googlenet | Yes | Yes | -| bvlc_reference_caffenet | Yes | Yes | -| bvlc_reference_rcnn_ilsvrc13 | Yes | Yes | -| emotion ferplus | Yes | Yes | -| densenet121 | Yes | Yes | -| inception_v1 | Yes | Yes | -| inception_v2 | Yes | Yes | -| mobilenetv2 | Yes | Yes | -| resnet18v2 | Yes | Yes | -| resnet34v2 | Yes | Yes | -| resnet101v2 | Yes | Yes | -| resnet152v2 | Yes | Yes | -| resnet50 | Yes | Yes | -| resnet50v2 | Yes | Yes | -| shufflenet | Yes | Yes | -| squeezenet1.1 | Yes | Yes | -| vgg19 | Yes | Yes | -| zfnet512 | Yes | Yes | -| mxnet_arcface | Yes | Yes | - - -### Image Recognition Networks - -| **MODEL NAME** | **CPU** | **GPU** | -| --- | --- | --- | -| mnist | Yes | Yes | - -### Object Detection Networks - -| **MODEL NAME** | **CPU** | **GPU** | -| --- | --- | --- | -| tiny_yolov2 | Yes | Yes | -| yolov3 | Yes | Yes | -| tiny_yolov3 | Yes | Yes | -| mask_rcnn | Yes | No | -| faster_rcnn | Yes | No | -| yolov4 | Yes | Yes | -| yolov5 | Yes | Yes | -| yolov7 | Yes | Yes | -| tiny_yolov7 | Yes | Yes | - -### Image Manipulation Networks - -| **MODEL NAME** | **CPU** | **GPU** | -| --- | --- | --- | -| mosaic | Yes | Yes | -| candy | Yes | Yes | -| cgan | Yes | Yes | -| rain_princess | Yes | Yes | -| pointilism | Yes | Yes | -| udnie | Yes | Yes | - -### Natural Language Processing Networks - -| **MODEL NAME** | **CPU** | **GPU** | -| --- | --- | --- | -| bert-squad | Yes | Yes | -| bert-base-cased | Yes | Yes | -| bert-base-chinese | Yes | Yes | -| bert-base-japanese-char | Yes | Yes | -| bert-base-multilingual-cased | Yes | Yes | -| bert-base-uncased | Yes | Yes | -| distilbert-base-cased | Yes | Yes | -| distilbert-base-multilingual-cased | Yes | Yes | -| distilbert-base-uncased | Yes | Yes | -| distilbert-base-uncased-finetuned-sst-2-english | Yes | Yes | -| gpt2 | Yes | Yes | -| roberta-base | Yes | Yes | -| roberta-base-squad2 | Yes | Yes | -| t5-base | Yes | Yes | -| twitter-roberta-base-sentiment | Yes | Yes | -| xlm-roberta-base | Yes | Yes | - -### Models Supported on NPU - -| **MODEL NAME** | **NPU** | -| --- | --- | -| yolov3 | Yes | -| microsoft_resnet-50 | Yes | -| realesrgan-x4 | Yes | -| timm_inception_v4.tf_in1k | Yes | -| squeezenet1.0-qdq | Yes | -| vgg16 | Yes | -| caffenet-qdq | Yes | -| zfnet512 | Yes | -| shufflenet-v2 | Yes | -| zfnet512-qdq | Yes | -| googlenet | Yes | -| googlenet-qdq | Yes | -| caffenet | Yes | -| bvlcalexnet-qdq | Yes | -| vgg16-qdq | Yes | -| mnist | Yes | -| ResNet101-DUC | Yes | -| shufflenet-v2-qdq | Yes | -| bvlcalexnet | Yes | -| squeezenet1.0 | Yes | - -**Note:** We have added support for INT8 models, quantized with Neural Network Compression Framework (NNCF). To know more about NNCF refer [here](https://github.com/openvinotoolkit/nncf). - -## OpenVINO™ Execution Provider Samples Tutorials - -In order to showcase what you can do with the OpenVINO™ Execution Provider for ONNX Runtime, we have created a few samples that shows how you can get that performance boost you’re looking for with just one additional line of code. - -### Python API -[Object detection with tinyYOLOv2 in Python](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/OpenVINO_EP/tiny_yolo_v2_object_detection) - -[Object detection with YOLOv4 in Python](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/OpenVINO_EP/yolov4_object_detection) - -### C/C++ API -[Image classification with Squeezenet in CPP](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_cxx/OpenVINO_EP) - -### Csharp API -[Object detection with YOLOv3 in C#](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_sharp/OpenVINO_EP/yolov3_object_detection) - -## Blogs/Tutorials - -### Overview of OpenVINO Execution Provider for ONNX Runtime -[OpenVINO Execution Provider](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/faster-inferencing-with-one-line-of-code.html) - -### Tutorial on how to use OpenVINO™ Execution Provider for ONNX Runtime Docker Containers -[Docker Containers](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-docker-container.html) - -### Tutorial on how to use OpenVINO™ Execution Provider for ONNX Runtime python wheel packages -[Python Pip Wheel Packages](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-for-onnx-runtime.html) +--- +title: Intel - OpenVINO™ +description: Instructions to execute OpenVINO™ Execution Provider for ONNX Runtime. +parent: Execution Providers +nav_order: 3 +redirect_from: /docs/reference/execution-providers/OpenVINO-ExecutionProvider +--- + +# OpenVINO™ Execution Provider +{: .no_toc } + +Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution Provider. Please refer to [this](https://software.intel.com/en-us/openvino-toolkit/hardware) page for details on the Intel hardware supported. + +## Contents +{: .no_toc } + +* TOC placeholder +{:toc} + +## Install + +Intel publishes pre-built OpenVINO™ Execution Provider packages for ONNX Runtime with each release. +* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.8 Release](https://github.com/intel/onnxruntime/releases) +* Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/) + +## Requirements + + +ONNX Runtime OpenVINO™ Execution Provider is compatible with three latest releases of OpenVINO™. + +|ONNX Runtime|OpenVINO™|Notes| +|---|---|---| +|1.23.0|2025.3|[Details - Placeholder]()| +|1.22.0|2025.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.7)| +|1.21.0|2025.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.6)| + +## Build + +For build instructions, refer [BUILD page](../build/eps.md#openvino). + +## Usage + +**Python Package Installation** + +For Python users, install the onnxruntime-openvino package: +``` +pip install onnxruntime-openvino +``` + +**Set OpenVINO™ Environment Variables** + +To use OpenVINO™ Execution Provider with any programming language (Python, C++, C#), you must set up the OpenVINO™ Environment Variables using the full installer package of OpenVINO™. + +* **Windows** +``` +C:\ \setupvars.bat +``` +* **Linux** +``` +$ source /setupvars.sh +``` + + + +**Set OpenVINO™ Environment for C#** + +To use csharp api for openvino execution provider create a custom nuget package. Follow the instructions [here](../build/inferencing.md#build-nuget-packages) to install prerequisites for nuget creation. Once prerequisites are installed follow the instructions to [build openvino execution provider](../build/eps.md#openvino) and add an extra flag `--build_nuget` to create nuget packages. Two nuget packages will be created Microsoft.ML.OnnxRuntime.Managed and Intel.ML.OnnxRuntime.Openvino. + +# OpenVINO Execution Provider Configuration + +## Table of Contents +- [Provider Options](#configuration-options) +- [Provider Descriptions](#configuration-descriptions) +- [Examples](#examples) + +## Configuration Options + +Runtime parameters set during OpenVINO Execution Provider initialization to control the inference flow. + + +| **Key** | **Type** | **Allowable Values** | **Value Type** | **Description** | +|---------|----------|---------------------|----------------|-----------------| +| [**device_type**](#device_type) | string | CPU, NPU, GPU, GPU.0, GPU.1, HETERO, MULTI, AUTO | string | Specify intel target H/W device | +| [**precision**](#precision) | string | FP32, FP16, ACCURACY | string | Set inference precision level | +| [**num_of_threads**](#num_of_threads--num_streams) | string | Any positive integer > 0 | size_t | Control number of inference threads | +| [**num_streams**](#num_of_threads--num_streams) | string | Any positive integer > 0 | size_t | Set parallel execution streams for throughput | +| [**cache_dir**](#cache_dir) | string | Valid filesystem path | string | Enable openvino model caching for improved latency | +| [**load_config**](#load_config) | string | JSON file path | string | Load and set custom/HW specific OpenVINO properties from JSON | +| [**enable_qdq_optimizer**](#enable_qdq_optimizer) | string | True/False | boolean | Enable QDQ optimization for NPU | +| [**disable_dynamic_shapes**](#disable_dynamic_shapes--reshape_input) | string | True/False | boolean | Convert dynamic models to static shapes | +| [**model_priority**](#model_priority) | string | LOW, MEDIUM, HIGH, DEFAULT | string | Configure model resource allocation priority | +| [**reshape_input**](#disable_dynamic_shapes--reshape_input) | string | input_name[shape_bounds] | string | Specify upper and lower bound for dynamic shaped inputs for improved performance with NPU | +| [**layout**](#layout) | string | input_name[layout_format] | string | Specify input/output tensor layout format | + +Refer to [Examples](#examples) for usage. + +## Configuration Descriptions + +### `device_type` + +Specify the target hardware device for compilation and inference execution. The OpenVINO Execution Provider supports the following devices for deep learning model execution: **CPU**, **GPU**, and **NPU**. Configuration supports both single device and multi-device setups, enabling: +- Automatic device selection +- Heterogeneous inference across devices +- Multi-device parallel execution + +**Supported Devices:** + +- `CPU` — Intel CPU +- `GPU` — Intel integrated GPU or discrete GPU +- `GPU.0`, `GPU.1` — Specific GPU when multiple GPUs are available +- `NPU` — Intel Neural Processing Unit + +**Multi-Device Configurations:** + +OpenVINO offers the option of running inference with the following inference modes: + +- `AUTO:,...` — Automatic Device Selection +- `HETERO:,...` — Heterogeneous Inference +- `MULTI:,...` — Multi-Device Execution + +Minimum **two devices** required for multi-device configurations. + +**Examples:** +- `AUTO:GPU,NPU,CPU` +- `HETERO:GPU,CPU` +- `MULTI:GPU,CPU` + +**Automatic Device Selection** + +Automatically selects the best device available for the given task. It offers many additional options and optimizations, including inference on multiple devices at the same time. AUTO internally recognizes CPU, integrated GPU, discrete Intel GPUs, and NPU, then assigns inference requests to the best-suited device. + +**Heterogeneous Inference** + +Enables splitting inference among several devices automatically. If one device doesn't support certain operations, HETERO distributes the workload across multiple devices, utilizing accelerator power for heavy operations while falling back to CPU for unsupported layers. + +**Multi-Device Execution** + +Runs the same model on multiple devices in parallel to improve device utilization. MULTI automatically groups inference requests to improve throughput and performance consistency via load distribution. + +> **Note:** Deprecated options `CPU_FP32`, `GPU_FP32`, `GPU_FP16`, `NPU_FP16` are no longer supported. Use `device_type` and `precision` separately. + +--- + +### `precision` + +- Controls numerical precision during inference, balancing **performance** and **accuracy**. + +**Precision Support on Devices:** + +- **CPU:** `FP32` +- **GPU:** `FP32`, `FP16`, `ACCURACY` +- **NPU:** `FP16` + +**ACCURACY Mode** + +- Maintains original model precision without conversion, ensuring maximum accuracy. + +> **Note 1:** `FP16` generally provides ~2x better performance on GPU/NPU with minimal accuracy loss. + +> **Note 2:** Can be configured via `load_config` using the `INFERENCE_PRECISION_HINT` property. + + +--- +### `num_of_threads` & `num_streams` + +**Multi-Threading** + +- Controls the number of inference threads for CPU execution (default: `8`). OpenVINO EP provides thread-safe inference across all devices. + +> **Note:** Can be configured via `load_config` using the `INFERENCE_NUM_THREADS` property. + +**Multi-Stream Execution** + +Manages parallel inference streams for throughput optimization (default: `1` for latency-focused execution). + +- **Multiple streams:** Higher throughput for batch workloads +- **Single stream:** Lower latency for real-time applications + +> **Note:** Can be configured via `load_config` using the `NUM_STREAMS` property. + +--- + +### `cache_dir` + +Enables model caching to significantly reduce subsequent load times. Supports CPU, NPU, and GPU devices with kernel caching on iGPU/dGPU. + +**Benefits** +- Saves compiled models and `cl_cache` files for dynamic shapes +- Eliminates recompilation overhead on subsequent runs +- Particularly useful for complex models and frequent application restarts + +> **Note:** Can be configured via `load_config` using the `CACHE_DIR` property. + +--- + +### `load_config` + +- Loads custom OpenVINO properties from JSON configuration file during runtime. + +**JSON Format:** + +```json +{ + "DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"} +} +``` + +**Validation** + +- Invalid property keys are ignored with warnings. Invalid values cause execution exceptions. Immutable properties are skipped. + +**Common Properties:** + +`PERFORMANCE_HINT`, `EXECUTION_MODE_HINT`, `LOG_LEVEL`, `CACHE_DIR`, `INFERENCE_PRECISION_HINT` + +--- + + +### `enable_qdq_optimizer` + +NPU-specific optimization for Quantize-Dequantize (QDQ) operations in the inference graph. This optimizer enhances ORT quantized models by: + +- Retaining QDQ operations only for supported operators +- Improving inference performance on NPU devices +- Maintaining model accuracy while optimizing execution + +--- + +### `disable_dynamic_shapes` & `reshape_input` + +**Dynamic Shape Management** + +- Handles models with variable input dimensions. +- Provides the option to convert dynamic shapes to static shapes when beneficial for performance optimization. + +**NPU Shape Bounds Configuration** + +- Use `reshape_input` to explicitly set dynamic shape bounds for NPU devices. + +**Format:** +- Range bounds: `input_name[lower..upper]` +- Fixed shape: `input_name[fixed_shape]` + +This configuration is required for optimal NPU memory allocation and management. + +--- + +### `model_priority` + +Configures resource allocation priority for multi-model deployment scenarios. + +**Priority Levels:** + +| Level | Description | +|-------|-------------| +| **HIGH** | Maximum resource allocation for critical models | +| **MEDIUM** | Balanced resource sharing across models | +| **LOW** | Minimal allocation, yields resources to higher priority models | +| **DEFAULT** | System-determined priority based on workload | + +> **Note:** Can be configured via `load_config` using the `MODEL_PRIORITY` property. + +--- + +### `layout` + +- Provides explicit control over tensor memory layout for performance optimization. +- Helps OpenVINO optimize memory access patterns and tensor operations. + +**Layout Characters:** + +- **N:** Batch dimension +- **C:** Channel dimension +- **H:** Height dimension +- **W:** Width dimension +- **D:** Depth dimension +- **T:** Time dimension +- **?:** Unknown/dynamic dimension + +**Format:** + +`input_name[LAYOUT],output_name[LAYOUT]` + +**Example:** + +`input_image[NCHW],output_tensor[NC]` + +--- + +## Examples + +### Example 1 + +```python +import onnxruntime as ort + +# Multi-device with caching and threading optimization +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{ + 'device_type': 'AUTO:GPU,NPU,CPU', + 'precision': 'FP16', + 'num_of_threads': '8', + 'num_streams': '4', + 'cache_dir': './ov_cache' + }] +) + +# Command line equivalent +# onnxruntime_perf_test.exe -e openvino -i "device_type|AUTO:GPU,NPU,CPU precision|FP16 num_of_threads|8 num_streams|4 cache_dir|./ov_cache" model.onnx +``` + +### Example 2 + +```python +import onnxruntime as ort + +# NPU-optimized with custom config and shape management +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{ + 'device_type': 'HETERO:NPU,CPU', + 'load_config': 'custom_config.json', + 'enable_qdq_optimizer': 'True', + 'disable_dynamic_shapes': 'True', + 'model_priority': 'HIGH', + 'reshape_input': 'data[1,3,224,224..448]', + 'layout': 'data[NCHW],output[NC]' + }] +) + +# Example custom_config.json +{ + "NPU": { + "LOG_LEVEL": "LOG_DEBUG", + "PERFORMANCE_HINT": "THROUGHPUT" + }, + "CPU": { + "EXECUTION_MODE_HINT": "ACCURACY" + } +} + +# Command line equivalent +# onnxruntime_perf_test.exe -e openvino -i "device_type|HETERO:NPU,CPU load_config|custom_config.json enable_qdq_optimizer|True disable_dynamic_shapes|True model_priority|HIGH reshape_input|data[1,3,224,224..448] layout|data[NCHW],output[NC]" model.onnx +``` + +--- +### Python API +Key-Value pairs for config options can be set using InferenceSession API as follow:- + +``` +session = onnxruntime.InferenceSession(, providers=['OpenVINOExecutionProvider'], provider_options=[{Key1 : Value1, Key2 : Value2, ...}]) +``` +*Note that the releases from (ORT 1.10) will require explicitly setting the providers parameter if you want to use execution providers other than the default CPU provider (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.* + +--- +### C/C++ API 2.0 +The session configuration options are passed to SessionOptionsAppendExecutionProvider API as shown in an example below for GPU device type: + +``` +std::unordered_map options; +options[device_type] = "GPU"; +options[precision] = "FP32"; +options[num_of_threads] = "8"; +options[num_streams] = "8"; +options[cache_dir] = ""; +options[context] = "0x123456ff"; +options[enable_qdq_optimizer] = "True"; +options[load_config] = "config_path.json"; +session_options.AppendExecutionProvider_OpenVINO_V2(options); +``` +--- +### C/C++ Legacy API +Note: This API is no longer officially supported. Users are requested to move to V2 API. + +The session configuration options are passed to SessionOptionsAppendExecutionProvider_OpenVINO() API as shown in an example below for GPU device type: + +``` +OrtOpenVINOProviderOptions options; +options.device_type = "GPU_FP32"; +options.num_of_threads = 8; +options.cache_dir = ""; +options.context = 0x123456ff; +options.enable_opencl_throttling = false; +SessionOptions.AppendExecutionProvider_OpenVINO(session_options, &options); +``` +--- + +### Onnxruntime Graph level Optimization +OpenVINO™ backend performs hardware, dependent as well as independent optimizations on the graph to infer it on the target hardware with best possible performance. In most cases it has been observed that passing the ONNX input graph as it is without explicit optimizations would lead to best possible optimizations at kernel level by OpenVINO™. For this reason, it is advised to turn off high level optimizations performed by ONNX Runtime for OpenVINO™ Execution Provider. This can be done using SessionOptions() as shown below:- + +* #### Python API + ``` + options = onnxruntime.SessionOptions() + options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL + sess = onnxruntime.InferenceSession(, options) + ``` + +* #### C/C++ API + ``` + SessionOptions::SetGraphOptimizationLevel(ORT_DISABLE_ALL); + ``` +--- +## Support Coverage + +**ONNX Layers supported using OpenVINO** + +The table below shows the ONNX layers supported and validated using OpenVINO™ Execution Provider.The below table also lists the Intel hardware support for each of the layers. CPU refers to Intel® +Atom, Core, and Xeon processors. GPU refers to the Intel Integrated Graphics. Intel Discrete Graphics. For NPU if an op is not supported we fallback to CPU. + +| **ONNX Layers** | **CPU** | **GPU** | +| --- | --- | --- | +| Abs | Yes | Yes | +| Acos | Yes | Yes | +| Acosh | Yes | Yes | +| Add | Yes | Yes | +| And | Yes | Yes | +| ArgMax | Yes | Yes | +| ArgMin | Yes | Yes | +| Asin | Yes | Yes | +| Asinh | Yes | Yes | +| Atan | Yes | Yes | +| Atanh | Yes | Yes | +| AveragePool | Yes | Yes | +| BatchNormalization | Yes | Yes | +| BitShift | Yes | No | +| Ceil | Yes | Yes | +| Celu | Yes | Yes | +| Cast | Yes | Yes | +| Clip | Yes | Yes | +| Concat | Yes | Yes | +| Constant | Yes | Yes | +| ConstantOfShape | Yes | Yes | +| Conv | Yes | Yes | +| ConvInteger | Yes | Yes | +| ConvTranspose | Yes | Yes | +| Cos | Yes | Yes | +| Cosh | Yes | Yes | +| CumSum | Yes | Yes | +| DepthToSpace | Yes | Yes | +| DequantizeLinear | Yes | Yes | +| Div | Yes | Yes | +| Dropout | Yes | Yes | +| Einsum | Yes | Yes | +| Elu | Yes | Yes | +| Equal | Yes | Yes | +| Erf | Yes | Yes | +| Exp | Yes | Yes | +| Expand | Yes | Yes | +| EyeLike | Yes | No | +| Flatten | Yes | Yes | +| Floor | Yes | Yes | +| Gather | Yes | Yes | +| GatherElements | No | No | +| GatherND | Yes | Yes | +| Gemm | Yes | Yes | +| GlobalAveragePool | Yes | Yes | +| GlobalLpPool | Yes | Yes | +| GlobalMaxPool | Yes | Yes | +| Greater | Yes | Yes | +| GreaterOrEqual | Yes | Yes | +| GridSample | Yes | No | +| HardMax | Yes | Yes | +| HardSigmoid | Yes | Yes | +| Identity | Yes | Yes | +| If | Yes | Yes | +| ImageScaler | Yes | Yes | +| InstanceNormalization | Yes | Yes | +| LeakyRelu | Yes | Yes | +| Less | Yes | Yes | +| LessOrEqual | Yes | Yes | +| Log | Yes | Yes | +| LogSoftMax | Yes | Yes | +| Loop | Yes | Yes | +| LRN | Yes | Yes | +| LSTM | Yes | Yes | +| MatMul | Yes | Yes | +| MatMulInteger | Yes | No | +| Max | Yes | Yes | +| MaxPool | Yes | Yes | +| Mean | Yes | Yes | +| MeanVarianceNormalization | Yes | Yes | +| Min | Yes | Yes | +| Mod | Yes | Yes | +| Mul | Yes | Yes | +| Neg | Yes | Yes | +| NonMaxSuppression | Yes | Yes | +| NonZero | Yes | No | +| Not | Yes | Yes | +| OneHot | Yes | Yes | +| Or | Yes | Yes | +| Pad | Yes | Yes | +| Pow | Yes | Yes | +| PRelu | Yes | Yes | +| QuantizeLinear | Yes | Yes | +| QLinearMatMul | Yes | No | +| Range | Yes | Yes | +| Reciprocal | Yes | Yes | +| ReduceL1 | Yes | Yes | +| ReduceL2 | Yes | Yes | +| ReduceLogSum | Yes | Yes | +| ReduceLogSumExp | Yes | Yes | +| ReduceMax | Yes | Yes | +| ReduceMean | Yes | Yes | +| ReduceMin | Yes | Yes | +| ReduceProd | Yes | Yes | +| ReduceSum | Yes | Yes | +| ReduceSumSquare | Yes | Yes | +| Relu | Yes | Yes | +| Reshape | Yes | Yes | +| Resize | Yes | Yes | +| ReverseSequence | Yes | Yes | +| RoiAlign | Yes | Yes | +| Round | Yes | Yes | +| Scatter | Yes | Yes | +| ScatterElements | Yes | Yes | +| ScatterND | Yes | Yes | +| Selu | Yes | Yes | +| Shape | Yes | Yes | +| Shrink | Yes | Yes | +| Sigmoid | Yes | Yes | +| Sign | Yes | Yes | +| Sin | Yes | Yes | +| Sinh | Yes | No | +| SinFloat | No | No | +| Size | Yes | Yes | +| Slice | Yes | Yes | +| Softmax | Yes | Yes | +| Softplus | Yes | Yes | +| Softsign | Yes | Yes | +| SpaceToDepth | Yes | Yes | +| Split | Yes | Yes | +| Sqrt | Yes | Yes | +| Squeeze | Yes | Yes | +| Sub | Yes | Yes | +| Sum | Yes | Yes | +| Softsign | Yes | No | +| Tan | Yes | Yes | +| Tanh | Yes | Yes | +| ThresholdedRelu | Yes | Yes | +| Tile | Yes | Yes | +| TopK | Yes | Yes | +| Transpose | Yes | Yes | +| Unsqueeze | Yes | Yes | +| Upsample | Yes | Yes | +| Where | Yes | Yes | +| Xor | Yes | Yes | + + +### Topology Support + +Below topologies from ONNX open model zoo are fully supported on OpenVINO™ Execution Provider and many more are supported through sub-graph partitioning. +For NPU if model is not supported we fallback to CPU. + +### Image Classification Networks + +| **MODEL NAME** | **CPU** | **GPU** | +| --- | --- | --- | +| bvlc_alexnet | Yes | Yes | +| bvlc_googlenet | Yes | Yes | +| bvlc_reference_caffenet | Yes | Yes | +| bvlc_reference_rcnn_ilsvrc13 | Yes | Yes | +| emotion ferplus | Yes | Yes | +| densenet121 | Yes | Yes | +| inception_v1 | Yes | Yes | +| inception_v2 | Yes | Yes | +| mobilenetv2 | Yes | Yes | +| resnet18v2 | Yes | Yes | +| resnet34v2 | Yes | Yes | +| resnet101v2 | Yes | Yes | +| resnet152v2 | Yes | Yes | +| resnet50 | Yes | Yes | +| resnet50v2 | Yes | Yes | +| shufflenet | Yes | Yes | +| squeezenet1.1 | Yes | Yes | +| vgg19 | Yes | Yes | +| zfnet512 | Yes | Yes | +| mxnet_arcface | Yes | Yes | + + +### Image Recognition Networks + +| **MODEL NAME** | **CPU** | **GPU** | +| --- | --- | --- | +| mnist | Yes | Yes | + +### Object Detection Networks + +| **MODEL NAME** | **CPU** | **GPU** | +| --- | --- | --- | +| tiny_yolov2 | Yes | Yes | +| yolov3 | Yes | Yes | +| tiny_yolov3 | Yes | Yes | +| mask_rcnn | Yes | No | +| faster_rcnn | Yes | No | +| yolov4 | Yes | Yes | +| yolov5 | Yes | Yes | +| yolov7 | Yes | Yes | +| tiny_yolov7 | Yes | Yes | + +### Image Manipulation Networks + +| **MODEL NAME** | **CPU** | **GPU** | +| --- | --- | --- | +| mosaic | Yes | Yes | +| candy | Yes | Yes | +| cgan | Yes | Yes | +| rain_princess | Yes | Yes | +| pointilism | Yes | Yes | +| udnie | Yes | Yes | + +### Natural Language Processing Networks + +| **MODEL NAME** | **CPU** | **GPU** | +| --- | --- | --- | +| bert-squad | Yes | Yes | +| bert-base-cased | Yes | Yes | +| bert-base-chinese | Yes | Yes | +| bert-base-japanese-char | Yes | Yes | +| bert-base-multilingual-cased | Yes | Yes | +| bert-base-uncased | Yes | Yes | +| distilbert-base-cased | Yes | Yes | +| distilbert-base-multilingual-cased | Yes | Yes | +| distilbert-base-uncased | Yes | Yes | +| distilbert-base-uncased-finetuned-sst-2-english | Yes | Yes | +| gpt2 | Yes | Yes | +| roberta-base | Yes | Yes | +| roberta-base-squad2 | Yes | Yes | +| t5-base | Yes | Yes | +| twitter-roberta-base-sentiment | Yes | Yes | +| xlm-roberta-base | Yes | Yes | + +### Models Supported on NPU + +| **MODEL NAME** | **NPU** | +| --- | --- | +| yolov3 | Yes | +| microsoft_resnet-50 | Yes | +| realesrgan-x4 | Yes | +| timm_inception_v4.tf_in1k | Yes | +| squeezenet1.0-qdq | Yes | +| vgg16 | Yes | +| caffenet-qdq | Yes | +| zfnet512 | Yes | +| shufflenet-v2 | Yes | +| zfnet512-qdq | Yes | +| googlenet | Yes | +| googlenet-qdq | Yes | +| caffenet | Yes | +| bvlcalexnet-qdq | Yes | +| vgg16-qdq | Yes | +| mnist | Yes | +| ResNet101-DUC | Yes | +| shufflenet-v2-qdq | Yes | +| bvlcalexnet | Yes | +| squeezenet1.0 | Yes | + +**Note:** We have added support for INT8 models, quantized with Neural Network Compression Framework (NNCF). To know more about NNCF refer [here](https://github.com/openvinotoolkit/nncf). + +--- + +# OpenVINO™ Execution Provider Samples & Tutorials + +In order to showcase what you can do with the OpenVINO™ Execution Provider for ONNX Runtime, we have created a few samples that show how you can get that performance boost you're looking for with just one additional line of code. + +## Samples + +### Python API + +- [Object detection with tinyYOLOv2 in Python](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/OpenVINO_EP/tiny_yolo_v2_object_detection) +- [Object detection with YOLOv4 in Python](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/OpenVINO_EP/yolov4_object_detection) + +### C/C++ API + +- [Image classification with Squeezenet in C++](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_cxx/OpenVINO_EP) + +### C# API + +- [Object detection with YOLOv3 in C#](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_sharp/OpenVINO_EP/yolov3_object_detection) + +## Blogs & Tutorials + +### Overview of OpenVINO Execution Provider for ONNX Runtime + +[OpenVINO Execution Provider](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/faster-inferencing-with-one-line-of-code.html) - Learn about faster inferencing with one line of code + +### Docker Containers + +[Tutorial: Using OpenVINO™ Execution Provider for ONNX Runtime Docker Containers](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-docker-container.html) + +### Python Pip Wheel Packages + +[Tutorial: Using OpenVINO™ Execution Provider for ONNX Runtime Python Wheel Packages](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-for-onnx-runtime.html) + +---