|  | 
| 1 |  | -# Vulkan Backend | 
|  | 1 | +# The ExecuTorch Vulkan Backend | 
| 2 | 2 | 
 | 
| 3 |  | -The ExecuTorch Vulkan delegate is a native GPU delegate for ExecuTorch that is | 
| 4 |  | -built on top of the cross-platform Vulkan GPU API standard. It is primarily | 
| 5 |  | -designed to leverage the GPU to accelerate model inference on Android devices, | 
| 6 |  | -but can be used on any platform that supports an implementation of Vulkan: | 
| 7 |  | -laptops, servers, and edge devices. | 
| 8 |  | - | 
| 9 |  | -::::{note} | 
| 10 |  | -The Vulkan delegate is currently under active development, and its components | 
| 11 |  | -are subject to change. | 
| 12 |  | -:::: | 
| 13 |  | - | 
| 14 |  | -## What is Vulkan? | 
| 15 |  | - | 
| 16 |  | -Vulkan is a low-level GPU API specification developed as a successor to OpenGL. | 
| 17 |  | -It is designed to offer developers more explicit control over GPUs compared to | 
| 18 |  | -previous specifications in order to reduce overhead and maximize the | 
| 19 |  | -capabilities of the modern graphics hardware. | 
| 20 |  | - | 
| 21 |  | -Vulkan has been widely adopted among GPU vendors, and most modern GPUs (both | 
| 22 |  | -desktop and mobile) in the market support Vulkan. Vulkan is also included in | 
| 23 |  | -Android from Android 7.0 onwards. | 
| 24 |  | - | 
| 25 |  | -**Note that Vulkan is a GPU API, not a GPU Math Library**. That is to say it | 
| 26 |  | -provides a way to execute compute and graphics operations on a GPU, but does not | 
| 27 |  | -come with a built-in library of performant compute kernels. | 
| 28 |  | - | 
| 29 |  | -## The Vulkan Compute Library | 
| 30 |  | - | 
| 31 |  | -The ExecuTorch Vulkan Delegate is a wrapper around a standalone runtime known as | 
| 32 |  | -the **Vulkan Compute Library**. The aim of the Vulkan Compute Library is to | 
| 33 |  | -provide GPU implementations for PyTorch operators via GLSL compute shaders. | 
| 34 |  | - | 
| 35 |  | -The Vulkan Compute Library is a fork/iteration of the [PyTorch Vulkan Backend](https://pytorch.org/tutorials/prototype/vulkan_workflow.html). | 
| 36 |  | -The core components of the PyTorch Vulkan backend were forked into ExecuTorch | 
| 37 |  | -and adapted for an AOT graph-mode style of model inference (as opposed to | 
| 38 |  | -PyTorch which adopted an eager execution style of model inference). | 
| 39 |  | - | 
| 40 |  | -The components of the Vulkan Compute Library are contained in the | 
| 41 |  | -`executorch/backends/vulkan/runtime/` directory. The core components are listed | 
| 42 |  | -and described below: | 
| 43 |  | - | 
| 44 |  | -``` | 
| 45 |  | -runtime/ | 
| 46 |  | -├── api/ .................... Wrapper API around Vulkan to manage Vulkan objects | 
| 47 |  | -└── graph/ .................. ComputeGraph class which implements graph mode inference | 
| 48 |  | -    └── ops/ ................ Base directory for operator implementations | 
| 49 |  | -        ├── glsl/ ........... GLSL compute shaders | 
| 50 |  | -        │   ├── *.glsl | 
| 51 |  | -        │   └── conv2d.glsl | 
| 52 |  | -        └── impl/ ........... C++ code to dispatch GPU compute shaders | 
| 53 |  | -            ├── *.cpp | 
| 54 |  | -            └── Conv2d.cpp | 
| 55 |  | -``` | 
| 56 |  | - | 
| 57 |  | -## Features | 
| 58 |  | - | 
| 59 |  | -The Vulkan delegate currently supports the following features: | 
| 60 |  | - | 
| 61 |  | -* **Memory Planning** | 
| 62 |  | -  * Intermediate tensors whose lifetimes do not overlap will share memory allocations. This reduces the peak memory usage of model inference. | 
| 63 |  | -* **Capability Based Partitioning**: | 
| 64 |  | -  * A graph can be partially lowered to the Vulkan delegate via a partitioner, which will identify nodes (i.e. operators) that are supported by the Vulkan delegate and lower only supported subgraphs | 
| 65 |  | -* **Support for upper-bound dynamic shapes**: | 
| 66 |  | -  * Tensors can change shape between inferences as long as its current shape is smaller than the bounds specified during lowering | 
| 67 |  | - | 
| 68 |  | -In addition to increasing operator coverage, the following features are | 
| 69 |  | -currently in development: | 
| 70 |  | - | 
| 71 |  | -* **Quantization Support** | 
| 72 |  | -  * We are currently working on support for 8-bit dynamic quantization, with plans to extend to other quantization schemes in the future. | 
| 73 |  | -* **Memory Layout Management** | 
| 74 |  | -  * Memory layout is an important factor to optimizing performance. We plan to introduce graph passes to introduce memory layout transitions throughout a graph to optimize memory-layout sensitive operators such as Convolution and Matrix Multiplication. | 
| 75 |  | -* **Selective Build** | 
| 76 |  | -  * We plan to make it possible to control build size by selecting which operators/shaders you want to build with | 
| 77 |  | - | 
| 78 |  | -## End to End Example | 
| 79 |  | - | 
| 80 |  | -To further understand the features of the Vulkan Delegate and how to use it, | 
| 81 |  | -consider the following end to end example with a simple single operator model. | 
| 82 |  | - | 
| 83 |  | -### Compile and lower a model to the Vulkan Delegate | 
| 84 |  | - | 
| 85 |  | -Assuming ExecuTorch has been set up and installed, the following script can be | 
| 86 |  | -used to produce a lowered MobileNet V2 model as `vulkan_mobilenetv2.pte`. | 
| 87 |  | - | 
| 88 |  | -Once ExecuTorch has been set up and installed, the following script can be used | 
| 89 |  | -to generate a simple model and lower it to the Vulkan delegate. | 
| 90 |  | - | 
| 91 |  | -``` | 
| 92 |  | -# Note: this script is the same as the script from the "Setting up ExecuTorch" | 
| 93 |  | -# page, with one minor addition to lower to the Vulkan backend. | 
| 94 |  | -import torch | 
| 95 |  | -from torch.export import export | 
| 96 |  | -from executorch.exir import to_edge | 
| 97 |  | -
 | 
| 98 |  | -from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner | 
| 99 |  | -
 | 
| 100 |  | -# Start with a PyTorch model that adds two input tensors (matrices) | 
| 101 |  | -class Add(torch.nn.Module): | 
| 102 |  | -  def __init__(self): | 
| 103 |  | -    super(Add, self).__init__() | 
| 104 |  | -
 | 
| 105 |  | -  def forward(self, x: torch.Tensor, y: torch.Tensor): | 
| 106 |  | -      return x + y | 
| 107 |  | -
 | 
| 108 |  | -# 1. torch.export: Defines the program with the ATen operator set. | 
| 109 |  | -aten_dialect = export(Add(), (torch.ones(1), torch.ones(1))) | 
| 110 |  | -
 | 
| 111 |  | -# 2. to_edge: Make optimizations for Edge devices | 
| 112 |  | -edge_program = to_edge(aten_dialect) | 
| 113 |  | -# 2.1 Lower to the Vulkan backend | 
| 114 |  | -edge_program = edge_program.to_backend(VulkanPartitioner()) | 
| 115 |  | -
 | 
| 116 |  | -# 3. to_executorch: Convert the graph to an ExecuTorch program | 
| 117 |  | -executorch_program = edge_program.to_executorch() | 
| 118 |  | -
 | 
| 119 |  | -# 4. Save the compiled .pte program | 
| 120 |  | -with open("vk_add.pte", "wb") as file: | 
| 121 |  | -    file.write(executorch_program.buffer) | 
| 122 |  | -``` | 
| 123 |  | - | 
| 124 |  | -Like other ExecuTorch delegates, a model can be lowered to the Vulkan Delegate | 
| 125 |  | -using the `to_backend()` API. The Vulkan Delegate implements the | 
| 126 |  | -`VulkanPartitioner` class which identifies nodes (i.e. operators) in the graph | 
| 127 |  | -that are supported by the Vulkan delegate, and separates compatible sections of | 
| 128 |  | -the model to be executed on the GPU. | 
| 129 |  | - | 
| 130 |  | -This means the a model can be lowered to the Vulkan delegate even if it contains | 
| 131 |  | -some unsupported operators. This will just mean that only parts of the graph | 
| 132 |  | -will be executed on the GPU. | 
| 133 |  | - | 
| 134 |  | - | 
| 135 |  | -::::{note} | 
| 136 |  | -The [supported ops list](https://github.com/pytorch/executorch/blob/main/backends/vulkan/op_registry.py#L194) | 
| 137 |  | -Vulkan partitioner code can be inspected to examine which ops are currently | 
| 138 |  | -implemented in the Vulkan delegate. | 
| 139 |  | -:::: | 
| 140 |  | - | 
| 141 |  | -### Build Vulkan Delegate libraries | 
| 142 |  | - | 
| 143 |  | -The easiest way to build and test the Vulkan Delegate is to build for Android | 
| 144 |  | -and test on a local Android device. Android devices have built in support for | 
| 145 |  | -Vulkan, and the Android NDK ships with a GLSL compiler which is needed to | 
| 146 |  | -compile the Vulkan Compute Library's GLSL compute shaders. | 
| 147 |  | - | 
| 148 |  | -The Vulkan Delegate libraries can be built by setting `-DEXECUTORCH_BUILD_VULKAN=ON` | 
| 149 |  | -when building with CMake. | 
| 150 |  | - | 
| 151 |  | -First, make sure that you have the Android NDK installed; any NDK version past | 
| 152 |  | -NDK r19c should work. Note that the examples in this doc have been validated with | 
| 153 |  | -NDK r28c. The Android SDK should also be installed so that you have access to `adb`. | 
| 154 |  | - | 
| 155 |  | -The instructions in this page assumes that the following environment variables | 
| 156 |  | -are set. | 
| 157 |  | - | 
| 158 |  | -```shell | 
| 159 |  | -export ANDROID_NDK=<path_to_ndk> | 
| 160 |  | -# Select the appropriate Android ABI for your device | 
| 161 |  | -export ANDROID_ABI=arm64-v8a | 
| 162 |  | -# All subsequent commands should be performed from ExecuTorch repo root | 
| 163 |  | -cd <path_to_executorch_root> | 
| 164 |  | -# Make sure adb works | 
| 165 |  | -adb --version | 
| 166 |  | -``` | 
| 167 |  | - | 
| 168 |  | -To build and install ExecuTorch libraries (for Android) with the Vulkan | 
| 169 |  | -Delegate: | 
| 170 |  | - | 
| 171 |  | -```shell | 
| 172 |  | -# From executorch root directory | 
| 173 |  | -(rm -rf cmake-android-out && \ | 
| 174 |  | -  pp cmake . -DCMAKE_INSTALL_PREFIX=cmake-android-out \ | 
| 175 |  | -    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ | 
| 176 |  | -    -DANDROID_ABI=$ANDROID_ABI \ | 
| 177 |  | -    -DEXECUTORCH_BUILD_VULKAN=ON \ | 
| 178 |  | -    -DPYTHON_EXECUTABLE=python \ | 
| 179 |  | -    -Bcmake-android-out && \ | 
| 180 |  | -  cmake --build cmake-android-out -j16 --target install) | 
| 181 |  | -``` | 
| 182 |  | - | 
| 183 |  | -### Run the Vulkan model on device | 
| 184 |  | - | 
| 185 |  | -::::{note} | 
| 186 |  | -Since operator support is currently limited, only binary arithmetic operators | 
| 187 |  | -will run on the GPU. Expect inference to be slow as the majority of operators | 
| 188 |  | -are being executed via Portable operators. | 
| 189 |  | -:::: | 
| 190 |  | - | 
| 191 |  | -Now, the partially delegated model can be executed (partially) on your device's | 
| 192 |  | -GPU! | 
| 193 |  | - | 
| 194 |  | -```shell | 
| 195 |  | -# Build a model runner binary linked with the Vulkan delegate libs | 
| 196 |  | -cmake --build cmake-android-out --target executor_runner -j32 | 
| 197 |  | - | 
| 198 |  | -# Push model to device | 
| 199 |  | -adb push vk_add.pte /data/local/tmp/vk_add.pte | 
| 200 |  | -# Push binary to device | 
| 201 |  | -adb push cmake-android-out/executor_runner /data/local/tmp/runner_bin | 
| 202 |  | - | 
| 203 |  | -# Run the model | 
| 204 |  | -adb shell /data/local/tmp/runner_bin --model_path /data/local/tmp/vk_add.pte | 
| 205 |  | -``` | 
|  | 3 | +Please see the [Vulkan Backend Overview](../../docs/source/backends/vulkan/vulkan-overview.md) | 
|  | 4 | +to learn more about the ExecuTorch Vulkan Backend. | 
0 commit comments