@@ -374,46 +374,102 @@ specific hardware (delegation), and because it is doing all of the calculations
374374
375375## Delegation
376376
377- While ExecuTorch provides a portable, cross-platform implementation for all operators, it also provides specialized
378- backends for a number of different targets. These include, but are not limited to, x86 and ARM CPU acceleration via
379- the XNNPACK backend, Apple acceleration via the CoreML backend and Metal Performance Shader (MPS) backend, and GPU
380- acceleration via the Vulkan backend.
381-
382- Because optimizations are specific to a given backend, each pte file is specific to the backend(s) targeted at
383- export. To support multiple devices, such as XNNPACK acceleration for Android and CoreML for iOS, export a separate
384- PTE file for each backend.
385-
386- To delegate to a backend at export time, ExecuTorch provides the ` to_backend() ` function, which takes a backend-
387- specific partitioner object. The partitioner is responsible for finding parts of the computation graph that can
388- be accelerated by the target backend. Any portions of the computation graph not delegated will be executed by the
389- portable or optimized ExecuTorch implementations.
390-
391- To delegate to the XNNPACK backend, call ` to_backend ` with an instance of ` XnnpackPartitioner() ` .
377+ While ExecuTorch provides a portable, cross-platform implementation for all
378+ operators, it also provides specialized backends for a number of different
379+ targets. These include, but are not limited to, x86 and ARM CPU acceleration via
380+ the XNNPACK backend, Apple acceleration via the CoreML backend and Metal
381+ Performance Shader (MPS) backend, and GPU acceleration via the Vulkan backend.
382+
383+ Because optimizations are specific to a given backend, each pte file is specific
384+ to the backend(s) targeted at export. To support multiple devices, such as
385+ XNNPACK acceleration for Android and CoreML for iOS, export a separate PTE file
386+ for each backend.
387+
388+ To delegate to a backend at export time, ExecuTorch provides the ` to_backend() `
389+ function in the ` EdgeProgramManager ` object, which takes a backend-specific
390+ partitioner object. The partitioner is responsible for finding parts of the
391+ computation graph that can be accelerated by the target backend,and
392+ ` to_backend() ` function will delegate matched part to given backend for
393+ acceleration and optimization. Any portions of the computation graph not
394+ delegated will be executed by the ExecuTorch operator implementations.
395+
396+ To delegate the exported model to the specific backend, we need to import its
397+ partitioner as well as edge compile config from Executorch Codebase first, then
398+ call ` to_backend ` with an instance of partitioner on the ` EdgeProgramManager `
399+ object ` to_edge ` function created.
400+
401+ Here's an example of how to delegate NanoGPT to XNNPACK (if you're deploying to an Android Phone for instance):
392402
393403``` python
394404# export_nanogpt.py
395405
406+ # Load partitioner for Xnnpack backend
396407from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
408+
409+ # Model to be delegated to specific backend should use specific edge compile config
397410from executorch.backends.xnnpack.utils.configs import get_xnnpack_edge_compile_config
411+ from executorch.exir import EdgeCompileConfig, to_edge
412+
413+ import torch
414+ from torch.export import export
415+ from torch.nn.attention import sdpa_kernel, SDPBackend
416+ from torch._export import capture_pre_autograd_graph
417+
418+ from model import GPT
419+
420+ # Load the NanoGPT model.
421+ model = GPT .from_pretrained(' gpt2' )
398422
399- # ...
423+ # Create example inputs. This is used in the export process to provide
424+ # hints on the expected shape of the model input.
425+ example_inputs = (
426+ torch.randint(0 , 100 , (1 , 8 ), dtype = torch.long),
427+ )
428+
429+ # Trace the model, converting it to a portable intermediate representation.
430+ # The torch.no_grad() call tells PyTorch to exclude training-specific logic.
431+ with torch.nn.attention.sdpa_kernel([SDPBackend.MATH ]), torch.no_grad():
432+ m = capture_pre_autograd_graph(model, example_inputs)
433+ traced_model = export(m, example_inputs)
400434
435+ # Convert the model into a runnable ExecuTorch program.
436+ # To be further lowered to Xnnpack backend, `traced_model` needs xnnpack-specific edge compile config
401437edge_config = get_xnnpack_edge_compile_config()
402438edge_manager = to_edge(traced_model, compile_config = edge_config)
403439
404- # Delegate to the XNNPACK backend.
440+ # Delegate exported model to Xnnpack backend by invoking `to_backend` function with Xnnpack partitioner .
405441edge_manager = edge_manager.to_backend(XnnpackPartitioner())
406-
407442et_program = edge_manager.to_executorch()
408443
444+ # Save the Xnnpack-delegated ExecuTorch program to a file.
445+ with open (" nanogpt.pte" , " wb" ) as file :
446+ file .write(et_program.buffer)
447+
448+
409449```
410450
411- Additionally, update CMakeLists.txt to build and link the XNNPACK backend.
451+ Additionally, update CMakeLists.txt to build and link the XNNPACK backend to
452+ ExecuTorch runner.
412453
413454```
414- option(EXECUTORCH_BUILD_XNNPACK "" ON)
455+ cmake_minimum_required(VERSION 3.19)
456+ project(nanogpt_runner)
415457
416- # ...
458+ set(CMAKE_CXX_STANDARD 17)
459+ set(CMAKE_CXX_STANDARD_REQUIRED True)
460+
461+ # Set options for executorch build.
462+ option(EXECUTORCH_BUILD_EXTENSION_DATA_LOADER "" ON)
463+ option(EXECUTORCH_BUILD_EXTENSION_MODULE "" ON)
464+ option(EXECUTORCH_BUILD_OPTIMIZED "" ON)
465+ option(EXECUTORCH_BUILD_XNNPACK "" ON) # Build with Xnnpack backend
466+
467+ # Include the executorch subdirectory.
468+ add_subdirectory(
469+ ${CMAKE_CURRENT_SOURCE_DIR}/third-party/executorch
470+ ${CMAKE_BINARY_DIR}/executorch)
471+
472+ # include_directories(${CMAKE_CURRENT_SOURCE_DIR}/src)
417473
418474add_executable(nanogpt_runner main.cpp)
419475target_link_libraries(
@@ -423,11 +479,51 @@ target_link_libraries(
423479 extension_module_static # Provides the Module class
424480 optimized_native_cpu_ops_lib # Provides baseline cross-platform kernels
425481 xnnpack_backend) # Provides the XNNPACK CPU acceleration backend
482+ ```
483+
484+ Keep the rest of the code the same. For more details refer to
485+ [ Exporting to Executorch](https:// pytorch.org/executorch/main/llm/getting-started.html#step-1-exporting-to-executorch)
486+ and
487+ [Invoking the Runtime](https:// pytorch.org/executorch/main/llm/getting-started.html#step-2-invoking-the-runtime)
488+ for more details
426489
490+ At this point, the working directory should contain the following files:
491+
492+ - CMakeLists.txt
493+ - main.cpp
494+ - basic_tokenizer.h
495+ - basic_sampler.h
496+ - managed_tensor.h
497+ - export_nanogpt.py
498+ - model.py
499+ - vocab.json
500+
501+ If all of these are present, you can now export Xnnpack delegated pte model:
502+ ```bash
503+ python export_nanogpt.py
427504```
428505
429- For more information, see the ExecuTorch guides for the [ XNNPACK Backend] ( https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html )
430- and [ CoreML Backend] ( https://pytorch.org/executorch/stable/build-run-coreml.html ) .
506+ It will generate ` nanogpt.pte ` , under the same working directory.
507+
508+ Then we can build and run the model by:
509+ ``` bash
510+ (rm -rf cmake-out && mkdir cmake-out && cd cmake-out && cmake ..)
511+ cmake --build cmake-out -j10
512+ ./cmake-out/nanogpt_runner
513+ ```
514+
515+ You should see something like the following:
516+
517+ ```
518+ Once upon a time, there was a man who was a member of the military...
519+ ```
520+
521+
522+ For more information regarding backend delegateion, see the ExecuTorch guides
523+ for the
524+ [ XNNPACK Backend] ( https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html )
525+ and
526+ [ CoreML Backend] ( https://pytorch.org/executorch/stable/build-run-coreml.html ) .
431527
432528## Quantization
433529
0 commit comments