@@ -95,7 +95,7 @@ For more information, see [Setting Up ExecuTorch](../getting-started-setup.md).
9595
9696## Running a Large Language Model Locally
9797
98- This example uses Karpathy’s [ NanoGPT ] ( https://github.com/karpathy/nanoGPT ) , which is a minimal implementation of
98+ This example uses Karpathy’s [ nanoGPT ] ( https://github.com/karpathy/nanoGPT ) , which is a minimal implementation of
9999GPT-2 124M. This guide is applicable to other language models, as ExecuTorch is model-invariant.
100100
101101There are two steps to running a model with ExecuTorch:
@@ -113,7 +113,7 @@ ExecuTorch runtime.
113113
114114Exporting takes a PyTorch model and converts it into a format that can run efficiently on consumer devices.
115115
116- For this example, you will need the NanoGPT model and the corresponding tokenizer vocabulary.
116+ For this example, you will need the nanoGPT model and the corresponding tokenizer vocabulary.
117117
118118::::{tab-set}
119119:::{tab-item} curl
@@ -426,12 +426,12 @@ specific hardware (delegation), and because it is doing all of the calculations
426426While ExecuTorch provides a portable, cross-platform implementation for all
427427operators, it also provides specialized backends for a number of different
428428targets. These include, but are not limited to, x86 and ARM CPU acceleration via
429- the XNNPACK backend, Apple acceleration via the CoreML backend and Metal
429+ the XNNPACK backend, Apple acceleration via the Core ML backend and Metal
430430Performance Shader (MPS) backend, and GPU acceleration via the Vulkan backend.
431431
432432Because optimizations are specific to a given backend, each pte file is specific
433433to the backend(s) targeted at export. To support multiple devices, such as
434- XNNPACK acceleration for Android and CoreML for iOS, export a separate PTE file
434+ XNNPACK acceleration for Android and Core ML for iOS, export a separate PTE file
435435for each backend.
436436
437437To delegate to a backend at export time, ExecuTorch provides the ` to_backend() `
@@ -442,12 +442,12 @@ computation graph that can be accelerated by the target backend,and
442442acceleration and optimization. Any portions of the computation graph not
443443delegated will be executed by the ExecuTorch operator implementations.
444444
445- To delegate the exported model to the specific backend, we need to import its
446- partitioner as well as edge compile config from ExecuTorch Codebase first, then
445+ To delegate the exported model to a specific backend, we need to import its
446+ partitioner as well as edge compile config from ExecuTorch codebase first, then
447447call ` to_backend ` with an instance of partitioner on the ` EdgeProgramManager `
448448object ` to_edge ` function created.
449449
450- Here's an example of how to delegate NanoGPT to XNNPACK (if you're deploying to an Android Phone for instance):
450+ Here's an example of how to delegate nanoGPT to XNNPACK (if you're deploying to an Android phone for instance):
451451
452452``` python
453453# export_nanogpt.py
@@ -466,7 +466,7 @@ from torch._export import capture_pre_autograd_graph
466466
467467from model import GPT
468468
469- # Load the NanoGPT model.
469+ # Load the nanoGPT model.
470470model = GPT .from_pretrained(' gpt2' )
471471
472472# Create example inputs. This is used in the export process to provide
@@ -590,7 +590,7 @@ I'm not sure if you've heard of the "Curse of the Dragon" or not, but it's a ver
590590The delegated model should be noticeably faster compared to the non-delegated model.
591591
592592For more information regarding backend delegateion, see the ExecuTorch guides
593- for the [ XNNPACK Backend] ( ../tutorial-xnnpack-delegate-lowering.md ) and [ CoreML
593+ for the [ XNNPACK Backend] ( ../tutorial-xnnpack-delegate-lowering.md ) and [ Core ML
594594Backend] ( ../build-run-coreml.md ) .
595595
596596## Quantization
@@ -701,15 +701,15 @@ df = delegation_info.get_operator_delegation_dataframe()
701701print (tabulate(df, headers = " keys" , tablefmt = " fancy_grid" ))
702702```
703703
704- For NanoGPT targeting the XNNPACK backend, you might see the following:
704+ For nanoGPT targeting the XNNPACK backend, you might see the following:
705705```
706706Total delegated subgraphs: 86
707707Number of delegated nodes: 473
708708Number of non-delegated nodes: 430
709709```
710710
711711
712- | | op_type | occurrences_in_delegated_graphs | occurrences_in_non_delegated_graphs |
712+ | | op_type | # in_delegated_graphs | # in_non_delegated_graphs |
713713| ----| ---------------------------------| ------- | -----|
714714| 0 | aten__ softmax_default | 12 | 0 |
715715| 1 | aten_add_tensor | 37 | 0 |
@@ -731,7 +731,7 @@ print(print_delegated_graph(graph_module))
731731This may generate a large amount of output for large models. Consider using "Control+F" or "Command+F" to locate the operator you’re interested in
732732( e.g. “aten_view_copy_default”). Observe which instances are not under lowered graphs.
733733
734- In the fragment of the output for NanoGPT below, observe that embedding and add operators are delegated to XNNPACK while the sub operator is not .
734+ In the fragment of the output for nanoGPT below, observe that embedding and add operators are delegated to XNNPACK while the sub operator is not .
735735
736736```
737737%aten_unsqueeze_copy_default_22 : [ num_users=1] = call_function[ target=executorch.exir.dialects.edge._ ops.aten.unsqueeze_copy.default] (args = (%aten_arange_start_step_23, -2), kwargs = {})
0 commit comments