From 7de36e2e09ef06b2afb8d3ab50eb8d4f3687896d Mon Sep 17 00:00:00 2001 From: twskj Date: Tue, 25 Jul 2023 09:47:37 -0400 Subject: [PATCH] fixed typo --- .../Part_2-improving_resource_utilization/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Conceptual_Guide/Part_2-improving_resource_utilization/README.md b/Conceptual_Guide/Part_2-improving_resource_utilization/README.md index ae0cb2b8..25db6617 100644 --- a/Conceptual_Guide/Part_2-improving_resource_utilization/README.md +++ b/Conceptual_Guide/Part_2-improving_resource_utilization/README.md @@ -231,7 +231,7 @@ Request concurrency: 16 ``` As each of the requests had a batch size (of 2), while the maximum batch size of the model was 8, dynamically batching these requests resulted in considerably improved throughput. Another consequence is a reduction in the latency. This reduction can be primarily attributed to reduced wait time in queue wait time. As the requests are batched together, multiple requests can be processed in parallel. -* **Dynamic Batching with multiple model instances**: To set up the Triton Server in this configuration, add `instance_group` in `config.pbtxt` and make sure to include `--gpus=1` and make sure to include `--gpus=1` in the `docker run` command to set up the server. Include `dynamic_batching` per instructions of the previous section in the model configuration. A point to note is that peak GPU utilization on the GPU shot up to 74% (A100 in this case) while just using a single model instance with dynamic batching. Adding one more instance will definitely improve performance but linear perf scaling will not be achieved in this case. +* **Dynamic Batching with multiple model instances**: To set up the Triton Server in this configuration, add `instance_group` in `config.pbtxt` and make sure to include `--gpus=1` in the `docker run` command to set up the server. Include `dynamic_batching` per instructions of the previous section in the model configuration. A point to note is that peak GPU utilization on the GPU shot up to 74% (A100 in this case) while just using a single model instance with dynamic batching. Adding one more instance will definitely improve performance but linear perf scaling will not be achieved in this case. ``` # Query