You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
179
180
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
181
+
dynamically_allocate_resources (bool): Dynamically allocate resources during engine execution.
180
182
**kwargs: Any,
181
183
Returns:
182
184
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
"""Compile an ExportedProgram module for NVIDIA GPUs using TensorRT
@@ -517,6 +521,7 @@ def compile(
517
521
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
518
522
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
519
523
offload_module_to_cpu (bool): Offload the module to CPU. This is useful when we need to minimize GPU memory usage.
524
+
dynamically_allocate_resources (bool): Dynamically allocate resources during engine execution.
520
525
**kwargs: Any,
521
526
Returns:
522
527
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
Copy file name to clipboardExpand all lines: py/torch_tensorrt/dynamo/_settings.py
+4Lines changed: 4 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,7 @@
11
11
DLA_GLOBAL_DRAM_SIZE,
12
12
DLA_LOCAL_DRAM_SIZE,
13
13
DLA_SRAM_SIZE,
14
+
DYNAMICALLY_ALLOCATE_RESOURCES,
14
15
DRYRUN,
15
16
ENABLE_CROSS_COMPILE_FOR_WINDOWS,
16
17
ENABLE_EXPERIMENTAL_DECOMPOSITIONS,
@@ -97,6 +98,8 @@ class CompilationSettings:
97
98
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
98
99
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
99
100
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
101
+
offload_module_to_cpu (bool): Offload the model to CPU to reduce memory footprint during compilation
102
+
dynamically_allocate_resources (bool): Dynamically allocate resources for TensorRT engines
0 commit comments