✨[Feature] Delayed Initialization for `TRTModule` Classes

# Context
For models requiring fallback to Torch due to converter capabilities, custom operators, or other needs, each of the `TRTEngine` objects is compiled, initialized, inserted into the Torch `nn.Module`, and runtime-ready during compile time. This takes up an unnecessary amount of memory on the GPU at compile time.

# Proposal
Use the GPU as a build space for `TRTEngine` objects, but do not deserialize or initialize the engines until the first forward pass, similar to what is done here:
https://github.com/pytorch/TensorRT/blob/ad74a735056667726692c49a175a790647ef889e/py/torch_tensorrt/fx/trt_module.py#L25-L27

## API Details
The `TRTModule` objects will take a parameter, `construct_live=True`, which can be specified to `False` if it is desired to initialize the engines at the first `forward` pass, thereby avoiding unnecessary usage of GPU space during compilation. After building the engine at compile time, the serialized object is moved to host memory until runtime, at which point it is initialized. `check_initialized()` is called at every `forward` pass, only having a measurable effect on the first pass of inference at which point the engines are moved from host to device memory for usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨[Feature] Delayed Initialization for `TRTModule` Classes #2673

Context

Proposal

API Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	def _initialize(self):
	self.initialized = True
	self.context = self.engine.create_execution_context()

✨[Feature] Delayed Initialization for TRTModule Classes #2673

Description

Context

Proposal

API Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

✨[Feature] Delayed Initialization for `TRTModule` Classes #2673