Skip to content

✨[Feature] Delayed Initialization for TRTModule Classes #2673

@gs-olive

Description

@gs-olive

Context

For models requiring fallback to Torch due to converter capabilities, custom operators, or other needs, each of the TRTEngine objects is compiled, initialized, inserted into the Torch nn.Module, and runtime-ready during compile time. This takes up an unnecessary amount of memory on the GPU at compile time.

Proposal

Use the GPU as a build space for TRTEngine objects, but do not deserialize or initialize the engines until the first forward pass, similar to what is done here:

def _initialize(self):
self.initialized = True
self.context = self.engine.create_execution_context()

API Details

The TRTModule objects will take a parameter, construct_live=True, which can be specified to False if it is desired to initialize the engines at the first forward pass, thereby avoiding unnecessary usage of GPU space during compilation. After building the engine at compile time, the serialized object is moved to host memory until runtime, at which point it is initialized. check_initialized() is called at every forward pass, only having a measurable effect on the first pass of inference at which point the engines are moved from host to device memory for usage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions