-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Fix Segmentation Fault For Tensorrt BYOC when TVM_TENSORRT_CACHE_DIR is Set #7162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Also cc @trevor-m who implemented TRT integration. |
trevor-m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lsy643 thank you for the PR! I hadn't considered that situation of caching + device buffers before.
I think in general we need some slight refactoring here. We should be able to allocate the buffers in GetCachedEnginesFromDisk() and we shouldn't need to modify the BuildEngine() code.
| void Convert(TensorRTOpConverterParams* params) const { | ||
| auto input = params->inputs.at(0).tensor; | ||
| ICHECK_EQ(std::stoi(params->node.GetAttr<std::vector<std::string>>("reverse")[0]), false); | ||
| //ICHECK_EQ(std::stoi(params->node.GetAttr<std::vector<std::string>>("reverse")[0]), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase so you don't need to comment out this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I have rebased the repo.
| LoadGlobalAttributes(); | ||
| if (GetCachedEnginesFromDisk()) return; | ||
| SetupConstants(consts); | ||
| if (GetCachedEnginesFromDisk()) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since GetCachedEnginesFromDisk is now at the end of the function, we dont need the if and return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If and return removed
| return {engine, context, network_input_names_, network_output_names_, device_buffers}; | ||
| } | ||
|
|
||
| void TensorRTBuilder::CreateDeviceBuffers(TensorRTEngineAndContext* engine_and_context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in this function is a duplicate of the code in BuildEngine() - can you call this new function from BuildEngine to avoid the duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CreateDeviceBuffers is used in BuildEngine
| TensorRTEngineAndContext& engine_and_context = | ||
| trt_engine_cache_.at(std::make_pair(symbol_name_, batch_size_)); | ||
| size_t binding_num = engine_and_context.engine->getNbBindings(); | ||
| if (engine_and_context.device_buffers.size() == binding_num) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be !engine_and_context.device_buffers.empty() instead, it maybe communicates the purpose of this check better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use empty() instead
| trt_engine_cache_.at(std::make_pair(symbol_name_, batch_size_)); | ||
| if (engine_and_context.device_buffers.size() == 0) { | ||
| builder.CreateDeviceBuffers(&engine_and_context); | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are building the TRT network in the TensorRTBuilder, but exiting before BuildEngine is called. This means the resources used by builder won't ever be freed (TensorRTBuilder::CleanUp()) needs to be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also shouldnt have to rebuild the whole nextwork just to allocate the buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the CleanUp is added, there will be a segmentation fault and so I don't call CleanUp.
In the BuildEngine function from tensorrt_runtime.cc, the whole network has not been actually rebuilt since the function returns beforebuilder.BuildEngine gets called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best solution is to move CreateDeviceBuffers out of TensorRTBuilder and into the runtime module. That way we can call it without the unnecessary allocations and creations done by TensorRTBuilder
|
@lsy643 please address the comments and rebase to resolve the conflicts. |
|
@trevor-m please review and approve explicitly (https://tvm.apache.org/docs/contribute/code_review.html#approve-and-request-changes-explicitly) |
| trt_engine_cache_.at(std::make_pair(symbol_name_, batch_size_)); | ||
| if (engine_and_context.device_buffers.size() == 0) { | ||
| builder.CreateDeviceBuffers(&engine_and_context); | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best solution is to move CreateDeviceBuffers out of TensorRTBuilder and into the runtime module. That way we can call it without the unnecessary allocations and creations done by TensorRTBuilder
|
Gentle ping @lsy643 |
|
This PR appears to be out of date, please feel free to reopen it if this is not the case. As part of the new year we are attempting to triage the project's open pull requests to ensure that code which Thanks again for your contribution, and feel free to reach out to discuss these changes. |
When we are deploying a SSD model in a cpu ctx and offload the backbone to Tensorrt with BYOC, if the TVM_TENSORRT_CACHE_DIR is set, there will be a segmentation fault when loading the Tensorrt engine from cache, which is caused by the
device_buffersofTensorRTEngineAndContextis not initialized correctly.This issue is fixed by adding
device_buffers initializationin theBuildEngineofTensorRTRuntimePlease review @masahi @comaniac