-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[RUNTIME][CLML] OpenCLML tuning and profiling enhanced #13843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
199755d to
4f672d5
Compare
Tuning cache bin is serialized through DMLC::Stream to support multiple CLML sub graphs with in a tvm module. Individual tuning cache blobs are saved to same output file. New API on OpenCLWorkspace to enable or disable profiling on command queue rather doing this only when Timer is invoked. This is required to perform CLML operator tuning. CLML layer profiling now uses OpenCL Timer interface. This PR also fix avoiding pad operator offloading at the very first layer (to be specific before at least one convolution layer) due to the limitation of CLML pad operator is concerned about layout. Please refer to CLML SDK documentation for more details.
4f672d5 to
9960020
Compare
echuraev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several comments
Co-authored-by: Egor Churaev <[email protected]>
a8b79f5 to
535d1e0
Compare
echuraev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks
* [RUNTIME][CLML] OpenCLML tuning and profiling enhanced Tuning cache bin is serialized through DMLC::Stream to support multiple CLML sub graphs with in a tvm module. Individual tuning cache blobs are saved to same output file. New API on OpenCLWorkspace to enable or disable profiling on command queue rather doing this only when Timer is invoked. This is required to perform CLML operator tuning. CLML layer profiling now uses OpenCL Timer interface. This PR also fix avoiding pad operator offloading at the very first layer (to be specific before at least one convolution layer) due to the limitation of CLML pad operator is concerned about layout. Please refer to CLML SDK documentation for more details. * Update src/runtime/opencl/opencl_common.h Co-authored-by: Egor Churaev <[email protected]> * * review comments --------- Co-authored-by: Egor Churaev <[email protected]>
Tuning cache bin is serialized through DMLC::Stream to support multiple CLML sub graphs with in a tvm module. Individual tuning cache blobs are saved to same output file.
New API on OpenCLWorkspace to enable or disable profiling on command queue rather doing this only when Timer is invoked. This is required to perform CLML operator tuning.
CLML layer profiling now uses OpenCL Timer interface.
This PR also fix avoiding pad operator offloading at the very first layer (to be specific before at least one convolution layer) due to the limitation of CLML pad operator is concerned about layout. Please refer to CLML SDK documentation for more details.
Co-Authored-By: Krishna Raju Vegiraju [email protected]