-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[TIR] SplitHostDevice, handle subroutines #14918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TIR] SplitHostDevice, handle subroutines #14918
Conversation
Prior to this commit, the `TargetNode::host` could be specified in
TVMScript as part of the config dictionary, under the key `"host"`.
However, this required all other device parameters to be explicitly
specified, rather than using any of the short-hand string
representations. This commit forwards the `host` argument from TVMScript's
`T.target` method to `tvm.target.Target`, allowing both the device and
host to be specified using the shorthand string representation.
```python
@T.prim_func
def before_this_commit():
T.func_attr(
{
"target": T.target(
{
"arch": "sm_86",
"host": {"keys": ["cpu"], "kind": "llvm", "tag": ""},
"keys": ["cuda", "gpu"],
"kind": "cuda",
"max_num_threads": 1024,
"tag": "",
"thread_warp_size": 32,
}
)
}
)
T.evaluate(0)
@T.prim_func
def after_this_commit():
T.func_attr({"target": T.target("cuda", host="llvm")})
T.evaluate(0)
```
Previously, the symbol name of the extracted compute kernel was defined based on the `kGlobalSymbol` attribute, which was required to be present. This commit updates `SplitHostDevice` to generate the symbol name using `kGlobalSymbol` if present, and to fall back to the name of the `tvm::GlobalVar` for internal functions.
First pass, `AnnotateDeviceRegions`. This pass decides which portions of a PrimFunc should be run on the device, and annotates them with `kTarget` attribute, indicating which target should be used for later lowering steps. Second pass, `SplitHostDevice`. This pass extracts the annotated region into an independent PrimFunc. The `kTarget` attribute of the extracted kernel is defined by the `kTarget` annotation inserted by `AnnotateDeviceRegions`. The host function is marked by the `tvm::tir::attr::kIsHostFunc` attribute, allowing it to be recognized by later host-only lowering passes. Third pass, `LowerDeviceKernelLaunch`. This pass identifies subroutine calls that call into device kernels, and rewrites them into `T.tvm_call_packed`.
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
|
This PR is a subset of the functionality in #14862. As that PR has grown to a difficult-to-review size, I'm carving off independent chunks of it into separate PRs, after which #14862 can be rebased to a more manageable size. In addition, this PR depends on #14915 for the new unit tests, and should be rebased onto main after #14915 lands. |
Previously, the SplitHostDevice pass added the `tir::attr::kKernelLaunchParams` attribute, and the LowerDeviceKernelLaunch pass filled in the values for it. This cleanup makes the kernel launch params be the sole responsibility of LowerDeviceKernelLaunch.
7dabdd0 to
a856076
Compare
PRs apache#14913 and apache#14914 made analogous changes to `MakePackedAPI` and `MakeUnpackedAPI` to handle subroutine calls. Both PRs introduced the same symbol, `tvm::tir::SubroutineCallRewriter`, a local utility to update internal calls to a modified function. While each PR passed CI individually, and was therefore able to merge, having both changes caused a duplicate symbol. This commit updates `MakePackedAPI` and `MakeUnpackedAPI` to place their local utilities into anonymous namespaces, avoiding the conflict.
| bool is_host_func = | ||
| func->GetAttr<Bool>(tvm::tir::attr::kIsHostFunc).value_or(Bool(false))->value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re. discussion offline, do we want to update to not rely on kIsHostFunc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and updated. The SplitHostDevice and LowerDeviceKernelLaunch no longer use the kIsHostFunc attribute at all.
Update to use the `tvm::tir::IsHostFunc` utility function, rather than the `kIsHostFunc` attribute. Per discussion on apache#14020, the `kIsHostFunct` attribute should only be used in `BindTarget`, and should not be re-introduced in `SplitHostDevice`.
…_handle_subroutines_pr_14918
|
(Merged #14950 into this PR to allow CI to run. No conflicts expected after CI, so github's squash/merge will remove it from the final commit.) |
csullivan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for the refactoring! Looking forward to the additional functionality having SplitHostDevice split out into separate steps will enable during lowering.
|
there seems to be a regression caused, or related to this PR, please look into it https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/main/715/pipeline |
|
@tqchen Thank you for the heads up, and looking into it. |
This resolves an issue introduced by the combination of apache#14918 and apache#14945. The bug occurred for targets that do not require device-side codegen, but do require a `device_type` other than `kDLCPU`. It wasn't caught by CI, as the issue only occurred with the combination of both PRs. 1. apache#14918 updated `SplitHostDevice` to only modify the `"target"` attribute when a device-side function has been extracted. 2. For VTA, there is no device-side function, as everything is done through host-side API calls. 3. From (1) and (2), the VTA examples kept the target `T.target("ext_dev", host="llvm")` after the `SplitHostDevice` pass, instead of being updated to `T.target("llvm")`. 4. apache#14945 restricted CombineContextCall to only apply to host-side passes. 5. From (4) and (5), the `CombineContextCall` pass was no longer applied to the VTA context calls. This PR fixes `SplitHostDevice`, updating the target from `T.target("ext_dev", host="llvm")` to `T.target("llvm")`, even if no device sections have been extracted from the function.
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
…14982) This resolves an issue introduced by the combination of #14918 and #14945. The bug occurred for targets that do not require device-side codegen, but do require a `device_type` other than `kDLCPU`. It wasn't caught by CI, as the issue only occurred with the combination of both PRs. 1. #14918 updated `SplitHostDevice` to only modify the `"target"` attribute when a device-side function has been extracted. 2. For VTA, there is no device-side function, as everything is done through host-side API calls. 3. From (1) and (2), the VTA examples kept the target `T.target("ext_dev", host="llvm")` after the `SplitHostDevice` pass, instead of being updated to `T.target("llvm")`. 4. #14945 restricted CombineContextCall to only apply to host-side passes. 5. From (4) and (5), the `CombineContextCall` pass was no longer applied to the VTA context calls. This PR fixes `SplitHostDevice`, updating the target from `T.target("ext_dev", host="llvm")` to `T.target("llvm")`, even if no device sections have been extracted from the function.
This PR refactors SplitHostDevice into three separate transformations. Previously, SplitHostDevice would replace device regions with a builtin::tvm_call_packed() node to replace the extracted region. After this PR, this process is performed in three separate steps.
AnnotateDeviceRegion: Annotate the regions that should be executed on another target.
SplitHostDevice: Extract the annotated region into an independent PrimFunc, with a GlobalVar to represent the call from into the new subroutine.
LowerDeviceKernelLaunch: For any subroutine call where the caller and callee are on different devices, replace with a device kernel launch.
* PR#14915 [TVMScript] Allow T.target("device", host="host") in TVMScript
Prior to this commit, the `TargetNode::host` could be specified in
TVMScript as part of the config dictionary, under the key `"host"`.
However, this required all other device parameters to be explicitly
specified, rather than using any of the short-hand string
representations. This commit forwards the `host` argument from TVMScript's
`T.target` method to `tvm.target.Target`, allowing both the device and
host to be specified using the shorthand string representation.
```python
@T.prim_func
def before_this_commit():
T.func_attr(
{
"target": T.target(
{
"arch": "sm_86",
"host": {"keys": ["cpu"], "kind": "llvm", "tag": ""},
"keys": ["cuda", "gpu"],
"kind": "cuda",
"max_num_threads": 1024,
"tag": "",
"thread_warp_size": 32,
}
)
}
)
T.evaluate(0)
@T.prim_func
def after_this_commit():
T.func_attr({"target": T.target("cuda", host="llvm")})
T.evaluate(0)
```
* [Target] Added WithoutHost method
* [TIR] SplitHostDevice, handle missing kGlobalSymbol
Previously, the symbol name of the extracted compute kernel was
defined based on the `kGlobalSymbol` attribute, which was required to
be present. This commit updates `SplitHostDevice` to generate the
symbol name using `kGlobalSymbol` if present, and to fall back to the
name of the `tvm::GlobalVar` for internal functions.
* [TIR] Refactor SplitHostDevice into three separate passes
First pass, `AnnotateDeviceRegions`. This pass decides which portions
of a PrimFunc should be run on the device, and annotates them with
`kTarget` attribute, indicating which target should be used for later
lowering steps.
Second pass, `SplitHostDevice`. This pass extracts the annotated
region into an independent PrimFunc. The `kTarget` attribute of the
extracted kernel is defined by the `kTarget` annotation inserted by
`AnnotateDeviceRegions`. The host function is marked by the
`tvm::tir::attr::kIsHostFunc` attribute, allowing it to be recognized
by later host-only lowering passes.
Third pass, `LowerDeviceKernelLaunch`. This pass identifies
subroutine calls that call into device kernels, and rewrites them into
`T.tvm_call_packed`.
* Add unit tests specifically for SplitHostDevice behavior
* Added unit test specifically for AnnotateDeviceRegions
* Added unit tests for LowerDeviceKernelLaunch
* Minor cleanup, moved all kernel launch collection into one spot
Previously, the SplitHostDevice pass added the
`tir::attr::kKernelLaunchParams` attribute, and the
LowerDeviceKernelLaunch pass filled in the values for it. This
cleanup makes the kernel launch params be the sole responsibility of
LowerDeviceKernelLaunch.
* Updated unit tests for LowerWarpMemory
* Updated unit tests for ThreadSync
* Updated unit test for inject ptx async copy
* [Bugfix] Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI
PRs apache#14913 and
apache#14914 made analogous changes to
`MakePackedAPI` and `MakeUnpackedAPI` to handle subroutine calls.
Both PRs introduced the same symbol,
`tvm::tir::SubroutineCallRewriter`, a local utility to update internal
calls to a modified function. While each PR passed CI individually,
and was therefore able to merge, having both changes caused a
duplicate symbol.
This commit updates `MakePackedAPI` and `MakeUnpackedAPI` to place
their local utilities into anonymous namespaces, avoiding the
conflict.
* Maintain "tir.is_global_func" attr in device-side entry point
* SplitHostDevice, update the host-side target to be the host
* [TIR] Update LowerDeviceKernelLaunch to avoid kIsHostFunc
Update to use the `tvm::tir::IsHostFunc` utility function, rather than
the `kIsHostFunc` attribute. Per discussion on
apache#14020, the `kIsHostFunct` attribute
should only be used in `BindTarget`, and should not be re-introduced
in `SplitHostDevice`.
* Remove is_host_func from SplitHostDevice tests
…pache#14982) This resolves an issue introduced by the combination of apache#14918 and apache#14945. The bug occurred for targets that do not require device-side codegen, but do require a `device_type` other than `kDLCPU`. It wasn't caught by CI, as the issue only occurred with the combination of both PRs. 1. apache#14918 updated `SplitHostDevice` to only modify the `"target"` attribute when a device-side function has been extracted. 2. For VTA, there is no device-side function, as everything is done through host-side API calls. 3. From (1) and (2), the VTA examples kept the target `T.target("ext_dev", host="llvm")` after the `SplitHostDevice` pass, instead of being updated to `T.target("llvm")`. 4. apache#14945 restricted CombineContextCall to only apply to host-side passes. 5. From (4) and (5), the `CombineContextCall` pass was no longer applied to the VTA context calls. This PR fixes `SplitHostDevice`, updating the target from `T.target("ext_dev", host="llvm")` to `T.target("llvm")`, even if no device sections have been extracted from the function.
* [Bugfix][TIR][VTA] Update host-side target, even without device func This resolves an issue introduced by the combination of #14918 and #14945. The bug occurred for targets that do not require device-side codegen, but do require a `device_type` other than `kDLCPU`. It wasn't caught by CI, as the issue only occurred with the combination of both PRs. 1. #14918 updated `SplitHostDevice` to only modify the `"target"` attribute when a device-side function has been extracted. 2. For VTA, there is no device-side function, as everything is done through host-side API calls. 3. From (1) and (2), the VTA examples kept the target `T.target("ext_dev", host="llvm")` after the `SplitHostDevice` pass, instead of being updated to `T.target("llvm")`. 4. #14945 restricted CombineContextCall to only apply to host-side passes. 5. From (4) and (5), the `CombineContextCall` pass was no longer applied to the VTA context calls. This PR fixes `SplitHostDevice`, updating the target from `T.target("ext_dev", host="llvm")` to `T.target("llvm")`, even if no device sections have been extracted from the function. * [CodegenC] Updated unit test for sorted CodegenC output Previously, this unit test generated a `Map<tvm::Target, IRModule>` whose default iteration order was not sorted by function name, built the `Map` of modules, then validated whether the resulting C code was a sorted list of 4 elements. However, this condition was stricter than necessary, as it depended on the number of items added to the `Map` until it was unsorted. This commit updates the test to instead validate that `std::is_sorted` returns true. * Ignore __tvm_main__ in unit test
* [Bugfix][TIR][VTA] Update host-side target, even without device func This resolves an issue introduced by the combination of #14918 and #14945. The bug occurred for targets that do not require device-side codegen, but do require a `device_type` other than `kDLCPU`. It wasn't caught by CI, as the issue only occurred with the combination of both PRs. 1. #14918 updated `SplitHostDevice` to only modify the `"target"` attribute when a device-side function has been extracted. 2. For VTA, there is no device-side function, as everything is done through host-side API calls. 3. From (1) and (2), the VTA examples kept the target `T.target("ext_dev", host="llvm")` after the `SplitHostDevice` pass, instead of being updated to `T.target("llvm")`. 4. #14945 restricted CombineContextCall to only apply to host-side passes. 5. From (4) and (5), the `CombineContextCall` pass was no longer applied to the VTA context calls. This PR fixes `SplitHostDevice`, updating the target from `T.target("ext_dev", host="llvm")` to `T.target("llvm")`, even if no device sections have been extracted from the function. * [TIR] Restrict tir.transform.LowerTVMBuiltin to host functions Previously, the `tir.transform.LowerTVMBuiltin` pass applied to all functions in an `IRModule`, but was only applied to modules that contain only host functions. This commit updates `tir.transform.LowerTVMBuiltin` to apply only to host functions. * Updated "stackvm" target to have "cpu" key. With the presence/absence of the "cpu" key in a target used to determine whether host-only calls should be run, should make sure to add it to "stackvm". * Update IsHostFunc() to use "host" tag instead of "cpu" Current CI failures due to LowerTVMBuiltin not running on "hexagon" target, and would like to avoid conflating cpu/host. * Avoid "host" tag for now * Update HEXAGON_AOT_LLVM_TARGET to be recognized as host
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
* [Bugfix][TIR][VTA] Update host-side target, even without device func This resolves an issue introduced by the combination of apache#14918 and apache#14945. The bug occurred for targets that do not require device-side codegen, but do require a `device_type` other than `kDLCPU`. It wasn't caught by CI, as the issue only occurred with the combination of both PRs. 1. apache#14918 updated `SplitHostDevice` to only modify the `"target"` attribute when a device-side function has been extracted. 2. For VTA, there is no device-side function, as everything is done through host-side API calls. 3. From (1) and (2), the VTA examples kept the target `T.target("ext_dev", host="llvm")` after the `SplitHostDevice` pass, instead of being updated to `T.target("llvm")`. 4. apache#14945 restricted CombineContextCall to only apply to host-side passes. 5. From (4) and (5), the `CombineContextCall` pass was no longer applied to the VTA context calls. This PR fixes `SplitHostDevice`, updating the target from `T.target("ext_dev", host="llvm")` to `T.target("llvm")`, even if no device sections have been extracted from the function. * [CodegenC] Updated unit test for sorted CodegenC output Previously, this unit test generated a `Map<tvm::Target, IRModule>` whose default iteration order was not sorted by function name, built the `Map` of modules, then validated whether the resulting C code was a sorted list of 4 elements. However, this condition was stricter than necessary, as it depended on the number of items added to the `Map` until it was unsorted. This commit updates the test to instead validate that `std::is_sorted` returns true. * Ignore __tvm_main__ in unit test
…e#14944) * [Bugfix][TIR][VTA] Update host-side target, even without device func This resolves an issue introduced by the combination of apache#14918 and apache#14945. The bug occurred for targets that do not require device-side codegen, but do require a `device_type` other than `kDLCPU`. It wasn't caught by CI, as the issue only occurred with the combination of both PRs. 1. apache#14918 updated `SplitHostDevice` to only modify the `"target"` attribute when a device-side function has been extracted. 2. For VTA, there is no device-side function, as everything is done through host-side API calls. 3. From (1) and (2), the VTA examples kept the target `T.target("ext_dev", host="llvm")` after the `SplitHostDevice` pass, instead of being updated to `T.target("llvm")`. 4. apache#14945 restricted CombineContextCall to only apply to host-side passes. 5. From (4) and (5), the `CombineContextCall` pass was no longer applied to the VTA context calls. This PR fixes `SplitHostDevice`, updating the target from `T.target("ext_dev", host="llvm")` to `T.target("llvm")`, even if no device sections have been extracted from the function. * [TIR] Restrict tir.transform.LowerTVMBuiltin to host functions Previously, the `tir.transform.LowerTVMBuiltin` pass applied to all functions in an `IRModule`, but was only applied to modules that contain only host functions. This commit updates `tir.transform.LowerTVMBuiltin` to apply only to host functions. * Updated "stackvm" target to have "cpu" key. With the presence/absence of the "cpu" key in a target used to determine whether host-only calls should be run, should make sure to add it to "stackvm". * Update IsHostFunc() to use "host" tag instead of "cpu" Current CI failures due to LowerTVMBuiltin not running on "hexagon" target, and would like to avoid conflating cpu/host. * Avoid "host" tag for now * Update HEXAGON_AOT_LLVM_TARGET to be recognized as host
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
The functionality tested in this commit was added across several recent PRs, each of which tested their features in isolation. This PR adds unit tests to validate the end-to-end behavior of TIR subroutine calls. PRs building up to this point: - TVMScript - apache#14889 - apache#14915 - apache#14919 - apache#14941 - Functionality improvements of existing TIR passes - apache#14913 - apache#14914 - apache#14918 - apache#14951 - Changes to the TIR lowering flow - apache#14942 - apache#14985 - Codegen updates - apache#14958 - apache#14901 - Compatibility updates/fixes - apache#14892 - apache#14950 - apache#14943 - apache#14944 - apache#14945 - apache#14952 - apache#14982 - apache#14949
|
A note that we should revisit some of the assumptions #16237 |
This PR refactors
SplitHostDeviceinto three separate transformations. Previously,SplitHostDevicewould replace device regions with abuiltin::tvm_call_packed()node to replace the extracted region. After this PR, this process is performed in three separate steps.AnnotateDeviceRegion: Annotate the regions that should be executed on another target.SplitHostDevice: Extract the annotated region into an independent PrimFunc, with aGlobalVarto represent the call from into the new subroutine.LowerDeviceKernelLaunch: For any subroutine call where the caller and callee are on different devices, replace with a device kernel launch.