-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. #9613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
715692f to
c1842cc
Compare
c1842cc to
d65804c
Compare
mbs-octoml
added a commit
to mbs-octoml/mbs-tvm
that referenced
this pull request
Dec 9, 2021
…imFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery.
mbs-octoml
added a commit
to mbs-octoml/mbs-tvm
that referenced
this pull request
Dec 9, 2021
This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files.
mbs-octoml
added a commit
to mbs-octoml/mbs-tvm
that referenced
this pull request
Dec 9, 2021
…imFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery.
mbrookhart
pushed a commit
that referenced
this pull request
Dec 10, 2021
…imFuncs (#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of #9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
jroesch
pushed a commit
that referenced
this pull request
Dec 10, 2021
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of #9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
…straints.
This PR:
1) Makes PlanDevices consider lowered calls when solving device domain constraints.
2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
Var type annotation PointerTypes storage_scope fields) to the memory_scope
fields of the SEScopes which PlanDevices unifies over.
3) Allows new device_copies to be inserted on the arguments and results of lowered
calls so as to acount for any memory scope mismatches which are now apparent.
[device_planner.cc has main changes, rest is secondary.]
In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).
- Memory scope constraints can flow both out of and in to PrimFuncs
introduced by LowerTE. In TIR memory scopes are represented by
'storage scopes' on the PointerType type annotations on TIR Buffer data
variables.
- It is straightforward to extract memory scopes from PrimFuncs by
looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
PlanDevices, which collects all the device constraints implied by
- However, pushing memory constraints in to PrimFuncs is more challenging
due to buffer aliasing. This aspect is still experimental.
- Allow device_copies to be inserted for both arguments and
results of PrimFunc calls, on the assumption PlanDevices has
already established a consistent device assignment prior to
lowering and any new mismatch is required to match up memory scopes.
We use the new 'free' on_device annotations to implement this.
Coming along for the ride:
- To make unit tests of mixed Relay/TIR functions possible needed
to be able to supply a checked_type to GlobalVar since that's currently
the only way to give a Relay type to PrimFuncs.
- Use GenSym to get unique var names in ANF & partial eval so easier
to diff debug output between passes and connect program fragments
back into the overall program. Relying on pretty-printing to
automagically unique-ify var names is certainly cute but until we
have better span support is very hard to work with.
- Realized both dead_code.cc and fold_constant.cc would
happily move values into a different lexical virtual
device context since device_planner.cc was being
'clever' and eliding on_devices for let-bound values
when there's no change. Fixed so that every let-bound
value has an on_device. Will be much better after
apache/tvm-rfcs#45 is implemented.
- Make build -Werror clean for clang-12 (mostly move fixups).
- Address post-submit comments from apache#9693.
d65804c to
219762a
Compare
jroesch
reviewed
Dec 13, 2021
jroesch
approved these changes
Dec 14, 2021
Contributor
|
Thanks @mbs-octoml @jroesch |
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 7, 2022
…imFuncs (apache#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 7, 2022
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 7, 2022
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
yangulei
pushed a commit
to yangulei/tvm
that referenced
this pull request
Jan 11, 2022
…imFuncs (apache#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
yangulei
pushed a commit
to yangulei/tvm
that referenced
this pull request
Jan 11, 2022
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
yangulei
pushed a commit
to yangulei/tvm
that referenced
this pull request
Jan 11, 2022
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
yangulei
pushed a commit
to yangulei/tvm
that referenced
this pull request
Jan 12, 2022
…imFuncs (apache#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
yangulei
pushed a commit
to yangulei/tvm
that referenced
this pull request
Jan 12, 2022
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
yangulei
pushed a commit
to yangulei/tvm
that referenced
this pull request
Jan 12, 2022
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 13, 2022
…imFuncs (apache#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 13, 2022
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 13, 2022
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
qsqqsqqsq-intellif
pushed a commit
to qsqqsqqsq-intellif/tvm
that referenced
this pull request
Apr 29, 2022
…imFuncs (apache#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
qsqqsqqsq-intellif
pushed a commit
to qsqqsqqsq-intellif/tvm
that referenced
this pull request
Apr 29, 2022
* [Relay] PlanDevices supports 'free' on_device annotations This is in support of apache#9613, which allows PlanDevices to be run after lowering so as to flow memory constraints in and out of PrimFuncs. That requires a way to insert device_copies when the memory scopes chosen during separate lowering of fused primitive functions clashes, but otherwise avoid device_copies when scopes can be chosen so as to avoid them. We support that by generalizing the "on_device" annotation to allow the device constraint to be independently controlled for its 'body' and 'result'. # Standard user annotation: body is constrained to S on_device(body, S) # Used by PlanDevices to 'fix' expression to S # (was is_fixed=True) on_device(body, S, constrain_result=True) # Used by PlanDevices to indicate a device_copy can be # inserted if necessary. on_device(body, S, constrain_body=False) # Supported, but currently has no use. on_device(body, S, constrain_result=True, constrain_body=False) A few extra odd's 'n ends collected along the way: - Some CallLowered cleanup which I found useful. - The usual extra debugging output needed as I debugged. In return I removed some particularly verbose logging I'd added while tracking down unexpected object copies. - Cleanup warnings from clang-12 as I touch files. * [checkpoint] unused var
qsqqsqqsq-intellif
pushed a commit
to qsqqsqqsq-intellif/tvm
that referenced
this pull request
Apr 29, 2022
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR:
Var type annotation PointerTypes storage_scope fields) to the memory_scope
fields of the SEScopes which PlanDevices unifies over.
calls so as to acount for any memory scope mismatches which are now apparent.
[device_planner.cc has main changes, rest is secondary.]
In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).
Memory scope constraints can flow both out of and in to PrimFuncs
introduced by LowerTE. In TIR memory scopes are represented by
'storage scopes' on the PointerType type annotations on TIR Buffer data
variables.
looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
PlanDevices, which collects all the device constraints implied by
due to buffer aliasing. This aspect is still experimental.
Allow device_copies to be inserted for both arguments and
results of PrimFunc calls, on the assumption PlanDevices has
already established a consistent device assignment prior to
lowering and any new mismatch is required to match up memory scopes.
We use the new 'free' on_device annotations to implement this.
Coming along for the ride:
To make unit tests of mixed Relay/TIR functions possible needed
to be able to supply a checked_type to GlobalVar since that's currently
the only way to give a Relay type to PrimFuncs.
Use GenSym to get unique var names in ANF & partial eval so easier
to diff debug output between passes and connect program fragments
back into the overall program. Relying on pretty-printing to
automagically unique-ify var names is certainly cute but until we
have better span support is very hard to work with.
Realized both dead_code.cc and fold_constant.cc would
happily move values into a different lexical virtual
device context since device_planner.cc was being
'clever' and eliding on_devices for let-bound values
when there's no change. Fixed so that every let-bound
value has an on_device. Will be much better after
[RELAY][AST] Add virtual device as a first class field to Relay expressions tvm-rfcs#45 is implemented.
Make build -Werror clean for clang-12 (mostly move fixups).
Address post-submit comments from [Relay] PlanDevices supports 'free' on_device annotations #9693.