-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs #9689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Have you consider also pass how much mem which primfunc can use? |
|
Hi @xqdan, thx for the comment. That takes us more into memory planning and what @manupa-arm is doing with USMP (including a representation of available memory pools, analysis to determine the conflict set for every abstract buffer, and a realization pass to resolve abstract buffers to physical within a pool). Currently the notion of 'memory/storage scope' we are using here is not connected to the USMP memory pools, or indeed anything at all! It's just a label we push around for use downstream. I see us eventually reconciling the 'flow memory scope constraints' aspect I'm working on here with the 'account for memory scope constraints during scheduling' work, but we'll need to get there gradually. |
…imFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery.
junrushao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks like. Just a few nits. CC: @vinx13
3398250 to
4468dd8
Compare
|
PTAL |
|
Thanks @mbs-octoml @xqdan @junrushao1994 |
…imFuncs (apache#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
…imFuncs (apache#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
…imFuncs (apache#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
…imFuncs (apache#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
…imFuncs (apache#9689) * [TIR] Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs This is in support of apache#9613 which allows memory scopes to flow out of already-lowered PrimFuncs into the rest of the Relay program. This means scope choices made during lowering can be accounted for in the rest of the program, with device_copies inserted as required. Somewhat more speculatively we also allow memory scopes to flow in to PrimFuncs. This is in preparation for when we can split lowering into two phases: i) lower "primitive" fused Relay functions to TensorIR in a schedulable form roughly isomorphic to TE, and ii) actual scheduling down to traditional TIR. Once that split is made it will be possible to flow memory scopes out of one PrimFunc and into another so as to avoid unnecessary device_copies being necessary due to independently chosen memory scopes. I also suspect we'll want to put our focus on layouts rather than memory scopes, but this at least sets up some of the machinery. * [checkpoint] Junru's comments.
This is in support of #9613 which allows memory scopes to flow
out of already-lowered PrimFuncs into the rest of the Relay
program. This means scope choices made during lowering can
be accounted for in the rest of the program, with device_copies
inserted as required.
Somewhat more speculatively we also allow memory scopes to flow
in to PrimFuncs. This is in preparation for when we can split
lowering into two phases: i) lower "primitive" fused Relay
functions to TensorIR in a schedulable form roughly isomorphic
to TE, and ii) actual scheduling down to traditional TIR. Once
that split is made it will be possible to flow memory scopes
out of one PrimFunc and into another so as to avoid unnecessary
device_copies due to arbitrary independently chosen memory scopes.
I also suspect we'll want to put our focus on layouts rather
than memory scopes, but this at least sets up some of the
machinery.