-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Adreno] Enable static texture planning for constant data #11357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
35aec2e to
467d321
Compare
|
@csullivan @mbs-octoml PR is ready for review |
mbs-octoml
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is quite the heroic piece of work.
Can I suggest splitting this into smaller chunks?
- Plumbing mem scope through TE (and perhaps handle the mem-scope-per-output issue)
- Mem scope in graph executor mem planner, codegen & runtime.
- 1D & 2D mem pools in graph executor runtime
- Schedules
- New annotator pass, which perhaps should also be made target-specific since the rules for deriving scope from shape seem pretty target specific. Ie it should be moved to a target specific dir, and the pass can be the identity if the current target does not match. Unfortunately there is no generic mechanism for registering a target-specific pass that will work for this.
I'm not sure how to reconcile the fairly generic multi-dimensional buffer support we now have with the hard 1d vs 2d distinction you're adding in graph executor. Is there a discussion about that someplace I've missed?
I can't quite tell if your StorageInfo pass supports tuples & tuple projection. That whole pass needs commenting. But in any case worth mentioning you've eschewed the collect-and-solve constraint approach used by plan devices in favor of an eager back propagation of scopes from argument to constants. Fair enough, but please document the subset of relay this supports.
| primitive_supports_texture_ = false; | ||
| Visit(call->op); | ||
| if (primitive_supports_texture_) { | ||
| if (call->checked_type().as<TensorTypeNode>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See tvm::relay::FlattenTupleType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done by intend so far - if we use FlattenTupleType, we would analyze each tensor individually that is wrong for this case. Currently I have an impression that if we have tuple been planned in static memory planner, the memory scope should be unique for all tensor contained in tuple
f13a402 to
23e5d7f
Compare
Co-authored-by: Chris Sullivan <[email protected]>
23e5d7f to
b73066b
Compare
b73066b to
02bc11c
Compare
|
I am closing this PR because I split it to several ones:
|
|
@mbs-octoml - several more answers on initial comment
The annotation pass consist of two parts - generic one and target specific. The generic one goes by graph and detects the targets and then construct name of the function including all found targets. It is not well robust for several targets, but works quite deterministic for one any target. We are introducing Adreno specific transformation, but other can be easily added. If we need to move adreno specific part into target specific directory, I will appreciate if you can suggest the proper place for target specific relay transformations.
There is no fairly generic multi-dimensional buffer in the memory planner - memory manager operates for now by continuous 1d memory blocks or other words flatten 1d buffers. Later on in tir this flatten memory can be managed as multidimensional array using tvm arithmetic. But on the memory management stage it is represented always as 1d memory. Until this PR.
It is still open question how to deal with tuple, I need some examples of ops producing tuples and not be merged into prim function. Could you please share which ops generate tuple of tensors as its output? |
This PR is a split part of origin PR #11357 Co-authored-by: Chris Sullivan <[email protected]>
This PR is a split part of origin PR apache#11357 Co-authored-by: Chris Sullivan <[email protected]>
This PR is a split part of origin PR apache#11357 Co-authored-by: Chris Sullivan <[email protected]>
This PR is a split part of origin PR apache#11357 Co-authored-by: Chris Sullivan <[email protected]>
This PR is a split part of origin PR apache#11357 Co-authored-by: Chris Sullivan <[email protected]>
Previous PR11161 added topi compute and schedules which manages textures in dynamic mode through explicit call of cache_read in adreno primitive schedules. This PR adds top-down approach of memory annotation plus adding support in dependent parts.