[MetaSchedule] Enable anchor-block tuning #13206

masahi · 2022-10-26T21:06:45Z

Note: Most diffs are from test cases, which are bloated due to many TVMScript modules.

Building on the notion of "module equality" introduced in #13050, I'm adding a new variant of module equality based on the "anchor blocks". I defined the anchor block in #13194.

Currently, MS does tuning at the level of subgraphs. So for example in resnet where there are conv2d -> add -> relu and conv2d -> add -> add -> relu subgraphs, the two subgraphs are treated as distinct tuning tasks even if the anchor block conv2d workloads are identical. The new module equality will identify them as equal, so it reduces the number of tuning tasks if shorter tuning time is preferred over subgraph-level performance. Currently there is no dedicated API for anchor-block tuning - passing module_equality="anchor-block" to task extraction and tune_relay will enable it.

This is particularly effective for int8 models, since each conv2d / dense (anchor) op is quantized slightly differently, ending up many similar but not identical elemwise ops fused after the anchor blocks. On the int8 resnet50 model I tested, it reduced the number of conv2d tuning tasks from 36 to 23.

The interesting question is the performance difference between full subgraph-level tuning and anchor-block tuning. To experiment this, I tested the int8 resnet50 mentioned above, where anchor-based tuning makes the most difference in terms of the # of extracted tasks. The results are summarized below.

num_iter_per_task = 32, and max_iters_per_task = 128 in all cases.

Target	Anchor tuning	Subgraph tuning
x86 VNNI	4.45	4.58
Hexagon	58.4	58.1
CUDA tensor core (batch size 16)	6.0	6.7 (TODO: Try again)

From the model + target combinations I tested, I didn't see much perf difference beyond those from natural tuning flaky-ness. The tensorcore result is a bit weird and needs more investigation, but I found that tuning this model using int8 tensor core auto tensorization is incredibly slow currently (for example, getting the 6.7 result took 12 hours). There is also the correctness issue discussed in #13204. So I haven't done more experiments on tensorcore.

Applying the trace tuned on an anchor block to the target block

The tricky problem that this work address is the application of a trace, which is tuned on a "representative" anchor subgraph, to the target mod which has different post blocks. Note that, in the resnet example where there are conv2d -> add -> relu and conv2d -> add -> add -> relu subgraphs, we tune the "smaller" conv2d -> add -> relu subgraph, not just the pure anchor block conv2d.

So while applying a trace tuned on just conv2d to conv2d -> add would be trivial (the existing Trace::ApplyToSchedule would just work), in practice we would be applying a trace tuned on conv2d -> add to conv2d -> subtract, for example. My proposed solution is implemented in src/meta_schedule/trace_apply.cc and it is tested extensively in test_meta_schedule_trace_apply.py.

@junrushao @zxybazh @vinx13 @tkonolige

…oschedule" This reverts commit 02df571.

tvm-bot · 2022-10-26T21:06:49Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @Hzfengsy, @elvin-n, @junrushao _{See #10317 for details}
Built docs for commit 1fe554d can be found here.

_{Generated by tvm-bot}

junrushao

This is just amazing!! Thanks for the PR!

Hzfengsy

LGTM. Thanks @masahi for the great work!

zxybazh

Apart from a few nits looks pretty good to me, very delicate design and comprehensive tests! Thanks @masahi!

python/tvm/meta_schedule/trace_apply.py

src/meta_schedule/trace_apply.cc

src/tir/schedule/primitive/cache_read_write.cc

src/meta_schedule/trace_apply.cc

* Introduce new module equality to extract only anchor block tasks * enabling application of anchor trace to different subgraph * fixed anchor block extraction * fixed UB in task extraction * Reworked anchor trace application and inlining logic * fixed anchor block extraction for winograd * fix inline logic for winograd * refactor, clean up, renaming * fix reverse compute inline unapplicable case * fixed get_block applicablity condition * adding test * introduce HasBlock utility * Decoupled trace creation and application in Trace::ApplyJSONToschedule * add test * adding more test * black * Revert "Decoupled trace creation and application in Trace::ApplyJSONToschedule" This reverts commit 02df571. * add tests * add doc * use anchor tuning in hexagon int8 tuning test * cpplint * suppress mypy on ffi * add workaround for false positive maybe-uninitialized warning * add a minimal anchor tuning test * relax tol for i386, remove gpu test since it requires sm86 * add doc for "anchor-block" module equality * address comments * add test for cache_write + AllocateConst bug

Following #13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

) Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

* Introduce new module equality to extract only anchor block tasks * enabling application of anchor trace to different subgraph * fixed anchor block extraction * fixed UB in task extraction * Reworked anchor trace application and inlining logic * fixed anchor block extraction for winograd * fix inline logic for winograd * refactor, clean up, renaming * fix reverse compute inline unapplicable case * fixed get_block applicablity condition * adding test * introduce HasBlock utility * Decoupled trace creation and application in Trace::ApplyJSONToschedule * add test * adding more test * black * Revert "Decoupled trace creation and application in Trace::ApplyJSONToschedule" This reverts commit 02df571. * add tests * add doc * use anchor tuning in hexagon int8 tuning test * cpplint * suppress mypy on ffi * add workaround for false positive maybe-uninitialized warning * add a minimal anchor tuning test * relax tol for i386, remove gpu test since it requires sm86 * add doc for "anchor-block" module equality * address comments * add test for cache_write + AllocateConst bug

) Following apache#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

) Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

masahi added 21 commits October 27, 2022 05:44

Introduce new module equality to extract only anchor block tasks

223f6ff

enabling application of anchor trace to different subgraph

f5ca225

fixed anchor block extraction

e6a4f21

fixed UB in task extraction

f107fd7

Reworked anchor trace application and inlining logic

b2dfc18

fixed anchor block extraction for winograd

a4bacd1

fix inline logic for winograd

718a514

refactor, clean up, renaming

6bbd4d8

fix reverse compute inline unapplicable case

fbe5160

fixed get_block applicablity condition

cf4d8b7

adding test

8ee4da6

introduce HasBlock utility

51b3766

Decoupled trace creation and application in Trace::ApplyJSONToschedule

289cd9a

add test

5c0d47f

adding more test

fbb2361

black

0b48c14

Revert "Decoupled trace creation and application in Trace::ApplyJSONT…

ac24ea3

…oschedule" This reverts commit 02df571.

add tests

200a2a5

add doc

0423cec

use anchor tuning in hexagon int8 tuning test

abb2d0b

cpplint

1e7db84

suppress mypy on ffi

c84bf21

masahi marked this pull request as draft October 26, 2022 22:30

masahi mentioned this pull request Oct 26, 2022

[Hexagon] Add E2E test demonstrating how to apply blocked layout schedule to conv2d via metaschedule #13180

Merged

masahi added 3 commits October 27, 2022 07:53

add workaround for false positive maybe-uninitialized warning

346d55f

add a minimal anchor tuning test

b88e63b

relax tol for i386, remove gpu test since it requires sm86

2f37900

masahi marked this pull request as ready for review October 27, 2022 01:45

add doc for "anchor-block" module equality

5e2367a

masahi force-pushed the ms-anchor-tuning branch from 9bae181 to 5e2367a Compare October 27, 2022 01:48

junrushao approved these changes Oct 27, 2022

View reviewed changes

Hzfengsy approved these changes Oct 28, 2022

View reviewed changes

zxybazh approved these changes Oct 28, 2022

View reviewed changes

python/tvm/meta_schedule/trace_apply.py Outdated Show resolved Hide resolved

src/meta_schedule/trace_apply.cc Show resolved Hide resolved

src/tir/schedule/primitive/cache_read_write.cc Show resolved Hide resolved

src/meta_schedule/trace_apply.cc Show resolved Hide resolved

masahi added 2 commits October 28, 2022 16:50

address comments

d6893af

add test for cache_write + AllocateConst bug

1fe554d

zxybazh merged commit f42826e into apache:main Oct 28, 2022

MasterJH5574 mentioned this pull request Nov 21, 2022

[Fix][MetaSchedule] Param for rule AutoBind on Python side #13454

Merged

junrushao pushed a commit that referenced this pull request Nov 21, 2022

[Fix][MetaSchedule] Param for rule AutoBind on Python side (#13454)

77f9c49

Following #13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

MasterJH5574 added a commit to MasterJH5574/tlc-relax that referenced this pull request Nov 22, 2022

[Fix][MetaSchedule] Param for rule AutoBind on Python side (#13454)

3e6e1e1

Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

MasterJH5574 added a commit to mlc-ai/relax that referenced this pull request Nov 22, 2022

[Fix][MetaSchedule] Param for rule AutoBind on Python side (#13454) (#26

cbf525b

) Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

MasterJH5574 added a commit to MasterJH5574/tlc-relax that referenced this pull request Nov 22, 2022

[Fix][MetaSchedule] Param for rule AutoBind on Python side (#13454)

6cbf076

Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022

[Fix][MetaSchedule] Param for rule AutoBind on Python side (apache#13454

1345a0c

) Following apache#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

Hzfengsy pushed a commit to mlc-ai/relax that referenced this pull request Dec 9, 2022

[Fix][MetaSchedule] Param for rule AutoBind on Python side (#13454) (#26

41c49dc

) Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

MasterJH5574 added a commit to mlc-ai/relax that referenced this pull request Dec 14, 2022

[Fix][MetaSchedule] Param for rule AutoBind on Python side (#13454) (#26

28bb6dd

) Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

MasterJH5574 added a commit to mlc-ai/relax that referenced this pull request Dec 24, 2022

[Fix][MetaSchedule] Param for rule AutoBind on Python side (#13454) (#26

83fbfaa

) Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

tqchen pushed a commit to mlc-ai/relax that referenced this pull request Dec 30, 2022

[Fix][MetaSchedule] Param for rule AutoBind on Python side (#13454) (#26

e0e119f

) Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

MasterJH5574 added a commit to mlc-ai/relax that referenced this pull request Jan 9, 2023

[Fix][MetaSchedule] Param for rule AutoBind on Python side (#13454) (#26

1fd4e7c

) Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

MasterJH5574 added a commit to mlc-ai/relax that referenced this pull request Jan 12, 2023

[Fix][MetaSchedule] Param for rule AutoBind on Python side (#13454) (#26

9947efd

) Following apache/tvm#13206, this PR brings the new parameter added to the AutoBind schedule rule to Python side.

leandron mentioned this pull request Feb 1, 2023

TVM v0.11.0 Release Candidate Notes #13899

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MetaSchedule] Enable anchor-block tuning #13206

[MetaSchedule] Enable anchor-block tuning #13206

Uh oh!

masahi commented Oct 26, 2022 •

edited

Loading

Uh oh!

tvm-bot commented Oct 26, 2022 •

edited

Loading

Uh oh!

junrushao left a comment

Uh oh!

Hzfengsy left a comment

Uh oh!

zxybazh left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[MetaSchedule] Enable anchor-block tuning #13206

[MetaSchedule] Enable anchor-block tuning #13206

Uh oh!

Conversation

masahi commented Oct 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Applying the trace tuned on an anchor block to the target block

Uh oh!

tvm-bot commented Oct 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junrushao left a comment

Choose a reason for hiding this comment

Uh oh!

Hzfengsy left a comment

Choose a reason for hiding this comment

Uh oh!

zxybazh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

masahi commented Oct 26, 2022 •

edited

Loading

tvm-bot commented Oct 26, 2022 •

edited

Loading