Skip to content

Commit 456c58d

Browse files
jcf94merrymercyminminsunFrozenGene
authored
[Ansor][AutoTVM v2.0] Part 0: Ansor minimum system for auto schedule generating (#5962)
* Code migration Start (#1) * Init commit: Code migration Start * Add loop_state.cc/h * Add ComputeDAG basic test * Split transform_step out & Update more UTs (#3) * Split transform_step out * Update GetProducers & GetConsumers * Update UTs * Add UT for CacheReadWrite & Some bug fix * Add search_task, measure and serialization (#4) * Add FollowSplit & FollowFusedSplit tests * Update dag.InferBound & its UT * Add search_task, measure and serialization * Update Serialization UT * Add MetaTileRewritePolicy (#5) * Add feature * Add cost_model, meta_tile_rewrite_policy * Add MetaTileRewritePolicy basic UT * Basic Python API for State (#6) * Add Basic Python API for State * Add UTs for State * Add Python API: Measure & Task (#7) * Update the return value of state operation * Add task * Copy measure.py & utils.py * Fix LocalBuilder * Fix LocalRunner * Add ansor.auto_schedule() API; First AutoSchedule working version(#8) * Add basic Python support for ansor.auto_schedule * Update AutoSchedule API * Bug fix for get the attach point of a fused iter * Update UT after infer bug fix * Bug fix & Add python serialization API (#10) * Delete C++ UT hack since Python is ready * Add ndarray.non_empty * Update Serialization python API * Improve code style, python wrapper and test cases (#11) * Update c++ code style and unit test * Update python State wrapper and test cases * fix unit tests * Add RPCRunner & OpenCL/CUDA test (#12) * Add RPCRunner & OpenCL search test * Add CUDA search test * Add RPCRunner test * rebase to upstream/master * Add Ansor basic tutorial (#13) * Add basic tutorial * migrate feature extraction (#14) * Add XGBModel & RPCRunnerWarpper (#15) * Add XGBModel & RPCRunnerWarpper * Revert "Add Parallel Granularity Mutation" * Migrate workload_registry.py (#16) * add workload registry * update * update * add task scheduler (#17) * Add conv2d cuda tutorial with workload registry (#18) * add tune_test.py (the old tune_wkl.py) (#19) * add tune_test.py (the old tune_wkl.py) * update * fix measure * fix for gpu * Code refine for tune_test.py & Add a pre load callback (#20) * Bug fix for tutorials * Add PreLoadMeasuredStates * Add search_callback support for task tuner * Code refine for tune_test.py * Update * Update * Update * Update * Bug fix * Add python custom sketch rule (#21) * Add custom sketch rule * Bug fix * Ansor Relay Integration (without layout rewrite) (#22) * relay integration * Add tune_op_subgraph.py & Some code clean for tune_network.py (#23) * Add single op tune scripts * Add tune subgraph support * Merge all op & all subgraph to one file * Rename file * add explicit_unroll_max_extent (#25) * Add Index simplification & API update (#26) * Add vectorized cooperative_fetching test * Update math simplify for vectorized CF * File rename * Update tune_network * API update * Update PreLoadMeasuredStates & Some bug fix (#27) * Add a threading wrapper to fix the test bug * Set default TVM_USE_AUTO_SCHEDULER to false * Update PreLoadMeasuredStates callback * Add tensorize step for loop_state (#31) * Add tensorize step * State python api update (#33) * Start to update api * Add compute_dag to state * API update * kernel layout rewrite (#28) * kernel layout rewrite * remove some hacks * add defuse_ops pass and move kernel_layout_rewrite pass after fuse_ops pass * set TVM_RELAY_DISABLE_BUILD_CACHE for task extraction and prepare_layout_rewrite * [cache flush] port cache flush to ansor (#32) * Improve relay integration (#34) * tmp checkpoint * Improve relay integration * Improve relay integration * Fix xgb error & Simplify dispatcher (#35) * Rename "MetaTileRewritePolicy" to "SketchPolicy". (#36) * Rename "MetaTileRewritePolicy" to "SketchPolicy". * Add a new class for auto_unroll_max_step, storage_offset in StageNode * fix tune_op_subgraph.py * rebase * Migrate all node::make to noderef's construct function (#37) * Start to move xxxnode::make to noderef() * Update * Update * Finish transform_step * Finish comute dag & auto schedule * Update * Update * Update * Update * Update * Code refine * Code refine * Code refine * Update * Update * Some lint fix & Recover the double constructor of tvm::PrimExpr (#39) * lint fix * clang-format-fix * pylint fix * Update * Recover the double constructor of tvm::PrimExpr * Fix pylint * pylint fix * pylint fix * Add MutateComputeLocation and MutateParallel in evolutionary search (#40) * Add MutateComputeLocation and MutateParallel in evolutionary search * fix lint * Improve loop state python API (stage_tensors -> stage_ops) (#41) * improve loop state python API (stage_tensors -> stage_ops) * fix * ComputeDAG bug fix & Add Custom TensorCore Matmul Example (#42) * Bug Fix * Sample example of Custom TensorCore Matmul * Rever Commits, Start to build minimum Ansor system * Code clean for minimum Ansor system * Bug fix & Delete AccessAnalyzer * Delete attachmap & Code clean * Doc update Update statenode::stages from vector to Array * Headfile update & Python doc update * clang-format fix * pylint fix * Update * Doc update * Update * Bug fix after code merge to the new master * clang-format fix * Update * Update * Update std::vector to Array; Update verbosity setting; Some commemts addressed * std::vector->Array & std::string->String * Add init_state to ComputeDAG * Update * Update some unordered_map to Map * clang-format fix * Comments addressed Delete ReplayAndInferBound Delete ReplaySteps & InferBoundCommon * Lint fix * Update * Update * Update * Update * Update * Update * Update * Update * Update * Rename ansor namespace to auto_schedule * Update * Rename ThreadPool to ParallelFor * Add parallel_for * Remove ThreadPool * Update python/tvm/auto_schedule/auto_schedule.py * trigger CI Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: Minmin Sun (孙敏敏) <[email protected]> Co-authored-by: Zhao Wu <[email protected]>
1 parent a23592c commit 456c58d

35 files changed

+6266
-0
lines changed

CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,7 @@ assign_source_group("Include" ${GROUP_INCLUDE})
185185

186186
# Source file lists
187187
file(GLOB_RECURSE COMPILER_SRCS
188+
src/auto_schedule/*.cc
188189
src/node/*.cc
189190
src/ir/*.cc
190191
src/arith/*.cc
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
# pylint: disable=unused-import, redefined-builtin
18+
""" Namespace for TVM Auto-scheduler. """
19+
20+
from . import compute_dag
21+
from . import measure
22+
from . import measure_record
23+
from . import loop_state
24+
from . import utils
25+
from . import workload_registry
26+
27+
# Shortcut
28+
from .compute_dag import ComputeDAG
29+
from .auto_schedule import SearchTask, TuningOptions, HardwareParams, \
30+
auto_schedule, EmptyPolicy
31+
from .measure import MeasureInput, LocalBuilder, LocalRunner
32+
from .measure_record import RecordToFile, RecordReader, load_best, \
33+
load_records, save_records
34+
from .workload_registry import register_workload, make_workload_key
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
""" Register FFI APIs from C++ for the namespace tvm.auto_schedule. """
19+
import tvm._ffi
20+
21+
22+
tvm._ffi._init_api("auto_schedule", __name__)
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
"""
19+
User interface for TVM Auto-scheduler.
20+
21+
The basic schedule search process for TVM Auto-scheduler is designed to be:
22+
`Program sampling` -> `Performance Tuning`.
23+
24+
In `Program sampling`, we use some predefined precise or heuristic rules to generate several
25+
initial schedules. Based on these initial starting points, we perform `Performance Tuning` which
26+
uses cost model based evolutionary search to select schedules with the best performance.
27+
28+
Candidate schedules are measured against the specific hardware target.
29+
"""
30+
31+
import tvm._ffi
32+
from tvm.runtime import Object
33+
from .measure import LocalBuilder, LocalRunner
34+
from . import _ffi_api
35+
36+
37+
@tvm._ffi.register_object("auto_schedule.HardwareParams")
38+
class HardwareParams(Object):
39+
""" The parameters of target hardware used to guide the search policy
40+
41+
TODO(jcf94): This is considered to be merged with the new Target specification:
42+
https://discuss.tvm.ai/t/rfc-tvm-target-specification/6844
43+
44+
Parameters
45+
----------
46+
num_cores : int
47+
The number of device cores.
48+
vector_unit_bytes : int
49+
The width of vector units in bytes.
50+
cache_line_bytes : int
51+
The size of cache line in bytes.
52+
"""
53+
def __init__(self, num_cores, vector_unit_bytes, cache_line_bytes):
54+
self.__init_handle_by_constructor__(_ffi_api.HardwareParams, num_cores,
55+
vector_unit_bytes, cache_line_bytes)
56+
57+
58+
@tvm._ffi.register_object("auto_schedule.SearchTask")
59+
class SearchTask(Object):
60+
""" The computation information and hardware parameters for a specific schedule search task.
61+
62+
Parameters
63+
----------
64+
dag : ComputeDAG
65+
The ComputeDAG for the corresponding compute declaration.
66+
workload_key : str
67+
The workload key for the corresponding compute declaration.
68+
target : tvm.target.Target
69+
The target device of this search task.
70+
target_host : Optional[tvm.target.Target]
71+
The target host device of this search task.
72+
hardware_params : Optional[HardwareParams]
73+
Hardware parameters used in this search task.
74+
"""
75+
def __init__(self, dag, workload_key, target, target_host=None,
76+
hardware_params=None):
77+
self.__init_handle_by_constructor__(_ffi_api.SearchTask, dag,
78+
workload_key, target, target_host,
79+
hardware_params)
80+
81+
82+
@tvm._ffi.register_object("auto_schedule.SearchPolicy")
83+
class SearchPolicy(Object):
84+
""" The base class of search policies. """
85+
86+
87+
@tvm._ffi.register_object("auto_schedule.EmptyPolicy")
88+
class EmptyPolicy(SearchPolicy):
89+
""" This is an example empty search policy which will always generate
90+
the init state of ComputeDAG.
91+
"""
92+
def __init__(self):
93+
self.__init_handle_by_constructor__(_ffi_api.EmptyPolicy)
94+
95+
96+
@tvm._ffi.register_object("auto_schedule.TuningOptions")
97+
class TuningOptions(Object):
98+
""" This controls the options of performance tuning.
99+
100+
Parameters
101+
----------
102+
num_measure_trials: int = 0
103+
The number of measurement trials.
104+
The search policy measures `num_measure_trials` schedules in total and returns the best one
105+
among them.
106+
With `num_measure_trials` == 0, the policy will do the schedule search but won't involve
107+
measurement. This can be used to get a runnable schedule quickly without auto-tuning.
108+
early_stopping: Optional[int]
109+
Stop the tuning early if getting no improvement after n measurements.
110+
num_measures_per_round: int = 64
111+
The number of schedules to be measured at each search round.
112+
The whole schedule search process will try a total number of `num_measure_trials` in several
113+
rounds.
114+
verbose: int = 1
115+
Verbosity level. 0 for silent, 1 to output information during schedule search.
116+
builder: Union[ProgramBuilder, str] = 'local'
117+
ProgramBuilder which builds the program.
118+
runner: Union[ProgramRunner, str] = 'local'
119+
ProgramRunner which runs the program and measures time costs.
120+
measure_callbacks: Optional[List[MeasureCallback]]
121+
Callback functions called after each measurement.
122+
Candidates:
123+
- auto_schedule.RecordToFile
124+
pre_search_callbacks: Optional[List[SearchCallback]]
125+
Callback functions called before the search process.
126+
Candidates:
127+
- auto_schedule.PreloadMeasuredStates
128+
- auto_schedule.PreloadCustomSketchRule
129+
TODO(jcf94): Add these implementation in later PRs.
130+
"""
131+
def __init__(self, num_measure_trials=0, early_stopping=None, num_measures_per_round=64,
132+
verbose=1, builder='local', runner='local', measure_callbacks=None,
133+
pre_search_callbacks=None):
134+
if isinstance(builder, str):
135+
if builder == 'local':
136+
builder = LocalBuilder()
137+
else:
138+
raise ValueError("Invalid builder: " + builder)
139+
elif not isinstance(builder, tvm.auto_schedule.measure.ProgramBuilder):
140+
raise ValueError("Invalid builder: " + builder +
141+
" . TuningOptions expects a ProgramBuilder or string.")
142+
143+
if isinstance(runner, str):
144+
if runner == 'local':
145+
runner = LocalRunner()
146+
else:
147+
raise ValueError("Invalid runner: " + runner)
148+
elif not isinstance(runner, tvm.auto_schedule.measure.ProgramRunner):
149+
raise ValueError("Invalid runner: " + runner +
150+
" . TuningOptions expects a ProgramRunner or string.")
151+
152+
self.__init_handle_by_constructor__(
153+
_ffi_api.TuningOptions, num_measure_trials, early_stopping if early_stopping else -1,
154+
num_measures_per_round, verbose, builder, runner, measure_callbacks,
155+
pre_search_callbacks)
156+
157+
158+
def auto_schedule(task, search_policy='default', tuning_options=None):
159+
""" Do auto scheduling for a computation declaration.
160+
161+
The task parameter can be a `string` as workload_key, or directly
162+
passing a `SearchTask` as input.
163+
164+
Parameters
165+
----------
166+
task : SearchTask
167+
The SearchTask for the computation declaration.
168+
search_policy : Union[SearchPolicy, str] = 'default'
169+
The search policy to be used for schedule search.
170+
tuning_options : Optional[TuningOptions]
171+
Tuning and measurement options.
172+
173+
Returns
174+
-------
175+
A `te.schedule` and the a list of `te.Tensor` to be used in `tvm.lower` or `tvm.build`.
176+
"""
177+
if not isinstance(task, SearchTask):
178+
raise ValueError("Invalid task: " + task +
179+
" . `auto_schedule.auto_schedule` expects a SearchTask.")
180+
181+
if isinstance(search_policy, str):
182+
if search_policy == 'default':
183+
# TODO(jcf94): This is an example policy for minimum system, will be upgrated to
184+
# formal search policy later.
185+
search_policy = EmptyPolicy()
186+
else:
187+
raise ValueError("Invalid search policy: " + search_policy)
188+
elif not isinstance(search_policy, SearchPolicy):
189+
raise ValueError("Invalid search policy: " + search_policy +
190+
" . `auto_schedule.auto_schedule` expects a SearchPolicy or a string.")
191+
192+
sch, tensors = _ffi_api.AutoSchedule(task, search_policy,
193+
tuning_options if tuning_options else TuningOptions())
194+
return sch, tensors

0 commit comments

Comments
 (0)