-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Description
Please leave any comments or edit this issue directly to adjust the release notes! Also see the rc0 vote thread in #13026.
Introduction
The TVM community has worked since the v0.9 release to deliver the following new exciting improvments!
- Metaschedule
- Software pipelining and padding for irregular shapes for auto tensorization
- Stabilized and polished user-interfaces (e.g.
databasechanges,tune_relay) - A new MLP-based cost model
- TIR
- New schedule primitive for
PadEinsum - A new TIR node:
DeclBuffer - INT8 Intrinsics for TensorCores for CUDA!
- New schedule primitive for
- microTVM
- Improved schedule primitives for ARM v8-m ISA
And many other general improvements to code quality, TVMScript, and more! Please visit the full listing of commits for a complete view: v0.9.0...v0.10.0rc0.
RFCs
These RFCs have been merged in apache/tvm-rfcs since the last release.
- Issue Triage Workflow RFC (#93) (
3345cc1) - [RFC] Add Commit Message Guideline (#88) (
e8a2d8b) - Add Target Features RFC (#78) (
1ab898d) - [RFC] TVMScript Metaprogramming (#79) (
ffbf686) - Add Target Pre-processing RFC (#71) (
78423c5) - [RFC] Name mangling in IRModules (#84) (
831d702) - Asynchronous stage in software pipeline (#80) (
aecb219) - [RFC] Buffer Layout Padding (#77) (
ca695fe) - [RFC] Create LLVM scope class for use with LLVM libraries (#83) (
22d1d11)
What's Changed
Note that this list is not comprehensive of all PRs and discussions since v0.9. Please visit the full listing of commits for a complete view: v0.9.0...v0.10.0rc0.
RELAY
- [Relay] Extract intermediate node by its expression ID #12646
- [Relay][Op] Multinomial #12284
- [Relay][Op] Trilu operator implementation #12124
- [Relay] Allow Primitive functions to carry virtual device annotations in PlanDevices #12095
- [Relay] Move TOpPattern registration for nn.* to C++ #12072
- [Relay] CaptureIndexInSpans debugging pass #11926
FRONTEND
- [TVM PyTorch Integration] libstdc++ CXX11 ABI Compatibility & boolean tensor support #12232
- [Relay][Frontend][Onnx] Add RNN operation for ONNX frontend #12213
- TVM Vertical Integration with PyTorch #11911
- [QNN] Support different qnn params between in/out tensor in leaky_relu #12116
- [Pytorch] add aten::rnn_tanh, aten::rnn_relu #12017
TIR
- [TIR] Implement API for padded layout transformations #12720
- [TIR] Construct the inverse in SuggestIndexMap #12797
- [TIR] Support pattern matching argmax/argmin generated by TOPI #12827
- [TIR, Schedule] Add schedule primitive PadEinsum #12750
- [TIR][Meta-Schedule] Tuple-reduction scheduling support #11639
- [TIR][Arith] Add more strict checking in imm construction and folding. #12515
- [TIR, Schedule] Check consumer in-bound and covered in reverse_compute_inline #12717
- [TIR] Handle axis_separators during FlattenBuffer #12652
- [TIR] Expose MMA-related PTX builtins #12623
- [TIR] More hygenic TVM_SREF macros #12607
- [TIR][Schedule] enhance compute_at and reverse_compute_at primitive to choose possible position #12450
- [TIR] Expose Memory Copy-Related PTX Builtins #12611
- [TIR] Expose WMMA-related TensorCore builtins #12589
- [TIR][CompactBufferAllocation] Improve upperbound estimation of buffer compaction #12527
- [TIR] Add pass to check for out of bounds memory access #12352
- [TIR][Schedule] Support for specific consumer block targeting in cache_read #12505
- [TIR] Support AllocateConst nodes in TensorIR scheduling flow #12489
- [TIR][Schedule][UX] Beautify TIR Trace Printing #12507
- [TIR] Expose TVM Backend API-related Builtins and Misc #12468
- [TIR] Add pass ManifestSharedMemoryLocalStage #12355
- [TIR] Add tir::builtin::undef #12266
- [TIR] Add DeclBuffer IR node and functors #12300
- [TIR] Add tir::builtin::assume #12267
- [UnitTest][TIR] Testing utility for before/after transform tests #12264
- [ROOFLINE] Add CUDA support to roofline analysis #12205
- [TIR] Asynchronous stage in software pipeline #12171
- [TIR][Schedule] DecomposePadding #12174
- [TIR Pass] Decouple flatten buffer to lower opaque block and flatten buffer. #12172
- [TIR] Well-Formed Verifier #12166
- [TIR] Moved PrimExpr operator overload from op.h to expr.h #11973
- [TIR][Schedule] Refactor Tensorize #12070
- [TIR] Make conversion from Integer to int64_t explicit #12010
- [TIR] Add sugar method
Schedule.work_on#11999
METASCHEDULE
- [Metaschedule] MultiLevelTiling for wide vector architectures #12845
- [MetaSchedule] PyDatabase Complete Function Reload Support #12838
- [MetaSchedule] Support padding for irregular shapes for CUDA tensor core #12759
- [MetaSchedule][Test] MLT uses SEqual tests #12805
- [MetaSchedule] Enable Clone Function for Task-Level Classes #12796
- [MetaSchedule][Test] Migrate
check_tracetocheck_sketch#12764 - [MetaSchedule][Testing] Migrate Add-RFactor to use SEqual #12758
- [MetaSchedule][UX] Convenient Object Creation #12643
- [MetaSchedule] Introduce
UnionandOrderedUnionin Database #12628 - [MetaSchedule] Introduce
ScheduleFnDatabase#12626 - [MetaSchedule][UX] Make
Databasewith-able #12520 - [MetaSchedule] Add software pipeline in CUDA tensor core auto tensorization #12544
- [MetaSchedule] Migrate MemoryDatabase to C++ #12514
- [MetaSchedule] Implement ScheduleFn as a C++ class #12513
- [MetaSchedule] Extend tune_tir to support tuning of specific blocks. #12342
- [MetaSchedule] Enhance Conv2d NCHW Winograd Schedule Rules #12127
- [MetaSchedule][Test] Add unittests for TBG #12262
- [MetaSchedule][Test] Add unittests for CBR #12252
- [MetaSchedule][Test] Add unittests for SFM #12251
- [MetaSchedule][Test] Add unittests for NRM #12250
- [MetaSchedule][Test] Add unittests for T2D #12249
- [MetaSchedule][Test] Add unittests for GRP #12246
- [MetaSchedule][Test] Add unittests for GMM #12243
- [MetaSchedule, Testing] Generalize in/out dtype of testing te workloads #12122
- [MetaSchedule] Allow MultiLevelTilingTensorCore rule to specify multiple tensor intrin groups #12113
- [MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059
- [MetaSchedule][Test] Add unittests for DIL #12077
- [MetaSchedule][Test] Add unittests for DEP #12071
- [MetaSchedule] Added a cost model #11961
- [MetaSchedule][Test] Add unittests for CAP #12047
- [MetaSchedule][Test] Add unittests for C3D #12046
- [MetaSchedule][Test] Add unittests for C2D #12043
- [MetaSchedule][Testing] Add unittests for C1D search space #12036
- [MetaSchedule][Testing] Test search space of conv1d #12032
- [MetaSchedule] Handle 'warp_execution' in RewriteCooperativeFetch #11955
- [MetaSchedule] Tuning Script Upgrade #11797
- [MetaSchedule] Handle 'warp_execution' implied extend of threadIdx.x in VerifyGpuCode #11949
TVMSCRIPT
- [TVMScript] IRBuilder methods for
Stmt#12831 - [TVMScript] IRBuilder methods for
Stmt#12830 - [TVMScript] Add more helper functions to the printer infra #12829
- [TVMScript] IRBuilder methods for
Block#12815 - [TVMScript] IRBuilder methods for
Axis#12808 - [TVMScript] IRBuilder methods for
For#12786 - [TVMScript] IRBuilder methods for
PrimFunc#12755 - [TVMScript] Base IRBuilder methods for
Block#12748 - [TVMScript] Base IRBuilder methods for
PrimFunc#12745 - [TVMScript] IRBuilder methods for
IRModule#12694 - [TIR, TVMScript] Update printer / parser to make T.allocate return buffer var #12412
- [TVMScript] Printer: add boolean operators to OperationDoc #12518
- [TVMScript] Printer entry point #12462
- [TVMScript] IRBuilder, IRBuilderFrame base class #12482
- [TVMScript] Printer IRDocsifier #12396
- [TVMScript] Printer VarTable #12336
- [TVMScript] Printer Frame #12366
- [TVMScript] Text underlining in DocPrinter based on Doc's source_paths #12344
- [TVMScript] Printer Registry #12237
- [TVMScript] TracedObject class that simplifies tracing ObjectPaths #12299
- [TVMScript] Add object path tracing to StructuralEqual #12101
- [TVMScript] Python Expression Precedence #12148
- [TVMScript] Doc Definition #12244
- [UX] highlight tvm script #12197
- [TVMScript] StmtDoc Printing #12112
- [TVMScript] StmtDoc Definitions #12111
- [TVMScript] ExprDoc #12048
- [TVMScript] Add ObjectPath class #11977
- [TVMScript] Doc Base Class & DocPrinter Scaffolding #11971
MICROTVM
- Add Arm DSP implementation of Depthwise Conv2D #12448
- [skip ci][microTVM] Add pytest-xdist to pyproject.toml #12478
- [microTVM] Zephyr: Add support for FVP #12125
- Pass that removes reshapes post LowerTE #12215
- [microTVM] Refactor pytest fixtures #12207
- [microTVM][Zephyr][projectAPI] Minimize project build commands #12209
- [microtvm][RVM] Refactor Arduino/Zephyr into one RVM #12023
- [microTVM] Autotuning performance tests #11782
- aot - [AOT] Add AOTLowerMain pass to lower a Relay main into TIR #12550, [microTVM][tutorial] AOT host-driven tutorial with TFLite model #12182, [Texture] Add 2d memory support into static memory planner #11876
BYOC
- adreno - [Adreno] Change compute/schedule for ToMixedPrecision pass #12537, [Adreno][OpenCL] Get rid of extra memory copy #12286, [Adreno] Add markup pass of relay tensors for static texture planning #11878
- collage - [Collage] PruneCandidates and demo_collage_partition.py #12105, [Collage] CollagePartition pass #12086, [Collage] CombinerRule and CandidatePartition::EstimateCost #12078, [Collage] PartitionRule (though without CombinePartitionRule) #11993, [Collage] SubGraphs #11981
- cmsis-nn -[CMSIS-NN] Pad fusion with QNN Conv2D #12353, [CMSIS-NN] Re-use CPU Target Parser #12320,[CMSIS-NN][Perf] Converted Relay Conv2D into CMSIS-NN Depthwise #12006
- DNNL - [BYOC-DNNL] add post_sum pattern #12151, [BYOC-DNNL] support more post-ops #12002, [BYOC-DNNL]rewrite downsize blocks for rensetv1 to get better performance #11822
- micronpu - [microNPU] Reorder copies and computes based on the cycle count #11591, [microNPU] Add support for hard swish #12120, [microNPU] Add MergeConstants pass #12029, [microNPU] Calculate memory pressure for microNPU external functions #11209
- opencl - [OpenCLML] More ops and network coverage #12762, [OpenCL] Enable OpenCL for GPU tests #12490, [OpenCLML] CLML Profiling fixes corresponding to OpenCL Timer recent … #12711
- [BYOC] Switch TensorRT BYOC integration to IRModule-at-a-time using RelayToTIR hook #11979
- [BYOC] Handle constants in IRModule-at-a-time external codegen #11770
ETHOSN
- [ETHOSN] Add support for transpose convolution #12674
- [ETHOSN] Use pytest parameterization for integration tests #12688
- [ETHOSN] Fix tests pylint errors #12649
- [ETHOSN] Support conversion of add to depthwise #12531
- [ETHOSN] Support multiply conversion to depthwise #12403
- [ETHOSN] Add support for Resize #12535
- [ETHOSN] Remove support for older versions of the driver stack #12347
- [ETHOSN] Add support for Requantize #12384
- [ETHOSN] Supply output tensor to issupported checks #11944
- [ETHOSN] Upgrade NPU driver stack to v22.05 #11759
- [ETHOSN] Get buffer sizes from the compiled network #12160
HEXAGON
- [Hexagon] 2-Stage Pipeline; Lower Async TIR primitives to Hexagon User DMA #12785
- [Hexagon] Create test examples to show parallelization #12654
- [Hexagon] Create tests to showcase vtcm loading capabilities on Hexagon. #12667
- [Hexagon] Add Hand written HVX conv2d #12204
- [TOPI][Hexagon] Implement quantized elementwise for hexagon #12606
- [HEXAGON] [TOPI] Dequantize #12677
- [Hexagon] Implement fixed_point_multiply op through intrinsics. #12659
- [Hexagon] Asynchronous DMA support #12411
- [Hexagon] Initial support for meta schedule tuning #12587
- [TOPI][Hexagon] Implement quantized avgpool #12340
- [hexagon][topi] add sliced max_pool2 #12169
- [TOPI][HEXAGON] Implement depthwise conv2d slice op. #12218
- [TOPI] [HEXAGON] Tanh Float16 Slice Op #12165
- [HEXAGON] QCOM hexagon library (qhl) #12149
- [Hexagon] Slice op relu #11449
- [Topi][Hexagon] Implement Cast F32ToF16 and F16ToF32 Slice Op #11561
- [TOPI] [Hexagon] Reshape slice op #11983
- [Topi] [Hexagon] Conv2d slice op initial version #11489
- [Hexagon] Enable int8 vlut codegen for Relay take (LUT) operator #11693
- [TOPI] [Hexagon] Batch flatten slice op initial version #11522
- [TOPI][Hexagon] Implement Argmax Slice Op #11847
CI / TESTING
- [microTVM][CI] Rename ci_qemu to ci_cortexm #12281
- [Testing] Add decorator tvm.testing.requires_cuda_compute_version #12778
- [ci] Add bot to post welcome comment #12695
- [ci] Add retries to docker push #12773
- [Docker][CI][RISC-V] Build riscv-isa-sim (spike) in ci_riscv Docker image to enable RISC-V unit testing #12534
- Always install into a python venv in ci containers #12663
- [ci] Re-balance shards #12473
- [ci][tvmbot] Trigger GitHub Actions after merging #12361
- [ci] Add linter for PR title and body #12367
- [CI] Assert some unittests are not skipped in CI #12436
- Add RISC-V build/test pipeline to Jenkins. #12441
- [ci][docker] Tag tlcpackstaging images to tlcpack #11832
- Build and test TVM under minimal configuration #12178
- Unify name mangling in TVM #12066
- [testing] Remove wrapper from @slow #11566
- [ci] De-duplicate retry functions #12325
- [CI] Cleanup after renaming ci_qemu #12329
- [skip ci] Increase the number of shards for Cortex-M from 4 to 8. #12334
- [CI] Increase CPU Intergration tests shards to speedup runtime #12316
- [ci][tvmbot] Enable re-run for GitHub Actions #12295
- [ci] Add retries to S3 uploads/downloads #12221
- [CI] Shard Qemu python tests #12258
- [ci] Redirect sphinx-gallery URLs to S3 #11839
- Move jenkins/ dir into ci/jenkins and spread docs around #11927
OTHER
- arith - [Arith] DetectIterMap support overlapped iteration sum #12039, [Arith] Updated BufferDomainTouched to use IRVisitorWithAnalyzer #11970, [Arith][Refactor] Return Optional<PrimExpr> from TryConstFold #12784
- autotvm - [AutoTVM][Testing] Add
tune_relayscripts #12685, [AutoTVM] Add support for text buffers to ApplyHistoryBest #12521 - llvm - [LLVM] Add "cl-opt" attribute to target_kind "llvm" #12440, [LLVM] Create LLVM scope object for use with LLVM libraries #12140
- profiler - [Profiler] Fix graph_executor_debug hang #12382
- runtime - * [Runtime] Change default alignment to 64 bytes. #12586, [Runtime][PipelineExecutor] Tutorial of using pipeline executor. #11557
- target - [Target] Remove deprecated parameters from target #12416, [Target] Add Target Parser for Arm(R) Cortex(R) M-Profile CPUs #12319, [Target] Improve string interpretation in Target creation #12152
- [TOPI] Allow conv definition to have custom kernel layout #11936
- [TVMC] Workspace Pools Parameters #11427
- [Containers] Add Array::Map #12692
- [Refactor] Replace std::tie with structured bindings #12610
- Replace '> >' in templates with >>, NFC #12615
- Use std::optional instead of dmlc::optional, NFC #12443
- [Pylint] Making frontend tests pylint compliant [part 1] #12028
- [UMA] UMA v1.0 #12087
- [release] Follow ups from v0.9.0 - scripts, docs #11987
- [Pylint] Making hexagon tests pylint compliant Part 2 of N #12176
- [Pylint] Making hexagon tests pylint compliant Part 1 of N #12082
- [Pylint] Pylint integration_tests folder #11672
Metadata
Metadata
Assignees
Labels
No labels