-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
RyuJIT has several loop optimization phases that have various issues (both correctness and performance) and can be significantly improved. RyuJIT also lacks some loop optimizations that have been shown to benefit various use cases. This meta-issue collects various links to the most important identified issues in one place, so they can be easily seen without searching the entire GitHub issue database. This issue is long-term. Specific issues will be created to identify work that will be included in each release.
Release-specific issues:
- .NET 10 loop optimization work: Improve JIT loop optimizations (.NET 10) #108901
- .NET 9 loop optimization work: Improve JIT loop optimizations (.NET 9) #93144
- .NET 8 loop optimization work: Improve JIT loop optimizations (.NET 8) #77032
- .NET 7 loop optimization work: Improve JIT loop optimizations (.NET 7) #55235
- .NET 6 loop optimization work: Improve JIT loop optimizations (.NET 6) #43549
If an item is implemented, it will be removed from this list (so this issue should only contain continuing loop optimization improvement opportunities).
Existing Optimizations
Below is a list of the existing loop-related RyuJIT phases and a short description of the improvement opportunities.
Multi-dimensional arrays
Multi-dimensional (MD) arrays are listed in this loop optimization issue because optimizing MD access is most valuable in the context of loop optimization. The first steps to improvement were implemented with #70271. Follow-up work:
- RyuJIT: support cloning loop nests with multi-dimensional array accesses #71674
- Improve hoisting, CSE, LSRA for MD array accesses #71676
Loop Cloning
This optimization creates two copies of a loop: one with bounds checks and one without bounds checks and executes one of them at runtime based on some condition. Several issues have been identified with this optimizations. One recurring theme is unnecessary loop cloning where we first clone a loop and then eliminate range checks from both copies.
- Loop cloning driven by type tests #65206
- JIT: examples where loop cloning is not useful #8558 JIT: examples where loop cloning is not useful
- Poor loop optimization in BilinearInterpol benchmark #31831 Poor loop optimization in BilinearInterpol benchmark
- loop cloning and pgo #48850 loop cloning and pgo. Remaining: use PGO data to influence cost/benefit analysis of deciding to clone a loop.
- If compReturnBB is unreachable we should remove it #48740 (comment) Poor tracking of return blocks impacts loop cloning
- Consider hoisting of class init checks for loop cloning and inversion #49102 Consider hoisting of class init checks for loop cloning and inversion
- Support loop cloning of class member arrays #77071 Clone arrays with class member arrays.
- If there are several different kinds of cloning criteria (array bounds and type tests, say) we currently require that we be able to satisfy them all in order to clone. In particular array bounds require increasing loops with suitable exit relops, and so we will fail cloning even if we could have just left the array aspects alone and cloned for type tests. Not sure how often this happens but if we add even more kinds of cloning conditions then we might see this fairly often.
Loop Unrolling
The existing loop unrolling phase only does full unrolls, and only for SIMD loops: current heuristic is that the loop bounds test must be a SIMD element count. The impact of the optimization is currently very limited but in general it's a high-impact optimization with the right heuristics.
- Implement loop peeling #93142 Implement loop peeling
- Loop unrolling support in RyuJIT #4248 Loop unrolling support in RyuJIT
- JIT optimization: loop unrolling #8107 JIT optimization: loop unrolling
Loop Invariant Code Hoisting
This phase attempts to hoist code that will produce the same value on each iteration of the loop to the pre-header. There is
at least one (and likely more) correctness issue:
- JIT: Loop hoisting re-ordering exceptions #6639 JIT: Loop hoisting re-ordering exceptions
And multiple issues about limitations of the algorithm:
- JIT: limitations in hoisting (loop invariant code motion) #35735 JIT: limitations in hoisting (loop invariant code motion)
- JIT: Loop hoisting inhibited by phase-ordering issue #6554 JIT: Loop hoisting inhibited by phase-ordering issue
- RyuJIT: Loop hoist invariant struct field accesses #7265 RyuJIT: Loop hoist invariant struct field accesses
- RyuJIT: missed opportunity for LICM #6666 RyuJIT: missed opportunity for LICM
- Indexer.Set of List is much slower than Array #29091 (comment) Indexer.Set of List is much slower than Array
- In addition, we should strongly consider hoisting conditionally executed trees.
Missing Optimizations
Several major optimizations are missing even though we have evidence of their effectiveness (at least on microbenchmarks).
Loop Unswitching
Loop unswitching moves a conditional from inside a loop to outside of it by duplicating the loop's body, and placing a version of the loop inside each of the if and else clauses of the conditional. It has elements of both Loop Cloning and Loop Invariant Code Motion.
Benefits
It's easy to show the benefit of improved loop optimizations on microbenchmarks. For example, the team has done analysis of JIT microbenchmarks (benchstones, SciMark, etc.) several years ago. The analysis contains estimates of perf improvement from several of these optimizations (each is low single digit %). Real code is also likely to have hot loops that will benefit from improved loop optimizations.
The benchmarks and other metrics we will measure to show the benefits is TBD.
category:planning
theme:loop-opt
skill-level:expert
cost:large
impact:medium