-
Notifications
You must be signed in to change notification settings - Fork 5.1k
JIT: Boost inversion for oversize loops with bounds checks #118078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves JIT loop inversion logic by allowing slightly oversized loops to be inverted when they contain bounds checks that could benefit from downstream loop cloning optimizations. This addresses performance regressions introduced in #116486 where stricter size limits prevented beneficial optimizations.
Key changes:
- Modified loop inversion size checking to use a 1.25x multiplier for loops with bounds checks
- Added logic to detect loops that might benefit from cloning (those with bounds checks)
- Enhanced debug output for better traceability of inversion decisions
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/coreclr/jit/optimizer.cpp | Enhanced loop inversion logic with bounds check detection and liberal size limits |
src/coreclr/jit/loopcloning.cpp | Added debug output when loops exceed cloning size limits |
src/coreclr/jit/compiler.hpp | Simplified return logic in optLoopComplexityExceeds and removed debug output |
Comments suppressed due to low confidence (1)
src/coreclr/jit/optimizer.cpp:1915
- [nitpick] The lambda parameter name 'tree' is generic. Consider renaming it to 'node' to be more consistent with the function name 'countNode'.
auto countNode = [&mightBenefitFromCloning, &loopSize](GenTree* tree) -> unsigned {
} | ||
|
||
// If the loop shows cloning potential, tolerate some excess size. | ||
const unsigned liberalInvertSizeLimit = (unsigned)(invertSizeLimit * 1.25); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 1.25 should be defined as a named constant to improve maintainability and make it easier to adjust this threshold in the future.
const unsigned liberalInvertSizeLimit = (unsigned)(invertSizeLimit * 1.25); | |
const unsigned liberalInvertSizeLimit = (unsigned)(invertSizeLimit * LIBERAL_INVERT_SIZE_LIMIT_FACTOR); |
Copilot uses AI. Check for mistakes.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Diffs show six-digit size increases, though much of that is coming from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have specific large regression cases you're targeting here? It would be good to call them out.
|
||
JITDUMP("Loop " FMT_LP ": complexity %u does not exceed limit %u\n", loop->GetIndex(), loopComplexity, limit); | ||
return false; | ||
return (result == BasicBlockVisit::Abort) ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (result == BasicBlockVisit::Abort) ? true : false; | |
return (result == BasicBlockVisit::Abort); |
Addresses some of the regressions in #116486. If a loop has bounds checks in it, it might benefit from cloning, so perhaps we ought to tolerate going a bit over the inversion size limit to enable downstream optimizations. I ought to do something for GDV checks; perhaps as a follow-up, I'll move the checks in loop cloning to something I can reuse here. The 1.25x figure isn't all that scientific -- I found it to be the smallest factor necessary to make a dent in the regressions from less cloning I looked at, and despite the size increases, it's a net PerfScore win across collections.