- 
                Notifications
    
You must be signed in to change notification settings  - Fork 5.2k
 
Cloning improvements #66257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloning improvements #66257
Conversation
| 
           Tagging subscribers to this area: @JulieLeeMSFT Issue Detailsnull 
  | 
    
| 
           /azp run runtime-coreclr jitstress  | 
    
| 
          
Azure Pipelines successfully started running 1 pipeline(s). | 
    
| 
           /azp run runtime-coreclr superpmi-replay  | 
    
| 
          
Azure Pipelines successfully started running 1 pipeline(s). | 
    
| 
           /azp run runtime-coreclr jitstress  | 
    
| 
          
Azure Pipelines successfully started running 1 pipeline(s). | 
    
Fix various comments
Assume that any pre-existing initialization is ok. Check it against zero if necessary. Const inits remain as before. Lots of diffs due to more cloning for cases of `for (i = expression...` where `expression` is not just a constant or local var.
8573095    to
    f2ed920      
    Compare
  
    | 
           /azp run runtime-coreclr jitstress  | 
    
| 
          
Azure Pipelines successfully started running 1 pipeline(s). | 
    
| 
           @AndyAyersMS PTAL Will summarize diffs later. Bottom-line: lots more cases where cloning kicks in. Some beneficial, some not. Non-beneficial cases are due to known existing issues with cloning profitability.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Couple of small suggestions.
        
          
                src/coreclr/jit/loopcloning.h
              
                Outdated
          
        
      | implementation does use `unsigned` to represent them.) | ||
| 8. Constant initializations and constant limits must be non-negative. This is because the | ||
| iterator variable will be used as an array index, and array indices must be non-negative. | ||
| For `i = v` variable initialization, we add a dynamic check that `v >= 0`. Constant initializations | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| For `i = v` variable initialization, we add a dynamic check that `v >= 0`. Constant initializations | |
| For `i = expr` non-constant initialization, we add a dynamic check that `expr >= 0`. Constant initializations | 
        
          
                src/coreclr/jit/optimizer.cpp
              
                Outdated
          
        
      | // RHS can be constant or local var. | ||
| // TODO-CQ: CLONE: Add arr length for descending loops. | ||
| if (rhs->gtOper == GT_CNS_INT && rhs->TypeGet() == TYP_INT) | ||
| if (rhs->gtOper != GT_CNS_INT || rhs->TypeGet() != TYP_INT) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (rhs->gtOper != GT_CNS_INT || rhs->TypeGet() != TYP_INT) | |
| if ((rhs->gtOper != GT_CNS_INT) || (rhs->TypeGet() != TYP_INT)) | 
or maybe
| if (rhs->gtOper != GT_CNS_INT || rhs->TypeGet() != TYP_INT) | |
| if (!rhs->OperIs(GT_CNS_INT) || !rhs->TypeIs(TYP_INT)) | 
| 
           As expected, there are a lot of cases where more cloning occurs. Unfortunately, not all these cases are beneficial, due to all the existing limitations of cloning or the JIT. For example, sometimes range check can get rid of bounds checks that cloning also gets rid of, meaning both "fast" and "slow" path loops end up identical. (Similarly, sometimes we won't clone a loop because we require a very specific loop condition, and but if CSE would have run first, we would then match that condition.) (Also related, cloning inserts various "cloning conditions" to branch to the "slow path", and sometimes subsequent optimization passes can get rid of some of them, like redundant branch elimination.) Note also that sometimes the JIT clones a lot of code to get rid of one bounds check. It's possible that it's beneficial, but the marginal benefit degrades as the amount of code increases. Some example: 
  | 
    
| 
           Asm diffs are a regression because when cloning kicks in, it duplicates code. E.g., win-x64: aspnet.run.windows.x64.checked.mch:Detail diffsbenchmarks.run.windows.x64.checked.mch:Detail diffscoreclr_tests.pmi.windows.x64.checked.mch:Detail diffslibraries.crossgen2.windows.x64.checked.mch:Detail diffslibraries.pmi.windows.x64.checked.mch:Detail diffslibraries_tests.pmi.windows.x64.checked.mch:Detail diffs | 
    
| 
           Ubuntu x64 improvements dotnet/perf-autofiling-issues#4220  | 
    
* Loop cloning improvements Fix various comments * Remove loop cloning var initialization condition Assume that any pre-existing initialization is ok. Check it against zero if necessary. Const inits remain as before. Lots of diffs due to more cloning for cases of `for (i = expression...` where `expression` is not just a constant or local var. * Feedback

Remove loop cloning variable initialization condition:
Assume that any pre-existing initialization is acceptable.
Check condition against zero if necessary. Const inits remain as before.
Lots of diffs due to more cloning for cases of
for (i = expression...where
expressionis not just a constant or local var.Also, fix various comments that were no longer correct (e.g., "first" block
concept is gone)