- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.7k
          inference: remove throw block deoptimization completely
          #49260
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
throw block deoptimization completelythrow block deoptimization completely
      | @nanosoldier  | 
ffba552    to
    e7dee0e      
    Compare
  
    | Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. | 
| I would like to remove it, but this seems surprisingly strongly negative on our benchmarks. Can we do something to reduce that, or is that just some sort of artifact of being a small test case? | 
| Yeah, there seems to be profitability for small cases. I will add more examples to BaseBenchmarks.jl (by using existing packages) and see what it will say. | 
bc32263    to
    ee45c04      
    Compare
  
    | I think we should merge this. Now that tab complete relies on proving termination and effectfree-ness, this PR also makes tab complete better. | 
e7dee0e    to
    a24372e      
    Compare
  
    ee45c04    to
    81287f2      
    Compare
  
    a24372e    to
    07165d7      
    Compare
  
    | @nanosoldier  | 
| Your job failed. | 
07165d7    to
    3ae3529      
    Compare
  
    | @nanosoldier  | 
| Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. | 
c8a5046    to
    d77836e      
    Compare
  
    3ae3529    to
    7328bb3      
    Compare
  
    7328bb3    to
    b08925c      
    Compare
  
    | @nanosoldier  | 
b08925c    to
    808bae1      
    Compare
  
    | Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. | 
808bae1    to
    906bf12      
    Compare
  
    | Although benchmarks suggest that this is indeed an effective optimization, this optimization has been quite incomplete from the beginning and more importantly started to cause numerous issues in other projects. So I will prioritize the development efficiency of those projects and remove this optimization nevertheless. We could try to come up with an alternative solution to improve latency for those code paths, but it doesn't seem to be a good idea to leave this situation as is. | 
After experimenting with #49235, I started to question if we are getting any actual benefit from the `throw` block deoptimization anymore. This commit removes the deoptimization from the system entirely. Based on the numbers below, it appears that the deoptimization is not very profitable in our current Julia-level compilation pipeline, with the effects analysis playing a significant role in reducing latency. Here are the updated benchmark: | Metric | master | #49235 | this commit | |-------------------------|-----------|-------------|--------------------------------------------| | Base (seconds) | 15.579300 | 15.206645 | 15.42059 | | Stdlibs (seconds) | 17.919013 | 17.667094 | 17.404586 | | Total (seconds) | 33.499279 | 32.874737 | 32.826162 | | Precompilation (seconds) | 53.488528 | 53.152028 | 53.152028 | | First time `plot(rand(10,3))` [^1] | `3.432983 seconds (16.55 M allocations)` | `3.477767 seconds (16.45 M allocations)` | `3.539117 seconds (16.43 M allocations)` | | First time `solve(prob, QNDF())(5.0)` [^2] | `4.628278 seconds (15.74 M allocations)` | `4.609222 seconds (15.32 M allocations)` | `4.547323 seconds (15.19 M allocations: 823.510 MiB)` | [^1]: With disabling precompilation of Plots.jl. [^2]: With disabling precompilation of OrdinaryDiffEq.
2853f37    to
    af93df0      
    Compare
  
    | CI is green after #55356 - Good to merge? | 
| @nanosoldier  | 
| Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. | 
| Looks like a few inference regressions, but nothing too surprising. IMO this is good to merge. | 
Due to JuliaLang/julia#49260, the constructor is now being inlined, so we need to check for `Expr(:new)` and not just `:call`.
…49260) Co-authored-by: Cody Tapscott <[email protected]> Co-authored-by: Oscar Smith <[email protected]>
| This causes a weird inference regression, see #55583. | 
After investigating the cause of the invalidation reported in #55583, it was found that the issue arises only when `r` is propagated as an extended lattice element, such as `PartialStruct(UnitRange{Int}, ...)`, for the method of `getindex(::String, r::UnitRange{Int})`. Specifically, the path at https://github.com/JuliaLang/julia/blob/cebfd7bc66153b82c56715cb1cb52dac7df8eac8/base/compiler/typeinfer.jl#L809-L815 is hit, so the direct cause was the recursion limit for constant inference. To explain in more detail, within the slow path of `nextind` which is called inside `getindex(::String, ::UnitRange{Int})`, the 1-argument `@assert` is used https://github.com/JuliaLang/julia/blob/cebfd7bc66153b82c56715cb1cb52dac7df8eac8/base/strings/string.jl#L211. The code related to `print` associated with this `@assert` further uses `getindex(::String, ::UnitRange{Int})`, causing the recursion limit. This recursion limit is only for constant inference, which is why we saw this regression only for the `PartialStruct` case. Moreover, since this recursion limit occurs within the `@assert`-related code, this issue did not arise until now (i.e. until #49260 was merged). As a solution, I considered improving the recursion limit itself, but decided that keeping the current code for the recursion limit of constant inference is safer. Ideally, this should be addressed on the compiler side, but there is certainly deep recursion in this case. As an easier solution, this commit resolves the issue by changing the 1-arg `@assert` to the 2-arg version. - replaces #55583
After investigating the cause of the invalidation reported in #55583, it was found that the issue arises only when `r` is propagated as an extended lattice element, such as `PartialStruct(UnitRange{Int}, ...)`, for the method of `getindex(::String, r::UnitRange{Int})`. Specifically, the path at https://github.com/JuliaLang/julia/blob/cebfd7bc66153b82c56715cb1cb52dac7df8eac8/base/compiler/typeinfer.jl#L809-L815 is hit, so the direct cause was the recursion limit for constant inference. To explain in more detail, within the slow path of `nextind` which is called inside `getindex(::String, ::UnitRange{Int})`, the 1-argument `@assert` is used https://github.com/JuliaLang/julia/blob/cebfd7bc66153b82c56715cb1cb52dac7df8eac8/base/strings/string.jl#L211. The code related to `print` associated with this `@assert` further uses `getindex(::String, ::UnitRange{Int})`, causing the recursion limit. This recursion limit is only for constant inference, which is why we saw this regression only for the `PartialStruct` case. Moreover, since this recursion limit occurs within the `@assert`-related code, this issue did not arise until now (i.e. until #49260 was merged). As a solution, I considered improving the recursion limit itself, but decided that keeping the current code for the recursion limit of constant inference is safer. Ideally, this should be addressed on the compiler side, but there is certainly deep recursion in this case. As an easier solution, this commit resolves the issue by changing the 1-arg `@assert` to the 2-arg version. - replaces #55583
* Generalize symbol type for debug scopes * More scope adjustment * Adjust to JuliaLang/julia#49260 * More Core.Compiler adjustments
After experimenting with #49235, I started to question if we are getting any actual benefit from the
throwblock deoptimization anymore.This commit removes the deoptimization from the system entirely.
Based on the numbers below, it appears that the deoptimization is not very profitable in our current Julia-level compilation pipeline, with the effects analysis playing a significant role in reducing latency.
Here are the updated benchmark:
plot(rand(10,3))13.432983 seconds (16.55 M allocations)3.477767 seconds (16.45 M allocations)3.539117 seconds (16.43 M allocations)solve(prob, QNDF())(5.0)24.628278 seconds (15.74 M allocations)4.609222 seconds (15.32 M allocations)4.547323 seconds (15.19 M allocations: 823.510 MiB)Footnotes
With disabling precompilation of Plots.jl. ↩
With disabling precompilation of OrdinaryDiffEq. ↩