- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 5.7k
 
          Add @inline for Diagonal's 2-arg l/rdiv! to enable auto vectorization
          #43171
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
replace broadcast with for loop.
| 
           I'm not sure it makes a difference, but I haven't seen   | 
    
| 
           See 1.8 release note:  | 
    
          
 Yes, that's why I'm used to seeing the pattern @inline ldiv!(D::Diagonal, B::AbstractVecOrMat) = ldiv!(B, D, B)In the one-line case, this may not make a difference. I think what you're referring to makes a difference in multi-line code: function foo(...)
    # some code involving possibly non-inlined function calls
    @inline call_a_function(...) # inline that specific function
    # some other code, possibly non-inlined function calls
    return result
end | 
    
| 
           I'm a little confused by the example in #41312, but macroexpand shows that: julia> @macroexpand ldiv!(D::Diagonal, B::AbstractVecOrMat) = @inline ldiv!(B, D, B)
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
          #= REPL[2]:1 =#
          begin
              $(Expr(:inline, true))
              local var"#37#val" = ldiv!(B, D, B)
              $(Expr(:inline, false))
              var"#37#val"
          end
      end)
julia> @macroexpand @inline ldiv!(D::Diagonal, B::AbstractVecOrMat) = ldiv!(B, D, B)
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
          $(Expr(:meta, :inline))
          #= REPL[3]:1 =#
          ldiv!(B, D, B)
      end)
julia> @macroexpand ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
                 @inline
                 ldiv!(B, D, B)
           end
:(ldiv!(D::Diagonal, B::AbstractVecOrMat) = begin
          #= REPL[9]:1 =#
          #= REPL[9]:2 =#
          $(Expr(:meta, :inline))
          #= REPL[9]:3 =#
          ldiv!(B, D, B)
      end)So   | 
    
| 
           Interestingly, on current nightly I don't see the regression for the out-of-place operation: julia> using LinearAlgebra, BenchmarkTools
julia> A = randn(128, 128); D = Diagonal(ones(128));
julia> @btime rdiv!($A, $D);
  18.315 μs (0 allocations: 0 bytes)
julia> @btime ldiv!($D, $A);
  18.437 μs (0 allocations: 0 bytes)
julia> @btime $D \ $A;
  9.939 μs (2 allocations: 128.05 KiB)
julia> @btime $A / $D;
  9.614 μs (2 allocations: 128.05 KiB)I'm on MacOS.  | 
    
| 
           Yes, the regression is on   | 
    
On master,
Diagnal's 2-argl/rdiv!call their 3-arg version for code reuse.But as shown in #43153, non-inlined code blocks LLVM's auto vectorization, so the 2-arg
l/rdiv!should be slower than before.This PR first add
@inlineat the call side to fix the regression. But it turns out thatbroadcastbasedldiv!still blocks vectorization. So I have to replace it with a for loop version.Some benchmark: