With this benchmark:
function mmult(N, n)
A = rand(n,n)
B = similar(A)
@time for i = 1:N
A_mul_B!(B, A, A)
end
end
I get the following on master:
julia> mmult(10^6, 6)
621.537 milliseconds (10000 k allocations: 153 MB, 0.90% gc time)
With JuliaLang/julia@62e5942 reverted:
julia> mmult(10^6, 6)
208.683 milliseconds
Sort of dup of JuliaLang/julia#11531 except this is in Base. Imho it is not acceptable to double the time for matrix multiplication for smallish matrices.