Skip to content

Conversation

@cstjean
Copy link
Contributor

@cstjean cstjean commented Feb 25, 2019

The broadcast code that copies to the new array was type-unstable, causing allocation and slowdown in v2:

Before:

julia> using BenchmarkTools

julia> v = fill(2.0, 50000);

julia> v2 = vcat(v, missing);

julia> v3 = vcat(missing, v);

julia> @btime $v .* 3.0;
  49.728 μs (2 allocations: 390.70 KiB)

julia> @btime $v2 .* 3.0;
  1.490 ms (99498 allocations: 2.33 MiB)

julia> @btime $v3 .* 3.0;
  103.577 μs (7 allocations: 439.77 KiB)

After:

julia> @btime $v .* 3.0;
  50.147 μs (2 allocations: 390.70 KiB)

julia> @btime $v2 .* 3.0;
  147.158 μs (8 allocations: 830.45 KiB)

julia> @btime $v3 .* 3.0;
  92.611 μs (7 allocations: 439.77 KiB)

This fixes the second half of #30455. First half was fixed by #30480

@mbauman
Copy link
Member

mbauman commented Feb 25, 2019

Nice. Makes sense. I know it's a small cost, but since we're outlining here, perhaps we should just outline everything after that similar call. E.g.,

newdest = Base.similar(dest, promote_typejoin(T, typeof(val)))
return restart_copyto_nonleaf!(newdest, dest, bc, val, I, iter, state, count)
function restart_copyto_nonleaf!(newdest, dest, bc, val, I, iter, state, count)
    # Function barrier that makes the copying to newdest type stable
    for II in Iterators.take(iter, count)
        newdest[II] = dest[II]
    end
    newdest[I] = val
    return copyto_nonleaf!(newdest, bc, iter, state, count+1)
end

That should just have one dynamic dispatch instead of three. In my quick tests of the cases you listed, it looks like it saves another ~25%.

@mbauman mbauman added performance Must go faster broadcast Applying a function over a collection labels Feb 25, 2019
@cstjean cstjean force-pushed the faster-broadcast-widening branch from 412b356 to a86ca21 Compare February 27, 2019 15:30
@cstjean
Copy link
Contributor Author

cstjean commented Feb 27, 2019

Sure, I just took your code and updated the PR. I updated the benchmarks.

@mbauman
Copy link
Member

mbauman commented Feb 27, 2019

The Travis failures look unrelated — 64 bit linux is an assertion failure in the Profile tests and Mac is a Pkg timeout.

@mbauman mbauman merged commit 6b0c5c2 into JuliaLang:master Feb 27, 2019
@cstjean
Copy link
Contributor Author

cstjean commented Feb 27, 2019

Awesome, could it be backported to 1.1?

@KristofferC KristofferC added triage This should be discussed on a triage call backport 1.1 labels Feb 27, 2019
@mbauman
Copy link
Member

mbauman commented Feb 27, 2019

We're trying to be a bit more conservative in backporting performance fixes, but this is about as straightforward as a patch could be — it's literally just copy-pasting code into an outlined function. I'd support a backport.

@Keno Keno removed the triage This should be discussed on a triage call label Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

broadcast Applying a function over a collection performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants