If Base.Broadcast.flatten is called on a Broadcasted object with a DataType f, the resulting Broadcasted object is no longer type-stable, triggering allocations.
This can be triggered when using || or && in expressions, a simple reproducible example of this:
julia> x = rand(5); y = rand(5);
julia> f!(y, x) = @. y = ifelse(false || x < Float32(1), x, y)
f! (generic function with 1 method)
julia> f!(y, x);
julia> @allocated f!(y, x)
800
I suspect that this will fail altogether on the GPU.