Skip to content

Conversation

@rfourquet
Copy link
Member

I discovered that rand!(A::Array{Int}) was allocating memory in a debugging session, thinking that the allocations came from a type instability. I think this shouldn't, but this patch is quite not pretty. It would be less so if getindex was defined for pointers (I have no hope that this happens), and unneeded if pointer_to_array was modified to not allocate (no idea if this is possible).

Here are timings I get before/after:

julia> function arr{T}(::Type{T}, n, r=1)
    a = zeros(T, n)
    m = MersenneTwister()
    tic()
    a = @allocated for i in 1:r
        rand!(m, a)
    end
    toq(), a/r
end

julia> [arr(Int, 10^i, 10^(9-i)) for i in 1:8] # master:
8-element Array{(Float64,Any),1}:
 (24.119626243,144.0)
 (5.701144431,144.0)
 (1.941585068,160.0)
 (1.328115407,176.0)
 (1.345973947,176.0)
 (1.344547687,176.0)
 (2.675291505,176.0)
 (2.861680458,176.0)

julia> [arr(Int, 10^i, 10^(9-i)) for i in 1:8] # this PR
8-element Array{(Float64,Any),1}:
(6.939816844,0.0)
 (3.781428907,0.0)
 (1.62118118,0.0)
 (1.174099115,0.0)
 (1.241850295,0.0)
 (1.351900588,0.0)
 (2.648783596,0.0)
 (2.839158974,0.0)

This is to avoid the small allocations of pointer_to_array, which can
add up and slow down repeated calls to rand!(A) on a same array A,
in particular when A is small.
@rfourquet rfourquet changed the title rand!(A): use pointer directly instead of pointer_to_array Non allocating rand!(A) (get rid of pointer_to_array) Nov 27, 2014
@rfourquet rfourquet added the randomness Random number generation and the Random stdlib label Nov 27, 2014
@ViralBShah
Copy link
Member

What makes 10x10^8 so much slower than 10^8x10?

@rfourquet
Copy link
Member Author

I should have given more explanations. On master, 10x10^8 allocates 144*10^8 bytes (i.e. 13.4 GB, and 38.60 gc time), whereas 10^8x10 allocates 176*10 bytes, so the allocation on small arrays is more visible.
Once the allocation is fixed (this PR), besides the fact that "a small loop within a big one is slower than a big one within a small one" in this case, the dSFMT fill_array! function is used only for lengths greater that 382 (so on small arrays, data has to be copied).

@ViralBShah
Copy link
Member

But, why do we have 2 loops? Can't we just iterate over the length of the array?

Can you create a small self-contained example about the memory allocation, which may be easier to work with? Cc: @JeffBezanson

@rfourquet
Copy link
Member Author

By the "outer loop", I was referring to the loop i in 1:r in the function arr I gave as an example. The smallest example about memory allocation is A=[1]; @allocated pointer_to_array(pointer(A),1). This is a small allocation, which occurs within the rand! function.

@JeffBezanson
Copy link
Member

Yes this is really ugly. Surely there is another way to write this. I really do not want us to keep chasing moderate speedups at any cost.

@ViralBShah
Copy link
Member

I am guessing pointer_to_array allocates memory for the array metadata.

@rfourquet
Copy link
Member Author

Personally, I'm not sure I would want this change if it was only for getting the modest speedup. But the allocations, which grow linearly with the number of times rand! is called, can/will distract users trying to find efficiency bugs (which often come from unexpected allocations) in their programs.

@rfourquet
Copy link
Member Author

What happens here is really pointer arithmetic/dereferencing; the master version wraps this into arrays for a nicer syntax, but uses @inbounds, The PR version only looses the getindex/setindex! syntax. Maybe one better way to write this is to create a lightweight version of pointer_to_array returning an immutable holding only a pointer and a length and which has getindex (maybe such a type already exists?).

@ViralBShah
Copy link
Member

Yes, that is essentially the issue here. I don't know if getindex and setindex for pointer has been discussed before. I suspect it is intentionally not provided to discourage such code.

I think this issue will apply for all small array operations, and is not specific to rand. Perhaps the point should be noted in other issues on small array performance.

@ViralBShah
Copy link
Member

I wonder how subarray fares here, with the recent improvements.

@ViralBShah
Copy link
Member

I am going to close this. Need a better solution, and the issue here is not rand specific. Best to have a new PR if we have a different solution that solves this problem.

@ViralBShah ViralBShah closed this Nov 30, 2014
@rfourquet
Copy link
Member Author

With the following type, it is enough to change pointer_to_array to UnsafePtrView in random.jl to get the same benefits as this PR:

immutable UnsafePtrView{T} <: DenseArray{T}
    ptr::Ptr{T}
    len::Int
end

UnsafePtrView{T}(A::Array{T}) = UnsafePtrView(pointer(A), length(A))
length(a::UnsafePtrView) = a.len
getindex(a::UnsafePtrView, i::Int) = unsafe_load(a.ptr, i)
setindex!(a::UnsafePtrView, x, i::Int) = unsafe_store!(a.ptr, x, i)
pointer(a::UnsafePtrView) = a.ptr

Would it be an acceptable solution? (until a better general solution is found)

@rfourquet rfourquet deleted the rf/rand-fillpointer branch December 2, 2014 07:10
@ViralBShah
Copy link
Member

I would be ok to go with such a solution, as it can be easily jettisoned in the future without affecting the rest of the codebase. The gains for small random arrays seem worthwhile. Would love to hear what @JeffBezanson has to say.

@timholy
Copy link
Member

timholy commented Dec 2, 2014

See #3224.

@rfourquet
Copy link
Member Author

As noted by @JeffBezanson, rand(::BigInt) is a good example of client of "small" array filling. With this PR, filling arrays of BigInt (with one or two limbs) can be 3x to 5x times faster than on master.

@ViralBShah
Copy link
Member

So do we yet have a better way of doing this? I'd rather get the ugly fix in and have an open issue to fix it.

rfourquet added a commit that referenced this pull request Dec 12, 2017
rfourquet added a commit that referenced this pull request Dec 12, 2017
rfourquet added a commit that referenced this pull request Dec 13, 2017
rfourquet added a commit that referenced this pull request Dec 13, 2017
fix some FIXMEs in random, and take 2 to #9174
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster randomness Random number generation and the Random stdlib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants