diff --git a/NEWS.md b/NEWS.md index 7ccab87d92019..9e0f2dd1f18ec 100644 --- a/NEWS.md +++ b/NEWS.md @@ -135,6 +135,9 @@ Standard library changes Further, percent utilization is now reported as a total or per-thread, based on whether the thread is idle or not at each sample. `Profile.fetch()` by default strips out the new metadata to ensure backwards compatibility with external profiling data consumers, but can be included with the `include_meta` kwarg. ([#41742]) +* The new `Profile.Allocs` module allows memory allocations to be profiled. The stack trace, type, and size of each + allocation is recorded, and a `sample_rate` argument allows a tunable amount of allocations to be skipped, + reducing performance overhead. ([#42768]) #### Random diff --git a/doc/src/manual/profile.md b/doc/src/manual/profile.md index 5596ebae512aa..64b6b8b0e1209 100644 --- a/doc/src/manual/profile.md +++ b/doc/src/manual/profile.md @@ -336,6 +336,23 @@ and how much garbage it collects each time. This can be enabled with [`GC.enable_logging(true)`](@ref), which causes Julia to log to stderr every time a garbage collection happens. +### Allocation Profiler + +The allocation profiler records the stack trace, type, and size of each +allocation while it is running. It can be invoked with +[`Profile.Allocs.@profile`](@ref). + +This information about the allocations is returned as an array of `Alloc` +objects, wrapped in an `AllocResults` object. The best way to visualize +these is currently with the [PProf.jl](https://github.com/JuliaPerf/PProf.jl) +library, which can visualize the call stacks which are making the most +allocations. + +The allocation profiler does have significant overhead, so a `sample_rate` +argument can be passed to speed it up by making it skip some allocations. +Passing `sample_rate=1.0` will make it record everything (which is slow); +`sample_rate=0.1` will record only 10% of the allocations (faster), etc. + ## External Profiling Currently Julia supports `Intel VTune`, `OProfile` and `perf` as external profiling tools. diff --git a/stdlib/Profile/docs/src/index.md b/stdlib/Profile/docs/src/index.md index ac60bb92cb5ed..89894723b1116 100644 --- a/stdlib/Profile/docs/src/index.md +++ b/stdlib/Profile/docs/src/index.md @@ -1,5 +1,7 @@ # [Profiling](@id lib-profiling) +## CPU Profiling + ```@docs Profile.@profile ``` @@ -15,3 +17,18 @@ Profile.retrieve Profile.callers Profile.clear_malloc_data ``` + +## Memory profiling + +```@docs +Profile.Allocs.@profile +``` + +The methods in `Profile.Allocs` are not exported and need to be called e.g. as `Profile.Allocs.fetch()`. + +```@docs +Profile.Allocs.clear +Profile.Allocs.fetch +Profile.Allocs.start +Profile.Allocs.stop +``` diff --git a/stdlib/Profile/src/Allocs.jl b/stdlib/Profile/src/Allocs.jl index 8336c27d0ac34..b8e3e7e7b09a0 100644 --- a/stdlib/Profile/src/Allocs.jl +++ b/stdlib/Profile/src/Allocs.jl @@ -33,6 +33,8 @@ end Profile allocations that happen during `expr`, returning both the result and and AllocResults struct. +A sample rate of 1.0 will record everything; 0.0 will record nothing. + ```julia julia> Profile.Allocs.@profile sample_rate=0.01 peakflops() 1.03733270279065e11 @@ -59,18 +61,40 @@ function _prof_expr(expr, opts) end end -function start(; sample_rate::Number) +""" + Profile.Allocs.start(sample_rate::Real) + +Begin recording allocations with the given sample rate +A sample rate of 1.0 will record everything; 0.0 will record nothing. +""" +function start(; sample_rate::Real) ccall(:jl_start_alloc_profile, Cvoid, (Cdouble,), Float64(sample_rate)) end +""" + Profile.Allocs.stop() + +Stop recording allocations. +""" function stop() ccall(:jl_stop_alloc_profile, Cvoid, ()) end +""" + Profile.Allocs.clear() + +Clear all previously profiled allocation information from memory. +""" function clear() ccall(:jl_free_alloc_profile, Cvoid, ()) end +""" + Profile.Allocs.fetch() + +Retrieve the recorded allocations, and decode them into Julia +objects which can be analyzed. +""" function fetch() raw_results = ccall(:jl_fetch_alloc_profile, RawAllocResults, ()) decoded_results = decode(raw_results)