Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,9 @@ Standard library changes
Further, percent utilization is now reported as a total or per-thread, based on whether the thread is idle or not at
each sample. `Profile.fetch()` by default strips out the new metadata to ensure backwards compatibility with external
profiling data consumers, but can be included with the `include_meta` kwarg. ([#41742])
* The new `Profile.Allocs` module allows memory allocations to be profiled. The stack trace, type, and size of each
allocation is recorded, and a `sample_rate` argument allows a tunable amount of allocations to be skipped,
reducing performance overhead. ([#42768])

#### Random

Expand Down
17 changes: 17 additions & 0 deletions doc/src/manual/profile.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,23 @@ and how much garbage it collects each time. This can be enabled with
[`GC.enable_logging(true)`](@ref), which causes Julia to log to stderr every time
a garbage collection happens.

### Allocation Profiler

The allocation profiler records the stack trace, type, and size of each
allocation while it is running. It can be invoked with
[`Profile.Allocs.@profile`](@ref).

This information about the allocations is returned as an array of `Alloc`
objects, wrapped in an `AllocResults` object. The best way to visualize
these is currently with the [PProf.jl](https://github.com/JuliaPerf/PProf.jl)
library, which can visualize the call stacks which are making the most
allocations.

The allocation profiler does have significant overhead, so a `sample_interval`
argument can be passed to speed it up by making it skip some allocations.
Passing `sample_interval=1.0` will make it record everything (which is slow);
`sample_interval=0.1` will record only 10% of the allocations (faster), etc.

## External Profiling

Currently Julia supports `Intel VTune`, `OProfile` and `perf` as external profiling tools.
Expand Down
17 changes: 17 additions & 0 deletions stdlib/Profile/docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# [Profiling](@id lib-profiling)

## CPU Profiling

```@docs
Profile.@profile
```
Expand All @@ -15,3 +17,18 @@ Profile.retrieve
Profile.callers
Profile.clear_malloc_data
```

## Memory profiling

```@docs
Profile.Allocs.@profile
```

The methods in `Profile.Allocs` are not exported and need to be called e.g. as `Profile.Allocs.fetch()`.

```@docs
Profile.Allocs.start
Profile.Allocs.stop
Profile.Allocs.clear
Profile.Allocs.fetch
```
24 changes: 24 additions & 0 deletions stdlib/Profile/src/Allocs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ end
Profile allocations that happen during `expr`, returning
both the result and and AllocResults struct.

A sample rate of 1 will record everything; 0 will record nothing.

```julia
julia> Profile.Allocs.@profile sample_rate=0.01 peakflops()
1.03733270279065e11
Expand All @@ -59,18 +61,40 @@ function _prof_expr(expr, opts)
end
end

"""
Profile.Allocs.start(sample_rate::Number)

Begin recording allocations with the given sample rate
A sample rate of 1 will record everything; 0 will record nothing.
"""
function start(; sample_rate::Number)
ccall(:jl_start_alloc_profile, Cvoid, (Cdouble,), Float64(sample_rate))
end

"""
Profile.Allocs.stop()

Stop recording allocations.
"""
function stop()
ccall(:jl_stop_alloc_profile, Cvoid, ())
end

"""
Profile.Allocs.stop()

Clear allocation information from memory.
"""
function clear()
ccall(:jl_free_alloc_profile, Cvoid, ())
end

"""
Profile.Allocs.fetch()

Retrieve the recorded allocations, and decode them into Julia
objects which can be analyzed.
"""
function fetch()
raw_results = ccall(:jl_fetch_alloc_profile, RawAllocResults, ())
decoded_results = decode(raw_results)
Expand Down