Skip to content

Conversation

@KristofferC
Copy link
Member

@KristofferC KristofferC commented May 7, 2023

This allows us to see the already instrumented compiler passes in tracy, cf:

image

@pchintalapudi
Copy link
Member

I would like to see this be extensible to ITTAPI/VTune integration as well as Tracy, do you have thoughts on how that would work here? ITTAPI requires the ability to store a field/pointer as timer state.

@KristofferC
Copy link
Member Author

KristofferC commented May 7, 2023

ITTAPI requires the ability to store a field/pointer as timer state.

Could you elaborate on what that means more concretely?

And yeah, I can rewrite this to make it more extensible. Just wanted to get something usable first.

@KristofferC KristofferC force-pushed the kc/core_compiler_tracy branch 2 times, most recently from bf1fa0b to 2e68f3c Compare May 7, 2023 18:26
@KristofferC
Copy link
Member Author

I changed it so it should be straight forward (I think) to hook up ITTAPI. Please have a look @pchintalapudi

@pchintalapudi
Copy link
Member

I feel like it would be better to have the zone macro ccall a pair of functions in timing.c with the relevant data that Tracy uses, and then hook in the relevant profiling calls in that pair of functions. I think it would be possible to completely reuse most of the already available jl_timing_block_t code for the start and end, which gets us equal integration with C events for free, and keeps the jl_timing_events/jl_timing_owners as the definitive list of events that we track. We'll probably need a clean way of translating the event from Julia to C, but that could be as simple as binary searching on the event name.

@KristofferC
Copy link
Member Author

I don't see many advantages to that. The majority of the new code is "scaffolding" (rooting, types etc) which is needed even if the call would be into the existing C-timing infrastructure. The actual ccall stuff is quite minimal. And having to go into C and add categories vs just slapping a @zone "name" begin in the Julia code makes it more awkward to use. But if you have some example implementation of it, I'd be happy to look at it.

@pchintalapudi
Copy link
Member

@KristofferC I have an example one here (c627096 , branched off #49679 ), with WITH_TIMING_COUNTS output below. Not too sure on the tracy details, but since it mostly reuses the jl_timing_block_t implementation I think it should just work.

image

@KristofferC
Copy link
Member Author

KristofferC commented May 9, 2023

Looking at that I can agree that there are some advantages to sharing the entry points etc with the C code.

The should_profile code could look similar to what is in this PR with the build_h.jl etc? Or we expose some way to ask the runtime if tracy / ittapi is enabled instead.

Also, we should wait for @topolarity to have a look at this code since he is the one with the deep knowledge about the Tracy stuff.

@KristofferC
Copy link
Member Author

@pchintalapudi, I think there is a small problem in your suggestion in that the lookup of zone happens at compile time (macro expansion time) of the function and will persist into new processes where this is no longer valid. In a7e1ec0 I moved it to happen at runtime, the first time that zone is run. It seems to work with that.

gnome-shell-screenshot-6y4s8g

@KristofferC
Copy link
Member Author

Based on the API in c627096 me and @topolarity looked into instrumenting the actual LLVM passes and came up with 794673b. With that you can get a list like:

gnome-shell-screenshot-97c87l

and the zones as

gnome-shell-screenshot-b7wxmz

We discussed that it is probably better to have the zones themselves be named after the pass which would require getting rid of the hardcoded list of zone names.

@pchintalapudi
Copy link
Member

That's great! I was also thinking about something like that. I would suggest stripping the PassManager passes from the events, since they're just organizational tools and it's denoted in the pass pipeline anyways. I think it would also be useful to use the analysis callbacks as well, to see when particular analysis computations are bad.

@KristofferC
Copy link
Member Author

I would suggest stripping the PassManager passes from the events

How would I identify one of those? Just from the name of them?

@KristofferC
Copy link
Member Author

KristofferC commented May 12, 2023

Okay, updated with showing the pass name (only in Tracy for now) and also remove the PassManager + PassAdaptor passes. 84f5389

gnome-shell-screenshot-clf178

@pchintalapudi
Copy link
Member

WRT compiler, the other thing I want to do is to add pre/post optimization counters for lines of IR (both Julia and LLVM). I think if combined with the timing counts printout we could track how changes to optimization affect amount of code generated. But that requires #49535 to be merged.

@KristofferC KristofferC force-pushed the kc/core_compiler_tracy branch from 2e68f3c to 6b180ed Compare April 1, 2025 18:47
@KristofferC
Copy link
Member Author

Updated


const srclocs = Vector{TracySrcLoc}()

function _tracy_zone_create(name::String, ex::Expr, linfo::LineNumberNode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be thread-safe so that we can lower @zone across multiple files in parallel

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you suggest we do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lock around it should be fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are locks available in Compiler during bootstrapping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like it :(. Maybe put a isdefined base around it. Kind of ugly but building the compiler is ST only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash any suggestions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the response to this review comment got more or less ignored, I think it is ok to merge it without the locking since:

  • The Tracy.jl package does the same
  • This is only used in one "package" (Compiler)
  • Locks are a bit messy to add at this point
  • I don't think lowering runs concurrently at this point?

I don't want this to get stalled again, preventing us from adding more zones and investigating the load time regressions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be concurrent IIRC. But i think we can just add a profiling lock in C and take it.

@KristofferC
Copy link
Member Author

There are quite a large number of traces from the Julia compiler which are very very quick which makes the traces a bit big. Not sure if something should be done about that.

image

@gbaraldi
Copy link
Member

gbaraldi commented Apr 2, 2025

Maybe we make those optional? With an envar or a global?

@KristofferC
Copy link
Member Author

We already have the DISABLE_FREQUENT_EVENTS. We could tap into that.

@KristofferC KristofferC force-pushed the kc/core_compiler_tracy branch from 0a66e87 to 02a228e Compare April 7, 2025 15:02
@KristofferC
Copy link
Member Author

UndefVarError(var=:var"@atomic" @gbaraldi

Comment on lines 51 to 54
if loc.zone_name === Ptr{UInt8}(0)
reinit!(loc)
end
# `loc` is rooted in the global `srclocs`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a substantial data race

@topolarity topolarity force-pushed the kc/core_compiler_tracy branch from 32faf1b to 608cf44 Compare April 16, 2025 20:32
This was mostly correct before, but Julia uses `:monotonic` not
`:relaxed`.

This does not resolve the data race in the `push!(...)` to the
`srclocs` Vector, but it's a step in the right direction.
@topolarity topolarity force-pushed the kc/core_compiler_tracy branch from 608cf44 to 575b0b7 Compare April 16, 2025 20:34
@KristofferC KristofferC merged commit 75c44d0 into master Apr 17, 2025
7 checks passed
@KristofferC KristofferC deleted the kc/core_compiler_tracy branch April 17, 2025 06:31
KristofferC added a commit that referenced this pull request Apr 26, 2025
Co-authored-by: Kristoffer Carlsson <[email protected]>
Co-authored-by: gbaraldi <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>
(cherry picked from commit 75c44d0)
KristofferC added a commit that referenced this pull request Apr 26, 2025
Co-authored-by: Kristoffer Carlsson <[email protected]>
Co-authored-by: gbaraldi <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>
(cherry picked from commit 75c44d0)
KristofferC added a commit that referenced this pull request Apr 26, 2025
Co-authored-by: Kristoffer Carlsson <[email protected]>
Co-authored-by: gbaraldi <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>
(cherry picked from commit 75c44d0)
KristofferC added a commit that referenced this pull request Apr 26, 2025
Co-authored-by: Kristoffer Carlsson <[email protected]>
Co-authored-by: gbaraldi <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>
(cherry picked from commit 75c44d0)
KristofferC added a commit that referenced this pull request Apr 26, 2025
Co-authored-by: Kristoffer Carlsson <[email protected]>
Co-authored-by: gbaraldi <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>
(cherry picked from commit 75c44d0)
@giordano
Copy link
Member

Would it be possible to document this in the devdocs? Sounds like a useful feature for developers, but it's completely obscure without diving into the source code.

serenity4 pushed a commit to serenity4/julia that referenced this pull request May 1, 2025
…ng#49675)

Co-authored-by: Kristoffer Carlsson <[email protected]>
Co-authored-by: gbaraldi <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>
LebedevRI pushed a commit to LebedevRI/julia that referenced this pull request May 2, 2025
…ng#49675)

Co-authored-by: Kristoffer Carlsson <[email protected]>
Co-authored-by: gbaraldi <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants