-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
transition @timeit in Core.Compiler to use tracy instead
#49675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I would like to see this be extensible to ITTAPI/VTune integration as well as Tracy, do you have thoughts on how that would work here? ITTAPI requires the ability to store a field/pointer as timer state. |
Could you elaborate on what that means more concretely? And yeah, I can rewrite this to make it more extensible. Just wanted to get something usable first. |
bf1fa0b to
2e68f3c
Compare
|
I changed it so it should be straight forward (I think) to hook up ITTAPI. Please have a look @pchintalapudi |
|
I feel like it would be better to have the zone macro ccall a pair of functions in timing.c with the relevant data that Tracy uses, and then hook in the relevant profiling calls in that pair of functions. I think it would be possible to completely reuse most of the already available jl_timing_block_t code for the start and end, which gets us equal integration with C events for free, and keeps the jl_timing_events/jl_timing_owners as the definitive list of events that we track. We'll probably need a clean way of translating the event from Julia to C, but that could be as simple as binary searching on the event name. |
|
I don't see many advantages to that. The majority of the new code is "scaffolding" (rooting, types etc) which is needed even if the call would be into the existing C-timing infrastructure. The actual ccall stuff is quite minimal. And having to go into C and add categories vs just slapping a |
|
@KristofferC I have an example one here (c627096 , branched off #49679 ), with WITH_TIMING_COUNTS output below. Not too sure on the tracy details, but since it mostly reuses the jl_timing_block_t implementation I think it should just work. |
|
Looking at that I can agree that there are some advantages to sharing the entry points etc with the C code. The Also, we should wait for @topolarity to have a look at this code since he is the one with the deep knowledge about the Tracy stuff. |
|
@pchintalapudi, I think there is a small problem in your suggestion in that the lookup of zone happens at compile time (macro expansion time) of the function and will persist into new processes where this is no longer valid. In a7e1ec0 I moved it to happen at runtime, the first time that zone is run. It seems to work with that. |
|
Based on the API in c627096 me and @topolarity looked into instrumenting the actual LLVM passes and came up with 794673b. With that you can get a list like: and the zones as We discussed that it is probably better to have the zones themselves be named after the pass which would require getting rid of the hardcoded list of zone names. |
|
That's great! I was also thinking about something like that. I would suggest stripping the PassManager passes from the events, since they're just organizational tools and it's denoted in the pass pipeline anyways. I think it would also be useful to use the analysis callbacks as well, to see when particular analysis computations are bad. |
How would I identify one of those? Just from the name of them? |
|
Okay, updated with showing the pass name (only in Tracy for now) and also remove the PassManager + PassAdaptor passes. 84f5389 |
|
WRT compiler, the other thing I want to do is to add pre/post optimization counters for lines of IR (both Julia and LLVM). I think if combined with the timing counts printout we could track how changes to optimization affect amount of code generated. But that requires #49535 to be merged. |
2e68f3c to
6b180ed
Compare
|
Updated |
Compiler/src/profiling/tracy.jl
Outdated
|
|
||
| const srclocs = Vector{TracySrcLoc}() | ||
|
|
||
| function _tracy_zone_create(name::String, ex::Expr, linfo::LineNumberNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be thread-safe so that we can lower @zone across multiple files in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you suggest we do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a lock around it should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are locks available in Compiler during bootstrapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like it :(. Maybe put a isdefined base around it. Kind of ugly but building the compiler is ST only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the response to this review comment got more or less ignored, I think it is ok to merge it without the locking since:
- The Tracy.jl package does the same
- This is only used in one "package" (Compiler)
- Locks are a bit messy to add at this point
- I don't think lowering runs concurrently at this point?
I don't want this to get stalled again, preventing us from adding more zones and investigating the load time regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be concurrent IIRC. But i think we can just add a profiling lock in C and take it.
|
Maybe we make those optional? With an envar or a global? |
|
We already have the |
0a66e87 to
02a228e
Compare
|
|
Compiler/src/profiling/tracy.jl
Outdated
| if loc.zone_name === Ptr{UInt8}(0) | ||
| reinit!(loc) | ||
| end | ||
| # `loc` is rooted in the global `srclocs` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a substantial data race
32faf1b to
608cf44
Compare
This was mostly correct before, but Julia uses `:monotonic` not `:relaxed`. This does not resolve the data race in the `push!(...)` to the `srclocs` Vector, but it's a step in the right direction.
608cf44 to
575b0b7
Compare
Co-authored-by: Kristoffer Carlsson <[email protected]> Co-authored-by: gbaraldi <[email protected]> Co-authored-by: Cody Tapscott <[email protected]> (cherry picked from commit 75c44d0)
Co-authored-by: Kristoffer Carlsson <[email protected]> Co-authored-by: gbaraldi <[email protected]> Co-authored-by: Cody Tapscott <[email protected]> (cherry picked from commit 75c44d0)
Co-authored-by: Kristoffer Carlsson <[email protected]> Co-authored-by: gbaraldi <[email protected]> Co-authored-by: Cody Tapscott <[email protected]> (cherry picked from commit 75c44d0)
Co-authored-by: Kristoffer Carlsson <[email protected]> Co-authored-by: gbaraldi <[email protected]> Co-authored-by: Cody Tapscott <[email protected]> (cherry picked from commit 75c44d0)
Co-authored-by: Kristoffer Carlsson <[email protected]> Co-authored-by: gbaraldi <[email protected]> Co-authored-by: Cody Tapscott <[email protected]> (cherry picked from commit 75c44d0)
|
Would it be possible to document this in the devdocs? Sounds like a useful feature for developers, but it's completely obscure without diving into the source code. |
…ng#49675) Co-authored-by: Kristoffer Carlsson <[email protected]> Co-authored-by: gbaraldi <[email protected]> Co-authored-by: Cody Tapscott <[email protected]>
…ng#49675) Co-authored-by: Kristoffer Carlsson <[email protected]> Co-authored-by: gbaraldi <[email protected]> Co-authored-by: Cody Tapscott <[email protected]>






This allows us to see the already instrumented compiler passes in tracy, cf: