-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
I had hoped that #85805 would have fixed all the cases where we have dynamic PGO data available but can't match up the schema with the current flow graph and so throw away perfectly good PGO data.
But that's not the case. The initial flow graph for a method may be slightly different, depending on whether or not the method is an inlinee. One culprit is this bit of code (there may be more I haven't spotted yet):
runtime/src/coreclr/jit/fgbasic.cpp
Lines 2975 to 2980 in e61e022
| jmpDist = (sz == 1) ? getI1LittleEndian(codeAddr) : getI4LittleEndian(codeAddr); | |
| if (compIsForInlining() && jmpDist == 0 && (opcode == CEE_BR || opcode == CEE_BR_S)) | |
| { | |
| continue; /* NOP */ | |
| } |
So for inlinees only, we will not break blocks because of jump to next. This causes divergence like the following:
;; compiling as root
-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight lp [IL range] [jump] [EH region] [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 [000..007)-> BB03 ( cond )
BB02 [0001] 1 BB01 1 [007..016)-> BB04 (always)
BB03 [0002] 1 BB01 1 [016..017)
BB04 [0003] 2 BB02,BB03 1 [017..039)-> BB05 (always)
BB05 [0004] 1 BB04 1 [039..06A) (return)
-----------------------------------------------------------------------------------------------------------------------------------------
Using edge profiling
EfficientEdgeCountInstrumentor: preparing for instrumentation
New BlockSet epoch 1, # of blocks (including unused BB00): 6, bitset array size: 1 (short)
[0] New probe for BB05 -> BB01 [source]
[1] New probe for BB03 -> BB04 [source]
5 blocks, 2 probes (0 on critical edges)
;; as inlinee
-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight lp [IL range] [jump] [EH region] [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0102] 1 100 [000..007)-> BB03 ( cond )
BB02 [0103] 1 BB01 100 [007..016)-> BB04 (always)
BB03 [0104] 1 BB01 100 [016..017)
BB04 [0105] 2 BB02,BB03 100 [017..06A) (return)
-----------------------------------------------------------------------------------------------------------------------------------------
*************** Inline @[000773] Starting PHASE Profile incorporation
Have Dynamic PGO: 2 schema records (schema at 000001F14A67CC78, data at 000001F14A63F8E8)
Profile summary: 1 runs, 0 block probes, 2 edge probes, 0 class profiles, 0 method profiles, 0 other records
Reconstructing block counts from sparse edge instrumentation
... adding known edge BB03 -> BB04: weight 65208
Could not find source block for schema entry 1 (IL offset/key 00000039)
... not solving because of the mismatch
... discarding profile count data: PGO data available, but IL did not match
Computing inlinee profile scale:
We have been relying on the fact that if we build our instrumentation plan super early we always get the same flow graph.
Seems like the pragmatic thing to do is to always do this optimization if we're optimizing or instrumenting, and not just for inlinees. There is matching logic in the importer and perhaps elsewhere so this should probably be encapsulated into a helper.
Doing that will (temporarily) break the ability to ingest static PGO for (hopefully just a few) root methods that have branch to next like this. Hopefully not too many of them. And SPMI will have a number of diffs as well, both cases where we no longer read PGO data for root methods and cases where we do read PGO data for inlinees).