Skip to content

JIT: more unexpected dynamic PGO schema mismatches #85856

@AndyAyersMS

Description

@AndyAyersMS

I had hoped that #85805 would have fixed all the cases where we have dynamic PGO data available but can't match up the schema with the current flow graph and so throw away perfectly good PGO data.

But that's not the case. The initial flow graph for a method may be slightly different, depending on whether or not the method is an inlinee. One culprit is this bit of code (there may be more I haven't spotted yet):

jmpDist = (sz == 1) ? getI1LittleEndian(codeAddr) : getI4LittleEndian(codeAddr);
if (compIsForInlining() && jmpDist == 0 && (opcode == CEE_BR || opcode == CEE_BR_S))
{
continue; /* NOP */
}

So for inlinees only, we will not break blocks because of jump to next. This causes divergence like the following:

;; compiling as root

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..007)-> BB03 ( cond )                     
BB02 [0001]  1       BB01                  1       [007..016)-> BB04 (always)                     
BB03 [0002]  1       BB01                  1       [016..017)                                     
BB04 [0003]  2       BB02,BB03             1       [017..039)-> BB05 (always)                     
BB05 [0004]  1       BB04                  1       [039..06A)        (return)                     
-----------------------------------------------------------------------------------------------------------------------------------------

Using edge profiling

EfficientEdgeCountInstrumentor: preparing for instrumentation

New BlockSet epoch 1, # of blocks (including unused BB00): 6, bitset array size: 1 (short)
[0] New probe for BB05 -> BB01 [source]
[1] New probe for BB03 -> BB04 [source]
5 blocks, 2 probes (0 on critical edges)

;; as inlinee

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0102]  1                           100       [000..007)-> BB03 ( cond )                     
BB02 [0103]  1       BB01                100       [007..016)-> BB04 (always)                     
BB03 [0104]  1       BB01                100       [016..017)                                     
BB04 [0105]  2       BB02,BB03           100       [017..06A)        (return)                     
-----------------------------------------------------------------------------------------------------------------------------------------

*************** Inline @[000773] Starting PHASE Profile incorporation
Have Dynamic PGO: 2 schema records (schema at 000001F14A67CC78, data at 000001F14A63F8E8)
Profile summary: 1 runs, 0 block probes, 2 edge probes, 0 class profiles, 0 method profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB03 -> BB04: weight 65208
Could not find source block for schema entry 1 (IL offset/key 00000039)
... not solving because of the mismatch
... discarding profile count data: PGO data available, but IL did not match
Computing inlinee profile scale:

We have been relying on the fact that if we build our instrumentation plan super early we always get the same flow graph.

Seems like the pragmatic thing to do is to always do this optimization if we're optimizing or instrumenting, and not just for inlinees. There is matching logic in the importer and perhaps elsewhere so this should probably be encapsulated into a helper.

Doing that will (temporarily) break the ability to ingest static PGO for (hopefully just a few) root methods that have branch to next like this. Hopefully not too many of them. And SPMI will have a number of diffs as well, both cases where we no longer read PGO data for root methods and cases where we do read PGO data for inlinees).

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions