Skip to content

Conversation

@jakobbotsch
Copy link
Member

Add a phase that peels loops by duplicating their loop body once. No
heuristics are yet included, so this is only going to be enabled under
stress for now.

Based on #97506

Factor the loop duplication code out of loop cloning and loop unrolling
in anticipation of also using it in loop peeling.
Add a phase that peels loops by duplicating their loop body once. No
heuristics are yet included, so this is only going to be enabled under
stress for now.
@ghost ghost assigned jakobbotsch Jan 25, 2024
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 25, 2024
@ghost
Copy link

ghost commented Jan 25, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Add a phase that peels loops by duplicating their loop body once. No
heuristics are yet included, so this is only going to be enabled under
stress for now.

Based on #97506

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@ryujit-bot
Copy link

Diff results for #97517

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,496,508 contexts (1,011,240 MinOpts, 1,485,268 FullOpts).

MISSED contexts: base: 6,580 (0.26%), diff: 8,842 (0.35%)

Overall (+83,870,420 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 15,871,176 +1,853,104
benchmarks.run_pgo.linux.arm64.checked.mch 78,212,264 +9,478,376
benchmarks.run_tiered.linux.arm64.checked.mch 29,580,048 +1,053,584
coreclr_tests.run.linux.arm64.checked.mch 508,462,256 +8,571,516
libraries.crossgen2.linux.arm64.checked.mch 55,593,716 +4,881,684
libraries.pmi.linux.arm64.checked.mch 74,364,484 +7,112,432
libraries_tests.run.linux.arm64.Release.mch 382,265,760 +38,543,372
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 162,857,060 +10,214,900
realworld.run.linux.arm64.checked.mch 15,282,328 +1,884,512
smoke_tests.nativeaot.linux.arm64.checked.mch 2,924,168 +276,940
FullOpts (+83,870,420 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 15,524,920 +1,853,104
benchmarks.run_pgo.linux.arm64.checked.mch 54,230,224 +9,478,376
benchmarks.run_tiered.linux.arm64.checked.mch 4,958,828 +1,053,584
coreclr_tests.run.linux.arm64.checked.mch 159,633,064 +8,571,516
libraries.crossgen2.linux.arm64.checked.mch 55,592,080 +4,881,684
libraries.pmi.linux.arm64.checked.mch 74,244,500 +7,112,432
libraries_tests.run.linux.arm64.Release.mch 166,844,084 +38,543,372
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 149,377,064 +10,214,900
realworld.run.linux.arm64.checked.mch 14,708,176 +1,884,512
smoke_tests.nativeaot.linux.arm64.checked.mch 2,923,180 +276,940

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,504,130 contexts (977,766 MinOpts, 1,526,364 FullOpts).

MISSED contexts: base: 6,922 (0.28%), diff: 8,132 (0.32%)

Overall (+85,413,308 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 16,134,263 +2,068,742
benchmarks.run_pgo.linux.x64.checked.mch 69,995,881 +9,684,884
benchmarks.run_tiered.linux.x64.checked.mch 15,913,368 +900,860
coreclr_tests.run.linux.x64.checked.mch 402,389,388 +8,170,685
libraries.crossgen2.linux.x64.checked.mch 38,669,752 +4,092,082
libraries.pmi.linux.x64.checked.mch 58,849,984 +6,571,715
libraries_tests.run.linux.x64.Release.mch 339,241,076 +41,981,234
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 130,891,574 +9,478,422
realworld.run.linux.x64.checked.mch 12,750,565 +1,869,040
smoke_tests.nativeaot.linux.x64.checked.mch 4,175,394 +595,644
FullOpts (+85,413,308 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 15,870,090 +2,068,742
benchmarks.run_pgo.linux.x64.checked.mch 48,685,304 +9,684,884
benchmarks.run_tiered.linux.x64.checked.mch 3,632,243 +900,860
coreclr_tests.run.linux.x64.checked.mch 122,850,857 +8,170,685
libraries.crossgen2.linux.x64.checked.mch 38,668,550 +4,092,082
libraries.pmi.linux.x64.checked.mch 58,737,114 +6,571,715
libraries_tests.run.linux.x64.Release.mch 155,729,956 +41,981,234
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 120,233,126 +9,478,422
realworld.run.linux.x64.checked.mch 12,363,655 +1,869,040
smoke_tests.nativeaot.linux.x64.checked.mch 4,174,445 +595,644

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,228,167 contexts (927,360 MinOpts, 1,300,807 FullOpts).

MISSED contexts: base: 6,095 (0.27%), diff: 7,850 (0.35%)

Overall (+61,409,324 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 34,173,472 +4,525,964
benchmarks.run_tiered.osx.arm64.checked.mch 15,534,704 +904,924
coreclr_tests.run.osx.arm64.checked.mch 482,896,688 +7,075,892
libraries.crossgen2.osx.arm64.checked.mch 55,476,760 +4,869,464
libraries.pmi.osx.arm64.checked.mch 78,203,768 +7,494,284
libraries_tests.run.osx.arm64.Release.mch 310,283,060 +24,468,928
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 160,885,512 +10,211,364
realworld.run.osx.arm64.checked.mch 14,574,520 +1,858,504
FullOpts (+61,409,324 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 17,766,124 +4,525,964
benchmarks.run_tiered.osx.arm64.checked.mch 4,035,456 +904,924
coreclr_tests.run.osx.arm64.checked.mch 151,603,736 +7,075,892
libraries.crossgen2.osx.arm64.checked.mch 55,475,132 +4,869,464
libraries.pmi.osx.arm64.checked.mch 78,082,640 +7,494,284
libraries_tests.run.osx.arm64.Release.mch 108,963,876 +24,468,928
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 147,748,988 +10,211,364
realworld.run.osx.arm64.checked.mch 14,010,564 +1,858,504

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,306,322 contexts (929,692 MinOpts, 1,376,630 FullOpts).

MISSED contexts: base: 6,353 (0.27%), diff: 8,476 (0.37%)

Overall (+66,017,632 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,657,768 +1,647,456
benchmarks.run_pgo.windows.arm64.checked.mch 46,143,416 +6,908,548
benchmarks.run_tiered.windows.arm64.checked.mch 15,239,072 +895,968
coreclr_tests.run.windows.arm64.checked.mch 493,954,596 +7,397,932
libraries.crossgen2.windows.arm64.checked.mch 58,806,560 +5,102,604
libraries.pmi.windows.arm64.checked.mch 77,801,572 +7,378,152
libraries_tests.run.windows.arm64.Release.mch 307,088,556 +23,889,608
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 169,130,060 +10,455,324
realworld.run.windows.arm64.checked.mch 15,374,216 +1,926,800
smoke_tests.nativeaot.windows.arm64.checked.mch 3,929,512 +415,240
FullOpts (+66,017,632 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,657,232 +1,647,456
benchmarks.run_pgo.windows.arm64.checked.mch 30,085,000 +6,908,548
benchmarks.run_tiered.windows.arm64.checked.mch 4,066,344 +895,968
coreclr_tests.run.windows.arm64.checked.mch 155,367,424 +7,397,932
libraries.crossgen2.windows.arm64.checked.mch 58,804,924 +5,102,604
libraries.pmi.windows.arm64.checked.mch 77,681,588 +7,378,152
libraries_tests.run.windows.arm64.Release.mch 106,082,604 +23,889,608
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 155,993,520 +10,455,324
realworld.run.windows.arm64.checked.mch 14,810,236 +1,926,800
smoke_tests.nativeaot.windows.arm64.checked.mch 3,928,500 +415,240

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,365,064 contexts (928,740 MinOpts, 1,436,324 FullOpts).

MISSED contexts: base: 6,816 (0.29%), diff: 8,137 (0.34%)

Overall (+64,751,990 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,527,978 +1,457,716
benchmarks.run_pgo.windows.x64.checked.mch 35,281,691 +6,239,881
benchmarks.run_tiered.windows.x64.checked.mch 12,546,330 +791,808
coreclr_tests.run.windows.x64.checked.mch 392,032,764 +6,850,273
libraries.crossgen2.windows.x64.checked.mch 39,426,266 +4,170,815
libraries.pmi.windows.x64.checked.mch 60,047,589 +6,548,076
libraries_tests.run.windows.x64.Release.mch 276,546,440 +26,574,826
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 135,656,589 +9,498,864
realworld.run.windows.x64.checked.mch 13,710,148 +1,889,086
smoke_tests.nativeaot.windows.x64.checked.mch 5,066,508 +730,645
FullOpts (+64,751,990 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,527,617 +1,457,716
benchmarks.run_pgo.windows.x64.checked.mch 21,284,086 +6,239,881
benchmarks.run_tiered.windows.x64.checked.mch 3,440,481 +791,808
coreclr_tests.run.windows.x64.checked.mch 119,501,197 +6,850,273
libraries.crossgen2.windows.x64.checked.mch 39,425,077 +4,170,815
libraries.pmi.windows.x64.checked.mch 59,934,070 +6,548,076
libraries_tests.run.windows.x64.Release.mch 102,825,764 +26,574,826
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 124,850,033 +9,498,864
realworld.run.windows.x64.checked.mch 13,323,545 +1,889,086
smoke_tests.nativeaot.windows.x64.checked.mch 5,065,561 +730,645

Details here


Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,228,746 contexts (825,130 MinOpts, 1,403,616 FullOpts).

MISSED contexts: base: 77,529 (3.36%), diff: 79,285 (3.44%)

Overall (+53,448,948 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,030,000 +1,464,368
benchmarks.run_pgo.linux.arm.checked.mch 63,279,568 +4,101,278
benchmarks.run_tiered.linux.arm.checked.mch 17,368,546 +1,394,258
coreclr_tests.run.linux.arm.checked.mch 320,937,174 +6,505,036
libraries.crossgen2.linux.arm.checked.mch 36,614,296 +3,331,352
libraries.pmi.linux.arm.checked.mch 48,572,466 +5,001,566
libraries_tests.run.linux.arm.Release.mch 243,987,636 +23,325,954
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 93,237,428 +6,452,200
realworld.run.linux.arm.checked.mch 13,249,158 +1,872,936
FullOpts (+53,448,948 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,646,534 +1,464,368
benchmarks.run_pgo.linux.arm.checked.mch 51,637,824 +4,101,278
benchmarks.run_tiered.linux.arm.checked.mch 10,176,318 +1,394,258
coreclr_tests.run.linux.arm.checked.mch 108,295,518 +6,505,036
libraries.crossgen2.linux.arm.checked.mch 36,613,066 +3,331,352
libraries.pmi.linux.arm.checked.mch 48,465,962 +5,001,566
libraries_tests.run.linux.arm.Release.mch 121,696,850 +23,325,954
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 83,153,626 +6,452,200
realworld.run.linux.arm.checked.mch 12,799,472 +1,872,936

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,277,191 contexts (840,452 MinOpts, 1,436,739 FullOpts).

MISSED contexts: base: 7,010 (0.30%), diff: 21,934 (0.95%)

Overall (+46,134,613 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 6,659,356 +1,146,429
benchmarks.run_pgo.windows.x86.checked.mch 43,839,637 +3,876,517
benchmarks.run_tiered.windows.x86.checked.mch 9,013,144 +1,137,867
coreclr_tests.run.windows.x86.checked.mch 307,202,031 +7,080,070
libraries.crossgen2.windows.x86.checked.mch 30,974,839 +3,115,370
libraries.pmi.windows.x86.checked.mch 46,306,972 +4,504,563
libraries_tests.run.windows.x86.Release.mch 179,696,217 +17,116,867
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 100,609,349 +6,823,639
realworld.run.windows.x86.checked.mch 10,518,585 +1,333,291
FullOpts (+46,134,613 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 6,659,077 +1,146,429
benchmarks.run_pgo.windows.x86.checked.mch 37,235,256 +3,876,517
benchmarks.run_tiered.windows.x86.checked.mch 4,745,806 +1,137,867
coreclr_tests.run.windows.x86.checked.mch 105,530,842 +7,080,070
libraries.crossgen2.windows.x86.checked.mch 30,973,782 +3,115,370
libraries.pmi.windows.x86.checked.mch 46,211,658 +4,504,563
libraries_tests.run.windows.x86.Release.mch 81,611,939 +17,116,867
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 91,939,641 +6,823,639
realworld.run.windows.x86.checked.mch 10,222,885 +1,333,291

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+4.02% to +18.80%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +14.08%
benchmarks.run_pgo.linux.arm64.checked.mch +17.38%
benchmarks.run_tiered.linux.arm64.checked.mch +9.81%
coreclr_tests.run.linux.arm64.checked.mch +4.02%
libraries.crossgen2.linux.arm64.checked.mch +11.45%
libraries.pmi.linux.arm64.checked.mch +10.91%
libraries_tests.run.linux.arm64.Release.mch +18.80%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +7.79%
realworld.run.linux.arm64.checked.mch +13.72%
smoke_tests.nativeaot.linux.arm64.checked.mch +11.72%
FullOpts (+7.07% to +25.44%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +14.18%
benchmarks.run_pgo.linux.arm64.checked.mch +19.67%
benchmarks.run_tiered.linux.arm64.checked.mch +23.71%
coreclr_tests.run.linux.arm64.checked.mch +7.07%
libraries.crossgen2.linux.arm64.checked.mch +11.45%
libraries.pmi.linux.arm64.checked.mch +10.91%
libraries_tests.run.linux.arm64.Release.mch +25.44%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +8.02%
realworld.run.linux.arm64.checked.mch +13.84%
smoke_tests.nativeaot.linux.arm64.checked.mch +11.72%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+4.46% to +20.19%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +14.06%
benchmarks.run_pgo.linux.x64.checked.mch +17.24%
benchmarks.run_tiered.linux.x64.checked.mch +13.24%
coreclr_tests.run.linux.x64.checked.mch +4.46%
libraries.crossgen2.linux.x64.checked.mch +11.29%
libraries.pmi.linux.x64.checked.mch +11.05%
libraries_tests.run.linux.x64.Release.mch +20.19%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +7.92%
realworld.run.linux.x64.checked.mch +13.97%
smoke_tests.nativeaot.linux.x64.checked.mch +14.22%
FullOpts (+7.69% to +25.92%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +14.13%
benchmarks.run_pgo.linux.x64.checked.mch +19.20%
benchmarks.run_tiered.linux.x64.checked.mch +23.75%
coreclr_tests.run.linux.x64.checked.mch +7.69%
libraries.crossgen2.linux.x64.checked.mch +11.29%
libraries.pmi.linux.x64.checked.mch +11.06%
libraries_tests.run.linux.x64.Release.mch +25.92%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +8.14%
realworld.run.linux.x64.checked.mch +14.06%
smoke_tests.nativeaot.linux.x64.checked.mch +14.22%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+3.55% to +21.98%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch +21.98%
benchmarks.run_tiered.osx.arm64.checked.mch +13.80%
coreclr_tests.run.osx.arm64.checked.mch +3.55%
libraries.crossgen2.osx.arm64.checked.mch +11.44%
libraries.pmi.osx.arm64.checked.mch +10.96%
libraries_tests.run.osx.arm64.Release.mch +16.56%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +7.85%
realworld.run.osx.arm64.checked.mch +14.10%
FullOpts (+6.27% to +27.66%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch +27.66%
benchmarks.run_tiered.osx.arm64.checked.mch +24.78%
coreclr_tests.run.osx.arm64.checked.mch +6.27%
libraries.crossgen2.osx.arm64.checked.mch +11.44%
libraries.pmi.osx.arm64.checked.mch +10.97%
libraries_tests.run.osx.arm64.Release.mch +24.91%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +8.07%
realworld.run.osx.arm64.checked.mch +14.23%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+3.62% to +21.25%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +17.71%
benchmarks.run_pgo.windows.arm64.checked.mch +21.25%
benchmarks.run_tiered.windows.arm64.checked.mch +13.75%
coreclr_tests.run.windows.arm64.checked.mch +3.62%
libraries.crossgen2.windows.arm64.checked.mch +11.35%
libraries.pmi.windows.arm64.checked.mch +10.88%
libraries_tests.run.windows.arm64.Release.mch +16.46%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +7.68%
realworld.run.windows.arm64.checked.mch +13.95%
smoke_tests.nativeaot.windows.arm64.checked.mch +12.79%
FullOpts (+6.38% to +25.02%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +17.71%
benchmarks.run_pgo.windows.arm64.checked.mch +24.50%
benchmarks.run_tiered.windows.arm64.checked.mch +24.41%
coreclr_tests.run.windows.arm64.checked.mch +6.38%
libraries.crossgen2.windows.arm64.checked.mch +11.35%
libraries.pmi.windows.arm64.checked.mch +10.89%
libraries_tests.run.windows.arm64.Release.mch +25.02%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +7.89%
realworld.run.windows.arm64.checked.mch +14.08%
smoke_tests.nativeaot.windows.arm64.checked.mch +12.79%

Throughput diffs for windows/x64 ran on windows/x64

Overall (+3.87% to +23.68%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch +17.65%
benchmarks.run_pgo.windows.x64.checked.mch +23.68%
benchmarks.run_tiered.windows.x64.checked.mch +14.47%
coreclr_tests.run.windows.x64.checked.mch +3.87%
libraries.crossgen2.windows.x64.checked.mch +11.18%
libraries.pmi.windows.x64.checked.mch +10.94%
libraries_tests.run.windows.x64.Release.mch +18.01%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +7.80%
realworld.run.windows.x64.checked.mch +13.45%
smoke_tests.nativeaot.windows.x64.checked.mch +14.30%
FullOpts (+6.64% to +27.38%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch +17.65%
benchmarks.run_pgo.windows.x64.checked.mch +27.38%
benchmarks.run_tiered.windows.x64.checked.mch +23.46%
coreclr_tests.run.windows.x64.checked.mch +6.64%
libraries.crossgen2.windows.x64.checked.mch +11.18%
libraries.pmi.windows.x64.checked.mch +10.95%
libraries_tests.run.windows.x64.Release.mch +25.04%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +8.00%
realworld.run.windows.x64.checked.mch +13.53%
smoke_tests.nativeaot.windows.x64.checked.mch +14.30%

Details here


Throughput diffs for linux/arm ran on windows/x86

Overall (+4.64% to +15.62%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +11.80%
benchmarks.run_pgo.linux.arm.checked.mch +7.36%
benchmarks.run_tiered.linux.arm.checked.mch +12.58%
coreclr_tests.run.linux.arm.checked.mch +4.64%
libraries.crossgen2.linux.arm.checked.mch +10.95%
libraries.pmi.linux.arm.checked.mch +10.98%
libraries_tests.run.linux.arm.Release.mch +15.00%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +8.40%
realworld.run.linux.arm.checked.mch +15.62%
FullOpts (+7.85% to +19.46%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +11.92%
benchmarks.run_pgo.linux.arm.checked.mch +7.85%
benchmarks.run_tiered.linux.arm.checked.mch +15.46%
coreclr_tests.run.linux.arm.checked.mch +7.94%
libraries.crossgen2.linux.arm.checked.mch +10.95%
libraries.pmi.linux.arm.checked.mch +10.99%
libraries_tests.run.linux.arm.Release.mch +19.46%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +8.72%
realworld.run.linux.arm.checked.mch +15.74%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+4.73% to +17.44%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +16.01%
benchmarks.run_pgo.windows.x86.checked.mch +10.40%
benchmarks.run_tiered.windows.x86.checked.mch +17.44%
coreclr_tests.run.windows.x86.checked.mch +4.73%
libraries.crossgen2.windows.x86.checked.mch +10.14%
libraries.pmi.windows.x86.checked.mch +9.26%
libraries_tests.run.windows.x86.Release.mch +14.60%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +7.11%
realworld.run.windows.x86.checked.mch +11.56%
FullOpts (+7.21% to +21.25%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +16.01%
benchmarks.run_pgo.windows.x86.checked.mch +10.84%
benchmarks.run_tiered.windows.x86.checked.mch +21.25%
coreclr_tests.run.windows.x86.checked.mch +7.21%
libraries.crossgen2.windows.x86.checked.mch +10.14%
libraries.pmi.windows.x86.checked.mch +9.26%
libraries_tests.run.windows.x86.Release.mch +18.74%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +7.28%
realworld.run.windows.x86.checked.mch +11.62%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (+3.85% to +18.62%)
Collection PDIFF
libraries.crossgen2.linux.arm64.checked.mch +11.41%
smoke_tests.nativeaot.linux.arm64.checked.mch +11.64%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +7.72%
coreclr_tests.run.linux.arm64.checked.mch +3.85%
benchmarks.run.linux.arm64.checked.mch +13.90%
realworld.run.linux.arm64.checked.mch +13.58%
libraries_tests.run.linux.arm64.Release.mch +18.62%
libraries.pmi.linux.arm64.checked.mch +10.85%
benchmarks.run_tiered.linux.arm64.checked.mch +9.78%
benchmarks.run_pgo.linux.arm64.checked.mch +17.27%
FullOpts (+6.99% to +25.14%)
Collection PDIFF
libraries.crossgen2.linux.arm64.checked.mch +11.41%
smoke_tests.nativeaot.linux.arm64.checked.mch +11.64%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +7.94%
coreclr_tests.run.linux.arm64.checked.mch +6.99%
benchmarks.run.linux.arm64.checked.mch +14.01%
realworld.run.linux.arm64.checked.mch +13.70%
libraries_tests.run.linux.arm64.Release.mch +25.14%
libraries.pmi.linux.arm64.checked.mch +10.86%
benchmarks.run_tiered.linux.arm64.checked.mch +23.40%
benchmarks.run_pgo.linux.arm64.checked.mch +19.50%

Throughput diffs for linux/x64 ran on linux/x64

Overall (+4.24% to +19.86%)
Collection PDIFF
realworld.run.linux.x64.checked.mch +13.82%
libraries.pmi.linux.x64.checked.mch +10.98%
smoke_tests.nativeaot.linux.x64.checked.mch +14.11%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +7.83%
libraries.crossgen2.linux.x64.checked.mch +11.23%
coreclr_tests.run.linux.x64.checked.mch +4.24%
benchmarks.run_tiered.linux.x64.checked.mch +13.01%
benchmarks.run_pgo.linux.x64.checked.mch +17.04%
benchmarks.run.linux.x64.checked.mch +13.87%
libraries_tests.run.linux.x64.Release.mch +19.86%
FullOpts (+7.59% to +25.54%)
Collection PDIFF
realworld.run.linux.x64.checked.mch +13.91%
libraries.pmi.linux.x64.checked.mch +10.98%
smoke_tests.nativeaot.linux.x64.checked.mch +14.11%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +8.05%
libraries.crossgen2.linux.x64.checked.mch +11.23%
coreclr_tests.run.linux.x64.checked.mch +7.59%
benchmarks.run_tiered.linux.x64.checked.mch +23.33%
benchmarks.run_pgo.linux.x64.checked.mch +18.96%
benchmarks.run.linux.x64.checked.mch +13.94%
libraries_tests.run.linux.x64.Release.mch +25.54%

Details here


@ryujit-bot
Copy link

Diff results for #97517

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,496,508 contexts (1,011,240 MinOpts, 1,485,268 FullOpts).

MISSED contexts: base: 6,580 (0.26%), diff: 8,842 (0.35%)

Overall (+83,870,420 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 15,871,176 +1,853,104
benchmarks.run_pgo.linux.arm64.checked.mch 78,212,264 +9,478,376
benchmarks.run_tiered.linux.arm64.checked.mch 29,580,048 +1,053,584
coreclr_tests.run.linux.arm64.checked.mch 508,462,256 +8,571,516
libraries.crossgen2.linux.arm64.checked.mch 55,593,716 +4,881,684
libraries.pmi.linux.arm64.checked.mch 74,364,484 +7,112,432
libraries_tests.run.linux.arm64.Release.mch 382,265,760 +38,543,372
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 162,857,060 +10,214,900
realworld.run.linux.arm64.checked.mch 15,282,328 +1,884,512
smoke_tests.nativeaot.linux.arm64.checked.mch 2,924,168 +276,940
FullOpts (+83,870,420 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 15,524,920 +1,853,104
benchmarks.run_pgo.linux.arm64.checked.mch 54,230,224 +9,478,376
benchmarks.run_tiered.linux.arm64.checked.mch 4,958,828 +1,053,584
coreclr_tests.run.linux.arm64.checked.mch 159,633,064 +8,571,516
libraries.crossgen2.linux.arm64.checked.mch 55,592,080 +4,881,684
libraries.pmi.linux.arm64.checked.mch 74,244,500 +7,112,432
libraries_tests.run.linux.arm64.Release.mch 166,844,084 +38,543,372
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 149,377,064 +10,214,900
realworld.run.linux.arm64.checked.mch 14,708,176 +1,884,512
smoke_tests.nativeaot.linux.arm64.checked.mch 2,923,180 +276,940

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,504,130 contexts (977,766 MinOpts, 1,526,364 FullOpts).

MISSED contexts: base: 6,922 (0.28%), diff: 8,132 (0.32%)

Overall (+85,413,308 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 16,134,263 +2,068,742
benchmarks.run_pgo.linux.x64.checked.mch 69,995,881 +9,684,884
benchmarks.run_tiered.linux.x64.checked.mch 15,913,368 +900,860
coreclr_tests.run.linux.x64.checked.mch 402,389,388 +8,170,685
libraries.crossgen2.linux.x64.checked.mch 38,669,752 +4,092,082
libraries.pmi.linux.x64.checked.mch 58,849,984 +6,571,715
libraries_tests.run.linux.x64.Release.mch 339,241,076 +41,981,234
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 130,891,574 +9,478,422
realworld.run.linux.x64.checked.mch 12,750,565 +1,869,040
smoke_tests.nativeaot.linux.x64.checked.mch 4,175,394 +595,644
FullOpts (+85,413,308 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 15,870,090 +2,068,742
benchmarks.run_pgo.linux.x64.checked.mch 48,685,304 +9,684,884
benchmarks.run_tiered.linux.x64.checked.mch 3,632,243 +900,860
coreclr_tests.run.linux.x64.checked.mch 122,850,857 +8,170,685
libraries.crossgen2.linux.x64.checked.mch 38,668,550 +4,092,082
libraries.pmi.linux.x64.checked.mch 58,737,114 +6,571,715
libraries_tests.run.linux.x64.Release.mch 155,729,956 +41,981,234
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 120,233,126 +9,478,422
realworld.run.linux.x64.checked.mch 12,363,655 +1,869,040
smoke_tests.nativeaot.linux.x64.checked.mch 4,174,445 +595,644

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,228,167 contexts (927,360 MinOpts, 1,300,807 FullOpts).

MISSED contexts: base: 6,095 (0.27%), diff: 7,850 (0.35%)

Overall (+61,409,324 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 34,173,472 +4,525,964
benchmarks.run_tiered.osx.arm64.checked.mch 15,534,704 +904,924
coreclr_tests.run.osx.arm64.checked.mch 482,896,688 +7,075,892
libraries.crossgen2.osx.arm64.checked.mch 55,476,760 +4,869,464
libraries.pmi.osx.arm64.checked.mch 78,203,768 +7,494,284
libraries_tests.run.osx.arm64.Release.mch 310,283,060 +24,468,928
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 160,885,512 +10,211,364
realworld.run.osx.arm64.checked.mch 14,574,520 +1,858,504
FullOpts (+61,409,324 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 17,766,124 +4,525,964
benchmarks.run_tiered.osx.arm64.checked.mch 4,035,456 +904,924
coreclr_tests.run.osx.arm64.checked.mch 151,603,736 +7,075,892
libraries.crossgen2.osx.arm64.checked.mch 55,475,132 +4,869,464
libraries.pmi.osx.arm64.checked.mch 78,082,640 +7,494,284
libraries_tests.run.osx.arm64.Release.mch 108,963,876 +24,468,928
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 147,748,988 +10,211,364
realworld.run.osx.arm64.checked.mch 14,010,564 +1,858,504

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,306,322 contexts (929,692 MinOpts, 1,376,630 FullOpts).

MISSED contexts: base: 6,353 (0.27%), diff: 8,476 (0.37%)

Overall (+66,017,632 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,657,768 +1,647,456
benchmarks.run_pgo.windows.arm64.checked.mch 46,143,416 +6,908,548
benchmarks.run_tiered.windows.arm64.checked.mch 15,239,072 +895,968
coreclr_tests.run.windows.arm64.checked.mch 493,954,596 +7,397,932
libraries.crossgen2.windows.arm64.checked.mch 58,806,560 +5,102,604
libraries.pmi.windows.arm64.checked.mch 77,801,572 +7,378,152
libraries_tests.run.windows.arm64.Release.mch 307,088,556 +23,889,608
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 169,130,060 +10,455,324
realworld.run.windows.arm64.checked.mch 15,374,216 +1,926,800
smoke_tests.nativeaot.windows.arm64.checked.mch 3,929,512 +415,240
FullOpts (+66,017,632 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,657,232 +1,647,456
benchmarks.run_pgo.windows.arm64.checked.mch 30,085,000 +6,908,548
benchmarks.run_tiered.windows.arm64.checked.mch 4,066,344 +895,968
coreclr_tests.run.windows.arm64.checked.mch 155,367,424 +7,397,932
libraries.crossgen2.windows.arm64.checked.mch 58,804,924 +5,102,604
libraries.pmi.windows.arm64.checked.mch 77,681,588 +7,378,152
libraries_tests.run.windows.arm64.Release.mch 106,082,604 +23,889,608
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 155,993,520 +10,455,324
realworld.run.windows.arm64.checked.mch 14,810,236 +1,926,800
smoke_tests.nativeaot.windows.arm64.checked.mch 3,928,500 +415,240

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,478,673 contexts (976,915 MinOpts, 1,501,758 FullOpts).

MISSED contexts: base: 6,816 (0.27%), diff: 8,236 (0.33%)

Overall (+69,970,980 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 43,157,840 +5,218,990
benchmarks.run.windows.x64.checked.mch 8,527,978 +1,457,716
benchmarks.run_pgo.windows.x64.checked.mch 35,281,691 +6,239,881
benchmarks.run_tiered.windows.x64.checked.mch 12,546,330 +791,808
coreclr_tests.run.windows.x64.checked.mch 392,032,764 +6,850,273
libraries.crossgen2.windows.x64.checked.mch 39,426,266 +4,170,815
libraries.pmi.windows.x64.checked.mch 60,047,589 +6,548,076
libraries_tests.run.windows.x64.Release.mch 276,546,440 +26,574,826
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 135,656,589 +9,498,864
realworld.run.windows.x64.checked.mch 13,710,148 +1,889,086
smoke_tests.nativeaot.windows.x64.checked.mch 5,066,508 +730,645
FullOpts (+69,970,980 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 28,739,224 +5,218,990
benchmarks.run.windows.x64.checked.mch 8,527,617 +1,457,716
benchmarks.run_pgo.windows.x64.checked.mch 21,284,086 +6,239,881
benchmarks.run_tiered.windows.x64.checked.mch 3,440,481 +791,808
coreclr_tests.run.windows.x64.checked.mch 119,501,197 +6,850,273
libraries.crossgen2.windows.x64.checked.mch 39,425,077 +4,170,815
libraries.pmi.windows.x64.checked.mch 59,934,070 +6,548,076
libraries_tests.run.windows.x64.Release.mch 102,825,764 +26,574,826
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 124,850,033 +9,498,864
realworld.run.windows.x64.checked.mch 13,323,545 +1,889,086
smoke_tests.nativeaot.windows.x64.checked.mch 5,065,561 +730,645

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+4.02% to +18.80%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +14.08%
benchmarks.run_pgo.linux.arm64.checked.mch +17.39%
benchmarks.run_tiered.linux.arm64.checked.mch +9.81%
coreclr_tests.run.linux.arm64.checked.mch +4.02%
libraries.crossgen2.linux.arm64.checked.mch +11.45%
libraries.pmi.linux.arm64.checked.mch +10.91%
libraries_tests.run.linux.arm64.Release.mch +18.80%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +7.79%
realworld.run.linux.arm64.checked.mch +13.72%
smoke_tests.nativeaot.linux.arm64.checked.mch +11.72%
FullOpts (+7.07% to +25.44%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +14.18%
benchmarks.run_pgo.linux.arm64.checked.mch +19.67%
benchmarks.run_tiered.linux.arm64.checked.mch +23.71%
coreclr_tests.run.linux.arm64.checked.mch +7.07%
libraries.crossgen2.linux.arm64.checked.mch +11.45%
libraries.pmi.linux.arm64.checked.mch +10.91%
libraries_tests.run.linux.arm64.Release.mch +25.44%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +8.02%
realworld.run.linux.arm64.checked.mch +13.84%
smoke_tests.nativeaot.linux.arm64.checked.mch +11.72%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+4.46% to +20.19%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +14.05%
benchmarks.run_pgo.linux.x64.checked.mch +17.24%
benchmarks.run_tiered.linux.x64.checked.mch +13.24%
coreclr_tests.run.linux.x64.checked.mch +4.46%
libraries.crossgen2.linux.x64.checked.mch +11.29%
libraries.pmi.linux.x64.checked.mch +11.05%
libraries_tests.run.linux.x64.Release.mch +20.19%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +7.92%
realworld.run.linux.x64.checked.mch +13.97%
smoke_tests.nativeaot.linux.x64.checked.mch +14.22%
FullOpts (+7.69% to +25.92%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +14.13%
benchmarks.run_pgo.linux.x64.checked.mch +19.20%
benchmarks.run_tiered.linux.x64.checked.mch +23.74%
coreclr_tests.run.linux.x64.checked.mch +7.69%
libraries.crossgen2.linux.x64.checked.mch +11.29%
libraries.pmi.linux.x64.checked.mch +11.06%
libraries_tests.run.linux.x64.Release.mch +25.92%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +8.14%
realworld.run.linux.x64.checked.mch +14.06%
smoke_tests.nativeaot.linux.x64.checked.mch +14.22%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+3.54% to +21.98%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch +21.98%
benchmarks.run_tiered.osx.arm64.checked.mch +13.80%
coreclr_tests.run.osx.arm64.checked.mch +3.54%
libraries.crossgen2.osx.arm64.checked.mch +11.44%
libraries.pmi.osx.arm64.checked.mch +10.96%
libraries_tests.run.osx.arm64.Release.mch +16.57%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +7.85%
realworld.run.osx.arm64.checked.mch +14.10%
FullOpts (+6.27% to +27.66%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch +27.66%
benchmarks.run_tiered.osx.arm64.checked.mch +24.78%
coreclr_tests.run.osx.arm64.checked.mch +6.27%
libraries.crossgen2.osx.arm64.checked.mch +11.44%
libraries.pmi.osx.arm64.checked.mch +10.97%
libraries_tests.run.osx.arm64.Release.mch +24.92%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +8.07%
realworld.run.osx.arm64.checked.mch +14.23%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+3.62% to +21.25%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +17.71%
benchmarks.run_pgo.windows.arm64.checked.mch +21.25%
benchmarks.run_tiered.windows.arm64.checked.mch +13.75%
coreclr_tests.run.windows.arm64.checked.mch +3.62%
libraries.crossgen2.windows.arm64.checked.mch +11.35%
libraries.pmi.windows.arm64.checked.mch +10.88%
libraries_tests.run.windows.arm64.Release.mch +16.46%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +7.68%
realworld.run.windows.arm64.checked.mch +13.95%
smoke_tests.nativeaot.windows.arm64.checked.mch +12.79%
FullOpts (+6.37% to +25.02%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +17.71%
benchmarks.run_pgo.windows.arm64.checked.mch +24.50%
benchmarks.run_tiered.windows.arm64.checked.mch +24.41%
coreclr_tests.run.windows.arm64.checked.mch +6.37%
libraries.crossgen2.windows.arm64.checked.mch +11.35%
libraries.pmi.windows.arm64.checked.mch +10.89%
libraries_tests.run.windows.arm64.Release.mch +25.02%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +7.89%
realworld.run.windows.arm64.checked.mch +14.08%
smoke_tests.nativeaot.windows.arm64.checked.mch +12.80%

Throughput diffs for windows/x64 ran on windows/x64

Overall (+3.87% to +23.69%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +16.20%
benchmarks.run.windows.x64.checked.mch +17.65%
benchmarks.run_pgo.windows.x64.checked.mch +23.69%
benchmarks.run_tiered.windows.x64.checked.mch +14.47%
coreclr_tests.run.windows.x64.checked.mch +3.87%
libraries.crossgen2.windows.x64.checked.mch +11.18%
libraries.pmi.windows.x64.checked.mch +10.94%
libraries_tests.run.windows.x64.Release.mch +18.02%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +7.80%
realworld.run.windows.x64.checked.mch +13.45%
smoke_tests.nativeaot.windows.x64.checked.mch +14.30%
FullOpts (+6.63% to +27.39%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +18.08%
benchmarks.run.windows.x64.checked.mch +17.65%
benchmarks.run_pgo.windows.x64.checked.mch +27.39%
benchmarks.run_tiered.windows.x64.checked.mch +23.46%
coreclr_tests.run.windows.x64.checked.mch +6.63%
libraries.crossgen2.windows.x64.checked.mch +11.18%
libraries.pmi.windows.x64.checked.mch +10.95%
libraries_tests.run.windows.x64.Release.mch +25.04%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +8.00%
realworld.run.windows.x64.checked.mch +13.53%
smoke_tests.nativeaot.windows.x64.checked.mch +14.30%

Details here


Throughput diffs for windows/x86 ran on linux/x86

Overall (+4.73% to +17.44%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +16.01%
benchmarks.run_pgo.windows.x86.checked.mch +10.41%
benchmarks.run_tiered.windows.x86.checked.mch +17.44%
coreclr_tests.run.windows.x86.checked.mch +4.73%
libraries.crossgen2.windows.x86.checked.mch +10.14%
libraries.pmi.windows.x86.checked.mch +9.26%
libraries_tests.run.windows.x86.Release.mch +14.60%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +7.11%
realworld.run.windows.x86.checked.mch +11.56%
FullOpts (+7.21% to +21.25%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +16.01%
benchmarks.run_pgo.windows.x86.checked.mch +10.84%
benchmarks.run_tiered.windows.x86.checked.mch +21.25%
coreclr_tests.run.windows.x86.checked.mch +7.21%
libraries.crossgen2.windows.x86.checked.mch +10.14%
libraries.pmi.windows.x86.checked.mch +9.26%
libraries_tests.run.windows.x86.Release.mch +18.74%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +7.28%
realworld.run.windows.x86.checked.mch +11.62%

Details here


@ryujit-bot
Copy link

Diff results for #97517

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,228,746 contexts (825,130 MinOpts, 1,403,616 FullOpts).

MISSED contexts: base: 77,529 (3.36%), diff: 79,285 (3.44%)

Overall (+53,448,948 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,030,000 +1,464,368
benchmarks.run_pgo.linux.arm.checked.mch 63,279,568 +4,101,278
benchmarks.run_tiered.linux.arm.checked.mch 17,368,546 +1,394,258
coreclr_tests.run.linux.arm.checked.mch 320,937,174 +6,505,036
libraries.crossgen2.linux.arm.checked.mch 36,614,296 +3,331,352
libraries.pmi.linux.arm.checked.mch 48,572,466 +5,001,566
libraries_tests.run.linux.arm.Release.mch 243,987,636 +23,325,954
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 93,237,428 +6,452,200
realworld.run.linux.arm.checked.mch 13,249,158 +1,872,936
FullOpts (+53,448,948 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,646,534 +1,464,368
benchmarks.run_pgo.linux.arm.checked.mch 51,637,824 +4,101,278
benchmarks.run_tiered.linux.arm.checked.mch 10,176,318 +1,394,258
coreclr_tests.run.linux.arm.checked.mch 108,295,518 +6,505,036
libraries.crossgen2.linux.arm.checked.mch 36,613,066 +3,331,352
libraries.pmi.linux.arm.checked.mch 48,465,962 +5,001,566
libraries_tests.run.linux.arm.Release.mch 121,696,850 +23,325,954
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 83,153,626 +6,452,200
realworld.run.linux.arm.checked.mch 12,799,472 +1,872,936

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,277,191 contexts (840,452 MinOpts, 1,436,739 FullOpts).

MISSED contexts: base: 7,010 (0.30%), diff: 21,934 (0.95%)

Overall (+46,134,613 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 6,659,356 +1,146,429
benchmarks.run_pgo.windows.x86.checked.mch 43,839,637 +3,876,517
benchmarks.run_tiered.windows.x86.checked.mch 9,013,144 +1,137,867
coreclr_tests.run.windows.x86.checked.mch 307,202,031 +7,080,070
libraries.crossgen2.windows.x86.checked.mch 30,974,839 +3,115,370
libraries.pmi.windows.x86.checked.mch 46,306,972 +4,504,563
libraries_tests.run.windows.x86.Release.mch 179,696,217 +17,116,867
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 100,609,349 +6,823,639
realworld.run.windows.x86.checked.mch 10,518,585 +1,333,291
FullOpts (+46,134,613 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 6,659,077 +1,146,429
benchmarks.run_pgo.windows.x86.checked.mch 37,235,256 +3,876,517
benchmarks.run_tiered.windows.x86.checked.mch 4,745,806 +1,137,867
coreclr_tests.run.windows.x86.checked.mch 105,530,842 +7,080,070
libraries.crossgen2.windows.x86.checked.mch 30,973,782 +3,115,370
libraries.pmi.windows.x86.checked.mch 46,211,658 +4,504,563
libraries_tests.run.windows.x86.Release.mch 81,611,939 +17,116,867
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 91,939,641 +6,823,639
realworld.run.windows.x86.checked.mch 10,222,885 +1,333,291

Details here


Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (+4.64% to +15.60%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +11.80%
benchmarks.run_pgo.linux.arm.checked.mch +7.36%
benchmarks.run_tiered.linux.arm.checked.mch +12.58%
coreclr_tests.run.linux.arm.checked.mch +4.64%
libraries.crossgen2.linux.arm.checked.mch +10.95%
libraries.pmi.linux.arm.checked.mch +10.98%
libraries_tests.run.linux.arm.Release.mch +15.01%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +8.40%
realworld.run.linux.arm.checked.mch +15.60%
FullOpts (+7.85% to +19.46%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +11.92%
benchmarks.run_pgo.linux.arm.checked.mch +7.85%
benchmarks.run_tiered.linux.arm.checked.mch +15.46%
coreclr_tests.run.linux.arm.checked.mch +7.93%
libraries.crossgen2.linux.arm.checked.mch +10.95%
libraries.pmi.linux.arm.checked.mch +10.99%
libraries_tests.run.linux.arm.Release.mch +19.46%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +8.72%
realworld.run.linux.arm.checked.mch +15.73%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (+3.85% to +18.62%)
Collection PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch +17.27%
smoke_tests.nativeaot.linux.arm64.checked.mch +11.64%
libraries.pmi.linux.arm64.checked.mch +10.85%
benchmarks.run.linux.arm64.checked.mch +13.90%
libraries.crossgen2.linux.arm64.checked.mch +11.41%
coreclr_tests.run.linux.arm64.checked.mch +3.85%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +7.72%
realworld.run.linux.arm64.checked.mch +13.58%
libraries_tests.run.linux.arm64.Release.mch +18.62%
benchmarks.run_tiered.linux.arm64.checked.mch +9.78%
FullOpts (+6.99% to +25.14%)
Collection PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch +19.50%
smoke_tests.nativeaot.linux.arm64.checked.mch +11.64%
libraries.pmi.linux.arm64.checked.mch +10.86%
benchmarks.run.linux.arm64.checked.mch +14.01%
libraries.crossgen2.linux.arm64.checked.mch +11.41%
coreclr_tests.run.linux.arm64.checked.mch +6.99%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +7.94%
realworld.run.linux.arm64.checked.mch +13.70%
libraries_tests.run.linux.arm64.Release.mch +25.14%
benchmarks.run_tiered.linux.arm64.checked.mch +23.40%

Throughput diffs for linux/x64 ran on linux/x64

Overall (+4.24% to +19.87%)
Collection PDIFF
libraries.crossgen2.linux.x64.checked.mch +11.23%
libraries.pmi.linux.x64.checked.mch +10.98%
smoke_tests.nativeaot.linux.x64.checked.mch +14.11%
benchmarks.run_pgo.linux.x64.checked.mch +17.04%
coreclr_tests.run.linux.x64.checked.mch +4.24%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +7.83%
benchmarks.run_tiered.linux.x64.checked.mch +13.01%
benchmarks.run.linux.x64.checked.mch +13.87%
libraries_tests.run.linux.x64.Release.mch +19.87%
realworld.run.linux.x64.checked.mch +13.82%
FullOpts (+7.59% to +25.55%)
Collection PDIFF
libraries.crossgen2.linux.x64.checked.mch +11.23%
libraries.pmi.linux.x64.checked.mch +10.98%
smoke_tests.nativeaot.linux.x64.checked.mch +14.11%
benchmarks.run_pgo.linux.x64.checked.mch +18.96%
coreclr_tests.run.linux.x64.checked.mch +7.59%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +8.05%
benchmarks.run_tiered.linux.x64.checked.mch +23.32%
benchmarks.run.linux.x64.checked.mch +13.94%
libraries_tests.run.linux.x64.Release.mch +25.55%
realworld.run.linux.x64.checked.mch +13.91%

Details here


@ryujit-bot
Copy link

Diff results for #97517

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,228,746 contexts (825,130 MinOpts, 1,403,616 FullOpts).

MISSED contexts: base: 77,529 (3.36%), diff: 79,285 (3.44%)

Overall (+53,448,948 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,030,000 +1,464,368
benchmarks.run_pgo.linux.arm.checked.mch 63,279,568 +4,101,278
benchmarks.run_tiered.linux.arm.checked.mch 17,368,546 +1,394,258
coreclr_tests.run.linux.arm.checked.mch 320,937,174 +6,505,036
libraries.crossgen2.linux.arm.checked.mch 36,614,296 +3,331,352
libraries.pmi.linux.arm.checked.mch 48,572,466 +5,001,566
libraries_tests.run.linux.arm.Release.mch 243,987,636 +23,325,954
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 93,237,428 +6,452,200
realworld.run.linux.arm.checked.mch 13,249,158 +1,872,936
FullOpts (+53,448,948 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,646,534 +1,464,368
benchmarks.run_pgo.linux.arm.checked.mch 51,637,824 +4,101,278
benchmarks.run_tiered.linux.arm.checked.mch 10,176,318 +1,394,258
coreclr_tests.run.linux.arm.checked.mch 108,295,518 +6,505,036
libraries.crossgen2.linux.arm.checked.mch 36,613,066 +3,331,352
libraries.pmi.linux.arm.checked.mch 48,465,962 +5,001,566
libraries_tests.run.linux.arm.Release.mch 121,696,850 +23,325,954
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 83,153,626 +6,452,200
realworld.run.linux.arm.checked.mch 12,799,472 +1,872,936

Details here


@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jan 29, 2024

Hitting an assert on x86 with JitStressRegs=0x80. We have the following IR after LSRA:

------------ BB96 [0297] [00C..03A) -> BB92,BB03,BB14,BB17,BB11,BB20,BB91,BB91,BB93,BB40 (switch), preds={BB02} succs={BB03,BB11,BB14,BB17,BB20,BB40,BB91,BB92,BB93}
N071 (???,???) [002516] ----------Z                 t2516 =    LCL_VAR   int    V04 loc0          edx REG edx
N073 (???,???) [002517] -----------                 t2517 =    JMPTABLE  int    REG eax
N001 (  1,  1) [002789] ----------z                 u2789 =    LCL_VAR   int    V04 loc0          edi REG edi
                                                            ┌──▌  t2516  int    
                                                            ├──▌  t2517  int    
N075 (???,???) [002518] -----------                           SWITCH_TABLE void   REG NA

[002789] seems to be inserted for resolution. When codegen consumes [002516] as part of generating the SWITCH_TABLE we hit the assert in the if here:

// If this is a write-thru or a single-def variable, we don't actually spill at a use,
// but we will kill the var in the reg (below).
if (!varDsc->IsAlwaysAliveInMemory())
{
instruction storeIns = ins_Store(lclType, compiler->isSIMDTypeLocalAligned(varNum));
assert(varDsc->GetRegNum() == tree->GetRegNum());
inst_TT_RV(storeIns, size, tree, tree->GetRegNum());
}

@kunalspathak any idea? Attached the jitdump.
out.txt

@kunalspathak
Copy link
Contributor

kunalspathak commented Jan 30, 2024

TLDR: This seems to be an existing bug in placement of resolution move of a local that is an operand of switch.

Details:

At the end of BB96, V04 is marked as inActive and hence, [002516] is marked as GT_SPILL. During resolution, we see that most of the successors requires it to be present in rdi, so we add a resolution node [002789] from rdi <- stk

BB96 bottom: move V04 from STK to edi (SharedCritical)
N001 (  1,  1) [002789] ----------z                 u2789 =    LCL_VAR   int    V04 loc0          edi REG edi

This gets added before the GT_SWITCH node:

// Put the copy at the bottom
GenTree* lastNode = blockRange.LastNode();
if (block->KindIs(BBJ_COND, BBJ_SWITCH))
{
noway_assert(!blockRange.IsEmpty());
GenTree* branch = lastNode;
assert(branch->OperIsConditionalJump() || branch->OperGet() == GT_SWITCH_TABLE ||
branch->OperGet() == GT_SWITCH);
blockRange.InsertBefore(branch, std::move(treeRange));
}

During codegen, throughout BB96, V04 was present in rdx, while [002789] needs it to be in rdi. We try to consume the registers for that node, and we do a move from V04 to treeNode i.e. rdi <- rdx. Then, we unspill the register from stack (which is the reason [002789] was added in first place) into rdi and set V04 to be in rdi going forward. This gives us:

Generating: N001 (  1,  1) [002789] ----------z                 u2789 =    LCL_VAR   int    V04 loc0          edi REG edi
IN000b:        mov      edi, edx <-- can be eliminated??
IN000c:        mov      edi, dword ptr [V04 ebp-0x10]

(I think the first mov instruction can be eliminated if we check here that it will be anyway get overwritten from unspilling. But that's a separate topic.)

Next, when we generate code for [002518], we consume the [002516] and again generate the move from V04 to register associated with [002516] i.e. rdx <- rdi. When we finally go and spill this tree, we see the mismatch in registers associated with V04 (which is in rdi) and [002516] (which is in rdx).

                                                                        ┌──▌  t2516  int    
                                                                        ├──▌  t2517  int    
Generating: N075 (???,???) [002518] -----------                         ▌  SWITCH_TABLE void   REG NA
IN000d:        mov      edx, edi


IMO, the resolution should happen after the last use of V04 in block BB96. The last use is [002516] after which it gets spilled to the stack. With the resolution, it will get unspilled in rdi. Until then, both the V04 and [002516] will be present in rdx. So the code sequence will be:

mov      dword ptr [ebp-0x10], edx   ; spill of [002516]
mov      edi, dword ptr [ebp-0x10]   ; resolution unspill of [002789]
// jump table code

I will come up with a fix.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jan 30, 2024

the resolution should happen after the last use of V04 in block BB96. The last use is [002516] after which it gets spilled to the stack.

I guess the problem is we can't really make that happen, since the relative ordering of [002516] and [002789] does not matter -- [002516] is being handled as part of codegen of the SWITCH_TABLE, so only the ordering of [002789] and the SWITCH_TABLE matter (and they cannot be reordered).

One thing that might be insightful is to figure out how the same situation gets handled for BBJ_COND on arm64. On arm64 we have many BBJ_COND blocks with terminator statements that have locals in them due to the "branch if zero" instruction that exists there, so it seems likely the same situation occurs there without hitting the assert.

Initially I thought the handling was happening here:

// Next, if this blocks ends with a JCMP/JTEST/JTRUE, we have to make sure:
// 1. Not to copy into the register that JCMP/JTEST/JTRUE uses
// e.g. JCMP w21, BRANCH
// 2. Not to copy into the source of JCMP's operand before it is consumed
// e.g. Should not use w0 since it will contain wrong value after resolution
// call METHOD
// ; mov w0, w19 <-- should not resolve in w0 here.
// mov w21, w0
// JCMP w21, BRANCH
// 3. Not to modify the local variable it must consume
// Note: GT_COPY has special handling in codegen and its generation is merged with the
// node that consumes its result. So both, the input and output regs of GT_COPY must be
// excluded from the set available for resolution.
else if (block->KindIs(BBJ_COND))

The BBJ_SWITCH case above misses some of the handling around locals that the BBJ_COND case has. However, when I tried duplicating the logic for BBJ_SWITCH it didn't fix the problem.

@kunalspathak
Copy link
Contributor

so only the ordering of [002789] and the SWITCH_TABLE matter (and they cannot be reordered).

Exactly.

The BBJ_SWITCH case above misses some of the handling around locals that the BBJ_COND case ha

Yes, I was wondering about that too last night. I tried a prototype and that seems to solve the problem. I have sent out #97713 to see the results of superpmi-replay .

@ghost ghost closed this Feb 29, 2024
@ghost
Copy link

ghost commented Feb 29, 2024

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 31, 2024
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants