-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Version Used: NET5
Not a hard bug - the suboptimal code is still logical correct.
Steps to Reproduce:
Compiler settings
<OutputType>Exe</OutputType>
<TargetFramework>net5</TargetFramework>
<TieredCompilation>false</TieredCompilation>
<Optimize>true</Optimize>
Assume following example (stripped down from real source code) which should load a struct with a kind of literal value.
In the Test() method, three variables x, y, z are initialized with 1.0, 2.0 and 1.0 again.
public struct MyStruct
{
public ulong A0, A1, A2, A3;
public ulong B0, B1, B2, B3;
}
public sealed class TestCase
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void LoadInline(out MyStruct result, ulong value) {
result = default;
result.A0 = value;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void LoadInline(out MyStruct result, double value) {
// Branch is JIT inline removable, when value is a compile time constant
if (value == 1.0) {
LoadInline(out result, 1);
return;
}
// Branch is JIT inline removable, when value is a compile time constant
if (value == 2.0) {
LoadInline(out result, 2);
return;
}
// Use default Value
LoadInline(out result, 0);
// Strange behavior, when code is active
// ulong ul = Unsafe.As<double, ulong>(ref value); // <-- LINE A
// result.A1 = ul; // <-- LINE B
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static void SomeUsage(in MyStruct result) {
;
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static void Test() {
LoadInline(out var x, 1.0);
LoadInline(out var y, 2.0);
LoadInline(out var z, 1.0);
SomeUsage(x);
SomeUsage(y);
SomeUsage(z);
}
}
Case 1: When you compile to a optimized release build following optimal code is generated for the three LoadInline() calls.
[SkipLocalsInit is NOT used, so at the begin of the Test() method local memory is already initialized]
LoadInline(out var x, 1.0);
00007FFB079BAA0B mov dword ptr [rsp+0A8h],1
LoadInline(out var y, 2.0);
00007FFB079BAA16 mov dword ptr [rsp+68h],2
LoadInline(out var z, 1.0);
00007FFB079BAA1E mov dword ptr [rsp+28h],1
Case 2: When LINE A and LINE B are both activated, following sub optimal (but working) code is produced.
The second and third call perform a redundant zero out of the struct. Interestingly the first call is still optimal. Observe the fact that the method arguments in call 1 and call 3 are the same, but produce different code.
LoadInline(out var x, 1.0);
00007FFB079CAA0E mov qword ptr [rsp+0A8h],1
LoadInline(out var y, 2.0);
00007FFB079CAA1A vxorps xmm0,xmm0,xmm0
00007FFB079CAA1E vmovdqu xmmword ptr [rsp+68h],xmm0
00007FFB079CAA24 vmovdqu xmmword ptr [rsp+78h],xmm0
00007FFB079CAA2A vmovdqu xmmword ptr [rsp+88h],xmm0
00007FFB079CAA33 vmovdqu xmmword ptr [rsp+98h],xmm0
00007FFB079CAA3C mov qword ptr [rsp+68h],2
LoadInline(out var z, 1.0);
00007FFB079CAA45 vxorps xmm0,xmm0,xmm0
00007FFB079CAA49 vmovdqu xmmword ptr [rsp+28h],xmm0
00007FFB079CAA4F vmovdqu xmmword ptr [rsp+38h],xmm0
00007FFB079CAA55 vmovdqu xmmword ptr [rsp+48h],xmm0
00007FFB079CAA5B vmovdqu xmmword ptr [rsp+58h],xmm0
00007FFB079CAA61 mov qword ptr [rsp+28h],1
Case 3: Even more strange - but not my real world use case - the optimizer gets totaly confused, when only LINE A is active and LINE B is commented out. The code is still working, but obviously the JIT constant branch removal has totaly failed.
LoadInline(out var x, 1.0);
00007FFB079EAA0E vmovsd xmm0,qword ptr [InlineStructInitializeBug.TestCase.Test()+0318h (07FFB079EACC8h)]
00007FFB079EAA16 vmovsd qword ptr [rsp+30h],xmm0
00007FFB079EAA1C vmovsd xmm0,qword ptr [rsp+30h]
00007FFB079EAA22 vucomisd xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0328h (07FFB079EACD8h)]
00007FFB079EAA2A jp InlineStructInitializeBug.TestCase.Test()+0B7h (07FFB079EAA67h)
00007FFB079EAA2C jne InlineStructInitializeBug.TestCase.Test()+0B7h (07FFB079EAA67h)
00007FFB079EAA2E vxorps xmm0,xmm0,xmm0
00007FFB079EAA32 vmovdqu xmmword ptr [rsp+0B8h],xmm0
00007FFB079EAA3B vmovdqu xmmword ptr [rsp+0C8h],xmm0
00007FFB079EAA44 vmovdqu xmmword ptr [rsp+0D8h],xmm0
00007FFB079EAA4D vmovdqu xmmword ptr [rsp+0E8h],xmm0
00007FFB079EAA56 mov qword ptr [rsp+0B8h],1
00007FFB079EAA62 jmp InlineStructInitializeBug.TestCase.Test()+0131h (07FFB079EAAE1h)
00007FFB079EAA67 vmovsd xmm0,qword ptr [rsp+30h]
00007FFB079EAA6D vucomisd xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0338h (07FFB079EACE8h)]
00007FFB079EAA75 jp InlineStructInitializeBug.TestCase.Test()+0FFh (07FFB079EAAAFh)
00007FFB079EAA77 jne InlineStructInitializeBug.TestCase.Test()+0FFh (07FFB079EAAAFh)
00007FFB079EAA79 vxorps xmm0,xmm0,xmm0
00007FFB079EAA7D vmovdqu xmmword ptr [rsp+0B8h],xmm0
00007FFB079EAA86 vmovdqu xmmword ptr [rsp+0C8h],xmm0
00007FFB079EAA8F vmovdqu xmmword ptr [rsp+0D8h],xmm0
00007FFB079EAA98 vmovdqu xmmword ptr [rsp+0E8h],xmm0
00007FFB079EAAA1 mov qword ptr [rsp+0B8h],2
00007FFB079EAAAD jmp InlineStructInitializeBug.TestCase.Test()+0131h (07FFB079EAAE1h)
00007FFB079EAAAF vxorps xmm0,xmm0,xmm0
00007FFB079EAAB3 vmovdqu xmmword ptr [rsp+0B8h],xmm0
00007FFB079EAABC vmovdqu xmmword ptr [rsp+0C8h],xmm0
00007FFB079EAAC5 vmovdqu xmmword ptr [rsp+0D8h],xmm0
00007FFB079EAACE vmovdqu xmmword ptr [rsp+0E8h],xmm0
00007FFB079EAAD7 xor ecx,ecx
00007FFB079EAAD9 mov qword ptr [rsp+0B8h],rcx
LoadInline(out var y, 2.0);
00007FFB079EAAE1 vmovsd xmm0,qword ptr [InlineStructInitializeBug.TestCase.Test()+0348h (07FFB079EACF8h)]
00007FFB079EAAE9 vmovsd qword ptr [rsp+28h],xmm0
00007FFB079EAAEF vmovsd xmm0,qword ptr [rsp+28h]
00007FFB079EAAF5 vucomisd xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0358h (07FFB079EAD08h)]
00007FFB079EAAFD jp InlineStructInitializeBug.TestCase.Test()+0181h (07FFB079EAB31h)
00007FFB079EAAFF jne InlineStructInitializeBug.TestCase.Test()+0181h (07FFB079EAB31h)
00007FFB079EAB01 vxorps xmm0,xmm0,xmm0
00007FFB079EAB05 vmovdqu xmmword ptr [rsp+78h],xmm0
00007FFB079EAB0B vmovdqu xmmword ptr [rsp+88h],xmm0
00007FFB079EAB14 vmovdqu xmmword ptr [rsp+98h],xmm0
00007FFB079EAB1D vmovdqu xmmword ptr [rsp+0A8h],xmm0
00007FFB079EAB26 mov qword ptr [rsp+78h],1
00007FFB079EAB2F jmp InlineStructInitializeBug.TestCase.Test()+01EFh (07FFB079EAB9Fh)
00007FFB079EAB31 vmovsd xmm0,qword ptr [rsp+28h]
00007FFB079EAB37 vucomisd xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0368h (07FFB079EAD18h)]
00007FFB079EAB3F jp InlineStructInitializeBug.TestCase.Test()+01C3h (07FFB079EAB73h)
00007FFB079EAB41 jne InlineStructInitializeBug.TestCase.Test()+01C3h (07FFB079EAB73h)
00007FFB079EAB43 vxorps xmm0,xmm0,xmm0
00007FFB079EAB47 vmovdqu xmmword ptr [rsp+78h],xmm0
00007FFB079EAB4D vmovdqu xmmword ptr [rsp+88h],xmm0
00007FFB079EAB56 vmovdqu xmmword ptr [rsp+98h],xmm0
00007FFB079EAB5F vmovdqu xmmword ptr [rsp+0A8h],xmm0
00007FFB079EAB68 mov qword ptr [rsp+78h],2
00007FFB079EAB71 jmp InlineStructInitializeBug.TestCase.Test()+01EFh (07FFB079EAB9Fh)
00007FFB079EAB73 vxorps xmm0,xmm0,xmm0
00007FFB079EAB77 vmovdqu xmmword ptr [rsp+78h],xmm0
00007FFB079EAB7D vmovdqu xmmword ptr [rsp+88h],xmm0
00007FFB079EAB86 vmovdqu xmmword ptr [rsp+98h],xmm0
00007FFB079EAB8F vmovdqu xmmword ptr [rsp+0A8h],xmm0
00007FFB079EAB98 xor ecx,ecx
00007FFB079EAB9A mov qword ptr [rsp+78h],rcx
LoadInline(out var z, 1.0);
00007FFB079EAB9F vmovsd xmm0,qword ptr [InlineStructInitializeBug.TestCase.Test()+0378h (07FFB079EAD28h)]
00007FFB079EABA7 vmovsd qword ptr [rsp+20h],xmm0
00007FFB079EABAD vmovsd xmm0,qword ptr [rsp+20h]
00007FFB079EABB3 vucomisd xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0388h (07FFB079EAD38h)]
00007FFB079EABBB jp InlineStructInitializeBug.TestCase.Test()+0236h (07FFB079EABE6h)
00007FFB079EABBD jne InlineStructInitializeBug.TestCase.Test()+0236h (07FFB079EABE6h)
00007FFB079EABBF vxorps xmm0,xmm0,xmm0
00007FFB079EABC3 vmovdqu xmmword ptr [rsp+38h],xmm0
00007FFB079EABC9 vmovdqu xmmword ptr [rsp+48h],xmm0
00007FFB079EABCF vmovdqu xmmword ptr [rsp+58h],xmm0
00007FFB079EABD5 vmovdqu xmmword ptr [rsp+68h],xmm0
00007FFB079EABDB mov qword ptr [rsp+38h],1
00007FFB079EABE4 jmp InlineStructInitializeBug.TestCase.Test()+0292h (07FFB079EAC42h)
00007FFB079EABE6 vmovsd xmm0,qword ptr [rsp+20h]
00007FFB079EABEC vucomisd xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0398h (07FFB079EAD48h)]
00007FFB079EABF4 jp InlineStructInitializeBug.TestCase.Test()+026Fh (07FFB079EAC1Fh)
00007FFB079EABF6 jne InlineStructInitializeBug.TestCase.Test()+026Fh (07FFB079EAC1Fh)
00007FFB079EABF8 vxorps xmm0,xmm0,xmm0
00007FFB079EABFC vmovdqu xmmword ptr [rsp+38h],xmm0
00007FFB079EAC02 vmovdqu xmmword ptr [rsp+48h],xmm0
00007FFB079EAC08 vmovdqu xmmword ptr [rsp+58h],xmm0
00007FFB079EAC0E vmovdqu xmmword ptr [rsp+68h],xmm0
00007FFB079EAC14 mov qword ptr [rsp+38h],2
00007FFB079EAC1D jmp InlineStructInitializeBug.TestCase.Test()+0292h (07FFB079EAC42h)
00007FFB079EAC1F vxorps xmm0,xmm0,xmm0
00007FFB079EAC23 vmovdqu xmmword ptr [rsp+38h],xmm0
00007FFB079EAC29 vmovdqu xmmword ptr [rsp+48h],xmm0
00007FFB079EAC2F vmovdqu xmmword ptr [rsp+58h],xmm0
00007FFB079EAC35 vmovdqu xmmword ptr [rsp+68h],xmm0
00007FFB079EAC3B xor ecx,ecx
00007FFB079EAC3D mov qword ptr [rsp+38h],rcx
Expected Behavior:
All three variants should produce the same optimal code.
LoadInline(out var x, 1.0);
00007FFB079BAA0B mov dword ptr [rsp+0A8h],1
LoadInline(out var y, 2.0);
00007FFB079BAA16 mov dword ptr [rsp+68h],2
LoadInline(out var z, 1.0);
00007FFB079BAA1E mov dword ptr [rsp+28h],1
Actual Behavior:
Case 1: The optimal code :-)
Case 2: For large enough structs the code and runtime behavior is very bad and seems unsymetric between the same method calls.
Case 3: This is not a real use case because a senseless Unsafe cast is made without any usage. But no warning is raised, when you accidentally compile such a source code.
Off Topic:
It would be great - especially for my use case - when a method argument could be restricted to a compile time constant.
category:cq
theme:inlining
skill-level:intermediate
cost:medium
impact:small