Skip to content

Unsafe cast confuses JIT Inline Optimizer #47082

@gerhard17

Description

@gerhard17

Version Used: NET5

Not a hard bug - the suboptimal code is still logical correct.

Steps to Reproduce:

Compiler settings

  <OutputType>Exe</OutputType>
  <TargetFramework>net5</TargetFramework>
  <TieredCompilation>false</TieredCompilation>
  <Optimize>true</Optimize>

Assume following example (stripped down from real source code) which should load a struct with a kind of literal value.
In the Test() method, three variables x, y, z are initialized with 1.0, 2.0 and 1.0 again.

public struct MyStruct
{
	public ulong A0, A1, A2, A3;
	public ulong B0, B1, B2, B3;
}

public sealed class TestCase
{
	[MethodImpl(MethodImplOptions.AggressiveInlining)]
	public static void LoadInline(out MyStruct result, ulong value) {
		result = default;
		result.A0 = value;
	}

	[MethodImpl(MethodImplOptions.AggressiveInlining)]
	public static void LoadInline(out MyStruct result, double value) {
		// Branch is JIT inline removable, when value is a compile time constant
		if (value == 1.0) {
			LoadInline(out result, 1);
			return;
		}
		// Branch is JIT inline removable, when value is a compile time constant
		if (value == 2.0) {
			LoadInline(out result, 2);
			return;
		}
		// Use default Value
		LoadInline(out result, 0);
		// Strange behavior, when code is active
		// ulong ul = Unsafe.As<double, ulong>(ref value); // <-- LINE A
		// result.A1 = ul;                                 // <-- LINE B
	}

	[MethodImpl(MethodImplOptions.NoInlining)]
	public static void SomeUsage(in MyStruct result) {
		;
	}

	[MethodImpl(MethodImplOptions.NoInlining)]
	public static void Test() {
		LoadInline(out var x, 1.0);
		LoadInline(out var y, 2.0);
		LoadInline(out var z, 1.0);
		SomeUsage(x);
		SomeUsage(y);
		SomeUsage(z);
	}
}

Case 1: When you compile to a optimized release build following optimal code is generated for the three LoadInline() calls.
[SkipLocalsInit is NOT used, so at the begin of the Test() method local memory is already initialized]

			LoadInline(out var x, 1.0);
00007FFB079BAA0B  mov         dword ptr [rsp+0A8h],1  
			LoadInline(out var y, 2.0);
00007FFB079BAA16  mov         dword ptr [rsp+68h],2  
			LoadInline(out var z, 1.0);
00007FFB079BAA1E  mov         dword ptr [rsp+28h],1  

Case 2: When LINE A and LINE B are both activated, following sub optimal (but working) code is produced.
The second and third call perform a redundant zero out of the struct. Interestingly the first call is still optimal. Observe the fact that the method arguments in call 1 and call 3 are the same, but produce different code.

		LoadInline(out var x, 1.0);
00007FFB079CAA0E  mov         qword ptr [rsp+0A8h],1  
		LoadInline(out var y, 2.0);
00007FFB079CAA1A  vxorps      xmm0,xmm0,xmm0  
00007FFB079CAA1E  vmovdqu     xmmword ptr [rsp+68h],xmm0  
00007FFB079CAA24  vmovdqu     xmmword ptr [rsp+78h],xmm0  
00007FFB079CAA2A  vmovdqu     xmmword ptr [rsp+88h],xmm0  
00007FFB079CAA33  vmovdqu     xmmword ptr [rsp+98h],xmm0  
00007FFB079CAA3C  mov         qword ptr [rsp+68h],2  
		LoadInline(out var z, 1.0);
00007FFB079CAA45  vxorps      xmm0,xmm0,xmm0  
00007FFB079CAA49  vmovdqu     xmmword ptr [rsp+28h],xmm0  
00007FFB079CAA4F  vmovdqu     xmmword ptr [rsp+38h],xmm0  
00007FFB079CAA55  vmovdqu     xmmword ptr [rsp+48h],xmm0  
00007FFB079CAA5B  vmovdqu     xmmword ptr [rsp+58h],xmm0  
00007FFB079CAA61  mov         qword ptr [rsp+28h],1  

Case 3: Even more strange - but not my real world use case - the optimizer gets totaly confused, when only LINE A is active and LINE B is commented out. The code is still working, but obviously the JIT constant branch removal has totaly failed.

		LoadInline(out var x, 1.0);
00007FFB079EAA0E  vmovsd      xmm0,qword ptr [InlineStructInitializeBug.TestCase.Test()+0318h (07FFB079EACC8h)]  
00007FFB079EAA16  vmovsd      qword ptr [rsp+30h],xmm0  
00007FFB079EAA1C  vmovsd      xmm0,qword ptr [rsp+30h]  
00007FFB079EAA22  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0328h (07FFB079EACD8h)]  
00007FFB079EAA2A  jp          InlineStructInitializeBug.TestCase.Test()+0B7h (07FFB079EAA67h)  
00007FFB079EAA2C  jne         InlineStructInitializeBug.TestCase.Test()+0B7h (07FFB079EAA67h)  
00007FFB079EAA2E  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAA32  vmovdqu     xmmword ptr [rsp+0B8h],xmm0  
00007FFB079EAA3B  vmovdqu     xmmword ptr [rsp+0C8h],xmm0  
00007FFB079EAA44  vmovdqu     xmmword ptr [rsp+0D8h],xmm0  
00007FFB079EAA4D  vmovdqu     xmmword ptr [rsp+0E8h],xmm0  
00007FFB079EAA56  mov         qword ptr [rsp+0B8h],1  
00007FFB079EAA62  jmp         InlineStructInitializeBug.TestCase.Test()+0131h (07FFB079EAAE1h)  
00007FFB079EAA67  vmovsd      xmm0,qword ptr [rsp+30h]  
00007FFB079EAA6D  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0338h (07FFB079EACE8h)]  
00007FFB079EAA75  jp          InlineStructInitializeBug.TestCase.Test()+0FFh (07FFB079EAAAFh)  
00007FFB079EAA77  jne         InlineStructInitializeBug.TestCase.Test()+0FFh (07FFB079EAAAFh)  
00007FFB079EAA79  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAA7D  vmovdqu     xmmword ptr [rsp+0B8h],xmm0  
00007FFB079EAA86  vmovdqu     xmmword ptr [rsp+0C8h],xmm0  
00007FFB079EAA8F  vmovdqu     xmmword ptr [rsp+0D8h],xmm0  
00007FFB079EAA98  vmovdqu     xmmword ptr [rsp+0E8h],xmm0  
00007FFB079EAAA1  mov         qword ptr [rsp+0B8h],2  
00007FFB079EAAAD  jmp         InlineStructInitializeBug.TestCase.Test()+0131h (07FFB079EAAE1h)  
00007FFB079EAAAF  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAAB3  vmovdqu     xmmword ptr [rsp+0B8h],xmm0  
00007FFB079EAABC  vmovdqu     xmmword ptr [rsp+0C8h],xmm0  
00007FFB079EAAC5  vmovdqu     xmmword ptr [rsp+0D8h],xmm0  
00007FFB079EAACE  vmovdqu     xmmword ptr [rsp+0E8h],xmm0  
00007FFB079EAAD7  xor         ecx,ecx  
00007FFB079EAAD9  mov         qword ptr [rsp+0B8h],rcx  
		LoadInline(out var y, 2.0);
00007FFB079EAAE1  vmovsd      xmm0,qword ptr [InlineStructInitializeBug.TestCase.Test()+0348h (07FFB079EACF8h)]  
00007FFB079EAAE9  vmovsd      qword ptr [rsp+28h],xmm0  
00007FFB079EAAEF  vmovsd      xmm0,qword ptr [rsp+28h]  
00007FFB079EAAF5  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0358h (07FFB079EAD08h)]  
00007FFB079EAAFD  jp          InlineStructInitializeBug.TestCase.Test()+0181h (07FFB079EAB31h)  
00007FFB079EAAFF  jne         InlineStructInitializeBug.TestCase.Test()+0181h (07FFB079EAB31h)  
00007FFB079EAB01  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAB05  vmovdqu     xmmword ptr [rsp+78h],xmm0  
00007FFB079EAB0B  vmovdqu     xmmword ptr [rsp+88h],xmm0  
00007FFB079EAB14  vmovdqu     xmmword ptr [rsp+98h],xmm0  
00007FFB079EAB1D  vmovdqu     xmmword ptr [rsp+0A8h],xmm0  
00007FFB079EAB26  mov         qword ptr [rsp+78h],1  
00007FFB079EAB2F  jmp         InlineStructInitializeBug.TestCase.Test()+01EFh (07FFB079EAB9Fh)  
00007FFB079EAB31  vmovsd      xmm0,qword ptr [rsp+28h]  
00007FFB079EAB37  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0368h (07FFB079EAD18h)]  
00007FFB079EAB3F  jp          InlineStructInitializeBug.TestCase.Test()+01C3h (07FFB079EAB73h)  
00007FFB079EAB41  jne         InlineStructInitializeBug.TestCase.Test()+01C3h (07FFB079EAB73h)  
00007FFB079EAB43  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAB47  vmovdqu     xmmword ptr [rsp+78h],xmm0  
00007FFB079EAB4D  vmovdqu     xmmword ptr [rsp+88h],xmm0  
00007FFB079EAB56  vmovdqu     xmmword ptr [rsp+98h],xmm0  
00007FFB079EAB5F  vmovdqu     xmmword ptr [rsp+0A8h],xmm0  
00007FFB079EAB68  mov         qword ptr [rsp+78h],2  
00007FFB079EAB71  jmp         InlineStructInitializeBug.TestCase.Test()+01EFh (07FFB079EAB9Fh)  
00007FFB079EAB73  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAB77  vmovdqu     xmmword ptr [rsp+78h],xmm0  
00007FFB079EAB7D  vmovdqu     xmmword ptr [rsp+88h],xmm0  
00007FFB079EAB86  vmovdqu     xmmword ptr [rsp+98h],xmm0  
00007FFB079EAB8F  vmovdqu     xmmword ptr [rsp+0A8h],xmm0  
00007FFB079EAB98  xor         ecx,ecx  
00007FFB079EAB9A  mov         qword ptr [rsp+78h],rcx  
		LoadInline(out var z, 1.0);
00007FFB079EAB9F  vmovsd      xmm0,qword ptr [InlineStructInitializeBug.TestCase.Test()+0378h (07FFB079EAD28h)]  
00007FFB079EABA7  vmovsd      qword ptr [rsp+20h],xmm0  
00007FFB079EABAD  vmovsd      xmm0,qword ptr [rsp+20h]  
00007FFB079EABB3  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0388h (07FFB079EAD38h)]  
00007FFB079EABBB  jp          InlineStructInitializeBug.TestCase.Test()+0236h (07FFB079EABE6h)  
00007FFB079EABBD  jne         InlineStructInitializeBug.TestCase.Test()+0236h (07FFB079EABE6h)  
00007FFB079EABBF  vxorps      xmm0,xmm0,xmm0  
00007FFB079EABC3  vmovdqu     xmmword ptr [rsp+38h],xmm0  
00007FFB079EABC9  vmovdqu     xmmword ptr [rsp+48h],xmm0  
00007FFB079EABCF  vmovdqu     xmmword ptr [rsp+58h],xmm0  
00007FFB079EABD5  vmovdqu     xmmword ptr [rsp+68h],xmm0  
00007FFB079EABDB  mov         qword ptr [rsp+38h],1  
00007FFB079EABE4  jmp         InlineStructInitializeBug.TestCase.Test()+0292h (07FFB079EAC42h)  
00007FFB079EABE6  vmovsd      xmm0,qword ptr [rsp+20h]  
00007FFB079EABEC  vucomisd    xmm0,mmword ptr [InlineStructInitializeBug.TestCase.Test()+0398h (07FFB079EAD48h)]  
00007FFB079EABF4  jp          InlineStructInitializeBug.TestCase.Test()+026Fh (07FFB079EAC1Fh)  
00007FFB079EABF6  jne         InlineStructInitializeBug.TestCase.Test()+026Fh (07FFB079EAC1Fh)  
00007FFB079EABF8  vxorps      xmm0,xmm0,xmm0  
00007FFB079EABFC  vmovdqu     xmmword ptr [rsp+38h],xmm0  
00007FFB079EAC02  vmovdqu     xmmword ptr [rsp+48h],xmm0  
00007FFB079EAC08  vmovdqu     xmmword ptr [rsp+58h],xmm0  
00007FFB079EAC0E  vmovdqu     xmmword ptr [rsp+68h],xmm0  
00007FFB079EAC14  mov         qword ptr [rsp+38h],2  
00007FFB079EAC1D  jmp         InlineStructInitializeBug.TestCase.Test()+0292h (07FFB079EAC42h)  
00007FFB079EAC1F  vxorps      xmm0,xmm0,xmm0  
00007FFB079EAC23  vmovdqu     xmmword ptr [rsp+38h],xmm0  
00007FFB079EAC29  vmovdqu     xmmword ptr [rsp+48h],xmm0  
00007FFB079EAC2F  vmovdqu     xmmword ptr [rsp+58h],xmm0  
00007FFB079EAC35  vmovdqu     xmmword ptr [rsp+68h],xmm0  
00007FFB079EAC3B  xor         ecx,ecx  
00007FFB079EAC3D  mov         qword ptr [rsp+38h],rcx  

Expected Behavior:

All three variants should produce the same optimal code.

			LoadInline(out var x, 1.0);
00007FFB079BAA0B  mov         dword ptr [rsp+0A8h],1  
			LoadInline(out var y, 2.0);
00007FFB079BAA16  mov         dword ptr [rsp+68h],2  
			LoadInline(out var z, 1.0);
00007FFB079BAA1E  mov         dword ptr [rsp+28h],1  

Actual Behavior:

Case 1: The optimal code :-)
Case 2: For large enough structs the code and runtime behavior is very bad and seems unsymetric between the same method calls.
Case 3: This is not a real use case because a senseless Unsafe cast is made without any usage. But no warning is raised, when you accidentally compile such a source code.

Off Topic:
It would be great - especially for my use case - when a method argument could be restricted to a compile time constant.

category:cq
theme:inlining
skill-level:intermediate
cost:medium
impact:small

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions