Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Jul 18, 2024

Closes #104954

Currently, unboxing for Nullable<> always go through a helper call, while we can inline it in JIT.

int? Test(object o) => (int?)o;

Current codegen:

; Method Proga:Test(System.Object):System.Nullable`1[int]:this (FullOpts)
G_M11066_IG01:
       sub      rsp, 40
G_M11066_IG02:
       lea      rcx, [rsp+0x20]
       mov      r8, rdx
       mov      rdx, 0x7FF7E261B750      ; System.Nullable`1[int]
       call     CORINFO_HELP_UNBOX_NULLABLE
       mov      rax, qword ptr [rsp+0x20]
G_M11066_IG03:
       add      rsp, 40
       ret      
; Total bytes of code: 37

New codegen:

; Method Proga:Test(System.Object):System.Nullable`1[int]:this (FullOpts)
G_M11066_IG01:
       sub      rsp, 40
G_M11066_IG02:
       test     rdx, rdx ;; is null?
       je       SHORT G_M11066_IG04
G_M11066_IG03:
       mov      rax, 0x7FF7E23674D0      ; System.Int32
       cmp      qword ptr [rdx], rax ;; is object's type int32?
       je       SHORT G_M11066_IG05
       jmp      SHORT G_M11066_IG08
G_M11066_IG04:
       xor      edx, edx
       mov      qword ptr [rsp+0x20], rdx
       jmp      SHORT G_M11066_IG06
G_M11066_IG05:
       mov      byte  ptr [rsp+0x20], 1
       mov      eax, dword ptr [rdx+0x08]
       mov      dword ptr [rsp+0x24], eax
G_M11066_IG06:
       mov      rax, qword ptr [rsp+0x20]
G_M11066_IG07:
       add      rsp, 40
       ret      

; cold path
G_M11066_IG08:
       lea      rcx, [rsp+0x20]
       mov      r8, rdx
       mov      rdx, 0x7FF7E263B750      ; System.Nullable`1[int]
       call     CORINFO_HELP_UNBOX_NULLABLE
       jmp      SHORT G_M11066_IG06
; Total bytes of code: 82

Benchmark:

using BenchmarkDotNet.Attributes;

public class MyBench
{
    [Benchmark]
    [Arguments(42)]
    [Arguments(null)]
    public int? Unbox(object o) => (int?)o;
}
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor (which is "Ampere Altra")
  Job-MMNLPR : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-QXOQTU : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain o Mean Error Ratio
Unbox Main ? 7.1195 ns 0.0098 ns 1.00
Unbox PR ? 0.6698 ns 0.0072 ns 0.09
Unbox Main 42 10.9406 ns 0.0175 ns 1.00
Unbox PR 42 2.7002 ns 0.0001 ns 0.25

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 18, 2024
@EgorBo
Copy link
Member Author

EgorBo commented Jul 18, 2024

@EgorBot -intel -arm64

using BenchmarkDotNet.Attributes;

public class MyBench
{
    [Benchmark]
    [Arguments(42)]
    [Arguments(null)]
    public int? Unbox(object o) => (int?)o;
}

@huoyaoyuan
Copy link
Member

Does it worth to recognize RuntimeHelpers.Unbox_Nullable and do the same expansion?

Or, can we write the method in C# and make helpers inlinable? This will help more around nullables.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 18, 2024

Or, can we write the method in C# and make helpers inlinable? This will help more around nullables.

Yes, that is an option, requires some effort to make helpers inlineable, they're currently not. I decided to do it in JIT since the actual change is like 30 LOC, the rest are comments. If we move it to an inlineable helper we can delete this I guess..
I'll take a look at that next week probably.

@EgorBot
Copy link

EgorBot commented Jul 18, 2024

Benchmark results on Intel
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-XHSURK : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-ZDCIDK : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain o Mean Error Ratio
Unbox Main ? 4.4659 ns 0.0093 ns 1.00
Unbox PR ? 0.5740 ns 0.0003 ns 0.13
Unbox Main 42 7.1867 ns 0.0395 ns 1.00
Unbox PR 42 3.9114 ns 0.0016 ns 0.54

BDN_Artifacts.zip

@EgorBot
Copy link

EgorBot commented Jul 18, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-MMNLPR : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-QXOQTU : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain o Mean Error Ratio
Unbox Main ? 7.1195 ns 0.0098 ns 1.00
Unbox PR ? 0.6698 ns 0.0072 ns 0.09
Unbox Main 42 10.9406 ns 0.0175 ns 1.00
Unbox PR 42 2.7002 ns 0.0001 ns 0.25

BDN_Artifacts.zip

@EgorBo
Copy link
Member Author

EgorBo commented Jul 18, 2024

@MihuBot

@EgorBo
Copy link
Member Author

EgorBo commented Jul 18, 2024

@jakobbotsch @AndyAyersMS cc @dotnet/jit-contrib PTAL, the actual change is 40 LOC, benchmarks attached. Jit-diff: MihuBot/runtime-utils#541 obviously, a size regression, but there were also a few cases where jit can now fold the if we emit here, e.g. previously we never tried to fold CORINFO_HELP_UNBOX_NULLABLE(null) etc.

// Unbox the object to the result local:
//
// result._hasValue = true;
// result._value = MethodTableLookup(obj);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, wrong comment, it's obj->RawData

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth limiting this to relatively small types, since will create both a zeroing and a copy?

@EgorBo
Copy link
Member Author

EgorBo commented Jul 18, 2024

Might be a good idea to limit it, yeah

@EgorBot -intel -arm64

using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;

public class MyBench2
{
    [InlineArray(100)]
    public struct LargeStruct
    {
        public long Fld;
    }

    [Benchmark]
    [Arguments(42)]
    [Arguments(null)]
    public LargeStruct? Unbox(object o) => (LargeStruct?)o;
}

@dotnet dotnet deleted a comment from EgorBot Jul 18, 2024
@dotnet dotnet deleted a comment from EgorBot Jul 18, 2024
@EgorBot
Copy link

EgorBot commented Jul 18, 2024

Benchmark results on Intel
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-BUSFSY : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-NIMINM : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain o Mean Error Ratio
Unbox Main ? 27.91 ns 0.013 ns 1.00
Unbox PR ? 20.32 ns 0.010 ns 0.73
Unbox Main 42 NA NA ?
Unbox PR 42 NA NA ?
Benchmarks with issues:
MyBench2.Unbox: Job-BUSFSY(Toolchain=Main) [o=42]
MyBench2.Unbox: Job-NIMINM(Toolchain=PR) [o=42]

BDN_Artifacts.zip

@EgorBo
Copy link
Member Author

EgorBo commented Jul 18, 2024

@EgorBot -intel -arm64

using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;

public class MyBench2
{
    [InlineArray(100)]
    public struct LargeStruct
    {
        public long Fld;
    }

    public IEnumerable<object> TestData()
    {
        yield return null;
        yield return new LargeStruct();
    }

    [Benchmark]
    [ArgumentsSource(nameof(TestData))]
    public LargeStruct? Unbox(object o) => (LargeStruct?)o;
}

@EgorBot
Copy link

EgorBot commented Jul 18, 2024

Benchmark results on Intel
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-GNOQYR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-LOMSQK : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain o Mean Error Ratio
Unbox Main ? 31.62 ns 0.038 ns 1.00
Unbox PR ? 21.39 ns 0.013 ns 0.68
Unbox Main MyBench2+LargeStruct 29.85 ns 0.010 ns 1.00
Unbox PR MyBench2+LargeStruct 26.19 ns 0.005 ns 0.88

BDN_Artifacts.zip

@EgorBot
Copy link

EgorBot commented Jul 18, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-WWWDAN : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-YJHBTM : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain o Mean Error Ratio
Unbox Main ? 89.28 ns 0.041 ns 1.00
Unbox PR ? 39.46 ns 0.023 ns 0.44
Unbox Main MyBench2+LargeStruct 48.70 ns 0.052 ns 1.00
Unbox PR MyBench2+LargeStruct 41.85 ns 0.209 ns 0.86

BDN_Artifacts.zip

@dotnet dotnet deleted a comment from EgorBot Jul 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize unboxing for nullable

4 participants