Skip to content

Conversation

@xtqqczze
Copy link
Contributor

@xtqqczze xtqqczze commented Apr 7, 2024

Follow-up to #99835.

Diffs

Diffs looks mostly positive although there are some large code size regressions due to enablement of more inlining.

cc: @stephentoub

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/interop-contrib
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member

EgorBo commented Apr 8, 2024

Didn't we discuss that this fast path makes no sense? in #99835 (comment)

@jakobbotsch
Copy link
Member

jakobbotsch commented Apr 8, 2024

Just echoing what @EgorBo said, due to that stall mentioned there, it's not clearly a win even if the codegen "looks" better. For example:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

BenchmarkSwitcher.FromAssembly(typeof(Benchmark).Assembly).Run(args);

public class Benchmark
{
    private int[] _values = Enumerable.Range(0, 10000).Select(_ => Random.Shared.Next(2)).ToArray();

    [Benchmark]
    public void BenchBase()
    {
        Span<int> s;
        for (int i = 0; i < 10000; i++)
        {
            TestBase(_values.AsSpan(0, i), out s);
        }
    }

    [Benchmark]
    public void BenchDiff()
    {
        Span<int> s;
        for (int i = 0; i < 10000; i++)
        {
            TestDiff(_values.AsSpan(0, i), out s);
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void TestBase<T>(Span<T> s, out Span<int> s2) where T : struct
    {
        s2 = CastBase<T, int>(s);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void TestDiff<T>(Span<T> s, out Span<int> s2) where T : struct
    {
        s2 = CastDiff<T, int>(s);
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static unsafe Span<TTo> CastDiff<TFrom, TTo>(Span<TFrom> span)
        where TFrom : struct
        where TTo : struct
    {
        if (typeof(TFrom) == typeof(TTo))
            return *(Span<TTo>*)&span;

        // Use unsigned integers - unsigned division by constant (especially by power of 2)
        // and checked casts are faster and smaller.
        uint fromSize = (uint)Unsafe.SizeOf<TFrom>();
        uint toSize = (uint)Unsafe.SizeOf<TTo>();
        uint fromLength = (uint)span.Length;
        int toLength;
        if (fromSize == toSize)
        {
            // Special case for same size types - `(ulong)fromLength * (ulong)fromSize / (ulong)toSize`
            // should be optimized to just `length` but the JIT doesn't do that today.
            toLength = (int)fromLength;
        }
        else if (fromSize == 1)
        {
            // Special case for byte sized TFrom - `(ulong)fromLength * (ulong)fromSize / (ulong)toSize`
            // becomes `(ulong)fromLength / (ulong)toSize` but the JIT can't narrow it down to `int`
            // and can't eliminate the checked cast. This also avoids a 32 bit specific issue,
            // the JIT can't eliminate long multiply by 1.
            toLength = (int)(fromLength / toSize);
        }
        else
        {
            // Ensure that casts are done in such a way that the JIT is able to "see"
            // the uint->ulong casts and the multiply together so that on 32 bit targets
            // 32x32to64 multiplication is used.
            ulong toLengthUInt64 = (ulong)fromLength * (ulong)fromSize / (ulong)toSize;
            toLength = checked((int)toLengthUInt64);
        }

        return MemoryMarshal.CreateSpan(ref Unsafe.As<TFrom, TTo>(ref MemoryMarshal.GetReference(span)), toLength);
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static unsafe Span<TTo> CastBase<TFrom, TTo>(Span<TFrom> span)
        where TFrom : struct
        where TTo : struct
    {
        // Use unsigned integers - unsigned division by constant (especially by power of 2)
        // and checked casts are faster and smaller.
        uint fromSize = (uint)Unsafe.SizeOf<TFrom>();
        uint toSize = (uint)Unsafe.SizeOf<TTo>();
        uint fromLength = (uint)span.Length;
        int toLength;
        if (fromSize == toSize)
        {
            // Special case for same size types - `(ulong)fromLength * (ulong)fromSize / (ulong)toSize`
            // should be optimized to just `length` but the JIT doesn't do that today.
            toLength = (int)fromLength;
        }
        else if (fromSize == 1)
        {
            // Special case for byte sized TFrom - `(ulong)fromLength * (ulong)fromSize / (ulong)toSize`
            // becomes `(ulong)fromLength / (ulong)toSize` but the JIT can't narrow it down to `int`
            // and can't eliminate the checked cast. This also avoids a 32 bit specific issue,
            // the JIT can't eliminate long multiply by 1.
            toLength = (int)(fromLength / toSize);
        }
        else
        {
            // Ensure that casts are done in such a way that the JIT is able to "see"
            // the uint->ulong casts and the multiply together so that on 32 bit targets
            // 32x32to64 multiplication is used.
            ulong toLengthUInt64 = (ulong)fromLength * (ulong)fromSize / (ulong)toSize;
            toLength = checked((int)toLengthUInt64);
        }

        return MemoryMarshal.CreateSpan(ref Unsafe.As<TFrom, TTo>(ref MemoryMarshal.GetReference(span)), toLength);
    }
}
BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.4170/22H2/2022Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=8.0.203
  [Host]     : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX2
Method Mean Error StdDev
BenchBase 12.96 us 0.109 us 0.102 us
BenchDiff 45.36 us 0.146 us 0.122 us

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Apr 8, 2024

@jakobbotsch Thanks for the BenchmarkDotNet example, is this issue specific to XArch?

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4 (23E214) [Darwin 23.4.0]
Apple M2, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.203
  [Host]     : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  Job-UYGVGG : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Runtime=.NET 8.0  Toolchain=net80  
Method Mean Error StdDev
BenchBase 9.612 us 0.0354 us 0.0314 us
BenchDiff 9.627 us 0.0495 us 0.0463 us

if (RuntimeHelpers.IsReferenceOrContainsReferences<TTo>())
ThrowHelper.ThrowInvalidTypeWithPointersNotSupported(typeof(TTo));

if (typeof(TFrom) == typeof(TTo))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would be a regression for shared generic code. TFrom and TTo may require generic dictionary lookups.

The identical types should be covered by if (fromSize == toSize) path below. Is there anything preventing the JIT from generating the optimal code for that path?

@github-actions github-actions bot locked and limited conversation to collaborators May 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Runtime.InteropServices community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants