Skip to content

Conversation

@JimBobSquarePants
Copy link
Member

@JimBobSquarePants JimBobSquarePants commented Oct 21, 2020

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

Replaces the SimdUtilsAxv2Intrinsics with SimdUtils.HwIntrinsics containing Avx2 and Sse2 implementations of ByteToNormalizedFloatReduce and NormalizedFloatToByteSaturateReduce. This gives us fully accelerated conversion in both directions.

Benchmarks

Warmup and iteration count is probably a little low for these as demonstrated by the differences between PixelOperations_Specialized and HwIntrinsics. One calls the other!

ToVector4_Rgba32

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  Job-EFOYPA : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

Runtime=.NET Core 3.1  IterationCount=3  LaunchCount=1
WarmupCount=3
Method Count Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
FallbackIntrinsics128 64 153.00 ns 9.491 ns 0.520 ns 1.34 0.01 - - - -
PixelOperations_Base 64 206.14 ns 4.369 ns 0.240 ns 1.81 0.01 0.0057 - - 24 B
BasicIntrinsics256 64 113.87 ns 6.391 ns 0.350 ns 1.00 0.00 - - - -
ExtendedIntrinsics 64 53.61 ns 8.274 ns 0.454 ns 0.47 0.00 - - - -
HwIntrinsics 64 44.90 ns 14.730 ns 0.807 ns 0.39 0.01 - - - -
PixelOperations_Specialized 64 51.34 ns 23.252 ns 1.275 ns 0.45 0.01 - - - -
FallbackIntrinsics128 256 547.30 ns 21.743 ns 1.192 ns 1.34 0.00 - - - -
PixelOperations_Base 256 757.21 ns 157.017 ns 8.607 ns 1.86 0.03 0.0057 - - 24 B
BasicIntrinsics256 256 407.99 ns 37.214 ns 2.040 ns 1.00 0.00 - - - -
ExtendedIntrinsics 256 134.77 ns 52.013 ns 2.851 ns 0.33 0.01 - - - -
HwIntrinsics 256 94.80 ns 11.826 ns 0.648 ns 0.23 0.00 - - - -
PixelOperations_Specialized 256 100.59 ns 6.056 ns 0.332 ns 0.25 0.00 - - - -
FallbackIntrinsics128 2048 4,776.34 ns 14,446.835 ns 791.880 ns 1.31 0.22 - - - -
PixelOperations_Base 2048 5,735.58 ns 445.746 ns 24.433 ns 1.58 0.01 - - - 24 B
BasicIntrinsics256 2048 3,635.09 ns 125.081 ns 6.856 ns 1.00 0.00 - - - -
ExtendedIntrinsics 2048 993.15 ns 205.277 ns 11.252 ns 0.27 0.00 - - - -
HwIntrinsics 2048 810.04 ns 59.218 ns 3.246 ns 0.22 0.00 - - - -
PixelOperations_Specialized 2048 729.54 ns 208.887 ns 11.450 ns 0.20 0.00 - - - -

FromVector4_Rgba32

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  Job-YIYDDW : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

Runtime=.NET Core 3.1  IterationCount=3  LaunchCount=1
WarmupCount=3
Method Count Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
FallbackIntrinsics128 1024 3,261.6 ns 4,313.76 ns 236.45 ns 3.80 0.28 - - - -
BasicIntrinsics256 1024 1,544.9 ns 85.07 ns 4.66 ns 1.80 0.00 - - - -
ExtendedIntrinsic 1024 858.2 ns 17.81 ns 0.98 ns 1.00 0.00 - - - -
UseHwIntrinsics 1024 330.5 ns 429.31 ns 23.53 ns 0.39 0.03 - - - -
UseAvx2_Grouped 1024 290.4 ns 70.29 ns 3.85 ns 0.34 0.00 - - - -
PixelOperations_Base 1024 3,787.9 ns 1,040.74 ns 57.05 ns 4.41 0.07 - - - 24 B
PixelOperations_Specialized 1024 289.8 ns 41.91 ns 2.30 ns 0.34 0.00 - - - -

@codecov
Copy link

codecov bot commented Oct 21, 2020

Codecov Report

Merging #1398 into master will increase coverage by 0.00%.
The diff coverage is 95.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1398   +/-   ##
=======================================
  Coverage   82.88%   82.88%           
=======================================
  Files         690      690           
  Lines       30903    30985   +82     
  Branches     3544     3554   +10     
=======================================
+ Hits        25614    25683   +69     
- Misses       4570     4580   +10     
- Partials      719      722    +3     
Flag Coverage Δ
#unittests 82.88% <95.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...mageSharp/Common/Helpers/SimdUtils.HwIntrinsics.cs 94.91% <94.91%> (ø)
src/ImageSharp/Common/Helpers/SimdUtils.cs 65.90% <100.00%> (ø)
...arp/Common/Helpers/SimdUtils.ExtendedIntrinsics.cs 72.60% <0.00%> (-9.59%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77a59ee...7de0ee3. Read the comment docs.

@JimBobSquarePants
Copy link
Member Author

@SixLabors/core I'd really like to get this merged now as I want to reuse PermuteMaskDeinterleave8x32.

@JimBobSquarePants JimBobSquarePants requested a review from a team October 23, 2020 10:02
Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is the broken RemoteExecutor run, but the E2E results are fine, so it should be OK.

}

/// <summary>
/// Implementation <see cref="SimdUtils.ByteToNormalizedFloat"/>, which is faster on new RyuJIT runtime.
Copy link
Member

@antonfirsov antonfirsov Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This summary got outdated by today I guess.

@JimBobSquarePants JimBobSquarePants merged commit b577d8e into master Oct 23, 2020
@JimBobSquarePants JimBobSquarePants deleted the js/SimdUtils branch October 23, 2020 10:40
JimBobSquarePants added a commit that referenced this pull request Mar 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants