- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.2k
Check Avx512BW.IsSupported in BitArray.CopyTo() #114818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Tagging subscribers to this area: @dotnet/area-system-collections | 
| Vector128<byte> upperShuffleMask_CopyToBoolArray = Vector128.Create(0x02020202_02020202, 0x03030303_03030303).AsByte(); | ||
|  | ||
| if (Avx512F.IsSupported && (uint)m_length >= Vector512<byte>.Count) | ||
| if (Avx512BW.IsSupported && (uint)m_length >= Vector512<byte>.Count) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're touching this anyways, would you want to update this to use the xplat APIs instead?
That is, swap these out:
- Avx512BW.IsSupported->- Vector512.IsHardwareAccelerated
- Avx512BW.Shuffle(x, y)->- Vector512.Shuffle(x, y)
- Avx512F.And(x, y)->- x & y
- Avx512BW.Min(x, y)->- Vector512.Min(x, y)
- Avx512F.Store(x, y)->- y.Store(x)
Similar changes could also be made to the Avx2 path (using Vector256) and so on.
This change to Avx512BW is notably "more correct" than Avx512F, but notably it's not technically broken today since we require AVX512F+BW+CD+DQ+VL to be provided for any AVX512 support to exist. It's still goodness to fix, but moving away from the platform specific APIs altogether would be even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About Vector512.Shuffle(x, y)
runtime/src/libraries/System.Collections/src/System/Collections/BitArray.cs
Lines 928 to 930 in 1a51788
| // Same logic as SSSE3 path, except we do not have Shuffle instruction. | |
| // (TableVectorLookup could be an alternative - dotnet/runtime#1277) | |
| // Instead we use chained ZIP1/2 instructions: | 
It seems that Arm64 might not have the proper instruction to support
VectorXXX.Shuffle<byte>(x, y). Could it happen that VectorXXX.IsHardwareAccelerated is true but VectorXXX.Shuffle<byte>(x, y) has poor performance on some platforms?
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that Arm64 might not have the proper instruction to support VectorXXX.Shuffle(x, y). C
That comment is very outdated, VectorXXX.Shuffle covers the usage and does the correct thing.
Could it happen that VectorXXX.IsHardwareAccelerated is true but VectorXXX.Shuffle(x, y) has poor performance on some platforms?
Not for the way we're using it here. Theoretically it could happen if you were on a machine from 2005 and didn't have SSSE3 but hit the Vector128.IsHardwareAccelerated code path. This isn't really a concern for anyone using the current versions of .NET and could be adjusted later if it was a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, Vector256.IsHardwareAccelerated strictly implies AVX2 support, and Vector512.IsHardwareAccelerated strictly implies AVX-512F+CD+DQ+BW+VL.  Arm64 support for larger vectors comes in the form of Scalable Vector Extensions (SVE), which does not allow acceleration of the fixed-width vector types.
The Vector128 code in BitArray is outdated and could also use the xplat intrinsics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shuffle attempts to normalize behavior across platforms, including zeroing out of range indices and allowing full vector (cross-lane) shuffle/permute. If you don't need that behavior, you can use the new Vector512.ShuffleNative, which will emit the expected vpshufb.
vpermb is actually AVX512_VBMI, not just AVX-512F+CD+DQ+BW+VL
This is not a problem. The baseline ISA set is required for any acceleration of Vector512, but JIT can opportunistically use any ISA supported by the hardware. One of the benefits of the xplat vector APIs is that JIT can optimize or polyfill using whatever instructions are available.
      
        
              This comment was marked as resolved.
        
          
      
    
    This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're touching this anyways, would you want to update this to use the xplat APIs instead?
@tannergooding , should this PR be merged or closed in the interim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets take it, as it is technically correct and will ensure valid behavior in the case other runtimes use the same libraries and have their own distinct support. (There's no bug on RyuJIT today due to how we've defined things to work for ourselves)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logged #116079 to track the move to xplat APIs
| /ba-g unrelated build failures, simply updated an Isa.IsSupported check to be more correct | 
runtime/src/libraries/System.Collections/src/System/Collections/BitArray.cs
Line 855 in 7e297e6
runtime/src/libraries/System.Collections/src/System/Collections/BitArray.cs
Line 860 in 7e297e6
It's using
Avx512BW, so it should checkAvx512BW.IsSupportedfirst. CheckingAvx512F.IsSupportedis not enough, as there are (old) CPUs supporting Avx512F but not Avx512BW according to https://en.wikipedia.org/w/index.php?title=AVX-512&oldid=1281313848#CPUs_with_AVX-512.