- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.2k
          Apply hardware intrinsics to BitArray.*Shift
          #113299
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Hi @kzrnm, apologies for the delay in reviewing this. Is there a chance you could rebase your changes on top of the latest main? I don't have write permissions in your PR branch to do it myself. | 
| I updated to match the latest implementation. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the BitArray.LeftShift and BitArray.RightShift methods by leveraging hardware intrinsics (Vector512, Vector256, Vector128) to achieve significant performance improvements. The benchmark results show 3-6x speedup across different array sizes.
Key Changes:
- Replaced int-based shifting withbyte-based operations using SIMD vectors
- Introduced vectorized processing for bulk shift operations with fallback to scalar code
- Added comprehensive test coverage for various bit array sizes and shift amounts
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description | 
|---|---|
| BitArray.cs | Refactored RightShiftandLeftShiftto use byte-level operations with SIMD intrinsics (Vector512/256/128), replacing the previous int-based implementation | 
| BitArray_OperatorsTests.cs | Enhanced test coverage with additional test cases covering various sizes (1023-1025) and shift amounts, plus validation of high-order bit clearing | 
|  | ||
| while (fromIndex >= 5) | ||
| { | ||
| uint lo = (Unsafe.ReadUnaligned<uint>(ref Unsafe.AddByteOffset(ref p, (uint)(fromIndex -= 4))) << shiftCount) & shiftMask; | 
    
      
    
      Copilot
AI
    
    
    
      Oct 24, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The decrement operation fromIndex -= 4 is embedded within the offset calculation, which reduces readability. Consider separating this into two statements: first decrement fromIndex, then perform the read operation.
| uint lo = (Unsafe.ReadUnaligned<uint>(ref Unsafe.AddByteOffset(ref p, (uint)(fromIndex -= 4))) << shiftCount) & shiftMask; | |
| fromIndex -= 4; | |
| uint lo = (Unsafe.ReadUnaligned<uint>(ref Unsafe.AddByteOffset(ref p, (uint)fromIndex)) << shiftCount) & shiftMask; | 
| uint lo = (Unsafe.ReadUnaligned<uint>(ref Unsafe.AddByteOffset(ref p, (uint)(fromIndex -= 4))) << shiftCount) & shiftMask; | ||
| uint hi = (Unsafe.ReadUnaligned<uint>(ref Unsafe.AddByteOffset(ref p, (uint)(fromIndex - 1))) >>> carryCount) & carryMask; | ||
| uint result = hi | lo; | ||
| Unsafe.WriteUnaligned(ref Unsafe.AddByteOffset(ref p, toIndex -= 4), result); | 
    
      
    
      Copilot
AI
    
    
    
      Oct 24, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The decrement operation toIndex -= 4 is embedded within the offset calculation, which reduces readability. Consider separating this into two statements: first decrement toIndex, then perform the write operation.
| Unsafe.WriteUnaligned(ref Unsafe.AddByteOffset(ref p, toIndex -= 4), result); | |
| toIndex -= 4; | |
| Unsafe.WriteUnaligned(ref Unsafe.AddByteOffset(ref p, toIndex), result); | 
|  | ||
| while (fromIndex >= TVector.ElementCount + 1) | ||
| { | ||
| TVector hi = TVector.LoadUnsafe(ref p, (nuint)(fromIndex -= TVector.ElementCount)) << shiftCount; | 
    
      
    
      Copilot
AI
    
    
    
      Oct 24, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The decrement operation fromIndex -= TVector.ElementCount is embedded within the offset calculation, which reduces readability. Consider separating this into two statements: first decrement fromIndex, then perform the load operation.
| TVector hi = TVector.LoadUnsafe(ref p, (nuint)(fromIndex -= TVector.ElementCount)) << shiftCount; | |
| fromIndex -= TVector.ElementCount; | |
| TVector hi = TVector.LoadUnsafe(ref p, (nuint)fromIndex) << shiftCount; | 
| TVector hi = TVector.LoadUnsafe(ref p, (nuint)(fromIndex -= TVector.ElementCount)) << shiftCount; | ||
| TVector lo = TVector.LoadUnsafe(ref p, (nuint)(fromIndex - 1)) >>> carryCount; | ||
| TVector result = hi | lo; | ||
| result.StoreUnsafe(ref p, (nuint)(toIndex -= TVector.ElementCount)); | 
    
      
    
      Copilot
AI
    
    
    
      Oct 24, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The decrement operation toIndex -= TVector.ElementCount is embedded within the offset calculation, which reduces readability. Consider separating this into two statements: first decrement toIndex, then perform the store operation.
| result.StoreUnsafe(ref p, (nuint)(toIndex -= TVector.ElementCount)); | |
| toIndex -= TVector.ElementCount; | |
| result.StoreUnsafe(ref p, (nuint)toIndex); | 
| (int byteIndex, int bitOffeset) = Math.DivRem(length, BitsPerByte); | ||
| if (bitOffeset != 0) | ||
| { | ||
| Span<byte> bs = CollectionsMarshal.AsBytes(ba); | ||
| Assert.Equal(byteIndex + 1, bs.Length); | ||
| Assert.Equal(0, bs[byteIndex] >> bitOffeset); | 
    
      
    
      Copilot
AI
    
    
    
      Oct 24, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'bitOffeset' to 'bitOffset'.
| (int byteIndex, int bitOffeset) = Math.DivRem(length, BitsPerByte); | |
| if (bitOffeset != 0) | |
| { | |
| Span<byte> bs = CollectionsMarshal.AsBytes(ba); | |
| Assert.Equal(byteIndex + 1, bs.Length); | |
| Assert.Equal(0, bs[byteIndex] >> bitOffeset); | |
| (int byteIndex, int bitOffset) = Math.DivRem(length, BitsPerByte); | |
| if (bitOffset != 0) | |
| { | |
| Span<byte> bs = CollectionsMarshal.AsBytes(ba); | |
| Assert.Equal(byteIndex + 1, bs.Length); | |
| Assert.Equal(0, bs[byteIndex] >> bitOffset); | 
Related #63722, #113005
benchmark
Benchmark code