Fix atomic_ref narrow-type memcheck false positive #6442
                
     Closed
            
            
          
      
        
          +509
        
        
          −4
        
        
          
        
      
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Summary
__atomic_ref_small_tagandreference_small.hsocuda::atomic_refon sub-4-byte types performs byte-granular locking instead of widened RMWsatomic_ref<int8_t>regression test covering host and device access patterns (sanitizer-friendly)Motivation
#6430reports thatcuda::atomic_refon narrow types trips compute-sanitizer memcheck because the implementation promotes byte writes to 32-bit RMWs. The sanitizer flags that widened access as an invalid global read, which blocks users (e.g., libcudf) from running their pipelines under memcheck.Explanation
The patch introduces a new storage tag for narrow
atomic_refinstances and backs it with a small device-side lock table. Each operation acquires a byte-granular lock, performs the necessary load/store/update, and releases the lock. Host execution still relies on the existing libatomic wrappers. This approach keeps behavior identical while preventing memcheck from seeing out-of-bounds accesses.Rationale
atomic_refstorage layer; existing owning atomics and the dispatch macros stay untouched.atomic_ref<int8_t>on both host and device, proving the narrow path works end-to-end (and keeps sanitizer latency low).Testing
nvcc -std=c++20 -x cu libcudacxx/test/libcudacxx/std/atomics/atomics.types.generic/integral/8b_integral_ref.pass.cpp -Ilibcudacxx/include -Ilibcudacxx/test/support -o /tmp/atomic_ref_small_testcompute-sanitizer --tool memcheck /tmp/atomic_ref_small_test@miscco @griwes