- 
                Notifications
    
You must be signed in to change notification settings  - Fork 285
 
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
 
Type of Bug
Something else
Component
libcu++
Describe the bug
When using atomic_ref::min (or other atomic operations) on small data types such as int8_t, compute-sanitizer (memcheck) reports an invalid global read of size 4 bytes.
This occurs when the underlying memory allocation is tightly packed (i.e., no 4-byte padding). The issue stems from CCCL’s use of a 4-byte CAS loop to emulate atomic operations on types smaller than 4 bytes.
While the sanitizer reports an out-of-bounds load, the operation is known to be safe on all supported hardware—making this a false positive.
libcudf currently works around this by rounding small-type column buffer sizes to a multiple of 4 bytes to silence the failures in our nightly CI.
How to Reproduce
Compile and run the following kernel under compute-sanitizer (memcheck) with input data size of 5:
__global__ void test(int8_t* data) {
  cuda::atomic_ref<int8_t, cuda::thread_scope_system> ref(*data);
  ref.fetch_min(1);
}Observed output:
========= Invalid __global__ read of size 4 bytes
=========     at void cuda::std::__cuda_atomic_load<unsigned int>(...)
This only happens when data points to a buffer without 4-byte padding.
Expected behavior
- No invalid global memory access should be reported by 
compute-sanitizer. - The sanitizer should recognize that the 4B CAS operation does not modify bytes outside the target value.
 
Additonal notes
- The operation is safe and behaves correctly on all supported GPU architectures.
 - CCCL maintainers confirmed this is not a bug in 
atomic_ref, but a sanitizer false positive. - Potential solution: NVVM intrinsic or PTX pragma to annotate these safe atomic operations so memcheck can suppress the warning.
 
Metadata
Metadata
Assignees
Labels
Type
Projects
Status