Skip to content

Memory safety issue in ByteGroupValueBuilder #15967

@thinkharderdev

Description

@thinkharderdev

Describe the bug

The implementation of ByteGroupValueBuilder is unsound as cast back and forth between signed and unsigned integer types can lead to a out-of-bounds memory access since ByteGroupValueBuilder::value uses core::slice::get_unchecked

To Reproduce

This test will segfault if run in release mode

    #[test]
    fn test_byte_group_value_builder_overflow() {
        let mut builder = ByteGroupValueBuilder::<i32>::new(OutputType::Utf8);
        
        let large_string = std::iter::repeat('a').take(1024 * 1024).collect::<String>();
        
        let array = Arc::new(StringArray::from(vec![Some(large_string.as_str())])) as ArrayRef;
        
        // Append items until our buffer length is 1 + i32::MAX as usize 
        for _ in 0..2048 {
            builder.append_val(&array, 0);
        }

        assert_eq!(builder.value(2047), large_string.as_bytes());
    }

Expected behavior

Either ByteGroupValueBuilder::do_append_val_inner needs to validate that self.buffer.len() <= i32::MAX as usize or ByteGroupValueBuilder::value needs to use safe slice access. The former seems like a better option since it would panic with a more useful message

Additional context

I tracked this down while debugging segfaults we've seen in our production environment so this is definitely something that can happen in the wild pretty easily when doing high-cardinality aggregations

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions