-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The implementation of ByteGroupValueBuilder
is unsound as cast back and forth between signed and unsigned integer types can lead to a out-of-bounds memory access since ByteGroupValueBuilder::value
uses core::slice::get_unchecked
To Reproduce
This test will segfault if run in release mode
#[test]
fn test_byte_group_value_builder_overflow() {
let mut builder = ByteGroupValueBuilder::<i32>::new(OutputType::Utf8);
let large_string = std::iter::repeat('a').take(1024 * 1024).collect::<String>();
let array = Arc::new(StringArray::from(vec![Some(large_string.as_str())])) as ArrayRef;
// Append items until our buffer length is 1 + i32::MAX as usize
for _ in 0..2048 {
builder.append_val(&array, 0);
}
assert_eq!(builder.value(2047), large_string.as_bytes());
}
Expected behavior
Either ByteGroupValueBuilder::do_append_val_inner
needs to validate that self.buffer.len() <= i32::MAX as usize
or ByteGroupValueBuilder::value
needs to use safe slice access. The former seems like a better option since it would panic with a more useful message
Additional context
I tracked this down while debugging segfaults we've seen in our production environment so this is definitely something that can happen in the wild pretty easily when doing high-cardinality aggregations
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working