- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1k
 
Closed
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently, dictionary values are merged only if they are Utf8, Binary, LargeUtf8, and LargeBinary because the pointer equality closure construction in should_merge_dictionaries returns false to the outer function in the default arm:
arrow-rs/arrow-select/src/dictionary.rs
Lines 112 to 123 in 1a5999a
| pub fn should_merge_dictionary_values<K: ArrowDictionaryKeyType>( | |
| dictionaries: &[&DictionaryArray<K>], | |
| len: usize, | |
| ) -> bool { | |
| use DataType::*; | |
| let first_values = dictionaries[0].values().as_ref(); | |
| let ptr_eq: Box<PtrEq> = match first_values.data_type() { | |
| Utf8 => Box::new(bytes_ptr_eq::<Utf8Type>), | |
| LargeUtf8 => Box::new(bytes_ptr_eq::<LargeUtf8Type>), | |
| Binary => Box::new(bytes_ptr_eq::<BinaryType>), | |
| LargeBinary => Box::new(bytes_ptr_eq::<LargeBinaryType>), | |
| _ => return false, | 
We've observed unnecessarily high memory usage when concatenating dictionaries with primitive values.
Describe the solution you'd like
should_merge_dictionaries should return true for other types as well.
Metadata
Metadata
Assignees
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog