-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
In https://github.com/apache/datafusion/pull/12019/files @dmitrybugakov added support for StringViewArray in the substr function ❤️
However, the initial implementation returns an output StringArray when the input is a StringViewArray, which means all the strings are copied
In some functions, such as substr, this extra copy is unnecessary and only the views (aka the i128s that make up the pointers). See GenericByteViewArray for more details
Describe the solution you'd like
I think we can avoid the copy when the input uses StringViewArray and thus make substr faster
Describe alternatives you've considered
The idea would be to
- Create a benchmark for the substring function for StringArray, LargeStringArray and StringViewArray
- Optimize the implementation of substr
The optimization would likely look like:
- Change the signature of
substrso it produces aStringViewArraywhen its first argument is aStringViewArray(at the moment it producesStringArraywhen its argument is aStringViewArray) - Make a function that took StringViewArray as input and produced another StringViewArray as output
Additional context
Here is an example benchmark: #12015
Here is the code to work to create StringViews: StringViewBuilder https://docs.rs/arrow/latest/arrow/array/type.StringViewBuilder.html