Skip to content

StringArrayView(Utf8View) slower cases compare to StringArray(Utf8) #7350

@zhuqi-lucas

Description

@zhuqi-lucas

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This ticket collected the Utf8View slower than Ut8 cases and try to improve it.

Mostly the cases happen when the string has same 4bytes prefix, but one of the string to compare is larger than 12 bytes, it will make it happen.

Describe the solution you'd like
Make Utf8View regression cases faster.

    • Add reproducer cases which the Utf8View will slower than Utf8
    • Add code implementation to improve the Utf8View regression cases

Describe alternatives you've considered
Make Utf8View regression cases faster.

Additional context
Make Utf8View regression cases faster.

From the benchmark testing from datafusion sort tpch, there are regressions about the Utf8View compare:

We'd better to improve it from arrow-rs, so we can benefit a lot for datafusion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions