Stateless Row Conversion

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**


Currently row conversion is stateful, relying on a separate `RowConverter` to maintain this global state.

This has a number of drawbacks:

* Prevents parallelizing row conversion across multiple threads
* Complicates APIs which have to manage mutable access to a RowConverter - https://github.com/apache/arrow-datafusion/blob/cf229b82cb53f1d9f430a2fa4ff5191ba52e5cef/datafusion/core/src/physical_plan/sorts/stream.rs#L38
* State accumulation can be problematic, necessitating workarounds like https://github.com/apache/arrow-datafusion/pull/7401
* The dictionary interning logic is very expensive - https://github.com/apache/arrow-rs/pull/3831

**Describe the solution you'd like**


I would like to propose removing the dictionary preservation logic, instead always hydrating the dictionaries values when encoding. This in turn would allow simplifying the API to no longer have a notion of a stateful RowConverter.

This may represent a performance regression for dictionaries with small number of values. We should definitely quantify this, but it is my expectation that this will only occur for dictionaries with a very low number of values. It is currently the case that even arrays with low numbers of distinct values may contain non-trivial number of values as a result of the way dictionaries are handled by the various kernels and readers, and so I'm inclined to not weigh this very highly.

**Describe alternatives you've considered**


**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stateless Row Conversion #4811

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stateless Row Conversion #4811

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions