-
Couldn't load subscription status.
- Fork 1.7k
Description
Is your feature request related to a problem or challenge?
As part of #8849, I wanted to build up the output state (the distinct string values) during accumulation and then generate output directly (without a copy)
However, I couldn't implement a zero copy algorithm because evaluate and merge take a &self not a &mut self (well I worked around it with a Mutex 🤮 ).
The need to copy intermediate state likely doesn't matter for most Accumulators as they only emit a single scalar value where the cost of copying is pretty low. However, for ones that emit significant internal state (like count DISTINCT) using the same internal state can save a lot of copying (again, see #8849 for an example)
Also, the actual Accumulator instances are never used after a call to evaluate/state, so the state of the accumulator after this call is never used again.
Describe the solution you'd like
I would like to change Accumulator::evaluate and Accumulator::state to take &mut self
This is also consistent with GroupsAccumulator which takes mut for its state and evaluate functions:
pub trait GroupsAccumulator: Send {
...
fn evaluate(
&mut self,
emit_to: EmitTo
) -> Result<Arc<dyn Array>, DataFusionError>;
fn state(
&mut self,
emit_to: EmitTo
) -> Result<Vec<Arc<dyn Array>>, DataFusionError>;
...Describe alternatives you've considered
No response
Additional context
No response