-
Couldn't load subscription status.
- Fork 1.7k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
We are trying to make hash based aggregation significantly faster -- see #4973
This will require some non trivial changes to the organization of how hash aggregation works. At the moment BoundedAggregateStream and GroupedHashAggregateStream both share significant amounts of code and so either we will have to duplicate the work to make hashing aggregation faster or else BoundedAggregateStream will not get the benefits.
Here is a visual depiction of the common code:
meld datafusion/core/src/physical_plan/aggregates/bounded_aggregate_stream.rs datafusion/core/src/physical_plan/aggregates/row_hash.rs Describe the solution you'd like
Reduce duplication between BoundedAggregateStream and GroupedHashAggregateStream
The major differences are:
- Choice of when output can be emitted
- Clearing previous group state when groups have been emitted
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
