Skip to content

[Epic] A Collection of Sort Based Optimizations #10313

@alamb

Description

@alamb

Usecase

Many analytic systems store their data with some particular sort order, and the query engine can often take advantage of this sort order to both reduce memory usage and performance

Specific examples in Datafusion include:

  1. Emitting from GroupBy early with partially sorted stream
  2. SortMergeJoin
  3. Sort removal via EnforceSorting and replace_with_order_preserving_variants

This information is currently encoded in ExecutionPlan::maintains_input_order ExecutionPlan::required_input_ordering and PlanProperties

The same underlying analysis is often required for streaming (where determining what to emit is modeled as a sorted stream, for example on date_trunc(ts) of a stream sorted by timestamp).

Describe the solution you'd like

This epic has a list of optimizations / improvements that further take sortedness into account. Here are some related issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    PROPOSAL EPICA proposal being discussed that is not yet fully underwayenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions