Skip to content

Morsel-Driven Parallelism Using Rayon #2199

@tustvold

Description

@tustvold

UPDATE June 2024: DataFusion does not use Morsel Driven Parallelism, instead it uses volcano pull + exchange style execution

You can read more about details and analysis in https://dl.acm.org/doi/10.1145/3626246.3653368

A proposal for reformulating the parallelism story within DataFusion to use a morsel-driven approach based on rayon. More details, background, and discussion can be found in the proposal document here, please feel free to comment there.

The keys highlights are:

  • Decouples parallelism from the partitioning expressed in the physical plan, allowing for:
    • Better handling of imbalanced partitions
    • Adaptive parallelism based on compute availability at execution time
    • Parallelism within a partition, such as decoding parquet columns in parallel, parallel sort, etc...
  • The first step to reducing the complexity associated with the current futures-based concurrency model
  • Improvements to thread-locality, observability and performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions