-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
I implemented a custom extension operator for DataFusion that logically wraps a TableScan, and the PushDownLimit optimizer rule doesn't support extension nodes (PushDownFilter does). This meant that my TableScans were always getting limit: None - which required pulling all of the data to then throw most of it away.
Describe the solution you'd like
I'd like PushDownLimit to be able to push down limits into Extension nodes as well.
I believe this will require changes to the UserDefinedLogicalNodeCore/UserDefinedLogicalNode traits to say if its safe to push the limit to the children or not. There may also need to be an API to say if the extension node itself supports limits?
Maybe something like:
/// Applies the limit to the extension node itself.
/// This should be false if the limit can push down to its inputs instead.
fn apply_limit(&self, limit: u64) -> bool {
false
}
/// Indicates to the optimizer if its safe to push a limit down past
/// this extension node
fn apply_limit_to_inputs(&self) -> bool {
false
}I'm not convinced we need apply_limit, so maybe an incremental step would be to just add apply_limit_to_inputs?
Describe alternatives you've considered
I ended up rewriting my AnalyzerRule to an OptimizerRule to ensure the limit pushdown happened before the new extension node was added, but that feels more like a workaround.
Additional context
No response