-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
IOx stores parquet files in a particular sort order, and then uses the fact the data is sorted for a variety of sort related optimizations
The new BasicEnforcement rule added in #4122 by @mingmwang is (correctly) deciding that since the ParquetExec declares its output is not sorted, it needs to add a SortExec which is unnecessary in our case and will slow performance dramatically.
I think the way to avoid this is to teach DataFusion that the ParquetExec is actually sorted (which is is) and then everything will work out.
Describe the solution you'd like
I would like a way for someone constructing a ParquetExec manually to be able to specify that the data is already sorted.
Describe alternatives you've considered
It might be possible to figure out the sort order of the data given the parquet metadata, but I haven't looked into that carefully
Additional context
As a bonus, I think at least some part of our plan construction logic in IOx that adds SortExec's in to sort the data could potentially be removed as it is now covered by the DataFusion optimizer.
See more detail at https://github.com/influxdata/influxdb_iox/pull/6108#discussion_r1019387151