You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
We are building / testing a specialized index for data stored in parquet that can tell us what row offsets are needed from the parquet file based on additional infomration
However, the DataFusion ParquetExec has no way to pass this information down. It does build its own
Describe the solution you'd like
What I would like is a way to provide something like a RowSelection for each row group
Describe alternatives you've considered
Here is one possible API:
let parquet_selection = ParquetSelection::new()// * rows 100-250 from row group 1.select(1,RowSelection::from(vec![RowSelector::skip(100),RowSelector::select(150)]);// * rows 50-100 and 200-300 in row group 2.select(2,RowSelection::from(vec![RowSelector::skip(50),RowSelector::select(50),RowSelector::skip(100),RowSelector::select(100),]);let parquet_exec = ParquetExec::new(...).with_selection(parquet_selection);