-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
Part of #10922
We are adding APIs to efficiently convert the data stored in Parquet's "PageIndex" into ArrayRefs -- which will make it significiantly easier to use this information for pruning and other tasks.
Describe the solution you'd like
Add support to StatisticsConverter::min_page_statistics and StatisticsConverter::max_page_statistics for the types above
datafusion/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs
Lines 637 to 656 in 2f43476
| pub(crate) fn min_page_statistics<'a, I>( | |
| data_type: Option<&DataType>, | |
| iterator: I, | |
| ) -> Result<ArrayRef> | |
| where | |
| I: Iterator<Item = (usize, &'a Index)>, | |
| { | |
| get_data_page_statistics!(Min, data_type, iterator) | |
| } | |
| /// Extracts the max statistics from an iterator | |
| /// of parquet page [`Index`]'es to an [`ArrayRef`] | |
| pub(crate) fn max_page_statistics<'a, I>( | |
| data_type: Option<&DataType>, | |
| iterator: I, | |
| ) -> Result<ArrayRef> | |
| where | |
| I: Iterator<Item = (usize, &'a Index)>, | |
| { | |
| get_data_page_statistics!(Max, data_type, iterator) |
Describe alternatives you've considered
- Update the test for the listed data types following the model of
test_int64(note this API will change slightly in Minor: Improveasync fn test_int_64() { arrow_statisticstests #10927) - Add any required implementation in (follow the model of the row counts,
datafusion/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs
Lines 575 to 586 in 2f43476
make_data_page_stats_iterator!(MinInt64DataPageStatsIterator, min, Index::INT64, i64); make_data_page_stats_iterator!(MaxInt64DataPageStatsIterator, max, Index::INT64, i64); macro_rules! get_data_page_statistics { ($stat_type_prefix: ident, $data_type: ident, $iterator: ident) => { paste! { match $data_type { Some(DataType::Int64) => Ok(Arc::new(Int64Array::from_iter([<$stat_type_prefix Int64DataPageStatsIterator>]::new($iterator).flatten()))), _ => unimplemented!() } } } )macro_rules! make_stats_iterator {
Additional context
No response
edmondop and Lordworms
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request