-
Couldn't load subscription status.
- Fork 1.7k
Description
Describe the bug
As @tustvold points out, there is a column_order API defined in parquet that is currently entirely ignored by DataFusion
It is not entirely clear to me what the implications of ignoring this field are or what other parquet writers populate it with, but we should probably not ignore it
To Reproduce
No response
Expected behavior
No response
Additional context
To emphasise the point I made when this API was originally proposed, you need more than just the ParquetStatistics in order to correctly interpret the data. You need at least the FileMetadata to get the https://docs.rs/parquet/latest/parquet/file/metadata/struct.FileMetaData.html#method.column_order in order to be able to even interpret what the statistics mean for a given column.