Skip to content

Conversation

@dharanad
Copy link
Contributor

Which issue does this PR close?

Closes #11114

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Covered by existing test cases

cargo test

Are there any user-facing changes?

No

@github-actions github-actions bot added the core Core DataFusion crate label Jun 30, 2024
[<$stat_type_prefix Int32DataPageStatsIterator>]::new($iterator).flatten(),
)),
_ => {
let len = $iterator.count();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have referenced get_statistics macro to decide on returing a null array, my other idea was to conclued with unimplemented!().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You beat me to it ;-).

I think this can just be:

                        _ => new_empty_array(&DataType::Time32(unit.clone())),

And then you need to add new_empty_array into the use statement. Time64 gets a similar treatment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner. Thanks for the idea @efredine

@dharanad dharanad marked this pull request as ready for review June 30, 2024 18:22
@dharanad dharanad requested a review from efredine June 30, 2024 19:24
Copy link
Contributor

@efredine efredine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right to me!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dharanad and @efredine for the review

)),
_ => {
// don't know how to extract statistics, so return an empty array
new_empty_array(&DataType::Time32(unit.clone()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking I think Time32 must be either Seconds or Milliseconds otherwise it is not a valid type.

https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Time32

However, the arrow-rs library doesn't encode this in the types so I think this is a reasonable behavior

@alamb alamb merged commit d624c0d into apache:main Jul 1, 2024
@dharanad dharanad deleted the time-parquet-data-page-statistics branch July 1, 2024 18:57
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* add parquet page stats for time

* return empty array instead of null

* fix typos
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Time Parquet Data Page Statistics

3 participants