Skip to content

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented May 2, 2025

Which issue does this PR close?

Closes #1709

Rationale for this change

Keep up-to-date with upstream changes.

What changes are included in this PR?

How are these changes tested?

@andygrove
Copy link
Member Author

failures are due to return_field_from_args should be called instead

@codecov-commenter
Copy link

codecov-commenter commented May 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.60%. Comparing base (f09f8af) to head (2b2b085).
Report is 200 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1710      +/-   ##
============================================
+ Coverage     56.12%   58.60%   +2.47%     
- Complexity      976     1138     +162     
============================================
  Files           119      130      +11     
  Lines         11743    12663     +920     
  Branches       2251     2362     +111     
============================================
+ Hits           6591     7421     +830     
- Misses         4012     4058      +46     
- Partials       1140     1184      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove marked this pull request as ready for review May 20, 2025 14:36
fun_expr,
vec![left, right],
data_type,
Field::new("foo", data_type, true),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There have been changes in DF to use Field rather than DataType. Comet does not use the field name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use name not_used or to put the comment like that field name is not used in Comet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the type is still used, so what about leaving a small comment instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to use the function name as the return field name

let arg_fields = coerced_types
.iter()
.enumerate()
.map(|(i, dt)| Field::new(format!("arg{i}"), dt.clone(), true))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have to provide Field instead of DataType

if let (Some(filter), Some(data_schema)) = (cnf_data_filters, &data_schema) {
parquet_source = parquet_source.with_predicate(Arc::clone(data_schema), filter);
if let Some(filter) = cnf_data_filters {
parquet_source = parquet_source.with_predicate(filter);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were changes to the predicate push-down API

)))
}

fn statistics(&self) -> DataFusionResult<Statistics> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also rename it to partition_statistics ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


fn statistics(&self) -> Result<Statistics> {
self.input.statistics()
self.input.partition_statistics(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to partition_statistics ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove just some minor comments

arrow = { version = "55.1.0", features = ["prettyprint", "ffi", "chrono-tz"] }
async-trait = { version = "0.1" }
bytes = { version = "1.10.0" }
parquet = { version = "55.0.0", default-features = false, features = ["experimental"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parquet should be 55.1.0 ?

&self,
_file_col_statistics: &[ColumnStatistics],
) -> datafusion::common::Result<Vec<ColumnStatistics>> {
Ok(vec![])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this unused or just placeholder...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column statistics are not used in Comet

@parthchandra
Copy link
Contributor

Is it possible to have arrow-rs 55.1.0 in datafusion 48.0.0.? A performance improvement went in for int8/int16 which was as a result of the unsigned int issues we raised. The perf improvement also addresses the unsigned int issue which will allow us to clean up a lot of code/tests.

@andygrove
Copy link
Member Author

Is it possible to have arrow-rs 55.1.0 in datafusion 48.0.0.? A performance improvement went in for int8/int16 which was as a result of the unsigned int issues we raised. The perf improvement also addresses the unsigned int issue which will allow us to clean up a lot of code/tests.

DataFusion main branch already uses arrow-rs 55.1.0: https://github.com/apache/datafusion/blob/963b649d1e021e2516c2d652a8aa9fc486a5a1ba/Cargo.toml#L91

@andygrove andygrove merged commit 7d9e3b2 into apache:main May 20, 2025
78 checks passed
@andygrove andygrove deleted the df-48.0.0 branch May 20, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate impact of breaking changes in DataFusion since 47.0.0 release
5 participants