-
Notifications
You must be signed in to change notification settings - Fork 245
chore: Prepare for DataFusion 48.0.0 #1710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
failures are due to |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1710 +/- ##
============================================
+ Coverage 56.12% 58.60% +2.47%
- Complexity 976 1138 +162
============================================
Files 119 130 +11
Lines 11743 12663 +920
Branches 2251 2362 +111
============================================
+ Hits 6591 7421 +830
- Misses 4012 4058 +46
- Partials 1140 1184 +44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
native/core/src/execution/planner.rs
Outdated
fun_expr, | ||
vec![left, right], | ||
data_type, | ||
Field::new("foo", data_type, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There have been changes in DF to use Field rather than DataType. Comet does not use the field name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use name not_used
or to put the comment like that field name is not used in Comet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the type is still used, so what about leaving a small comment instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this to use the function name as the return field name
let arg_fields = coerced_types | ||
.iter() | ||
.enumerate() | ||
.map(|(i, dt)| Field::new(format!("arg{i}"), dt.clone(), true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have to provide Field instead of DataType
if let (Some(filter), Some(data_schema)) = (cnf_data_filters, &data_schema) { | ||
parquet_source = parquet_source.with_predicate(Arc::clone(data_schema), filter); | ||
if let Some(filter) = cnf_data_filters { | ||
parquet_source = parquet_source.with_predicate(filter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were changes to the predicate push-down API
))) | ||
} | ||
|
||
fn statistics(&self) -> DataFusionResult<Statistics> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also rename it to partition_statistics
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
||
fn statistics(&self) -> Result<Statistics> { | ||
self.input.statistics() | ||
self.input.partition_statistics(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to partition_statistics
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove just some minor comments
native/Cargo.toml
Outdated
arrow = { version = "55.1.0", features = ["prettyprint", "ffi", "chrono-tz"] } | ||
async-trait = { version = "0.1" } | ||
bytes = { version = "1.10.0" } | ||
parquet = { version = "55.0.0", default-features = false, features = ["experimental"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parquet should be 55.1.0
?
&self, | ||
_file_col_statistics: &[ColumnStatistics], | ||
) -> datafusion::common::Result<Vec<ColumnStatistics>> { | ||
Ok(vec![]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this unused or just placeholder...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Column statistics are not used in Comet
Is it possible to have arrow-rs 55.1.0 in datafusion 48.0.0.? A performance improvement went in for int8/int16 which was as a result of the unsigned int issues we raised. The perf improvement also addresses the unsigned int issue which will allow us to clean up a lot of code/tests. |
DataFusion main branch already uses arrow-rs 55.1.0: https://github.com/apache/datafusion/blob/963b649d1e021e2516c2d652a8aa9fc486a5a1ba/Cargo.toml#L91 |
Which issue does this PR close?
Closes #1709
Rationale for this change
Keep up-to-date with upstream changes.
What changes are included in this PR?
How are these changes tested?