-
Notifications
You must be signed in to change notification settings - Fork 253
feat: Support ANSI mode sum expr #2600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Support ANSI mode sum expr #2600
Conversation
|
Draft PR to support sum function - WIP |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2600 +/- ##
============================================
+ Coverage 56.12% 58.86% +2.74%
- Complexity 976 1470 +494
============================================
Files 119 164 +45
Lines 11743 13999 +2256
Branches 2251 2371 +120
============================================
+ Hits 6591 8241 +1650
- Misses 4012 4561 +549
- Partials 1140 1197 +57 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| for i in 0..int_array.len() { | ||
| if !int_array.is_null(i) { | ||
| let v = int_array.value(i).to_i64().ok_or_else(|| { | ||
| DataFusionError::Internal("Failed to convert value to i64".to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to print the problematic value in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review . I believe we wouldnt be needing these checks at all given how we check the data types multiple times (from planning phase , internal functions)
| running_sum, | ||
| )?, | ||
| _ => { | ||
| panic!("Unsupported data type {}", values.data_type()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line 138 it returns an Err when the conversion fails. Would it make sense to return an Err here too instead of panicking ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you .I made sure it is all error driven and not panic
970d693 to
c0715aa
Compare
|
@andygrove , @comphead I believe this should be ready for review (pending CI) |
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Outdated
Show resolved
Hide resolved
Thats actually interesting why would window fall back, we planning to fallback on windows in #2726 but the test would still preserve windows, so I think we need to check a fallback reason |
6dcf47b to
13b7d68
Compare
This is no longer a failing test after I rebased with main branch |
|
@andygrove @comphead . All the tests passed and the PR is rest for review |
Which issue does this PR close?
Closes #531 .
(Partially closes 531) . The ANSI changes to support
AVGwill be tracked in a new PRRationale for this change
DataFusion's default SUM doesn't match Spark's overflow semantics. This implementation ensures we get the exact same behavior as Spark for integer overflow in all 3 eval modes (ANSI , Legacy and Try mode)
What changes are included in this PR?
This PR adds native Rust implementation for SUM aggregation on integer types (byte, short, int, long)
Native changes (Rust):
(Inspired from
SumDecimaland spark's SUM impl)SumIntegeraggregate function that handles SUM for all integer types (in coherence with Spark)( Implemented code in similar fashion of spark leveraging
Option<i64>to represent NULL and numeric values for sum , and an additional parameter calledhas_all_nullswhich is leveraged in Try mode to distinguish if NULL sum is caused by all NULL inputs or the fact that the sum overflowed. (Spark does this withshouldTrackIsEmptyand assigning NULL to running sum which is a long datatype) )Scala side changes :
CometSumto add ANSI support (ANSI and Try)eval_modeinstead offail_on_errorto supportLegacy, ANSI and Tryeval modesjava.lang.Longto avoid Scala's auto-boxing feature which auto casts objects to primitive types there by casting nulls to 0s) handling in both simple and GROUP BY agg .How are these changes tested?