-
Notifications
You must be signed in to change notification settings - Fork 1.8k
revert some code for #4726 / remove unnecessary coercion in physical plans #4741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| }, | ||
| (Timestamp(_, tz), Utf8) => Some(Timestamp(TimeUnit::Nanosecond, tz.clone())), | ||
| (Utf8, Timestamp(_, tz)) => Some(Timestamp(TimeUnit::Nanosecond, tz.clone())), | ||
| // TODO: need to investigate the result type for the comparison between timestamp and date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#4644 (comment)
we need to get the behavior for the comparison in the PG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liukun4515 comparison is usually boolean type, did you mean what is the common datatype for casting between timestamp and date in diff scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
the result of comparison is bool type, but if left and right are not the same data type, we should get the common type for left and right through the type coercion.
For example, INT32>INT64, the common type is INT64, and we should cast the left to INT64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of type coercion is decided by input types and op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liukun4515 comparison is usually boolean type, did you mean what is the common datatype for casting between timestamp and date in diff scenarios?
I think it's not about casting from one type to other type, it is related the type inference for different input types and different operation or functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try the comparison timestamp with date in the Spark, and find the coerced type is timestamp.
spark-sql> explain extended select now()>cast('2000-01-01' as date);
== Parsed Logical Plan ==
'Project [unresolvedalias(('now() > cast(2000-01-01 as date)), None)]
+- OneRowRelation
== Analyzed Logical Plan ==
(now() > CAST(2000-01-01 AS DATE)): boolean
Project [(now() > cast(cast(2000-01-01 as date) as timestamp)) AS (now() > CAST(2000-01-01 AS DATE))#49]
+- OneRowRelation
== Optimized Logical Plan ==
Project [true AS (now() > CAST(2000-01-01 AS DATE))#49]
+- OneRowRelation
== Physical Plan ==
*(1) Project [true AS (now() > CAST(2000-01-01 AS DATE))#49]
+- *(1) Scan OneRowRelation[]
But current implementation of coerced type is date, we can fix this in the follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timestamp is probably consistent with other coercions that ensure no data is truncated (so going Date --> Timestamp is good as the the Timestamp could be losslessly converted back to a Date). However, going from Timestamp --> Date would truncate the precision of the original timesteamp,
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I like it
(I made a similar version here #4745 before I found this PR)
| Timestamp(TimeUnit::Nanosecond, None) => { | ||
| matches!(type_from, Null | Timestamp(_, None)) | ||
| } | ||
| Date32 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
Though I admit that the fact that all tests still pass is concerning 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added from the commit https://github.com/comphead/arrow-datafusion/blob/4761802a4bc46d377406aa20767c4af66b80d65b/datafusion/expr/src/type_coercion/functions.rs#L180 in the #4726
I just revert useless codes adding from the PR #4726
| /// Returns a common coerced datatype between 2 given datatypes | ||
| /// | ||
| /// See the module level documentation for more detail on coercion. | ||
| pub fn get_common_coerced_type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method can be of some interest, if the user wants to check the common coerced type between 2 datatypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth waiting for a specific use before leaving it in 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method can be of some interest, if the user wants to check the common coerced type between 2 datatypes?
the common coerced type will be decided by input exprs/input types and op, if the op is different and the same input types/input exprs may get the different coerced type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this PR then -- we can always recover get_common_coerced_type from this PR or git history if we need it.
|
Benchmark runs are scheduled for baseline = d3ca9b0 and contender = 760f108. 760f108 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
from the #4726 (comment)
If we want to compare the timestamp with date, we just need to add logic in
type_coercion, and don't need to do type coercion in the physical phase.Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?