Skip to content

Conversation

@berkaysynnada
Copy link
Contributor

Which issue does this PR close?

Closes #6155.

Rationale for this change

#6196 PR has solved the issue of temporal scalars cannot appear as the LHS argument in an arithmetic operation. But there is an unoptimized solution such as creating an array from a scalar value as the length of the RHS array, and apply array vs array operations. This PR solves this issue in a similar way of LHS:Array and RHS:Scalar arithmetic.

What changes are included in this PR?

In the datetime.rs, there are two functions that handle arithmetic operations between arrays and arrays, and between scalars and arrays. To determine the appropriate operation to perform, a flag is used to hold information about the left-hand side (LHS) and right-hand side (RHS) operands.

In the kernel_arrow.rs, type matching is performed to identify the specific operation to execute. Finally, the actual computation is carried out in the corresponding function within the binary.rs.

Are these changes tested?

Yes.

Are there any user-facing changes?

@github-actions github-actions bot added optimizer Optimizer rules physical-expr Changes to the physical-expr crates labels May 24, 2023
@alamb
Copy link
Contributor

alamb commented May 24, 2023

I plan to review this PR tomorrow. Perhaps @tustvold is interested as I believe he is planning some work to improve the temporal kernel support in arrow-rs

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @berkaysynnada -- in general this code is looking quite good. I think it is structured nicely and is improving

My one big concern is that the new code seems to be mostly be untested -- specifically there are a lot of new functions and code that looks similar to me such as ts_ms_interval_mdn_op and ts_us_interval_mdn_op

Do you know if this code covered by existing tests?

I realize it might be repetitive to write tests for this code, but I think it is is really important to avoid regressions in the future. Perhaps there is some way to reduce the code replication (and therefore also the testing burden) 🤔

@berkaysynnada
Copy link
Contributor Author

berkaysynnada commented May 29, 2023

Thank you @berkaysynnada -- in general this code is looking quite good. I think it is structured nicely and is improving

My one big concern is that the new code seems to be mostly be untested -- specifically there are a lot of new functions and code that looks similar to me such as ts_ms_interval_mdn_op and ts_us_interval_mdn_op

Do you know if this code covered by existing tests?

I realize it might be repetitive to write tests for this code, but I think it is is really important to avoid regressions in the future. Perhaps there is some way to reduce the code replication (and therefore also the testing burden) 🤔

Thank you for the review ☺️ I have applied your suggestions. I have added some tests that compare the results of a(ScalarValue) vs b(another ScalarValue) and a vs c(Array having one element and the same value of b). I thought this would suffice, as the ScalarValue tests are wide-ranging and testing edge cases. I have also tried to reduce arithmetic functions as much as possible, but I would appreciate it if you let me know if anything else catches your eye.

https://github.com/apache/arrow-datafusion/blob/c768c9b234538be2a2640d689c87edcbce46d94a/datafusion/core/tests/sqllogictests/test_files/interval.slt#L359

https://github.com/synnada-ai/arrow-datafusion/blob/99b88618dc62c0917165079374c8b0ff21b27556/datafusion/physical-expr/src/expressions/datetime.rs#LL674C12-L674C12

These are the tests for Scalar vs Array operations.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @berkaysynnada -- I think this is an improvement. I hope we can eventually move some of this logic upstream into arrow-rs but for now it seems good in DataFusion

@mustafasrepo if you are good with this PR, perhaps you can merge it?

@mustafasrepo mustafasrepo merged commit 0abd4a2 into apache:main May 30, 2023
@mustafasrepo
Copy link
Contributor

Thank you @berkaysynnada -- I think this is an improvement. I hope we can eventually move some of this logic upstream into arrow-rs but for now it seems good in DataFusion

@mustafasrepo if you are good with this PR, perhaps you can merge it?

Sure, I have reviewed this Pr in our internal repo.

@alamb
Copy link
Contributor

alamb commented May 30, 2023

Thanks everyone!

@berkaysynnada berkaysynnada deleted the feature/refactor-temporal-arithmetic branch May 31, 2023 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

arithmetic operations on timestamps fail when now() is used as the LHS argument

6 participants