Skip to content

Conversation

@peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Apr 30, 2019

What changes were proposed in this pull request?

This PR adds a new FilterReduction rule that can reduce BinaryComparison filters like these:

SELECT * FROM table WHERE i <= 5 AND i = 5        => ... WHERE i = 5
SELECT * FROM table WHERE i < j AND ... AND i > j => ... WHERE false

How was this patch tested?

Existing and new UTs.

@peter-toth
Copy link
Contributor Author

peter-toth commented Apr 30, 2019

I filed https://issues.apache.org/jira/browse/SPARK-27604 to cover some improvements regarding constraints. This PR adds some improvement although there is room for more.
I'm interested in feedbacks about this PR and also in suggestions how we could take this further.

cc. @gengliangwang @cloud-fan @mgaido91 @viirya @hvanhovell @gatorsmile @dongjoon-hyun as I saw you worked on similar topics before.

Change-Id: I8031a37f380d5fe5fff37488460b790dd65c0161
Change-Id: I79ec5286b4762d98abaea638f8a9e5e9f7f753d0
* SELECT * FROM table WHERE i = 5 AND j = i + 3
* ==> SELECT * FROM table WHERE i = 5 AND j = 8
* SELECT * FROM table WHERE i = 5 AND j = i + 3 => SELECT * FROM table WHERE i = 5 AND j = 8
* SELECT * FROM table WHERE i <= 5 AND i = 5 => SELECT * FROM table WHERE i = 5
Copy link
Member

@dongjoon-hyun dongjoon-hyun Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @peter-toth . I don't think this aims the equal goal. Could you make a separate rule instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter reduction should be a separate rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for the feedback. The 2 rules seemed similar and easy to combine to me, but I will not mix them then.

Copy link
Contributor Author

@peter-toth peter-toth May 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated the 2 rules. I added some improvement to the ConstantPropagation and created a new FilterReduction.

Copy link
Contributor Author

@peter-toth peter-toth May 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved all new filter reduction related logic to FilterReduction and extracted constant propagation enhancements to a separate PR.

* SELECT * FROM table WHERE i = 5 AND j = i + 3
* ==> SELECT * FROM table WHERE i = 5 AND j = 8
* SELECT * FROM table WHERE i = 5 AND j = i + 3 => SELECT * FROM table WHERE i = 5 AND j = 8
* SELECT * FROM table WHERE i <= 5 AND i = 5 => SELECT * FROM table WHERE i = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

peter-toth added 2 commits May 6, 2019 13:35
Change-Id: Ia68242ba5067e84143e35a1126a49017c5a203f6
val newCondition =
splitConjunctivePredicates(condition)
.map(rewriteEqual)
.sortBy(p => scala.util.hashing.MurmurHash3.seqHash(Seq(p.getClass, p)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the class of expression to the sorting order here and in rewriteEqual() as hashCode() of TreeNode doesn't take the class of the expression into account and
comparePlans(testRelation.where('a < 2 && 'a === 2).analyze, testRelation.where('a === 2 && 'a < 2).analyze) would fail.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 7, 2019

Hi, @peter-toth . Thank you for splitting the rule.
Please split ConstantPropagation improvement into a separate PR. That doesn't depend on the new FilterReduction, right? That makes the review easier and faster.

@peter-toth peter-toth changed the title [SPARK-27604][SQL] Enhance constant propagation to constraint propagation [SPARK-27604][SQL] Add filter reduction May 8, 2019
@peter-toth
Copy link
Contributor Author

Thanks @dongjoon-hyun, I extracted the improved ConstantPropagation rule to here: #24553

* - Using this mapping, replace occurrence of the expressions with the corresponding constant
* values in the AND node.
*/
object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since new PR is created, we had better remove this change from this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, reverted.
Please note that some of the new UTs will require the enhanced constant propagation to work as expected.

Change-Id: I8ba31ae5043184d53662a4cedfe011a36d63759f
Batch("AnalysisNodes", Once,
EliminateSubqueryAliases) ::
Batch("FilterReduction", FixedPoint(10),
ConstantPropagation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which test case requires this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like a <= 2 AND 2 <= a where I would expect a = 2 reduced result. FilterReduction can transforms it into a <= 2 AND 2 = a. Then ConstantPropagation (only the enhanced in #24553) can transform it into 2 <= 2 AND 2 = a ...

Change-Id: Ifef383125dd8827fdeea90e8101cb432de9e704e
case c EqualNullSafe d if planEqual(a, d) => EqualTo(c, d)
}
case a EqualTo b => expression transformUp {
case c LessThan d if planEqual(b, d) && planEqual(a, c) => Literal.FalseLiteral
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually these too are wrong with nulls, as they should return null instead of false and the difference is significant (consider the case of a Not in front of it..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think you are right and this is an issue at many other places too.

Copy link
Contributor Author

@peter-toth peter-toth May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the solution could be checking if a, b, c and d not nullable and then we can use transform. Otherwise limiting a top-down way replacement until we have any of And, Or, CaseWhen or If nodes and can't go deeper if we hit something else. This is what ReplaceNullWithFalseInPredicate rule does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the issue and added some new UTs.

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented May 23, 2019

Test build #105735 has finished for PR 24495 at commit 0972a29.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth
Copy link
Contributor Author

Thanks @HyukjinKwon for enabling the tests. Some UT failures are expected until #24553 is not merged.

@cloud-fan
Copy link
Contributor

can we enrich the PR description with how it is implemented? e.g. the overall design, the algorithm that is used.

@peter-toth
Copy link
Contributor Author

can we enrich the PR description with how it is implemented? e.g. the overall design, the algorithm that is used.

@cloud-fan, I'm happy to take this up again, but there is a prerequisite PR to this one which I would like to polish first: #24553

@github-actions
Copy link

github-actions bot commented Apr 6, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Apr 6, 2020
@github-actions github-actions bot closed this Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants