Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,17 @@ trait ConstraintHelper {
*/
def inferAdditionalConstraints(constraints: Set[Expression]): Set[Expression] = {
var inferredConstraints = Set.empty[Expression]
constraints.foreach {
// IsNotNull should be constructed by `constructIsNotNullConstraints`.
val predicates = constraints.filterNot(_.isInstanceOf[IsNotNull])
predicates.foreach {
case eq @ EqualTo(l: Attribute, r: Attribute) =>
val candidateConstraints = constraints - eq
val candidateConstraints = predicates - eq
inferredConstraints ++= replaceConstraints(candidateConstraints, l, r)
inferredConstraints ++= replaceConstraints(candidateConstraints, r, l)
case eq @ EqualTo(l @ Cast(_: Attribute, _, _), r: Attribute) =>
inferredConstraints ++= replaceConstraints(predicates - eq, r, l)
Copy link
Contributor

@cloud-fan cloud-fan Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to https://github.com/apache/spark/pull/27252/files#r378111623

If we have cast(a, dt) = b and b = 1, we can definitely infer cast(a, dt) = 1.
If we have cast(a, dt) = b and a = 1, seems we can also infer cast(1, dt) = b.

But I'm a bit unsure about how to do it. We may need a variant of replaceConstraints, which holds an expression builder "e => cast(e, dt) = b". It looks for attribute a, and replace it with cast(1, dt) = b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to touch replaceConstraints at all. Please check #27518 that will do the trick because a = 1 will be "substituted" into cast(a, dt) = b as a new cast(1, dt) = b constraint.

Copy link
Contributor

@peter-toth peter-toth Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both this PR and #27518 are beneficial. But I would use val originalLeft = testRelation1.where('a < 1).subquery('left) in test cases of this one instead of val originalLeft = testRelation1.where('a === 1).subquery('left) to avoid confusion.

Copy link
Member Author

@wangyum wangyum Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan This PR support cast(1, dt) = b before:
https://github.com/apache/spark/compare/048a0ecc65763c6feaa939938e2dec6f4040d939..7dcfe915087dbe274b470928600197745a645f5e

I removed it because:

  1. It may be broken the plan. This is how I handled it before.
  2. For cast(a, dt) = b, we support inferring many predicates, for example: a > 1, a < 1, a in (2, 3). I'm not sure if it's safe.

How about only supporting cast(a, dt) = 1 now?

@peter-toth I'd like to support these cases in #27518:

a < b && b < c infer a < c
a < b && b <= c infer a < c
a < b && b = c infer a < c
...

Copy link
Contributor

@peter-toth peter-toth Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum I see, but I think currently you are doing something very different in #27518 see details here: #27518 (comment)

I would suggest keeping your #27518 in its current form (but amending its title) and open a new one to address inequalities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about only supporting cast(a, dt) = 1 now?

+1 for supporting the limited case only in this pr. Since this part of optimization can affect many queries, I think we need exhaustive discussions and tests for supporting wider cases.

case eq @ EqualTo(l: Attribute, r @ Cast(_: Attribute, _, _)) =>
inferredConstraints ++= replaceConstraints(predicates - eq, l, r)
case _ => // No inference
}
inferredConstraints -- constraints
Expand All @@ -75,7 +81,7 @@ trait ConstraintHelper {
private def replaceConstraints(
constraints: Set[Expression],
source: Expression,
destination: Attribute): Set[Expression] = constraints.map(_ transform {
destination: Expression): Set[Expression] = constraints.map(_ transform {
case e: Expression if e.semanticEquals(source) => destination
})

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.plans._
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.rules._
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.{IntegerType, LongType}

class InferFiltersFromConstraintsSuite extends PlanTest {

Expand All @@ -46,8 +47,8 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
y: LogicalPlan,
expectedLeft: LogicalPlan,
expectedRight: LogicalPlan,
joinType: JoinType) = {
val condition = Some("x.a".attr === "y.a".attr)
joinType: JoinType,
condition: Option[Expression] = Some("x.a".attr === "y.a".attr)) = {
val originalQuery = x.join(y, joinType, condition).analyze
val correctAnswer = expectedLeft.join(expectedRight, joinType, condition).analyze
val optimized = Optimize.execute(originalQuery)
Expand Down Expand Up @@ -263,4 +264,56 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
val y = testRelation.subquery('y)
testConstraintsAfterJoin(x, y, x.where(IsNotNull('a)), y, RightOuter)
}

test("Constraints should be inferred from cast equality constraint(filter higher data type)") {
val testRelation1 = LocalRelation('a.int)
val testRelation2 = LocalRelation('b.long)
val originalLeft = testRelation1.subquery('left)
val originalRight = testRelation2.where('b === 1L).subquery('right)

val left = testRelation1.where(IsNotNull('a) && 'a.cast(LongType) === 1L).subquery('left)
val right = testRelation2.where(IsNotNull('b) && 'b === 1L).subquery('right)

Seq(Some("left.a".attr.cast(LongType) === "right.b".attr),
Some("right.b".attr === "left.a".attr.cast(LongType))).foreach { condition =>
testConstraintsAfterJoin(originalLeft, originalRight, left, right, Inner, condition)
Copy link
Member

@maropu maropu Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of test coverage, its better to test both cases (left/right-side casts)? I have the same comment in the test below.

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also test cast(int) here. The key is: we should test both left side cast and right side cast, as @maropu said.

"left.a".attr.cast(LongType) === "right.b".attr and "right.b".attr === "left.a".attr.cast(LongType) only test left side cast (join left side, not EqualTo left side).


Seq(Some("left.a".attr === "right.b".attr.cast(IntegerType)),
Some("right.b".attr.cast(IntegerType) === "left.a".attr)).foreach { condition =>
testConstraintsAfterJoin(
originalLeft,
originalRight,
testRelation1.where(IsNotNull('a)).subquery('left),
right,
Inner,
condition)
}
}

test("Constraints shouldn't be inferred from cast equality constraint(filter lower data type)") {
val testRelation1 = LocalRelation('a.int)
val testRelation2 = LocalRelation('b.long)
val originalLeft = testRelation1.where('a === 1).subquery('left)
val originalRight = testRelation2.subquery('right)

val left = testRelation1.where(IsNotNull('a) && 'a === 1).subquery('left)
val right = testRelation2.where(IsNotNull('b)).subquery('right)

Seq(Some("left.a".attr.cast(LongType) === "right.b".attr),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, but I find these test cases a bit confusing because left and right have equality filters. Eg. here 'a === 1 in left so actually it would be correct to infer 1.cast(LongType) === 'b for right. This PR doesn't do that obviously (#27518 will address that) but probably using inequalities ('a < 1) would be easier to follow.

Some("right.b".attr === "left.a".attr.cast(LongType))).foreach { condition =>
testConstraintsAfterJoin(originalLeft, originalRight, left, right, Inner, condition)
}

Seq(Some("left.a".attr === "right.b".attr.cast(IntegerType)),
Some("right.b".attr.cast(IntegerType) === "left.a".attr)).foreach { condition =>
testConstraintsAfterJoin(
originalLeft,
originalRight,
left,
testRelation2.where(IsNotNull('b) && 'b.attr.cast(IntegerType) === 1).subquery('right),
Inner,
condition)
}
}
}