Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -675,12 +675,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled

object ExtractAttribute {
var castToStr = false

def unapply(expr: Expression): Option[Attribute] = {
expr match {
case attr: Attribute => Some(attr)
case attr: Attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we simply do case attr: Attribute if attr.dataType == StringType?

Copy link
Member Author

@turboFei turboFei May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to test that, but it can't pass some tests in PartitionedTablePerfStatsSuite.
It seems that such as p1 = '0' (p1 is a Int partition key and '0' is an Integer) can filter some partitions, but p1 = "0" (p1 is a Int partition key and "0" is a String) can't be pushed down.
So, I add the prediction here to judge whether it has a castToString operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the !partitionKeys.contains(attr.name), it is false always.
I did't have a good understanding about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you point to the problematic test cases?

Copy link
Member Author

@turboFei turboFei May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such as(PartitionedTablePerfStatsSuite) :

genericTest("lazy partition pruning reads only necessary partition data")

Relative query is( partCol1 is an Int type partition key):

          spark.sql("select * from test where partCol1 = 999").count()
          assert(HiveCatalogMetrics.METRIC_PARTITIONS_FETCHED.getCount() == 0)

The relative log is

5 did not equal 0
ScalaTestFailureLocation: org.apache.spark.sql.hive.PartitionedTablePerfStatsSuite at (PartitionedTablePerfStatsSuite.scala:139)
Expected :0
Actual   :5
<Click to see difference>

org.scalatest.exceptions.TestFailedException: 5 did not equal 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm a little confused. Seems hive does support to filter non-string-type partition columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we need to revisit #19602

Copy link
Member Author

@turboFei turboFei May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you execute a sql likes

sql("SELECT c1 FROM t1 WHERE CAST(p1 as STRING) = '5'").show

It will throw an exception:

Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:759)
	... 57 more
Caused by: MetaException(message:Filtering is supported only on partition keys of type string)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

if (!castToStr || attr.dataType == StringType) =>
castToStr = false
Some(attr)
case Cast(child @ AtomicType(), dt: AtomicType, _)
if Cast.canUpCast(child.dataType.asInstanceOf[AtomicType], dt) => unapply(child)
case _ => None
if Cast.canUpCast(child.dataType.asInstanceOf[AtomicType], dt) =>
castToStr = (castToStr || dt == StringType)
unapply(child)
case _ =>
castToStr = false
None
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2384,4 +2384,13 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
}
}
}

test("SPARK-27814: test cast operation for partition key") {
withTable("t1") {
sql("CREATE TABLE t1(c1 INT, c2 STRING) PARTITIONED BY (p1 INT)")
sql("INSERT INTO TABLE t1 PARTITION (p1 = 5) values(1, 'str')")
checkAnswer(sql("SELECT c1 FROM t1 WHERE CAST(p1 as STRING) = '5'"), Row(1))
checkAnswer(sql("SELECT c1 FROM t1 WHERE CAST( CAST(p1 AS BIGINT) AS STRING) = '5'"), Row(1))
}
}
}