Skip to content

Commit ffee4f1

Browse files
lianchengcloud-fan
authored andcommitted
[SPARK-19905][SQL] Bring back Dataset.inputFiles for Hive SerDe tables
## What changes were proposed in this pull request? `Dataset.inputFiles` works by matching `FileRelation`s in the query plan. In Spark 2.1, Hive SerDe tables are represented by `MetastoreRelation`, which inherits from `FileRelation`. However, in Spark 2.2, Hive SerDe tables are now represented by `CatalogRelation`, which doesn't inherit from `FileRelation` anymore, due to the unification of Hive SerDe tables and data source tables. This change breaks `Dataset.inputFiles` for Hive SerDe tables. This PR tries to fix this issue by explicitly matching `CatalogRelation`s that are Hive SerDe tables in `Dataset.inputFiles`. Note that we can't make `CatalogRelation` inherit from `FileRelation` since not all `CatalogRelation`s are file based (e.g., JDBC data source tables). ## How was this patch tested? New test case added in `HiveDDLSuite`. Author: Cheng Lian <[email protected]> Closes #17247 from liancheng/spark-19905-hive-table-input-files.
1 parent bc30351 commit ffee4f1

File tree

2 files changed

+14
-0
lines changed

2 files changed

+14
-0
lines changed

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ import org.apache.spark.broadcast.Broadcast
3636
import org.apache.spark.rdd.RDD
3737
import org.apache.spark.sql.catalyst._
3838
import org.apache.spark.sql.catalyst.analysis._
39+
import org.apache.spark.sql.catalyst.catalog.CatalogRelation
3940
import org.apache.spark.sql.catalyst.encoders._
4041
import org.apache.spark.sql.catalyst.expressions._
4142
import org.apache.spark.sql.catalyst.expressions.aggregate._
@@ -2734,6 +2735,8 @@ class Dataset[T] private[sql](
27342735
fsBasedRelation.inputFiles
27352736
case fr: FileRelation =>
27362737
fr.inputFiles
2738+
case r: CatalogRelation if DDLUtils.isHiveTable(r.tableMeta) =>
2739+
r.tableMeta.storage.locationUri.map(_.toString).toArray
27372740
}.flatten
27382741
files.toSet.toArray
27392742
}

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1865,4 +1865,15 @@ class HiveDDLSuite
18651865
}
18661866
}
18671867
}
1868+
1869+
test("SPARK-19905: Hive SerDe table input paths") {
1870+
withTable("spark_19905") {
1871+
withTempView("spark_19905_view") {
1872+
spark.range(10).createOrReplaceTempView("spark_19905_view")
1873+
sql("CREATE TABLE spark_19905 STORED AS RCFILE AS SELECT * FROM spark_19905_view")
1874+
assert(spark.table("spark_19905").inputFiles.nonEmpty)
1875+
assert(sql("SELECT input_file_name() FROM spark_19905").count() > 0)
1876+
}
1877+
}
1878+
}
18681879
}

0 commit comments

Comments
 (0)