Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Jan 16, 2017

What changes were proposed in this pull request?

So far, the test cases in DDLSuites only verify the behaviors of InMemoryCatalog. That means, they do not cover the scenarios using HiveExternalCatalog. Thus, we need to improve the existing test suite to run these cases using Hive metastore.

When porting these test cases, a bug of SET LOCATION is found. path is not set when the location is changed.

After this PR, a few changes are made, as summarized below,

  • DDLSuite becomes an abstract class. Both InMemoryCatalogedDDLSuite and HiveCatalogedDDLSuite extend it. InMemoryCatalogedDDLSuite is using InMemoryCatalog. HiveCatalogedDDLSuite is using HiveExternalCatalog.
  • InMemoryCatalogedDDLSuite contains all the existing test cases in DDLSuite.
  • HiveCatalogedDDLSuite contains a subset of DDLSuite. The following test cases are excluded:
  1. The following test cases only make sense for InMemoryCatalog:
  test("desc table for parquet data source table using in-memory catalog")
  test("create a managed Hive source table") {
  test("create an external Hive source table")
  test("Create Hive Table As Select")
  1. The following test cases are unable to be ported because we are unable to alter table provider when using Hive metastore. In the future PRs we need to improve the test cases so that altering table provider is not needed:
  test("alter table: set location (datasource table)")
  test("alter table: set properties (datasource table)")
  test("alter table: unset properties (datasource table)")
  test("alter table: set serde (datasource table)")
  test("alter table: set serde partition (datasource table)")
  test("alter table: change column (datasource table)")
  test("alter table: add partition (datasource table)")
  test("alter table: drop partition (datasource table)")
  test("alter table: rename partition (datasource table)")
  test("drop table - data source table")

TODO : in the future PRs, we need to remove HiveDDLSuite and move the test cases to either DDLSuite, InMemoryCatalogedDDLSuite or HiveCatalogedDDLSuite.

How was this patch tested?

N/A

}.getMessage
assert(e.contains("Hive support is required to use CREATE Hive TABLE AS SELECT"))
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above four cases are copied from the existing ones in DDLSuites. These test cases only makes sense to InMemoryCatalog.


test("drop table - data source table") {
testDropTable(isDatasourceTable = true)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above 10 test cases are currently running with InMemoryCatalog only. The reason is HiveExternalCatalog does not allow users to change the table provider from hive to the others. In the future PRs, we can fix it.

@gatorsmile
Copy link
Member Author

cc @cloud-fan I think we really need to do this ASAP for improving the test case coverage in DDL commands, when I do the PR: #16587

@SparkQA
Copy link

SparkQA commented Jan 16, 2017

Test build #71420 has finished for PR 16592 at commit 0133463.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with BeforeAndAfterEach
  • abstract class DDLSuite extends QueryTest with SQLTestUtils
  • class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeAndAfterEach

@cloud-fan
Copy link
Contributor

can we wait for #16597?

@gatorsmile
Copy link
Member Author

Sure, no problem.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 17, 2017

Test build #71526 has finished for PR 16592 at commit 0133463.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with BeforeAndAfterEach
  • abstract class DDLSuite extends QueryTest with SQLTestUtils
  • class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeAndAfterEach

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 19, 2017

Test build #71651 has finished for PR 16592 at commit 0133463.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with BeforeAndAfterEach
  • abstract class DDLSuite extends QueryTest with SQLTestUtils
  • class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeAndAfterEach

@gatorsmile
Copy link
Member Author

ping @cloud-fan


private val escapedIdentifier = "`(.+)`".r

def normalizeCatalogTable(table: CatalogTable): CatalogTable = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need this in HiveCatalogedDDLSuite?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Let me move it to HiveCatalogedDDLSuite

intercept[DatabaseAlreadyExistsException] {
sql(s"CREATE DATABASE $dbName")
if (isUsingHiveMetastore) {
val e = intercept[AnalysisException] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can write intercept[AnalysisException] and leave a TODO saying that HiveExternalCatalog should throw DatabaseAlreadyExistsException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

val dbPath = new Path(spark.sessionState.conf.warehousePath)
s"${dbPath.toUri.getPath.stripSuffix("/")}/$dbName.db"
} else {
s"spark-warehouse/$dbName.db"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why hardcode it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: ) copied from the original place. Let me clean it. Thanks!

@SparkQA
Copy link

SparkQA commented Jan 20, 2017

Test build #71740 has finished for PR 16592 at commit b905191.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xwu0226
Copy link
Contributor

xwu0226 commented Feb 24, 2017

@gatorsmile is this PR going to be merged soon, since the ALTER ADD column PR #16626 also depends on this to create test cases for InMemoryCatalog. Thanks!

@gatorsmile
Copy link
Member Author

Let me continue the work now

@SparkQA
Copy link

SparkQA commented Mar 6, 2017

Test build #73975 has started for PR 16592 at commit b186dcd.

}
}

Seq("a b", "a:b", "a%b").foreach { specialChars =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is duplicate from the one in HiveCatalogedDDLSuite. Removed.

}

Seq("a b", "a:b", "a%b").foreach { specialChars =>
test(s"location uri contains $specialChars for database") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is duplicate from the one in HiveCatalogedDDLSuite. Removed.

@SparkQA
Copy link

SparkQA commented Mar 6, 2017

Test build #74037 has finished for PR 16592 at commit d19d725.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Mar 7, 2017

Test build #74116 has finished for PR 16592 at commit ec914dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val expectedLocation = makeQualifiedPath(s"spark-warehouse/$dbName.db")
val dbPath = new Path(spark.sessionState.conf.warehousePath)
val expectedDBLocation = s"file:${dbPath.toUri.getPath.stripSuffix("/")}/$dbName.db"
val expectedDBUri = CatalogUtils.stringToURI(expectedDBLocation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

val expectedDBPath = new Path(spark.sessionState.conf.warehousePath, s"$dbName.db").toURI

name: TableIdentifier): CatalogTable = {
val storage =
CatalogStorageFormat(
locationUri = Some(catalog.defaultTablePath(name)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: CatalogStorageFormat.copy(locationUri = xxx)

val dbPath = new Path(spark.sessionState.conf.warehousePath)
val expectedDBLocation =
s"file:${dbPath.toUri.getPath.stripSuffix("/")}/$dbNameWithoutBackTicks.db"
val expectedDBUri = CatalogUtils.stringToURI(expectedDBLocation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

val dbNameWithoutBackTicks = cleanIdentifier(dbName)
val location = makeQualifiedPath(s"spark-warehouse/$dbNameWithoutBackTicks.db")
val dbPath = new Path(spark.sessionState.conf.warehousePath)
val expectedDBLocation =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

sql(s"DROP DATABASE $dbName")
}.getMessage
assert(message.contains(s"Database '$dbNameWithoutBackTicks' not found"))
if (isUsingHiveMetastore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO to unify the exception?

sql(s"DROP DATABASE $dbName RESTRICT")
}.getMessage
assert(message.contains(s"Database '$dbName' is not empty. One or more tables exist"))
if (isUsingHiveMetastore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO to unify the exception?

assert(catalog.getPartition(tableIdent, part1).parameters("numFiles") == "1")
assert(catalog.getPartition(tableIdent, part2).parameters("numFiles") == "2")
if (!isUsingHiveMetastore) {
assert(catalog.getPartition(tableIdent, part1).parameters("numFiles") == "1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only test it for InMemoryExternalCatalog?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After ALTER TABLE, the statistics info of the first partition is removed by Hive megastore. : (

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me write a comment in the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

private def testDropTable(isDatasourceTable: Boolean): Unit = {
protected def testDropTable(isDatasourceTable: Boolean): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's only used in InMemoryCatalogedDDLSuite,we should move this method there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not only for InMemoryCatalogedDDLSuite . We have another call here.

"maxFileSize",
"minFileSize",
// EXTERNAL is not non-deterministic, but it is filtered out for external tables.
"EXTERNAL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is gone now.

@SparkQA
Copy link

SparkQA commented Mar 8, 2017

Test build #74160 has finished for PR 16592 at commit 64a8f5a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 8, 2017

Test build #74213 has finished for PR 16592 at commit 64a8f5a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val partitionLocation = if (isUsingHiveMetastore) {
val tableLocation = catalog.getTableMetadata(tableIdent).storage.locationUri
assert(tableLocation.isDefined)
makeQualifiedPath(tableLocation.get + "/paris")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should avoid hardcoding file seperator

@SparkQA
Copy link

SparkQA commented Mar 9, 2017

Test build #74223 has finished for PR 16592 at commit 9d4211c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public static class LongWrapper
  • public static class IntWrapper
  • case class CostBasedJoinReorder(conf: CatalystConf) extends Rule[LogicalPlan] with PredicateHelper
  • case class JoinPlan(itemIds: Set[Int], plan: LogicalPlan, joinConds: Set[Expression], cost: Cost)
  • case class Cost(rows: BigInt, size: BigInt)
  • abstract class RepartitionOperation extends UnaryNode
  • case class FlatMapGroupsWithState(
  • trait WatermarkSupport extends UnaryExecNode
  • case class FlatMapGroupsWithStateExec(

@windpiger
Copy link
Contributor

windpiger commented Mar 9, 2017

great~ waiting this to merge and fix my confilicts~

@gatorsmile
Copy link
Member Author

ping @cloud-fan

}

private def getDBPath(dbName: String): URI = {
val warehousePath = s"file:${spark.sessionState.conf.warehousePath.stripPrefix("file:")}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can call makeQualifiedPath here. We can fix it in follow up.

fs.makeQualified(hadoopPath).toUri
}

def makeQualifiedPath(path: Path): URI = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we don't need this, we already have def makeQualifiedPath(path: String): URI. let's remove it in follow up

Copy link
Contributor

@windpiger windpiger Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed the above two issues in #17095

@cloud-fan
Copy link
Contributor

great work! merging to master!

@asfgit asfgit closed this in 09829be Mar 9, 2017
}
// TODO(gatorsmile): fix the bug in alter table set location.
// if (isUsingHiveMetastore) {
// assert(storageFormat.properties.get("path") === expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to fix this bug and satify this test case?
When porting these test cases, a bug of SET LOCATION is found. path is not set when the location is changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We need to fix it. Thanks for working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants