Skip to content

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

After #16552 , CreateHiveTableAsSelectCommand becomes very similar to CreateDataSourceTableAsSelectCommand, and we can further simplify it by only creating table in the table-not-exist branch.

This PR also adds hive provider checking in DataStream reader/writer, which is missed in #16552

How was this patch tested?

N/A

@cloud-fan
Copy link
Contributor Author

cc @gatorsmile @windpiger

@SparkQA
Copy link

SparkQA commented Jan 24, 2017

Test build #71933 has finished for PR 16693 at commit db00cf9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def start(): StreamingQuery = {
if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) {
throw new AnalysisException("Hive data source can only be used with tables, you can not " +
"read files of Hive data source directly.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not to read but write the results to Hive tables, right?

def load(): DataFrame = {
if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) {
throw new AnalysisException("Hive data source can only be used with tables, you can not " +
"write files of Hive data source directly.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to read the streaming data from Hive tables, right? I think we need to fix the error message.

return Seq.empty
}
sparkSession.sessionState.executePlan(InsertIntoTable(
metastoreRelation, Map(), query, overwrite = false, ifNotExists = false)).toRdd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh... Previously, we try to create the table even if the table still exists. A good change!

val withSchema = if (withFormat.schema.isEmpty) {
tableDesc.copy(schema = query.schema)
} else {
withFormat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the other reviewers, this is not needed, because the schema is always empty when we need to create a table. See the assert here..

tableDesc.storage.outputFormat
.orElse(Some(classOf[HiveIgnoreKeyTextOutputFormat[Text, Text]].getName)),
serde = tableDesc.storage.serde.orElse(Some(classOf[LazySimpleSerDe].getName)),
compressed = tableDesc.storage.compressed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after the code refactoring, this is always ensured in the rule DetermineHiveSerde.

@gatorsmile
Copy link
Member

LGTM except two minor comments in the error messages.

@cloud-fan
Copy link
Contributor Author

@gatorsmile thanks for adding comments about why the cleanup is safe!

@SparkQA
Copy link

SparkQA commented Jan 26, 2017

Test build #72022 has finished for PR 16693 at commit f4a9342.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

1 similar comment
@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jan 28, 2017

Test build #72109 has finished for PR 16693 at commit f4a9342.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in f7c07db Jan 29, 2017
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
## What changes were proposed in this pull request?

After apache#16552 , `CreateHiveTableAsSelectCommand` becomes very similar to `CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only creating table in the table-not-exist branch.

This PR also adds hive provider checking in DataStream reader/writer, which is missed in apache#16552

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes apache#16693 from cloud-fan/minor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants