Skip to content

Conversation

@cloud-fan
Copy link

No description provided.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a method for it, so that we can reuse it in SparkSqlAstBuilder

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visitReplaceTableHeader simply return 3 false for these 3 properties. So it's simpler to use assert here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is bad practice for a method to assume it will get certain results from other methods. While using an assert handles correctness, if there is a change that violates the assertion, a user would get a nearly unusable error.

I think this change should be reverted so that the error messages are meaningful.

Copy link
Author

@cloud-fan cloud-fan Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when table columns list is not specified, we ignore the partition columns with data type. It was fine before syntax merging, as there was no partition columns with data type in REPLACE TABLE. But now it's better to make it consistent with CREATE TABLE. I also added test to check it: https://github.com/rdblue/spark/pull/8/files#diff-b9e91f767e5562861565b0ce78759af3bcb7fff405a81e928894641147db2ae4R293

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit weird to print scala Option directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise the assert below is useless.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some refactoring, to avoid duplicated code between CREATE TABLE and CTAS, buildCatalogTable and buildHiveCatalogTable

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. I like that it should have fewer changes from master.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method closely follow the original logic of creating hive table in SparkSqlAstBuilder

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is roughly merging 3 serde infos: the one user-specified, the one inferred from STORED AS, and the default one.

}
}

private def toStorageFormat(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of duplicated code here once we move CREATE TABLE LIKE and INSERT DIRECTORY to v2 command.

operationNotAllowed("INSERT OVERWRITE DIRECTORY must be accompanied by path", ctx)
}

val default = HiveSerDe.getDefaultStorage(conf)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why INSERT DIRECTORY considers the default serde info but CREATE TABLE LIKE does not. I'm going to fix it later and keep the behavior unchanged here.

Copy link
Owner

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, it mostly looks good but I did find a few things to change and at least one bug from using _._1 instead of _._2. Let me know when you've had time to update this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is bad practice for a method to assume it will get certain results from other methods. While using an assert handles correctness, if there is a change that violates the assertion, a user would get a nearly unusable error.

I think this change should be reverted so that the error messages are meaningful.

serde = serdeInfo.serde.orElse(hiveSerde.serde),
properties = serdeInfo.serdeProperties)
case _ =>
operationNotAllowed(s"STORED AS with file format '${serdeInfo.storedAs.get}'", ctx)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay, but my intent was to avoid mixing parsing logic and validation where possible. The parser should return what was parsed to Spark, which should decide whether it is supported.

I don't think this is a blocker because it is in the SparkSqlParser instead of the one in catalyst. We can fix this when moving these commands to v2.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. I like that it should have fewer changes from master.

@cloud-fan
Copy link
Author

@rdblue thanks for the suggestions! Pushed a commit to update it, please take another look, thanks!

"CREATE OR REPLACE TEMPORARY TABLE ..., use CREATE TEMPORARY VIEW instead",
ctx)
operationNotAllowed("CREATE OR REPLACE TEMPORARY TABLE is not supported yet. " +
"Please use CREATE OR REPLACE TEMPORARY VIEW as an alternative.", ctx)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use the original error message, "CREATE OR REPLACE TEMPORARY TABLE ..., use CREATE TEMPORARY VIEW instead".

Using operationNotAllowed means that the message is prefixed with "Operation not allowed: ", so adding "is not supported yet." is not helpful and just makes the message harder to read. In addition, "yet" implies that this will be supported and there is no reason to do so.

.getOrElse(StructType(partCols))
if (temp) {
operationNotAllowed("CREATE TEMPORARY TABLE is not supported yet. " +
"Please use CREATE TEMPORARY VIEW as an alternative.", ctx)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the other error message, "CREATE TEMPORARY TABLE ... AS ..., use CREATE TEMPORARY VIEW instead".

As I noted on the similar case, "is not supported yet" is both redundant and misleading. I don't think that Spark intends to implement CREATE TEMPORARY TABLE. Even if it may be implemented, it has not been supported for years, so there is no value in implying that it will be supported.

Please update to use the simpler and clearer error mesage.

Copy link
Owner

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only things that need to be fixed are the error messages that were changed and are now longer and less clear.

@rdblue rdblue merged this pull request into rdblue:unify-create-table Nov 23, 2020
@rdblue
Copy link
Owner

rdblue commented Nov 23, 2020

Thanks, @cloud-fan. I've merged this.

rdblue pushed a commit that referenced this pull request Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants