-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-19257][SQL]location for table/partition/database should be java.net.URI #17149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
38436e8
69a1646
890327a
e792cb6
f2b9bd8
109e2b5
dc0a37b
b6bc466
abfb6f5
80f2c40
681db88
5b92620
5b423f5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -386,7 +386,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { | |
| "LOCATION and 'path' in OPTIONS are both used to indicate the custom table path, " + | ||
| "you can only specify one of them.", ctx) | ||
| } | ||
| val customLocation = storage.locationUri.orElse(location) | ||
| val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_))) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: |
||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems this has a double-de/encoding problem when the input path is an URI. scala> new org.apache.hadoop.fs.Path(new java.io.File("a b").toURI.toString).toUri.toString
res1: String = file:/.../a%2520bA space character in URI is encoded as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, it seems we have a similar util with @windpiger could you double check my comments and open a followup if I was not wrong? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FYI.. cc @sarutak There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you reproduce it as a bug, please submit a PR to fix it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how could the input path be a URI string? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm.. is the input path always expected to be a path? I thought we support both URI and path forms. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's ambiguous to support both, what if users do wanna create a path There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Up to my knowledge, HDFS's fully qualified path is a URI form. If we do not support this, that virtually means we are going to disallow the fully qualified path. I understand it sounds ambiguous but disallowing does not look a good solution. Also, if users might want to access to local files or S3 when default scheme is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In addition, I guess we already have a lot of tests with URI input paths and I did many of them to pass the tests on Windows, which I guess implicitly some committers do not disagree with this. IMHO, I guess URIs should be supported first correctly because those local path form is abbreviation of the fully qualified path. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How about external tables on S3? |
||
| val tableType = if (customLocation.isDefined) { | ||
| CatalogTableType.EXTERNAL | ||
|
|
@@ -1080,8 +1080,10 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { | |
| if (external && location.isEmpty) { | ||
| operationNotAllowed("CREATE EXTERNAL TABLE must be accompanied by LOCATION", ctx) | ||
| } | ||
|
|
||
| val locUri = location.map(CatalogUtils.stringToURI(_)) | ||
| val storage = CatalogStorageFormat( | ||
| locationUri = location, | ||
| locationUri = locUri, | ||
| inputFormat = fileStorage.inputFormat.orElse(defaultStorage.inputFormat), | ||
| outputFormat = fileStorage.outputFormat.orElse(defaultStorage.outputFormat), | ||
| serde = rowStorage.serde.orElse(fileStorage.serde).orElse(defaultStorage.serde), | ||
|
|
@@ -1132,7 +1134,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { | |
| // At here, both rowStorage.serdeProperties and fileStorage.serdeProperties | ||
| // are empty Maps. | ||
| val newTableDesc = tableDesc.copy( | ||
| storage = CatalogStorageFormat.empty.copy(locationUri = location), | ||
| storage = CatalogStorageFormat.empty.copy(locationUri = locUri), | ||
| provider = Some(conf.defaultDataSourceName)) | ||
| CreateTable(newTableDesc, mode, Some(q)) | ||
| } else { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can simply it and write
Utils.createTempDir().toURIThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utils.createTempDir().toURIhas a suffix '/', here we have to strip itThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan after I try to use Path to Compare , I think stripSuffix here is the simple way.
Path with a
Stringtype constructor will be equal when one has a/, and another does not.Path with a
URItype constructor will be not equal when one has a/, and another does not.So just the Path with
Stringtype can ignore the suffix/, then if we have aURIin hand, and we want to compare with anotherURI, we should first transform them toString, and use this String to constructor a Path, after this two actions, we can compare them with ignore the suffix/.But I think it is more complicate, here we normalize the URI with stripSuffix from the Orignal then compare two URI directly, it is more simple.
should we must to convert it to Path to compare?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok if we always compare URI with Path, instead of converting it to string.