-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19257][SQL]location for table/partition/database should be java.net.URI #17149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #73840 has finished for PR 17149 at commit
|
|
Test build #73841 has finished for PR 17149 at commit
|
|
Test build #73842 has finished for PR 17149 at commit
|
|
Test build #73855 has finished for PR 17149 at commit
|
| } | ||
| validateName(dbName) | ||
| val qualifiedPath = makeQualifiedPath(dbDefinition.locationUri).toString | ||
| val qualifiedPath = makeQualifiedPath(new Path(dbDefinition.locationUri).toString).toUri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makeQualifiedPath should accept URI now
| Map("partCol1" -> "1", "partCol2" -> "2")).location | ||
| val tableLocation = catalog.getTable("db1", "tbl").location | ||
| val defaultPartitionLocation = new Path(new Path(tableLocation, "partCol1=1"), "partCol2=2") | ||
| val tableLocationPath = new Path(catalog.getTable("db1", "tbl").location) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tableLocationPath sound weird, let's keep the previous name
| val partition1 = | ||
| CatalogTablePartition(Map("partCol1" -> "1", "partCol2" -> "2"), | ||
| storageFormat.copy(locationUri = Some(newLocationPart1))) | ||
| storageFormat.copy(locationUri = Some(new Path(newLocationPart1).toUri))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't newLocationPart1 already a URI?
| catalog.createTable(table, ignoreIfExists = false) | ||
|
|
||
| val tableLocation = catalog.getTable("db1", "tbl").location | ||
| val tableLocationPath = new Path(catalog.getTable("db1", "tbl").location) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep the previous name
| val catalog = newBasicCatalog() | ||
| try { | ||
| val newLocation = newUriForDatabase() | ||
| val newLocationUri = new Path(newUriForDatabase()).toUri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep the previous name
| def newFunc(): CatalogFunction = newFunc("funcName") | ||
|
|
||
| def newUriForDatabase(): String = Utils.createTempDir().toURI.toString.stripSuffix("/") | ||
| def newUriForDatabase(): URI = new URI(Utils.createTempDir().toURI.toString.stripSuffix("/")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can simply it and write Utils.createTempDir().toURI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utils.createTempDir().toURI has a suffix '/', here we have to strip it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan after I try to use Path to Compare , I think stripSuffix here is the simple way.
Path with a String type constructor will be equal when one has a /, and another does not.
scala> val x = new Path("/ab/c/")
x: org.apache.hadoop.fs.Path = /ab/c
scala> val y = new Path("/ab/c")
y: org.apache.hadoop.fs.Path = /ab/c
scala> x == y
res0: Boolean = true
Path with a URI type constructor will be not equal when one has a /, and another does not.
scala> val x =new URI("/a/b/c/")
x: java.net.URI = /a/b/c/
scala> val y =new URI("/a/b/c")
y: java.net.URI = /a/b/c
scala> x == y
res1: Boolean = false
scala> val x1 =new Path(x)
x1: org.apache.hadoop.fs.Path = /a/b/c/
scala> val y1 =new Path(y)
y1: org.apache.hadoop.fs.Path = /a/b/c
scala> x1 == y1
res2: Boolean = false
So just the Path with String type can ignore the suffix /, then if we have a URI in hand, and we want to compare with another URI, we should first transform them to String , and use this String to constructor a Path, after this two actions, we can compare them with ignore the suffix /.
But I think it is more complicate, here we normalize the URI with stripSuffix from the Orignal then compare two URI directly, it is more simple.
should we must to convert it to Path to compare?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok if we always compare URI with Path, instead of converting it to string.
|
|
||
| package org.apache.spark.sql.catalog | ||
|
|
||
| import java.net.URI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary import?
| identifier = table, | ||
| tableType = tableType, | ||
| storage = storage.copy(locationUri = customLocation), | ||
| storage = storage.copy(locationUri = customLocation.map{ loc => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
storage.copy(
locationUri = xxx)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add comments to explain why we don't create URI directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pattern appears many times in this PR, how about we create a util method toURI in CatalogUtils and add comments there?
| dir.delete | ||
| val tableLocFile = new File(table.location) | ||
| assert(!tableLocFile.exists) | ||
| // val tableLocFile = new File(table.location) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why comment it out?
| val storageWithPathOption = tableDefinition.storage.copy( | ||
| properties = tableDefinition.storage.properties ++ newLocation.map("path" -> _)) | ||
| properties = | ||
| tableDefinition.storage.properties ++newLocation.map("path" -> new Path(_).toString)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also add a util method URIToString and add comments
|
Test build #73976 has started for PR 17149 at commit |
|
Test build #73977 has started for PR 17149 at commit |
|
|
||
| /** | ||
| * Convert URI to String. | ||
| * Since URI.toString does not decode for the uri string, we need to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since URI.toString does not decode the uri, e.g. you may see `%21` instead of `!`.
Here we create a hadoop Path with the given URI, ad rely on Path.toString to decode the uri
|
Test build #73979 has started for PR 17149 at commit |
| * @return the String of the path | ||
| */ | ||
| def URIToString(uri: Option[URI]): Option[String] = { | ||
| uri.map(new Path(_).toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try java.net.URLDecoder? I'd like to avoid hadoop dependency on this kind of basic functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about StringToURI? it still import Path
| * FileSystem is changed. | ||
| */ | ||
| private def makeQualifiedPath(path: String): Path = { | ||
| private def makeQualifiedPath(path: URI): Path = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method can return URI too
| val catalog = newBasicCatalog() | ||
| try { | ||
| val newLocation = newUriForDatabase() | ||
| val newLocation = new Path(newUriForDatabase()).toUri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change?
|
|
||
| import org.antlr.v4.runtime.{ParserRuleContext, Token} | ||
| import org.antlr.v4.runtime.tree.TerminalNode | ||
| import org.apache.hadoop.fs.Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this
|
|
||
| val tableLocation = if (table.tableType == CatalogTableType.MANAGED) { | ||
| Some(sessionState.catalog.defaultTablePath(table.identifier)) | ||
| Some(new Path(sessionState.catalog.defaultTablePath(table.identifier)).toUri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the defaultTablePath should also return URI now.
| Row("Database Name", dbMetadata.name) :: | ||
| Row("Description", dbMetadata.description) :: | ||
| Row("Location", dbMetadata.locationUri) :: Nil | ||
| Row("Location", new Path(dbMetadata.locationUri).toString) :: Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use CatalogUtil.URIToString?
| databaseName, | ||
| comment.getOrElse(""), | ||
| path.getOrElse(catalog.getDefaultDBPath(databaseName)), | ||
| new Path(path.getOrElse(catalog.getDefaultDBPath(databaseName))).toUri, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getDefaultDBPath should return URI now.
| None | ||
| } else { | ||
| Some(s"path '${escapeSingleQuotedString(location)}'") | ||
| Some(s"path '${escapeSingleQuotedString(new Path(location).toString)}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use URIToString?
| name = metadata.name, | ||
| description = metadata.description, | ||
| locationUri = metadata.locationUri) | ||
| locationUri = new Path(metadata.locationUri).toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URIToString?
| { | ||
| val defaultDbDefinition = CatalogDatabase( | ||
| SessionCatalog.DEFAULT_DATABASE, "default database", warehousePath, Map()) | ||
| SessionCatalog.DEFAULT_DATABASE, "default database", new Path(warehousePath).toUri, Map()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringToURI?
| private def createDatabase(catalog: SessionCatalog, name: String): Unit = { | ||
| catalog.createDatabase( | ||
| CatalogDatabase(name, "", spark.sessionState.conf.warehousePath, Map()), | ||
| CatalogDatabase(name, "", new Path(spark.sessionState.conf.warehousePath).toUri, Map()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringToURI?
| val storage = | ||
| CatalogStorageFormat( | ||
| locationUri = Some(catalog.defaultTablePath(name)), | ||
| locationUri = Some(new Path(catalog.defaultTablePath(name)).toUri), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringToURI?
| sql(s"CREATE DATABASE $dbName") | ||
| val db1 = catalog.getDatabaseMetadata(dbName) | ||
| val expectedLocation = makeQualifiedPath(s"spark-warehouse/$dbName.db") | ||
| val expectedLocation = new Path(makeQualifiedPath(s"spark-warehouse/$dbName.db")).toUri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringToURI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just implement this function with a option[String], I think for a String without a option, it is already simple, we still to implement one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think StringToURI should convert String to URI, not Option[String].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, maybe we should implement them all? option[string] also exists in many places, and more complicate
|
Test build #74000 has finished for PR 17149 at commit
|
|
Test build #74004 has finished for PR 17149 at commit
|
|
Test build #73995 has finished for PR 17149 at commit
|
|
Test build #73996 has finished for PR 17149 at commit
|
|
Test build #74012 has finished for PR 17149 at commit
|
| "you can only specify one of them.", ctx) | ||
| } | ||
| val customLocation = storage.locationUri.orElse(location) | ||
| val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: xx.map(CatalogUtils.stringToURI)
| } | ||
| } | ||
|
|
||
| Seq("a b", "a:b", "a%b").foreach { specialChars => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a TODO to remove these duplicated tests when we merge DDLSuite and HiveDDLSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces the conflict in my PR #16592. Let me remove the duplicate there.
|
thanks, merging to master! |
| } | ||
| val customLocation = storage.locationUri.orElse(location) | ||
| val customLocation = storage.locationUri.orElse(location.map(CatalogUtils.stringToURI(_))) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this has a double-de/encoding problem when the input path is an URI.
scala> new org.apache.hadoop.fs.Path(new java.io.File("a b").toURI.toString).toUri.toString
res1: String = file:/.../a%2520bA space character in URI is encoded as %20 and % character is encoded as %25. It seems a URI has a %20 and then is url-encoded from %20 to %2520.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it seems we have a similar util with CatalogUtils.stringToURI in org.apache.spark.util.Utils.resolveURI. Could we consolidate them? I did not replace it when I identified this because some existing tests in org.apache.spark.util.UtilsSuite were broken but I guess it would be fine if there is a coherent reason. These broken cases might be bugs.
@windpiger could you double check my comments and open a followup if I was not wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not replace it when I identified this because some existing tests in org.apache.spark.util.UtilsSuite were broken but I guess it would be fine if there is a coherent reason. These broken cases might be bugs.
FYI.. cc @sarutak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you reproduce it as a bug, please submit a PR to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how could the input path be a URI string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. is the input path always expected to be a path? I thought we support both URI and path forms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ambiguous to support both, what if users do wanna create a path /tmp/%25?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to my knowledge, HDFS's fully qualified path is a URI form. If we do not support this, that virtually means we are going to disallow the fully qualified path. I understand it sounds ambiguous but disallowing does not look a good solution.
Also, if users might want to access to local files or S3 when default scheme is hdfs, I guess it requires changing the default scheme every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, I guess we already have a lot of tests with URI input paths and I did many of them to pass the tests on Windows, which I guess implicitly some committers do not disagree with this.
IMHO, I guess URIs should be supported first correctly because those local path form is abbreviation of the fully qualified path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how could the input path be a URI string?
How about external tables on S3?
| // compatible format, which means the data source is file-based and must have a `path`. | ||
| require(table.storage.locationUri.isDefined, | ||
| "External file-based data source table must have a `path` entry in storage properties.") | ||
| Some(new Path(table.location).toUri.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon previously we also encode the table location string, I think we never supported uri path string before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it seems the problem itself seems existing before.
I found this while running related tests on Windows which it looks related with this PR (special character stuff) so I think this is a proper JIRA to make a followup with and PR to note this.
It seems we do have tests that use URIs already whether it was mistakenly supported or not. Should we ask it to dev-mailing list? I think this is an important decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, please go ahead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for confirming. Let me send a mail soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I did a bit of search more about what org.apache.hadoop.fs.Path expects. It seems the documentation says:
Construct a path from a String. Path strings are URIs, but with unescaped elements and some additional normalization.
It seems those strings are expected to be unescaped. So, it seems we support URI with unescaped characters, which is inherited from Path.
I want to be sure on this because I have fixed many tests to use URIs to pass on Windows and I am about to fix them further in this way. @steveloughran, do you mind if I cc you and ask to take a look and help to confirm this please? I know no one who knows better about Hadoop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aah, paths, especially on windows make me weep. Note also lots of fun related to encoded secrets in s3n/s3a/s3 urls (HADOOP-3733 and the associated regression HADOOP-14114)
Because of those patches, and the fact that if you want to embed a / in an S3A URI you must escape it, I don't know what to say here.
If you can come up with a simple set of questions to ask someone who doesn't know this code but could understand Path's logic, I could pass them on to some people I think may know the answer. Maybe @cnauroth might have some insight on what windows gets up to
BTW, Path in Hadoop 2.9 is Serializable. you can pass them down to closures working on RDDs, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @steveloughran. Let me try to produce a simple set of questions purely about org.apache.hadoop.fs.Path soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have two questions as below:
-
Is it okay to use both URIs and local file paths for the input string for
org.apache.hadoop.fs.Pathin general (when they are expected to be unescaped)? -
What
org.apache.hadoop.fs.Pathexpects for input string in definition? It seems URI strings but with unescaped characters per https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/Path.html and there are special cases that requires escaped characters such as s3n/s3a/s3 urls. Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- there's actually some discussion about defining this properly, related to the problem of "colons in object store paths". That's not going to help directly, but a sign of something which has historically been underspecified
- regarding URI strings + escaping, that S3 encoded password is a bit of an outlier. I plan to cut it ASAP —we just need a release of hadoop with it printing out warnings first & supporting the per-bucket configuration people need for the use case "auth as somebody else on a specific bucket". Given Hadoop 2.8 meets these criteria, Hadoop 3.x is the obvious target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Your answer and the pointer helped me a lot to understand.
So, up to my understanding...
There are few exceptional cases, for example, special characters or encoded characters in those URIs. Also, there are some corner cases that are not clearly defined about what Path expects.
Nevertheless, the baseline is fully qualified URI with unescaped characters or the unescaped path part in URI as the documentation says but there may be some variants for now or in the future.
|
@HyukjinKwon , nice to meet you! I see I got notified here for a bit of Hadoop
Yes, this is correct. Specifically on the topic of Windows, The standard invocation for doing this in the Hadoop code is |
|
@cnauroth Thank you so much for your help. FWIW, actually, in Spark's case, it seems we can't use directly use local paths on Windows via the parser because it seems Spark's parser tries to escape |
|
(I think I should cc @srowen too FYI because he reviewed all my PRs fixing the tests on Windows. The point here seems to me okay to use URIs in general.) |
|
Our parser might need a change regarding escape handling. We are having a related discussion in another PR: #15398 |
|
@gatorsmile, Thanks for your pointer. There is a good discussion there. |
What changes were proposed in this pull request?
Currently we treat the location of table/partition/database as URI string.
It will be safer if we can make the type of location as java.net.URI.
In this PR, there are following classes changes:
1. CatalogDatabase
2. CatalogStorageFormat
Before and After this PR, it is transparent for user, there is no change that the user should concern. The
StringtoURIjust happened in SparkSQL internally.Here list some operation related location:
1. whitespace in the location
e.g.
/a/b c/dFor both table location and partition location,
After
CREATE TABLE t... (PARTITIONED BY ...) LOCATION '/a/b c/d',then
DESC EXTENDED tshow the location is/a/b c/d,and the real path in the FileSystem also show
/a/b c/d2. colon(:) in the location
e.g.
/a/b:c/dFor both table location and partition location,
when
CREATE TABLE t... (PARTITIONED BY ...) LOCATION '/a/b:c/d',In linux file system
DESC EXTENDED tshow the location is/a/b:c/d,and the real path in the FileSystem also show
/a/b:c/din HDFS throw exception:
java.lang.IllegalArgumentException: Pathname /a/b:c/d from hdfs://iZbp1151s8hbnnwriekxdeZ:9000/a/b:c/d is not a valid DFS filename.while After
INSERT INTO TABLE t PARTITION(a="a:b") SELECT 1then
DESC EXTENDED tshow the location is/xxx/a=a%3Ab,and the real path in the FileSystem also show
/xxx/a=a%3Ab3. percent sign(%) in the location
e.g.
/a/b%c/dFor both table location and partition location,
After
CREATE TABLE t... (PARTITIONED BY ...) LOCATION '/a/b%c/d',then
DESC EXTENDED tshow the location is/a/b%c/d,and the real path in the FileSystem also show
/a/b%c/d4. encoded(%25) in the location
e.g.
/a/b%25c/dFor both table location and partition location,
After
CREATE TABLE t... (PARTITIONED BY ...) LOCATION '/a/b%25c/d',then
DESC EXTENDED tshow the location is/a/b%25c/d,and the real path in the FileSystem also show
/a/b%25c/dwhile After
INSERT INTO TABLE t PARTITION(a="%25") SELECT 1then
DESC EXTENDED tshow the location is/xxx/a=%2525,and the real path in the FileSystem also show
/xxx/a=%2525Additionally, except the location, there are two other factors will affect the location of the table/partition. one is the table name which does not allowed to have special characters, and the other is
partition namewhich have the same actions withpartition value, andpartition namewith special character situation has add some testcase and resolve a bug in PRSummary:
After
CREATE TABLE t... (PARTITIONED BY ...) LOCATION path,the path which we get from
DESC TABLEandreal path in FileSystemare all the same with theCREATE TABLEcommand(different filesystem has different action that allow what kind of special character to create the path, e.g. HDFS does not allow colon, but linux filesystem allow it ).DataBasealso have the same logic withCREATE TABLEwhile if the
partition valuehas some special character like%:#etc, then we will get the path with encodedpartition valuelike/xxx/a=A%25BfromDESC TABLEandreal path in FileSystemIn this PR, the core change code is using
new Path(str).toUriandnew Path(uri).toStringwhich transfrom
str to urioruri to str.for example:
when we restore table/partition from metastore, or get the location from
CREATE TABLEcommand, we can use it as above to change string to urinew Path(str).toUriHow was this patch tested?
unit test added.
The
current master branchalsopassed all the test casesadded in this PR by a litter change.https://github.com/apache/spark/pull/17149/files#diff-b7094baa12601424a5d19cb930e3402fR1764
here
toURI->toStringwhen test in master branch.This can show that this PR is transparent for user.