Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

package org.apache.spark.sql.catalyst.analysis

import java.util.Locale

import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Range}
import org.apache.spark.sql.catalyst.rules._
Expand Down Expand Up @@ -103,7 +105,7 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] {

override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
case u: UnresolvedTableValuedFunction if u.functionArgs.forall(_.resolved) =>
builtinFunctions.get(u.functionName.toLowerCase()) match {
builtinFunctions.get(u.functionName.toLowerCase(Locale.ROOT)) match {
case Some(tvf) =>
val resolved = tvf.flatMap { case (argList, resolver) =>
argList.implicitCast(u.functionArgs) match {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,14 +114,14 @@ class SessionCatalog(
* Format table name, taking into account case sensitivity.
*/
protected[this] def formatTableName(name: String): String = {
if (conf.caseSensitiveAnalysis) name else name.toLowerCase
if (conf.caseSensitiveAnalysis) name else name.toLowerCase(Locale.ROOT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I think is that this affects user apps and we were trying to avoid changes like this. The change was only about internal strings.

I would imagine the fix is in a test, not the main code?

Copy link
Member Author

@gatorsmile gatorsmile Apr 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the restrictions on database/table names. That is, the names can only contain ("[a-zA-z_0-9]+").

Without the fixe in this PR, users are not allowed to read/write/create a table whose name containing I, because toLowerCase will convert it to ı when the locale is tr. The names become illegal. Is my understanding right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are correct then, if these identifiers always have only alphanumeric characters. There's no case where lower-casing the table name should be locale-sensitive then.

Is this true of column names?

It won't be true of data, and those are the cases I was trying to leave alone along with user-supplied table and col names, but maybe the latter two aren't locale-sensitive.

Copy link
Member

@HyukjinKwon HyukjinKwon Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think column names have such restrictions. Assuming #7165, it seems we support other characters in column names. I can provide several cases that data becomes column names as below:

scala> Seq("").toDF("a").groupBy("a").pivot("a").count().show()
+---+---+
|  a||
+---+---+
||  1|
+---+---+
scala> import org.apache.spark.sql.functions
import org.apache.spark.sql.functions

scala> spark.range(1).select(functions.lit("")).show()
+---+
||
+---+
||
+---+

Seems parser does not allow such characters though.

scala> sql("SELECT 아 FROM tbl")
org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'SELECT 아'(line 1, pos 7)

== SQL ==
SELECTFROM tbl
-------^^^

  at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:210)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:112)
  at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:66)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:622)
  ... 48 elided

EDITED: We can use backquotes instead in this case

scala> sql("SELECT `아` FROM tbl")
res29: org.apache.spark.sql.DataFrame = [: bigint]

It seems we can still select

scala> Seq("").toDF("a").groupBy("a").pivot("a").count().createOrReplaceTempView("tbl")

scala> sql("SELECT * FROM tbl").show()
+---+---+
|  a||
+---+---+
||  1|
+---+---+

If these were mistakenly supported, these should have the restrictions first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to use backticks to quote the column names?

Copy link
Member

@HyukjinKwon HyukjinKwon Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean selectExpr("`아`") via the parser? Ah, Sorry, it seems working if we backquotes.

scala> Seq(1).toDF("").selectExpr("`아`")
res14: org.apache.spark.sql.DataFrame = [: int]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, it works as expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, so, it seems the column names don't have such restrictions. I added the case you mentioned above in my comment.

}

/**
* Format database name, taking into account case sensitivity.
*/
protected[this] def formatDatabaseName(name: String): String = {
if (conf.caseSensitiveAnalysis) name else name.toLowerCase
if (conf.caseSensitiveAnalysis) name else name.toLowerCase(Locale.ROOT)
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

package org.apache.spark.sql.internal

import java.util.Locale

import scala.reflect.ClassTag
import scala.util.control.NonFatal

Expand Down Expand Up @@ -114,7 +116,7 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
// System preserved database should not exists in metastore. However it's hard to guarantee it
// for every session, because case-sensitivity differs. Here we always lowercase it to make our
// life easier.
val globalTempDB = sparkContext.conf.get(GLOBAL_TEMP_DATABASE).toLowerCase
val globalTempDB = sparkContext.conf.get(GLOBAL_TEMP_DATABASE).toLowerCase(Locale.ROOT)
if (externalCatalog.databaseExists(globalTempDB)) {
throw new SparkException(
s"$globalTempDB is a system preserved database, please rename your existing database " +
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2295,5 +2295,24 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils {
}
}
}

test(s"basic DDL using locale tr - caseSensitive $caseSensitive") {
withSQLConf(SQLConf.CASE_SENSITIVE.key -> s"$caseSensitive") {
withLocale("tr") {
val dbName = "DaTaBaSe_I"
withDatabase(dbName) {
sql(s"CREATE DATABASE $dbName")
sql(s"USE $dbName")

val tabName = "tAb_I"
withTable(tabName) {
sql(s"CREATE TABLE $tabName(col_I int) USING PARQUET")
sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1")
checkAnswer(sql(s"SELECT col_I FROM $tabName"), Row(1) :: Nil)
}
}
}
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ package org.apache.spark.sql.test
import java.io.File
import java.net.URI
import java.nio.file.Files
import java.util.UUID
import java.util.{Locale, UUID}

import scala.language.implicitConversions
import scala.util.control.NonFatal
Expand Down Expand Up @@ -228,6 +228,32 @@ private[sql] trait SQLTestUtils
}
}

/**
* Drops database `dbName` after calling `f`.
*/
protected def withDatabase(dbNames: String*)(f: => Unit): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this but there's only one usage of it, and other tests don't seem to bother to drop their tables -- is it necessary within the context of one run? or just inline this? I don't feel strongly, you can leave it too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we will use it more when refactoring the test cases.

try f finally {
dbNames.foreach { name =>
spark.sql(s"DROP DATABASE IF EXISTS $name")
}
}
}

/**
* Enables Locale `language` before executing `f`, then switches back to the default locale of JVM
* after `f` returns.
*/
protected def withLocale(language: String)(f: => Unit): Unit = {
val originalLocale = Locale.getDefault
try {
// Add Locale setting
Locale.setDefault(new Locale(language))
f
} finally {
Locale.setDefault(originalLocale)
}
}

/**
* Activates database `db` before executing `f`, then switches back to `default` database after
* `f` returns.
Expand Down