-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12270][SQL]remove empty space after getString from database #10262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just do
case StringConversion =>
val newString = Option(rs.getString(pos)).map { s => UTF8String.fromString(s.trim) }.orNull
mutableRow.update(i, newString)
|
@andrewor14 |
|
Jenkins, ok to test |
|
Test build #47763 has finished for PR 10262 at commit
|
|
Can you add a test? |
|
@yhuai |
|
I guess we can try to add a docker test. Can you try to add one in https://github.com/apache/spark/tree/master/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc? |
|
@yhuai |
|
@huaxingao Do you mind If I submit a PR based on this if you are not working on this? (It looks it has been inactive for few months!) |
|
@HyukjinKwon |
|
Hi @huaxingao, would this be better closed for now? |
{code}
conn.prepareStatement(
"create table people (name char(32)").executeUpdate()
conn.prepareStatement("insert into people values ('fred')").executeUpdate()
sql(
s"""
|CREATE TEMPORARY TABLE foobar
|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$url', dbtable 'PEOPLE', user 'testuser', password 'testpassword')
""".stripMargin.replaceAll("\n", " "))
val df = sqlContext.sql("SELECT * FROM foobar WHERE NAME = 'fred'")
{code}
I am expecting to see one row with content 'fred' in df. However, there is no row returned. If I changed the data type to varchar (32) in the create table ddl , then I can get the row back correctly. The cause of the problem is that for data type char (num), DB2 defines it as fixed-length character strings, so if I have char (32), when doing "SELECT * FROM foobar WHERE NAME = 'fred'", DB2 returns 'fred' padded with 28 empty space. Spark treats "fred' padded with empty space not the same as 'fred' so df doesn't have any row. If I have varchar (32), DB2 just returns 'fred' for the select statement and df has the right row. In order to make DB2 char (num) works for spark, I suggest to change spark code to trim the empty space after get the data from database.