[SPARK-12270][SQL]remove empty space after getString from database #10262

huaxingao · 2015-12-11T02:09:55Z

{code}
conn.prepareStatement(
"create table people (name char(32)").executeUpdate()
conn.prepareStatement("insert into people values ('fred')").executeUpdate()
sql(
s"""
|CREATE TEMPORARY TABLE foobar
|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$url', dbtable 'PEOPLE', user 'testuser', password 'testpassword')
""".stripMargin.replaceAll("\n", " "))
val df = sqlContext.sql("SELECT * FROM foobar WHERE NAME = 'fred'")
{code}
I am expecting to see one row with content 'fred' in df. However, there is no row returned. If I changed the data type to varchar (32) in the create table ddl , then I can get the row back correctly. The cause of the problem is that for data type char (num), DB2 defines it as fixed-length character strings, so if I have char (32), when doing "SELECT * FROM foobar WHERE NAME = 'fred'", DB2 returns 'fred' padded with 28 empty space. Spark treats "fred' padded with empty space not the same as 'fred' so df doesn't have any row. If I have varchar (32), DB2 just returns 'fred' for the select statement and df has the right row. In order to make DB2 char (num) works for spark, I suggest to change spark code to trim the empty space after get the data from database.

andrewor14 · 2015-12-14T22:41:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala

you could just do

case StringConversion => val newString = Option(rs.getString(pos)).map { s => UTF8String.fromString(s.trim) }.orNull mutableRow.update(i, newString)

andrewor14 · 2015-12-14T22:42:17Z

@yhuai

huaxingao · 2015-12-15T08:11:13Z

@andrewor14
Thanks a lot for your comment. I will change to what you suggested.
In the same method, case DecimalConversion has the similar code. Shall I change that one too?

andrewor14 · 2015-12-16T00:36:01Z

Jenkins, ok to test

SparkQA · 2015-12-16T02:24:35Z

Test build #47763 has finished for PR 10262 at commit 63a3d83.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-12-17T00:16:31Z

Can you add a test?

huaxingao · 2015-12-19T23:23:46Z

@yhuai
JDBCSuite uses H2 database. It seems that for char(n) data type, either H2 database doesn't pad, or the H2 JDBC driver already trims the empty space for ResultSet.getString. So H2 database doesn't have this problem. To show the problem, it will need DB2 and DB2 JDBC driver ( I guess Oracle has the same problem too) , but I don't think the test system has DB2 JDBC driver. So I am guessing maybe no need to add the test?

yhuai · 2015-12-20T02:49:50Z

I guess we can try to add a docker test. Can you try to add one in https://github.com/apache/spark/tree/master/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc?

huaxingao · 2016-01-11T21:48:11Z

@yhuai
Sorry for the late reply. I waited for my coworker Luciano to come back from vacation today to check with him about his DB2 docker test status. He has a PR to add DB2 docker test. #9893. His PR is still pending because of the DB2 jdbc driver dependency. After his PR is merged, I will add a test in his DB2 docker test.

HyukjinKwon · 2016-04-25T05:34:53Z

@huaxingao Do you mind If I submit a PR based on this if you are not working on this? (It looks it has been inactive for few months!)

huaxingao · 2016-04-25T16:47:45Z

@HyukjinKwon
I will continue working on this and finish the work this week.

HyukjinKwon · 2017-02-09T12:45:03Z

Hi @huaxingao, would this be better closed for now?

[SPARK-12270][SQL]remove empty space after getString from database

63a3d83

andrewor14 reviewed Dec 14, 2015
View reviewed changes

huaxingao closed this Feb 9, 2017

[SPARK-12270][SQL]remove empty space after getString from database #10262

[SPARK-12270][SQL]remove empty space after getString from database #10262

Uh oh!

Conversation

huaxingao commented Dec 11, 2015

Uh oh!

andrewor14 Dec 14, 2015

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Dec 14, 2015

Uh oh!

huaxingao commented Dec 15, 2015

Uh oh!

andrewor14 commented Dec 16, 2015

Uh oh!

SparkQA commented Dec 16, 2015

Uh oh!

yhuai commented Dec 17, 2015

Uh oh!

huaxingao commented Dec 19, 2015

Uh oh!

yhuai commented Dec 20, 2015

Uh oh!

huaxingao commented Jan 11, 2016

Uh oh!

HyukjinKwon commented Apr 25, 2016

Uh oh!

huaxingao commented Apr 25, 2016

Uh oh!

HyukjinKwon commented Feb 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants