Skip to content

Conversation

@huaxingao
Copy link
Contributor

{code}
conn.prepareStatement(
"create table people (name char(32)").executeUpdate()
conn.prepareStatement("insert into people values ('fred')").executeUpdate()
sql(
s"""
|CREATE TEMPORARY TABLE foobar
|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$url', dbtable 'PEOPLE', user 'testuser', password 'testpassword')
""".stripMargin.replaceAll("\n", " "))
val df = sqlContext.sql("SELECT * FROM foobar WHERE NAME = 'fred'")
{code}
I am expecting to see one row with content 'fred' in df. However, there is no row returned. If I changed the data type to varchar (32) in the create table ddl , then I can get the row back correctly. The cause of the problem is that for data type char (num), DB2 defines it as fixed-length character strings, so if I have char (32), when doing "SELECT * FROM foobar WHERE NAME = 'fred'", DB2 returns 'fred' padded with 28 empty space. Spark treats "fred' padded with empty space not the same as 'fred' so df doesn't have any row. If I have varchar (32), DB2 just returns 'fred' for the select statement and df has the right row. In order to make DB2 char (num) works for spark, I suggest to change spark code to trim the empty space after get the data from database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just do

case StringConversion =>
  val newString = Option(rs.getString(pos)).map { s => UTF8String.fromString(s.trim) }.orNull
  mutableRow.update(i, newString)

@andrewor14
Copy link
Contributor

@yhuai

@huaxingao
Copy link
Contributor Author

@andrewor14
Thanks a lot for your comment. I will change to what you suggested.
In the same method, case DecimalConversion has the similar code. Shall I change that one too?

@andrewor14
Copy link
Contributor

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Dec 16, 2015

Test build #47763 has finished for PR 10262 at commit 63a3d83.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Dec 17, 2015

Can you add a test?

@huaxingao
Copy link
Contributor Author

@yhuai
JDBCSuite uses H2 database. It seems that for char(n) data type, either H2 database doesn't pad, or the H2 JDBC driver already trims the empty space for ResultSet.getString. So H2 database doesn't have this problem. To show the problem, it will need DB2 and DB2 JDBC driver ( I guess Oracle has the same problem too) , but I don't think the test system has DB2 JDBC driver. So I am guessing maybe no need to add the test?

@yhuai
Copy link
Contributor

yhuai commented Dec 20, 2015

I guess we can try to add a docker test. Can you try to add one in https://github.com/apache/spark/tree/master/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc?

@huaxingao
Copy link
Contributor Author

@yhuai
Sorry for the late reply. I waited for my coworker Luciano to come back from vacation today to check with him about his DB2 docker test status. He has a PR to add DB2 docker test. #9893. His PR is still pending because of the DB2 jdbc driver dependency. After his PR is merged, I will add a test in his DB2 docker test.

@HyukjinKwon
Copy link
Member

@huaxingao Do you mind If I submit a PR based on this if you are not working on this? (It looks it has been inactive for few months!)

@huaxingao
Copy link
Contributor Author

@HyukjinKwon
I will continue working on this and finish the work this week.

@HyukjinKwon
Copy link
Member

Hi @huaxingao, would this be better closed for now?

@huaxingao huaxingao closed this Feb 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants