[SPARK-5090][examples] The improvement of python converter for hbase #3920

GenTang · 2015-01-07T02:14:28Z

Hi,

Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples:

HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string
hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict
HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made

AmplabJenkins · 2015-01-07T02:17:09Z

Can one of the admins verify this patch?

GenTang · 2015-01-07T02:20:27Z

@MLnick
Hi Nick Pentreath, please review. Thanks ^^

davies · 2015-01-09T21:44:08Z

OK to test.

MLnick · 2015-01-15T07:36:25Z

@GenTang Sorry for the delay, have been travelling. I will try to get to this in the next few days.

MLnick · 2015-01-22T19:37:49Z

examples/src/main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala

minor spelling error of columnFamily here

MLnick · 2015-01-22T19:46:11Z

@GenTang overall looks fine to me - just noted a few minor comments / questions, and in particular it would be good to expand on the sample data in the example just to show multiple records per Result.

I will give it a final once over after you've had a look at the above comments.

GenTang · 2015-01-22T23:33:26Z

@MLnick I change the sample data to show that we can have several records in one columnFamily.

In fact, in HBase 0.96 and newer, the default maximum number of versions(timestamp) has been changed to 1 and in order to show all the versions in a HBase table, we need to add {"hbase.mapreduce.scan.maxversions":"num"} in the conf. For me, it is rather a topic of HBase not Spark. That's why I don't show that we can have several versions of records in Result

MLnick · 2015-01-26T15:19:37Z

@GenTang ok, thanks for updating the example data. I checked out the PR and tested quickly locally, the new example and data works for me. Looks good.

@davies +1 to merge.

davies · 2015-02-05T18:07:39Z

examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala

Is this an API change in HBase?

Yes. The old one is deprecated

GenTang · 2015-02-11T14:28:41Z

@davies @MLnick
Perhaps it is not a good place to discuss this, but I tried the script hbase_outputformat.py in spark 1.2.0 and it caused java.lang.IncompatibleClassChangeError: Implementing class. I guess that it is caused by the conflict between dependency and version of java.
Do you have any idea about this error? Or do we need create a jira for this?

GenTang · 2015-02-14T22:06:18Z

@davies
Now we return the string in json format.
Therefore the specific characters such as ' or " don't cause problem any more.

davies · 2015-02-15T03:26:17Z

examples/src/main/python/hbase_inputformat.py

simplejson is not builtin in Python 2.6, we should use json.

davies · 2015-02-23T06:26:39Z

examples/src/main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala

How about JSONObject(output).toString()

Output is a Buffer[Map[String, String]], since there are several records in an HBase Result.
However JSONObject has the only constructor JSONObject(obj: Map[String, Any]). So JSONObject(output).toString() will cause compilation failure. ^^

That make sense. JSON will escape the \n in String, so it's safe to have \n as separator.

Great! In fact, HBase itself will escape \n too. That's why I choose \n at the first place.
Thanks!

davies · 2015-02-23T16:57:31Z

@GenTang This PR looks good to me now, thanks

@JoshRosen I think it's ready to go.

GenTang · 2015-03-04T21:26:34Z

Hi, I am sorry to bother you all.
But is this pull request OK for merging, please?

AmplabJenkins · 2015-04-27T18:22:42Z

Can one of the admins verify this patch?

MLnick · 2015-05-04T07:03:02Z

@davies should be good to merge now

davies · 2015-05-23T03:23:25Z

@GenTang Can you check that these examples working with latest master? If yes, I will merge it now.

GenTang · 2015-05-23T04:43:26Z

@davies I just tested it with the assembly of master branch, it works.
Thanks

JoshRosen · 2015-05-23T04:47:00Z

Jenkins, retest this please.

AmplabJenkins · 2015-05-23T04:47:12Z

Merged build triggered.

AmplabJenkins · 2015-05-23T04:47:17Z

Merged build started.

SparkQA · 2015-05-23T04:47:43Z

Test build #33391 has started for PR 3920 at commit d2153df.

SparkQA · 2015-05-23T06:34:31Z

Test build #33391 has finished for PR 3920 at commit d2153df.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-23T06:34:36Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-23T06:34:36Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33391/
Test PASSed.

Hi, Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples: 1. HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string 2. hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict 3. HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made Author: GenTang <[email protected]> Closes apache#3920 from GenTang/master and squashes the following commits: d2153df [GenTang] import JSONObject precisely 4802481 [GenTang] dump the result into a singl String 62df7f0 [GenTang] remove the comment 21de653 [GenTang] return the string in json format 15b1fe3 [GenTang] the modification of comments 5cbbcfc [GenTang] the improvement of pythonconverter ceb31c5 [GenTang] the modification for adapting updation of hbase 3253b61 [GenTang] the modification accompanying the improvement of pythonconverter

GenTang added 3 commits January 7, 2015 03:04

the modification accompanying the improvement of pythonconverter

3253b61

the modification for adapting updation of hbase

ceb31c5

the improvement of pythonconverter

5cbbcfc

GenTang changed the title ~~The improvement of python converter for hbase(examples)~~ [SPARK-5090] The improvement of python converter for hbase(examples) Jan 7, 2015

GenTang changed the title ~~[SPARK-5090] The improvement of python converter for hbase(examples)~~ [SPARK-5090] [examples] The improvement of python converter for hbase Jan 7, 2015

GenTang changed the title ~~[SPARK-5090] [examples] The improvement of python converter for hbase~~ [SPARK-5090][examples] The improvement of python converter for hbase Jan 7, 2015

GenTang force-pushed the master branch from 5d63c1c to 5cbbcfc Compare January 9, 2015 00:11

MLnick reviewed Jan 22, 2015
View reviewed changes

the modification of comments

15b1fe3

davies reviewed Feb 5, 2015
View reviewed changes

return the string in json format

21de653

remove the comment

62df7f0

davies reviewed Feb 15, 2015
View reviewed changes

examples/src/main/python/hbase_inputformat.py Outdated

Copy link

Contributor

davies Feb 15, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplejson is not builtin in Python 2.6, we should use json.

dump the result into a singl String

4802481

davies reviewed Feb 23, 2015
View reviewed changes

import JSONObject precisely

d2153df

asfgit closed this in 4583cf4 May 23, 2015

[SPARK-5090][examples] The improvement of python converter for hbase #3920

[SPARK-5090][examples] The improvement of python converter for hbase #3920

Uh oh!

Conversation

GenTang commented Jan 7, 2015

Uh oh!

AmplabJenkins commented Jan 7, 2015

Uh oh!

GenTang commented Jan 7, 2015

Uh oh!

davies commented Jan 9, 2015

Uh oh!

MLnick commented Jan 15, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MLnick commented Jan 22, 2015

Uh oh!

GenTang commented Jan 22, 2015

Uh oh!

MLnick commented Jan 26, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GenTang commented Feb 11, 2015

Uh oh!

GenTang commented Feb 14, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davies commented Feb 23, 2015

Uh oh!

GenTang commented Mar 4, 2015

Uh oh!

AmplabJenkins commented Apr 27, 2015

Uh oh!

MLnick commented May 4, 2015

Uh oh!

davies commented May 23, 2015

Uh oh!

GenTang commented May 23, 2015

Uh oh!

JoshRosen commented May 23, 2015

Uh oh!

AmplabJenkins commented May 23, 2015

Uh oh!

AmplabJenkins commented May 23, 2015

Uh oh!

SparkQA commented May 23, 2015

Uh oh!

SparkQA commented May 23, 2015

Uh oh!

AmplabJenkins commented May 23, 2015

Uh oh!

AmplabJenkins commented May 23, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants