-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-5090][examples] The improvement of python converter for hbase #3920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
@MLnick |
|
OK to test. |
|
@GenTang Sorry for the delay, have been travelling. I will try to get to this in the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor spelling error of columnFamily here
|
@GenTang overall looks fine to me - just noted a few minor comments / questions, and in particular it would be good to expand on the sample data in the example just to show multiple records per I will give it a final once over after you've had a look at the above comments. |
|
@MLnick I change the sample data to show that we can have several records in one columnFamily. In fact, in HBase 0.96 and newer, the default maximum number of versions(timestamp) has been changed to 1 and in order to show all the versions in a HBase table, we need to add {"hbase.mapreduce.scan.maxversions":"num"} in the conf. For me, it is rather a topic of HBase not Spark. That's why I don't show that we can have several versions of records in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an API change in HBase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The old one is deprecated
|
@davies @MLnick |
|
@davies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplejson is not builtin in Python 2.6, we should use json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about JSONObject(output).toString()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output is a Buffer[Map[String, String]], since there are several records in an HBase Result.
However JSONObject has the only constructor JSONObject(obj: Map[String, Any]). So JSONObject(output).toString() will cause compilation failure. ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That make sense. JSON will escape the \n in String, so it's safe to have \n as separator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! In fact, HBase itself will escape \n too. That's why I choose \n at the first place.
Thanks!
|
@GenTang This PR looks good to me now, thanks @JoshRosen I think it's ready to go. |
|
Hi, I am sorry to bother you all. |
|
Can one of the admins verify this patch? |
|
@davies should be good to merge now |
|
@GenTang Can you check that these examples working with latest master? If yes, I will merge it now. |
|
@davies I just tested it with the assembly of master branch, it works. |
|
Jenkins, retest this please. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #33391 has started for PR 3920 at commit |
|
Test build #33391 has finished for PR 3920 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
Hi, Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples: 1. HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string 2. hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict 3. HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made Author: GenTang <[email protected]> Closes apache#3920 from GenTang/master and squashes the following commits: d2153df [GenTang] import JSONObject precisely 4802481 [GenTang] dump the result into a singl String 62df7f0 [GenTang] remove the comment 21de653 [GenTang] return the string in json format 15b1fe3 [GenTang] the modification of comments 5cbbcfc [GenTang] the improvement of pythonconverter ceb31c5 [GenTang] the modification for adapting updation of hbase 3253b61 [GenTang] the modification accompanying the improvement of pythonconverter
Hi, Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples: 1. HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string 2. hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict 3. HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made Author: GenTang <[email protected]> Closes apache#3920 from GenTang/master and squashes the following commits: d2153df [GenTang] import JSONObject precisely 4802481 [GenTang] dump the result into a singl String 62df7f0 [GenTang] remove the comment 21de653 [GenTang] return the string in json format 15b1fe3 [GenTang] the modification of comments 5cbbcfc [GenTang] the improvement of pythonconverter ceb31c5 [GenTang] the modification for adapting updation of hbase 3253b61 [GenTang] the modification accompanying the improvement of pythonconverter
Hi, Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples: 1. HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string 2. hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict 3. HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made Author: GenTang <[email protected]> Closes apache#3920 from GenTang/master and squashes the following commits: d2153df [GenTang] import JSONObject precisely 4802481 [GenTang] dump the result into a singl String 62df7f0 [GenTang] remove the comment 21de653 [GenTang] return the string in json format 15b1fe3 [GenTang] the modification of comments 5cbbcfc [GenTang] the improvement of pythonconverter ceb31c5 [GenTang] the modification for adapting updation of hbase 3253b61 [GenTang] the modification accompanying the improvement of pythonconverter
Hi,
Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples: