Skip to content

Conversation

@GenTang
Copy link
Contributor

@GenTang GenTang commented Jan 7, 2015

Hi,

Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples:

  1. HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string
  2. hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict
  3. HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made

@GenTang GenTang changed the title The improvement of python converter for hbase(examples) [SPARK-5090] The improvement of python converter for hbase(examples) Jan 7, 2015
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@GenTang
Copy link
Contributor Author

GenTang commented Jan 7, 2015

@MLnick
Hi Nick Pentreath, please review. Thanks ^^

@GenTang GenTang changed the title [SPARK-5090] The improvement of python converter for hbase(examples) [SPARK-5090] [examples] The improvement of python converter for hbase Jan 7, 2015
@GenTang GenTang changed the title [SPARK-5090] [examples] The improvement of python converter for hbase [SPARK-5090][examples] The improvement of python converter for hbase Jan 7, 2015
@davies
Copy link
Contributor

davies commented Jan 9, 2015

OK to test.

@MLnick
Copy link
Contributor

MLnick commented Jan 15, 2015

@GenTang Sorry for the delay, have been travelling. I will try to get to this in the next few days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor spelling error of columnFamily here

@MLnick
Copy link
Contributor

MLnick commented Jan 22, 2015

@GenTang overall looks fine to me - just noted a few minor comments / questions, and in particular it would be good to expand on the sample data in the example just to show multiple records per Result.

I will give it a final once over after you've had a look at the above comments.

@GenTang
Copy link
Contributor Author

GenTang commented Jan 22, 2015

@MLnick I change the sample data to show that we can have several records in one columnFamily.

In fact, in HBase 0.96 and newer, the default maximum number of versions(timestamp) has been changed to 1 and in order to show all the versions in a HBase table, we need to add {"hbase.mapreduce.scan.maxversions":"num"} in the conf. For me, it is rather a topic of HBase not Spark. That's why I don't show that we can have several versions of records in Result

@MLnick
Copy link
Contributor

MLnick commented Jan 26, 2015

@GenTang ok, thanks for updating the example data. I checked out the PR and tested quickly locally, the new example and data works for me. Looks good.

@davies +1 to merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an API change in HBase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The old one is deprecated

@GenTang
Copy link
Contributor Author

GenTang commented Feb 11, 2015

@davies @MLnick
Perhaps it is not a good place to discuss this, but I tried the script hbase_outputformat.py in spark 1.2.0 and it caused java.lang.IncompatibleClassChangeError: Implementing class. I guess that it is caused by the conflict between dependency and version of java.
Do you have any idea about this error? Or do we need create a jira for this?

@GenTang
Copy link
Contributor Author

GenTang commented Feb 14, 2015

@davies
Now we return the string in json format.
Therefore the specific characters such as ' or " don't cause problem any more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplejson is not builtin in Python 2.6, we should use json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about JSONObject(output).toString()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output is a Buffer[Map[String, String]], since there are several records in an HBase Result.
However JSONObject has the only constructor JSONObject(obj: Map[String, Any]). So JSONObject(output).toString() will cause compilation failure. ^^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense. JSON will escape the \n in String, so it's safe to have \n as separator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! In fact, HBase itself will escape \n too. That's why I choose \n at the first place.
Thanks!

@davies
Copy link
Contributor

davies commented Feb 23, 2015

@GenTang This PR looks good to me now, thanks

@JoshRosen I think it's ready to go.

@GenTang
Copy link
Contributor Author

GenTang commented Mar 4, 2015

Hi, I am sorry to bother you all.
But is this pull request OK for merging, please?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@MLnick
Copy link
Contributor

MLnick commented May 4, 2015

@davies should be good to merge now

@davies
Copy link
Contributor

davies commented May 23, 2015

@GenTang Can you check that these examples working with latest master? If yes, I will merge it now.

@GenTang
Copy link
Contributor Author

GenTang commented May 23, 2015

@davies I just tested it with the assembly of master branch, it works.
Thanks

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 23, 2015

Test build #33391 has started for PR 3920 at commit d2153df.

@SparkQA
Copy link

SparkQA commented May 23, 2015

Test build #33391 has finished for PR 3920 at commit d2153df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33391/
Test PASSed.

@asfgit asfgit closed this in 4583cf4 May 23, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Hi,

Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples:
1. HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string
2. hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict
3. HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made

Author: GenTang <[email protected]>

Closes apache#3920 from GenTang/master and squashes the following commits:

d2153df [GenTang] import JSONObject precisely
4802481 [GenTang] dump the result into a singl String
62df7f0 [GenTang] remove the comment
21de653 [GenTang] return the string in json format
15b1fe3 [GenTang] the modification of comments
5cbbcfc [GenTang] the improvement of pythonconverter
ceb31c5 [GenTang] the modification for adapting updation of hbase
3253b61 [GenTang] the modification accompanying the improvement of pythonconverter
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Hi,

Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples:
1. HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string
2. hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict
3. HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made

Author: GenTang <[email protected]>

Closes apache#3920 from GenTang/master and squashes the following commits:

d2153df [GenTang] import JSONObject precisely
4802481 [GenTang] dump the result into a singl String
62df7f0 [GenTang] remove the comment
21de653 [GenTang] return the string in json format
15b1fe3 [GenTang] the modification of comments
5cbbcfc [GenTang] the improvement of pythonconverter
ceb31c5 [GenTang] the modification for adapting updation of hbase
3253b61 [GenTang] the modification accompanying the improvement of pythonconverter
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Hi,

Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples:
1. HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string
2. hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict
3. HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made

Author: GenTang <[email protected]>

Closes apache#3920 from GenTang/master and squashes the following commits:

d2153df [GenTang] import JSONObject precisely
4802481 [GenTang] dump the result into a singl String
62df7f0 [GenTang] remove the comment
21de653 [GenTang] return the string in json format
15b1fe3 [GenTang] the modification of comments
5cbbcfc [GenTang] the improvement of pythonconverter
ceb31c5 [GenTang] the modification for adapting updation of hbase
3253b61 [GenTang] the modification accompanying the improvement of pythonconverter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants