Skip to content

Conversation

@pjfanning
Copy link
Member

What changes were proposed in this pull request?

[SPARK-15615] add new json function that takes Dataset[String] as input and deprecate the existing RDD based functions

How was this patch tested?

Changed the existing unit tests

PJ Fanning added 3 commits May 28, 2016 08:07
…o json-dataset

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala
@HyukjinKwon
Copy link
Member

I guess it would be nicer if the title has the form such as [SPARK-15615][SQL] ... according to https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark.

*
* @param jsonRDD input RDD with one JSON object per record
* @since 1.4.0
* @since 1.4.0*
Copy link
Member

@HyukjinKwon HyukjinKwon Jun 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is a typo here, *.

@pjfanning pjfanning changed the title Json dataset [SPARK-15615] [SQL] Json dataset Jun 2, 2016
@pjfanning pjfanning changed the title [SPARK-15615] [SQL] Json dataset [SPARK-15615] [SQL] Support Json input from Dataset[String] Jun 2, 2016
def complexFieldAndType1: RDD[String] =
spark.sparkContext.parallelize(
def complexFieldAndType1: Dataset[String] =
sqlContext.createDataset(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is preferred to use spark rather than sqlContext..

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jun 2, 2016

I just fetched this PR and run tests. It seems "SPARK-7565 MapType in JsonRDD" test is being failed. Maybe this should be solved. Also, if json(jsonRDD: JavaRDD[String]) is being deprecated, then I think its usages should be changed to the new ones in this PR or in a follow-up.

BTW, I guess it is arguable to add a new API. For me, I feel it is a bit questionable to add this API because there is already rdd one, json(jsonRDD: RDD[String]). Dataset one might be able to be easily done with this API like the blow:

json(jsonDataset.rdd)

I guess APIs would not be added only for consistency.

Maybe I think we should wait for a committer's call.

@pjfanning
Copy link
Member Author

@HyukjinKwon this change in input parameter relates to #13300. There was a request there to treat Dataset[String] as a preferred input to RDD[String].

@pjfanning
Copy link
Member Author

@HyukjinKwon all the JsonSuite tests pass for me on my laptop - would it be feasible to get this reviewed again?

# Conflicts:
#	sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants