Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This pr supported a DDL-formatted string in DataStreamReader.schema.
This fix could make users easily define a schema without importing the type classes.

For example,

scala> spark.readStream.schema("col0 INT, col1 DOUBLE").load("/tmp/abc").printSchema()
root
 |-- col0: integer (nullable = true)
 |-- col1: double (nullable = true)

How was this patch tested?

Added tests in DataStreamReaderWriterSuite.

@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78352 has started for PR 18373 at commit 8df0cd2.

@HyukjinKwon
Copy link
Member Author

cc @gatorsmile, @zsxwing and @maropu.

@HyukjinKwon HyukjinKwon changed the title [SPARK-20431][FOLLOWUP][SS] Specify a schema by using a DDL-formatted string in DataStreamReader [SPARK-20431][SS][FOLLOWUP] Specify a schema by using a DDL-formatted string in DataStreamReader Jun 21, 2017
test("SPARK-20431: Specify a schema by using a DDL-formatted string") {
spark.readStream
.format("org.apache.spark.sql.streaming.test")
.schema("aa integer")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Capitalize integer? (I remember @gatorsmile suggested before)

@maropu
Copy link
Member

maropu commented Jun 21, 2017

Thanks for follow-up! IMHO this syntax helps much, so we could support this in other cases for defining schemas (e.g., spark.createDataFrame)? (I think it's impossible to support in all the cases though, is it possible to do in frequently-used APIs...?)

@shaneknapp
Copy link
Contributor

test this please

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 21, 2017

I skimmed SparkSession, Dataset and functions.scala. I think I can add it ,in SparkSession , to

def createDataFrame(rowRDD: RDD[Row], schema: StructType)
def createDataFrame(rowRDD: JavaRDD[Row], schema: StructType)
def createDataFrame(rows: java.util.List[Row], schema: StructType)

Please let me know. If we are not sure for now, I guess it might be okay as is as a proper follow-up.

@HyukjinKwon
Copy link
Member Author

Thanks for triggering tests and fixing Jenkins @shaneknapp.

@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78354 has finished for PR 18373 at commit 8df0cd2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78367 has finished for PR 18373 at commit 55f1d07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

@amoussoubaruch, sounds unrelated with this PR. Probably, question should go to mailing list.

@HyukjinKwon
Copy link
Member Author

cc @cloud-fan, would you have some time to look into this maybe?

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@amoussoubaruch please post your question to dev list, instead of randomly picking a PR...

@asfgit asfgit closed this in 7525ce9 Jun 24, 2017
robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 29, 2017
… string in DataStreamReader

## What changes were proposed in this pull request?

This pr supported a DDL-formatted string in `DataStreamReader.schema`.
This fix could make users easily define a schema without importing the type classes.

For example,

```scala
scala> spark.readStream.schema("col0 INT, col1 DOUBLE").load("/tmp/abc").printSchema()
root
 |-- col0: integer (nullable = true)
 |-- col1: double (nullable = true)
```

## How was this patch tested?

Added tests in `DataStreamReaderWriterSuite`.

Author: hyukjinkwon <[email protected]>

Closes apache#18373 from HyukjinKwon/SPARK-20431.
@HyukjinKwon HyukjinKwon deleted the SPARK-20431 branch January 2, 2018 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants