[SPARK-22076][SQL] Expand.projections should not be a Stream #19289

cloud-fan · 2017-09-20T06:56:43Z

What changes were proposed in this pull request?

Spark with Scala 2.10 fails with a group by cube:

spark.range(1).select($"id" as "a", $"id" as "b").write.partitionBy("a").mode("overwrite").saveAsTable("rollup_bug")
spark.sql("select 1 from rollup_bug group by rollup ()").show

It can be traced back to #15484 , which made Expand.projections a lazy Stream for group by cube.

In scala 2.10 Stream captures a lot of stuff, and in this case it captures the entire query plan which has some un-serializable parts.

This change is also good for master branch, to reduce the serialized size of Expand.projections.

How was this patch tested?

manually verified with Spark with Scala 2.10.

cloud-fan · 2017-09-20T06:57:32Z

cc @liufengdb @gatorsmile @jiangxb1987

SparkQA · 2017-09-20T07:04:45Z

Test build #81974 has finished for PR 19289 at commit 20ea0c4.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-09-20T07:08:07Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

cubeExprs -> cubeExprs0 ?

gatorsmile · 2017-09-20T07:08:32Z

retest this please

SparkQA · 2017-09-20T09:51:43Z

Test build #81977 has finished for PR 19289 at commit 20ea0c4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-20T09:58:42Z

Test build #81979 has finished for PR 19289 at commit 518fe49.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? Spark with Scala 2.10 fails with a group by cube: ``` spark.range(1).select($"id" as "a", $"id" as "b").write.partitionBy("a").mode("overwrite").saveAsTable("rollup_bug") spark.sql("select 1 from rollup_bug group by rollup ()").show ``` It can be traced back to #15484 , which made `Expand.projections` a lazy `Stream` for group by cube. In scala 2.10 `Stream` captures a lot of stuff, and in this case it captures the entire query plan which has some un-serializable parts. This change is also good for master branch, to reduce the serialized size of `Expand.projections`. ## How was this patch tested? manually verified with Spark with Scala 2.10. Author: Wenchen Fan <[email protected]> Closes #19289 from cloud-fan/bug. (cherry picked from commit ce6a71e) Signed-off-by: gatorsmile <[email protected]>

gatorsmile · 2017-09-20T16:02:16Z

LGTM

Thanks! Merged to master/2.2.

## What changes were proposed in this pull request? This a follow-up of apache#19289 , we missed another place: `rollup`. `Seq.init.toSeq` also returns a `Stream`, we should fix it too. ## How was this patch tested? manually Author: Wenchen Fan <[email protected]> Closes apache#19298 from cloud-fan/bug.

## What changes were proposed in this pull request? Spark with Scala 2.10 fails with a group by cube: ``` spark.range(1).select($"id" as "a", $"id" as "b").write.partitionBy("a").mode("overwrite").saveAsTable("rollup_bug") spark.sql("select 1 from rollup_bug group by rollup ()").show ``` It can be traced back to apache#15484 , which made `Expand.projections` a lazy `Stream` for group by cube. In scala 2.10 `Stream` captures a lot of stuff, and in this case it captures the entire query plan which has some un-serializable parts. This change is also good for master branch, to reduce the serialized size of `Expand.projections`. ## How was this patch tested? manually verified with Spark with Scala 2.10. Author: Wenchen Fan <[email protected]> Closes apache#19289 from cloud-fan/bug. (cherry picked from commit ce6a71e) Signed-off-by: gatorsmile <[email protected]>

gatorsmile reviewed Sep 20, 2017

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated

Copy link

Member

gatorsmile Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubeExprs -> cubeExprs0 ?

Expand.projections should not be a Stream

518fe49

cloud-fan force-pushed the bug branch from 20ea0c4 to 518fe49 Compare September 20, 2017 07:21

asfgit closed this in ce6a71e Sep 20, 2017

cloud-fan mentioned this pull request Sep 21, 2017

[SPARK-22076][SQL][followup] Expand.projections should not be a Stream #19298

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-22076][SQL] Expand.projections should not be a Stream #19289

[SPARK-22076][SQL] Expand.projections should not be a Stream #19289

Uh oh!

cloud-fan commented Sep 20, 2017

Uh oh!

cloud-fan commented Sep 20, 2017

Uh oh!

SparkQA commented Sep 20, 2017

Uh oh!

gatorsmile Sep 20, 2017

Uh oh!

gatorsmile commented Sep 20, 2017

Uh oh!

SparkQA commented Sep 20, 2017

Uh oh!

SparkQA commented Sep 20, 2017

Uh oh!

gatorsmile commented Sep 20, 2017 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-22076][SQL] Expand.projections should not be a Stream #19289

[SPARK-22076][SQL] Expand.projections should not be a Stream #19289

Uh oh!

Conversation

cloud-fan commented Sep 20, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Sep 20, 2017

Uh oh!

SparkQA commented Sep 20, 2017

Uh oh!

gatorsmile Sep 20, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Sep 20, 2017

Uh oh!

SparkQA commented Sep 20, 2017

Uh oh!

SparkQA commented Sep 20, 2017

Uh oh!

gatorsmile commented Sep 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gatorsmile commented Sep 20, 2017 •

edited

Loading