[SPARK-12289][SQL] Support UnsafeRow in TakeOrderedAndProject/Limit #10330

viirya · 2015-12-16T16:52:13Z

JIRA: https://issues.apache.org/jira/browse/SPARK-12289

This change is needed for #10283. Without this, JavaDataFrameSuite will be failed when we support UnsafeRow in LocalTableScan.

Support in Limit is added first. TakeOrderedAndProject will be added later.

SparkQA · 2015-12-16T17:29:22Z

Test build #47820 has finished for PR 10330 at commit acb3a58.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2015-12-17T04:00:11Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

InterpretedProjection can be replaced by UnsafeProjection?

I think it is ok.

Sorry, I still have a dumb question. When calling the eval of each of the specified expressions, how can we know they can process unsafe rows? Why does the planner insert unsafe->safe conversion in the original design of TakeOrderedAndProject?

I think it is just because not all expressions support unsafe before.

So now all the expressions can support unsafe?

I think so. If there are expressions still not supporting unsafe, we should make it support.

Got it. Thank you!

SparkQA · 2015-12-17T04:45:13Z

Test build #47892 has finished for PR 10330 at commit ecf7ec8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-17T05:50:52Z

Test build #47890 has finished for PR 10330 at commit 304c94a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-17T06:59:32Z

Test build #47901 has finished for PR 10330 at commit ba06795.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-17T10:04:09Z

Test build #47916 has finished for PR 10330 at commit c343447.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2015-12-22T03:37:48Z

cc @davies @cloud-fan @marmbrus

cloud-fan · 2015-12-22T06:50:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

what if the projectList is None?

I added few lines to check if we need do extra unsafe projection for it.

What you have done when projectList is None is exactly same with ConvertToUnsafe right? How about we change this to if (projectList.isDefined) true else child.outputsUnsafeRows, then our framework can insert ConvertToUnsafe if it's necessary.

OK. I've updated it.

SparkQA · 2015-12-24T08:43:59Z

Test build #48296 has finished for PR 10330 at commit fdc0097.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2015-12-25T03:12:47Z

@cloud-fan any other comments?

SparkQA · 2015-12-25T12:09:10Z

Test build #48323 has finished for PR 10330 at commit 04eb37e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2015-12-25T14:50:18Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

Do we need copy here? We have already copied the rows when getting data.

OK. I was thinking that it is needed to copy the returned row because it is the same object. But after I checked GenerateUnsafeProjection, looks like it will create new row every time. I've updated it.

Seems it will be a problem without this copy(). HiveCompatibilitySuite will be failed.

[info] key value [info] !== HIVE - 5 row(s) == == CATALYST - 5 row(s) == [info] !0 val_0 4 val_4 [info] !0 val_0 4 val_4 [info] !0 val_0 4 val_4 [info] !2 val_2 4 val_4 [info] 4 val_4 4 val_4

I re-checked GenerateUnsafeProjection, it will return the same unsafe row. So we should use another copy() here.

oh sorry I misread the code, the copy is needed even we already copied before takeOrdered.

cloud-fan · 2015-12-25T15:21:35Z

sql/core/src/test/scala/org/apache/spark/sql/execution/RowFormatConvertersSuite.scala

cc @marmbrus @yhuai should we remove this test?

Is there any other cases that we will have the conversion? Or, we can create a dummy operator that only accepts safe rows. So, we can still test the logic of adding conversions.

Added a dummy node for it. Thanks.

cloud-fan · 2015-12-25T15:21:55Z

overall LGTM

SparkQA · 2015-12-25T16:56:38Z

Test build #48324 has finished for PR 10330 at commit 6a519d8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-26T02:50:26Z

Test build #48331 has finished for PR 10330 at commit f3054d6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2015-12-28T14:16:25Z

ping @marmbrus @yhuai Please help see if this patch is ok for you. Thanks.

SparkQA · 2015-12-29T10:34:19Z

Test build #48409 has finished for PR 10330 at commit c44c93a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class DummySafeNode(limit: Int, child: SparkPlan) extends UnaryNode

viirya · 2015-12-29T10:35:49Z

@yhuai @marmbrus Please see if now it is good to merge this patch into the code. Thanks.

davies · 2015-12-29T19:20:32Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

Would it be easy that just process UnsafeRow and output UnsafeRow?

So I need to close this again.....

…ut UnsafeRow It's confusing that some operator output UnsafeRow but some not, easy to make mistake. This PR change to only output UnsafeRow for all the operators (SparkPlan), removed the rule to insert Unsafe/Safe conversions. For those that can't output UnsafeRow directly, added UnsafeProjection into them. Closes apache#10330 cc JoshRosen rxin Author: Davies Liu <[email protected]> Closes apache#10511 from davies/unsafe_row.

Support UnsafeRow in Limit.

acb3a58

viirya added 2 commits December 17, 2015 11:13

Ignore the test because Limit node accepts UnsafeRow now.

304c94a

Support UnsafeRow in TakeOrderedAndProject.

ecf7ec8

viirya changed the title ~~[SPARK-12289][WIP][SQL] Support UnsafeRow in TakeOrderedAndProject/Limit~~ [SPARK-12289][SQL] Support UnsafeRow in TakeOrderedAndProject/Limit Dec 17, 2015

gatorsmile reviewed Dec 17, 2015
View reviewed changes

Add copy().

ba06795

add copy().

c343447

cloud-fan reviewed Dec 22, 2015
View reviewed changes

Add extra unsafe projection for the case projectList is None.

fdc0097

Set proper outputsUnsafeRows when projectList is None.

04eb37e

cloud-fan reviewed Dec 25, 2015
View reviewed changes

Remove unnecessary copy() and test cases.

6a519d8

cloud-fan reviewed Dec 25, 2015
View reviewed changes

Add copy() back.

f3054d6

Add a Dummy Node to test.

c44c93a

davies reviewed Dec 29, 2015
View reviewed changes

davies mentioned this pull request Dec 29, 2015

[SPARK-12286] [SPARK-12290] [SPARK-12294] [SPARK-12284] [SQL] always output UnsafeRow #10511

Closed

viirya closed this Dec 30, 2015

viirya deleted the limit-outputunsafe branch December 27, 2023 18:32

[SPARK-12289][SQL] Support UnsafeRow in TakeOrderedAndProject/Limit #10330

[SPARK-12289][SQL] Support UnsafeRow in TakeOrderedAndProject/Limit #10330

Uh oh!

Conversation

viirya commented Dec 16, 2015

Uh oh!

SparkQA commented Dec 16, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 17, 2015

Uh oh!

SparkQA commented Dec 17, 2015

Uh oh!

SparkQA commented Dec 17, 2015

Uh oh!

SparkQA commented Dec 17, 2015

Uh oh!

viirya commented Dec 22, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 24, 2015

Uh oh!

viirya commented Dec 25, 2015

Uh oh!

SparkQA commented Dec 25, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 25, 2015

Uh oh!

SparkQA commented Dec 25, 2015

Uh oh!

SparkQA commented Dec 26, 2015

Uh oh!

viirya commented Dec 28, 2015

Uh oh!

SparkQA commented Dec 29, 2015

Uh oh!

viirya commented Dec 29, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees