Skip to content

Conversation

icexelloss
Copy link
Contributor

What changes were proposed in this pull request?

test("withColumn doesn't invalidate cached dataframe") in CachedTableSuite doesn't not work because:

The UDF is executed and test count incremented when "df.cache()" is called and the subsequent "df.collect()" has no effect on the test result.

This PR fixed this test and add another test for caching UDF.

How was this patch tested?

Add new tests.

var evalCount = 0
val myUDF = udf((x: String) => { evalCount += 1; "result" })
val df = Seq(("test", 1)).toDF("s", "i").select(myUDF($"s"))
df.cache()
Copy link
Contributor Author

@icexelloss icexelloss Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is when evalCount is incremented (because the optimizer will execute the UDF locally on driver), the rest of the tests have no effect on the value of evalCount.

evalCount is copied (by value) to the closure and therefore it's not possible to change the value evalCount defined in the driver by incrementing it in the UDF (because each closure has a different copy of evalCount)

@gatorsmile
Copy link
Member

cc @maryannxue

@icexelloss icexelloss changed the title [SPARK-24521][SQL] Fix ineffective test in CachedTableSuite [SPARK-24521][SQL][TEST] Fix ineffective test in CachedTableSuite Jun 11, 2018
@SparkQA
Copy link

SparkQA commented Jun 11, 2018

Test build #91672 has finished for PR 21531 at commit 658539a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@icexelloss
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 12, 2018

Test build #91713 has finished for PR 21531 at commit 658539a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@icexelloss
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 12, 2018

Test build #91719 has finished for PR 21531 at commit 658539a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link

@maryannxue-databricks maryannxue-databricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor nits inlined. Other than that, LGTM.


assertCached(df2)

failAfter(5 seconds) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here, like "udf has been evaluated during caching, and thus should not be re-evaluated."


test("persist and then withColumn") {
val df = Seq(("test", 1)).toDF("s", "i")
// We should not invalidate the cached DataFrame

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment? Or what is it were you trying to say?

@icexelloss
Copy link
Contributor Author

Thanks for the review! I addressed the comments

@SparkQA
Copy link

SparkQA commented Jun 14, 2018

Test build #91800 has finished for PR 21531 at commit c9db68d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

Jenkins, retest this please

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Jun 14, 2018

Test build #91819 has finished for PR 21531 at commit c9db68d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@icexelloss
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 14, 2018

Test build #91848 has finished for PR 21531 at commit c9db68d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master/2.3

@asfgit asfgit closed this in a78a904 Jun 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants