-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24521][SQL][TEST] Fix ineffective test in CachedTableSuite #21531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
var evalCount = 0 | ||
val myUDF = udf((x: String) => { evalCount += 1; "result" }) | ||
val df = Seq(("test", 1)).toDF("s", "i").select(myUDF($"s")) | ||
df.cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is when evalCount is incremented (because the optimizer will execute the UDF locally on driver), the rest of the tests have no effect on the value of evalCount.
evalCount
is copied (by value) to the closure and therefore it's not possible to change the value evalCount
defined in the driver by incrementing it in the UDF (because each closure has a different copy of evalCount
)
cc @maryannxue |
Test build #91672 has finished for PR 21531 at commit
|
retest this please |
Test build #91713 has finished for PR 21531 at commit
|
retest this please |
Test build #91719 has finished for PR 21531 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor nits inlined. Other than that, LGTM.
|
||
assertCached(df2) | ||
|
||
failAfter(5 seconds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here, like "udf has been evaluated during caching, and thus should not be re-evaluated."
|
||
test("persist and then withColumn") { | ||
val df = Seq(("test", 1)).toDF("s", "i") | ||
// We should not invalidate the cached DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this comment? Or what is it were you trying to say?
Thanks for the review! I addressed the comments |
Test build #91800 has finished for PR 21531 at commit
|
Jenkins, retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test build #91819 has finished for PR 21531 at commit
|
retest this please |
Test build #91848 has finished for PR 21531 at commit
|
LGTM Thanks! Merged to master/2.3 |
What changes were proposed in this pull request?
test("withColumn doesn't invalidate cached dataframe") in CachedTableSuite doesn't not work because:
The UDF is executed and test count incremented when "df.cache()" is called and the subsequent "df.collect()" has no effect on the test result.
This PR fixed this test and add another test for caching UDF.
How was this patch tested?
Add new tests.