Skip to content

Conversation

srowen
Copy link
Member

@srowen srowen commented Jun 6, 2017

What changes were proposed in this pull request?

Don't leave thread pool running from AlterTableRecoverPartitionsCommand DDL command

How was this patch tested?

Existing tests.

@srowen
Copy link
Member Author

srowen commented Jun 6, 2017

@davies could you help me review as you wrote the original code?
CC @viirya because it's similar to #18100

@SparkQA
Copy link

SparkQA commented Jun 6, 2017

Test build #77773 has finished for PR 18216 at commit 27a76c1.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 6, 2017

Test build #77774 has finished for PR 18216 at commit ca6d3cd.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 6, 2017

Test build #3779 has finished for PR 18216 at commit ca6d3cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Jun 6, 2017

The change looks good. Shall we add a test for it?

@viirya
Copy link
Member

viirya commented Jun 6, 2017

Ah, I remember that it is a bit hard to automatically test it.

@viirya
Copy link
Member

viirya commented Jun 7, 2017

The fix looks straightforward as previous fix. LGTM. Should be better if we can manually verify it like before. cc @cloud-fan

@cloud-fan
Copy link
Contributor

LGTM, can we add a manual test like #18100 ?

@srowen
Copy link
Member Author

srowen commented Jun 10, 2017

@viirya / @cloud-fan what do you mean here by a manual test?

@viirya
Copy link
Member

viirya commented Jun 10, 2017

@srowen In #18073, I ran the code snippet which can leak the pools, and checked the number of threads before/after the fix by using some tools like jconsole.

@srowen
Copy link
Member Author

srowen commented Jun 11, 2017

OK, I tried this as a test:

import scala.collection.JavaConverters._

spark.sql("DROP TABLE IF EXISTS my_tab")
spark.sql("CREATE TABLE my_tab(a INT, b STRING, c INT, d String) USING parquet PARTITIONED BY (a, b, c)")
spark.sql("INSERT INTO TABLE my_tab VALUES (1, 'foo', 4, 'bing')")
spark.sql("INSERT INTO TABLE my_tab VALUES (2, 'bar', 5, 'bing')")
spark.sql("INSERT INTO TABLE my_tab VALUES (3, 'baz', 6, 'bing')")

Thread.getAllStackTraces.keySet().asScala.map(_.getName).filter(_.contains("ForkJoinPool"))

res5: scala.collection.mutable.Set[String] = Set(ForkJoinPool-1-worker-13)

spark.sql("ALTER TABLE my_tab RECOVER PARTITIONS")

Thread.getAllStackTraces.keySet().asScala.map(_.getName).filter(_.contains("ForkJoinPool"))

res10: scala.collection.mutable.Set[String] = Set(ForkJoinPool-1-worker-13, ForkJoinPool-1-worker-9, ForkJoinPool-2-worker-13)

spark.sql("ALTER TABLE my_tab RECOVER PARTITIONS")
spark.sql("ALTER TABLE my_tab RECOVER PARTITIONS")
spark.sql("ALTER TABLE my_tab RECOVER PARTITIONS")

res23: scala.collection.mutable.Set[String] = Set(ForkJoinPool-5-worker-13, ForkJoinPool-3-worker-13, ForkJoinPool-2-worker-13, ForkJoinPool-1-worker-11, ForkJoinPool-4-worker-13, ForkJoinPool-1-worker-3)

Before the change you can see that it does look like one thread is left from each of several ForkJoinPools.

After the change, the result is Set(ForkJoinPool-1-worker-13) every time. I think that does suggest this fixes the issue.

@viirya
Copy link
Member

viirya commented Jun 13, 2017

Thanks @srowen. I think it looks good.

asfgit pushed a commit that referenced this pull request Jun 13, 2017
…bles with many partitions

## What changes were proposed in this pull request?

Don't leave thread pool running from AlterTableRecoverPartitionsCommand DDL command

## How was this patch tested?

Existing tests.

Author: Sean Owen <[email protected]>

Closes #18216 from srowen/SPARK-20920.

(cherry picked from commit 7b7c85e)
Signed-off-by: Sean Owen <[email protected]>
@srowen
Copy link
Member Author

srowen commented Jun 13, 2017

Merged to master/2.2/2.1

@asfgit asfgit closed this in 7b7c85e Jun 13, 2017
asfgit pushed a commit that referenced this pull request Jun 13, 2017
…bles with many partitions

## What changes were proposed in this pull request?

Don't leave thread pool running from AlterTableRecoverPartitionsCommand DDL command

## How was this patch tested?

Existing tests.

Author: Sean Owen <[email protected]>

Closes #18216 from srowen/SPARK-20920.

(cherry picked from commit 7b7c85e)
Signed-off-by: Sean Owen <[email protected]>
dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
…bles with many partitions

## What changes were proposed in this pull request?

Don't leave thread pool running from AlterTableRecoverPartitionsCommand DDL command

## How was this patch tested?

Existing tests.

Author: Sean Owen <[email protected]>

Closes apache#18216 from srowen/SPARK-20920.
@srowen srowen deleted the SPARK-20920 branch June 16, 2017 10:03
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
…bles with many partitions

Don't leave thread pool running from AlterTableRecoverPartitionsCommand DDL command

Existing tests.

Author: Sean Owen <[email protected]>

Closes apache#18216 from srowen/SPARK-20920.

(cherry picked from commit 7b7c85e)
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants