Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Nov 16, 2018

What changes were proposed in this pull request?

resource package is a Unix specific package. See https://docs.python.org/2/library/resource.html and https://docs.python.org/3/library/resource.html.

Note that we document Windows support:

Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS).

This should be backported into branch-2.4 to restore Windows support in Spark 2.4.1.

How was this patch tested?

Manually mocking the changed logics.

@HyukjinKwon
Copy link
Member Author

cc @rdblue, @vanzin and @haydenjeune

@HyukjinKwon HyukjinKwon changed the title [SPARK-26080][SQL] Disable 'spark.executor.pyspark.memory' always on Windows [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.memory' always on Windows Nov 16, 2018
@rdblue
Copy link
Contributor

rdblue commented Nov 16, 2018

Thanks for fixing this so quickly, @HyukjinKwon! I'd like a couple of changes, but overall it is going in the right direction.

We should also plan on porting this to the 2.4 branch when it is committed since it is a regression.

@SparkQA
Copy link

SparkQA commented Nov 16, 2018

Test build #98892 has finished for PR 23055 at commit 2d3315a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2018

Test build #98952 has finished for PR 23055 at commit 52a91cc.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 17, 2018

Test build #98962 has finished for PR 23055 at commit 52a91cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

adding @BryanCutler and @ueshin as well FYI.

@viirya
Copy link
Member

viirya commented Nov 18, 2018

btw, there is a typo: a Unit specific package, in the PR description.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Nov 21, 2018

Test build #99120 has finished for PR 23055 at commit 52a91cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Sorry, may I ask to take another look please? I thought it's quite a straightforward fix by a consistent way of fixing.

@SparkQA
Copy link

SparkQA commented Nov 22, 2018

Test build #99148 has finished for PR 23055 at commit fd92a4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@HyukjinKwon
Copy link
Member Author

Let me get this in in few days if there are no more comments.

@SparkQA
Copy link

SparkQA commented Nov 26, 2018

Test build #99278 has finished for PR 23055 at commit fd92a4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.memory' always on Windows [SPARK-26080][PYTHON] Skips Python resource limit on Windows in Python worker Nov 30, 2018
@SparkQA
Copy link

SparkQA commented Nov 30, 2018

Test build #99490 has finished for PR 23055 at commit 741d64a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 30, 2018

Test build #99491 has finished for PR 23055 at commit bd1acf2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue
Copy link
Contributor

rdblue commented Nov 30, 2018

+1 once the docs are updated to note that resource requests still include python memory, even in Windows.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Dec 1, 2018

@vanzin and @rdblue, I updated the doc because it sounds not wrong to me. But, for clarification, we shouldn't really document we support something that's not tested (in particular such case above that the failure case was found). Also, IMHO, it's better to make it simple when there's Windows issue in terms of maintenance since not so many Windows maintainers exist in Spark.

The main functionality of that configuration is limiting resource if i'm not mistaken. The allocation ones in some modes is secondary if I am not mistaken. Technically someone should test it and fix the doc with showing how it works in another PR.

@SparkQA
Copy link

SparkQA commented Dec 1, 2018

Test build #99531 has finished for PR 23055 at commit 5fe1c09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue
Copy link
Contributor

rdblue commented Dec 1, 2018

+1 with the latest changes. Thanks for taking care of this, @HyukjinKwon!

Functionality is in two parts: changing the resource requests (which doesn't change) and limiting memory use in python.

It is too bad that this broke, but I'm not sure how to deal with a platform that, as you say, has few contributors. I certainly wouldn't want to gate a feature like this on making sure someone tested it in Windows, unless we have CI set up for Windows builds.

@HyukjinKwon
Copy link
Member Author

Thanks, @rdblue.

Merged to master and branch-2.4.

@asfgit asfgit closed this in 9cda9a8 Dec 2, 2018
@rdblue
Copy link
Contributor

rdblue commented Dec 3, 2018

@HyukjinKwon, for the future, I should note that I'm not a committer so my +1 for a PR is not binding. I'm fairly sure @vanzin would +1 this commit as well, but it's best not to merge based on my approval.

@vanzin
Copy link
Contributor

vanzin commented Dec 3, 2018

(Belated +1.) Doc update looks fine. The previous one was misleading for reasons that Ryan explains above, it has nothing to do with whether it's Windows or not.

@HyukjinKwon
Copy link
Member Author

Last comment was a minor comment for a doc - actually the whole point was a minor one. It does related with Windows.

@manojfaria
Copy link

manojfaria commented Feb 22, 2019

I am new to this. So this may be a simple/stupid question. When is this fix getting released? I suppose this fix is planned for Apache Spark 2.4.1 for which pre-built binaries are not yet available for download (sounds right?). If I need this fix (even if it is not a stable release) how do I get the pre-built binaries for 2.4.1 release?

@BryanCutler
Copy link
Member

@manojfaria , this fix will be in the 2.4.1 and 3.0.0 releases, see SPARK-26080. If you want it before then, you will have to apply the patch and build Spark yourself. The patch can be downloaded by adding a .diff to the pull request link. Hope that helps!

@parshvms
Copy link

@HyukjinKwon thanks for the fix. Workers run well with 2.4.0 now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants