-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26080][PYTHON] Skips Python resource limit on Windows in Python worker #23055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @rdblue, @vanzin and @haydenjeune |
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
Outdated
Show resolved
Hide resolved
|
Thanks for fixing this so quickly, @HyukjinKwon! I'd like a couple of changes, but overall it is going in the right direction. We should also plan on porting this to the 2.4 branch when it is committed since it is a regression. |
|
Test build #98892 has finished for PR 23055 at commit
|
|
Test build #98952 has finished for PR 23055 at commit
|
|
retest this please |
|
Test build #98962 has finished for PR 23055 at commit
|
|
adding @BryanCutler and @ueshin as well FYI. |
|
btw, there is a typo: |
|
Retest this please. |
|
Test build #99120 has finished for PR 23055 at commit
|
|
Sorry, may I ask to take another look please? I thought it's quite a straightforward fix by a consistent way of fixing. |
|
Test build #99148 has finished for PR 23055 at commit
|
|
retest this please |
|
Let me get this in in few days if there are no more comments. |
|
Test build #99278 has finished for PR 23055 at commit
|
|
Test build #99490 has finished for PR 23055 at commit
|
|
Test build #99491 has finished for PR 23055 at commit
|
|
+1 once the docs are updated to note that resource requests still include python memory, even in Windows. |
|
@vanzin and @rdblue, I updated the doc because it sounds not wrong to me. But, for clarification, we shouldn't really document we support something that's not tested (in particular such case above that the failure case was found). Also, IMHO, it's better to make it simple when there's Windows issue in terms of maintenance since not so many Windows maintainers exist in Spark. The main functionality of that configuration is limiting resource if i'm not mistaken. The allocation ones in some modes is secondary if I am not mistaken. Technically someone should test it and fix the doc with showing how it works in another PR. |
|
Test build #99531 has finished for PR 23055 at commit
|
|
+1 with the latest changes. Thanks for taking care of this, @HyukjinKwon! Functionality is in two parts: changing the resource requests (which doesn't change) and limiting memory use in python. It is too bad that this broke, but I'm not sure how to deal with a platform that, as you say, has few contributors. I certainly wouldn't want to gate a feature like this on making sure someone tested it in Windows, unless we have CI set up for Windows builds. |
|
Thanks, @rdblue. Merged to master and branch-2.4. |
|
@HyukjinKwon, for the future, I should note that I'm not a committer so my +1 for a PR is not binding. I'm fairly sure @vanzin would +1 this commit as well, but it's best not to merge based on my approval. |
|
(Belated +1.) Doc update looks fine. The previous one was misleading for reasons that Ryan explains above, it has nothing to do with whether it's Windows or not. |
|
Last comment was a minor comment for a doc - actually the whole point was a minor one. It does related with Windows. |
|
I am new to this. So this may be a simple/stupid question. When is this fix getting released? I suppose this fix is planned for Apache Spark 2.4.1 for which pre-built binaries are not yet available for download (sounds right?). If I need this fix (even if it is not a stable release) how do I get the pre-built binaries for 2.4.1 release? |
|
@manojfaria , this fix will be in the 2.4.1 and 3.0.0 releases, see SPARK-26080. If you want it before then, you will have to apply the patch and build Spark yourself. The patch can be downloaded by adding a |
|
@HyukjinKwon thanks for the fix. Workers run well with 2.4.0 now. |
What changes were proposed in this pull request?
resourcepackage is a Unix specific package. See https://docs.python.org/2/library/resource.html and https://docs.python.org/3/library/resource.html.Note that we document Windows support:
This should be backported into branch-2.4 to restore Windows support in Spark 2.4.1.
How was this patch tested?
Manually mocking the changed logics.