-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-11653][Deploy] Allow spark-daemon.sh to run in the foreground #15338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…reground operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. Looks good overall, I have some minor suggestions.
sbin/spark-daemon.sh
Outdated
|
|
||
| execute_command() { | ||
| command="$@" | ||
| if [ "$SPARK_NO_DAEMONIZE" != "" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only checks if the variable is non-empty. Setting it to the empty string will not work. E.g. export SPARK_NO_DAEMONIZE='' will not satisfy the condition.
Something like if [ -z ${SPARK_NO_DAEMONIZE+set} ]; will also handle the null-case (see this SO thread). Another, more elegant, solution would be to use the [[ -v SPARK_NO_DAEMONIZE]], however that requires at least bash 4.2 and is probably not available on all systems that spark runs on.
sbin/spark-daemon.sh
Outdated
| execute_command() { | ||
| command="$@" | ||
| if [ "$SPARK_NO_DAEMONIZE" != "" ]; then | ||
| eval $command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think eval is required here
sbin/spark-daemon.sh
Outdated
| if [ "$SPARK_NO_DAEMONIZE" != "" ]; then | ||
| eval $command | ||
| else | ||
| eval "nohup $command >> \"$log\" 2>&1 < /dev/null &" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
sbin/spark-daemon.sh
Outdated
| (class) | ||
| nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-class $command "$@" >> "$log" 2>&1 < /dev/null & | ||
| newpid="$!" | ||
| execute_command "nice -n \"$SPARK_NICENESS\" \"${SPARK_HOME}/bin/spark-class\" $command $@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the outer-most quotes aren't required without the eval.
execute_command nice -n "$SPARK_NICENESS" "${SPARK_HOME}/bin/spark-class" $command $@ should do the trick
sbin/spark-daemon.sh
Outdated
| (submit) | ||
| nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-submit --class $command "$@" >> "$log" 2>&1 < /dev/null & | ||
| newpid="$!" | ||
| execute_command "nice -n \"$SPARK_NICENESS\" \"${SPARK_HOME}/bin/spark-submit\" --class $command $@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
sbin/spark-daemon.sh
Outdated
| fi | ||
|
|
||
| execute_command() { | ||
| command="$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark scripts require bash as interpreter, so we can make this a local variable
|
Jenkins add to whitelist |
|
Test build #66759 has finished for PR 15338 at commit
|
… make variables local.
|
Thanks for the review @srowen. Those changes are in. |
|
Test build #66770 has finished for PR 15338 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are adding a new variable that is now part of the interface, we should probably also add it to spark-env.sh.template.
That said, I am generally not a fan of adding new interface options via environment variables (as opposed to via explicit command-line options) even though I appreciate that environment variables are the easiest to use at this time in this script.
I would rather we used getopt() or even converted this script to Python, but that's probably more work than we may be looking to do at this time. Anyway, I just wanted to raise this point since, if we add this in, we are committing to it through the 2.x lifecycle and probably beyond.
sbin/spark-daemon.sh
Outdated
| (class) | ||
| nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-class $command "$@" >> "$log" 2>&1 < /dev/null & | ||
| newpid="$!" | ||
| execute_command nice -n $SPARK_NICENESS ${SPARK_HOME}/bin/spark-class $command $@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If SPARK_HOME contains spaces, this will break. I recommend quoting both SPARK_HOME and SPARK_NICENESS as they were before.
sbin/spark-daemon.sh
Outdated
| (submit) | ||
| nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-submit --class $command "$@" >> "$log" 2>&1 < /dev/null & | ||
| newpid="$!" | ||
| execute_command nice -n $SPARK_NICENESS bash ${SPARK_HOME}/bin/spark-submit --class $command $@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: I would quote the SPARK_ environment variables.
sbin/spark-daemon.sh
Outdated
|
|
||
| echo "$newpid" > "$pid" | ||
|
|
||
| #Poll for up to 5 seconds for the java process to start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Space after #. (I know it was like this before your PR.)
… spark-env.sh.template file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, given my earlier caveats about the interface. That's probably something for a committer to comment on.
|
Test build #66969 has finished for PR 15338 at commit
|
|
You raise a good point about having to commit to support environment variables in future versions. A command-line option would be preferable as a long-term solution. Nevertheless, command-line options and environment variables aren't mutually exclusive, so it would also be possible to merge this as it stands and add command line options later. I guess the question comes down to how urgent this feature is and whether it warrants the extra maintenance? |
|
Agree that env variables are not a great way to do anything, but that ship has sailed. I don't think we'd remove the 10 that are already here, and if this changed, would be a total rewrite. For that reason I don't mind piling on the problem a tiny bit to add this feature. |
|
Thanks for the input guys. I'm new to the community, so...I'm unfamiliar On Tue, Oct 18, 2016 at 2:17 AM, Sean Owen [email protected] wrote:
Mike Ihbe [email protected] |
|
The practical answer is that someone who can commit it has to want to commit it, with nobody objecting. I can commit and am mildly in favor. I tend to leave it open a day or two to let any discussion play out and make the change most acceptable to anyone interested enough to comment, and then merge it if there are no objections. |
|
no objections from me ;) |
|
Merged to master |
## What changes were proposed in this pull request? Add a SPARK_NO_DAEMONIZE environment variable flag to spark-daemon.sh that causes the process it would run to be run in the foreground. It looks like there has been some prior work in apache#3881, but there was some talk about these being refactored. I'm not sure if that happened or not, but that PR is almost 2 years old at this point so it was worth revisiting. ## How was this patch tested? ./dev/run-tests still seems to work. It doesn't look like these scripts have tests, but if I missed them just let me know. Author: Mike Ihbe <[email protected]> Closes apache#15338 from mikejihbe/SPARK-11653.
…ows Unrecognized option ## What changes were proposed in this pull request? spark-daemon.sh will lost single quotes around after #15338. as follows: ``` execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp ``` With this fix, as follows: ``` execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 'Thrift JDBC/ODBC Server' --conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp' ``` ## How was this patch tested? - Manual tests - Build the package and start-thriftserver.sh with `--conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'` Author: Yuming Wang <[email protected]> Closes #16079 from wangyum/SPARK-18645. (cherry picked from commit 2ab8551) Signed-off-by: Sean Owen <[email protected]>
…ows Unrecognized option ## What changes were proposed in this pull request? spark-daemon.sh will lost single quotes around after #15338. as follows: ``` execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp ``` With this fix, as follows: ``` execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 'Thrift JDBC/ODBC Server' --conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp' ``` ## How was this patch tested? - Manual tests - Build the package and start-thriftserver.sh with `--conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'` Author: Yuming Wang <[email protected]> Closes #16079 from wangyum/SPARK-18645.
…ows Unrecognized option ## What changes were proposed in this pull request? spark-daemon.sh will lost single quotes around after apache#15338. as follows: ``` execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp ``` With this fix, as follows: ``` execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 'Thrift JDBC/ODBC Server' --conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp' ``` ## How was this patch tested? - Manual tests - Build the package and start-thriftserver.sh with `--conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'` Author: Yuming Wang <[email protected]> Closes apache#16079 from wangyum/SPARK-18645.
## What changes were proposed in this pull request? Add a SPARK_NO_DAEMONIZE environment variable flag to spark-daemon.sh that causes the process it would run to be run in the foreground. It looks like there has been some prior work in apache#3881, but there was some talk about these being refactored. I'm not sure if that happened or not, but that PR is almost 2 years old at this point so it was worth revisiting. ## How was this patch tested? ./dev/run-tests still seems to work. It doesn't look like these scripts have tests, but if I missed them just let me know. Author: Mike Ihbe <[email protected]> Closes apache#15338 from mikejihbe/SPARK-11653.
…ows Unrecognized option ## What changes were proposed in this pull request? spark-daemon.sh will lost single quotes around after apache#15338. as follows: ``` execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp ``` With this fix, as follows: ``` execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 'Thrift JDBC/ODBC Server' --conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp' ``` ## How was this patch tested? - Manual tests - Build the package and start-thriftserver.sh with `--conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'` Author: Yuming Wang <[email protected]> Closes apache#16079 from wangyum/SPARK-18645.
What changes were proposed in this pull request?
Add a SPARK_NO_DAEMONIZE environment variable flag to spark-daemon.sh that causes the process it would run to be run in the foreground.
It looks like there has been some prior work in #3881, but there was some talk about these being refactored. I'm not sure if that happened or not, but that PR is almost 2 years old at this point so it was worth revisiting.
How was this patch tested?
./dev/run-tests still seems to work. It doesn't look like these scripts have tests, but if I missed them just let me know.