Skip to content

Conversation

@mikejihbe
Copy link
Contributor

What changes were proposed in this pull request?

Add a SPARK_NO_DAEMONIZE environment variable flag to spark-daemon.sh that causes the process it would run to be run in the foreground.

It looks like there has been some prior work in #3881, but there was some talk about these being refactored. I'm not sure if that happened or not, but that PR is almost 2 years old at this point so it was worth revisiting.

How was this patch tested?

./dev/run-tests still seems to work. It doesn't look like these scripts have tests, but if I missed them just let me know.

Copy link
Member

@jodersky jodersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Looks good overall, I have some minor suggestions.


execute_command() {
command="$@"
if [ "$SPARK_NO_DAEMONIZE" != "" ]; then
Copy link
Member

@jodersky jodersky Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only checks if the variable is non-empty. Setting it to the empty string will not work. E.g. export SPARK_NO_DAEMONIZE='' will not satisfy the condition.

Something like if [ -z ${SPARK_NO_DAEMONIZE+set} ]; will also handle the null-case (see this SO thread). Another, more elegant, solution would be to use the [[ -v SPARK_NO_DAEMONIZE]], however that requires at least bash 4.2 and is probably not available on all systems that spark runs on.

execute_command() {
command="$@"
if [ "$SPARK_NO_DAEMONIZE" != "" ]; then
eval $command
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think eval is required here

if [ "$SPARK_NO_DAEMONIZE" != "" ]; then
eval $command
else
eval "nohup $command >> \"$log\" 2>&1 < /dev/null &"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

(class)
nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-class $command "$@" >> "$log" 2>&1 < /dev/null &
newpid="$!"
execute_command "nice -n \"$SPARK_NICENESS\" \"${SPARK_HOME}/bin/spark-class\" $command $@"
Copy link
Member

@jodersky jodersky Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the outer-most quotes aren't required without the eval.

execute_command nice -n "$SPARK_NICENESS" "${SPARK_HOME}/bin/spark-class" $command $@ should do the trick

(submit)
nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-submit --class $command "$@" >> "$log" 2>&1 < /dev/null &
newpid="$!"
execute_command "nice -n \"$SPARK_NICENESS\" \"${SPARK_HOME}/bin/spark-submit\" --class $command $@"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

fi

execute_command() {
command="$@"
Copy link
Member

@jodersky jodersky Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark scripts require bash as interpreter, so we can make this a local variable

@srowen
Copy link
Member

srowen commented Oct 11, 2016

Jenkins add to whitelist

@SparkQA
Copy link

SparkQA commented Oct 11, 2016

Test build #66759 has finished for PR 15338 at commit cb89755.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • execute_command \"nice -n \"$SPARK_NICENESS\" \"$

@mikejihbe
Copy link
Contributor Author

Thanks for the review @srowen. Those changes are in.

@SparkQA
Copy link

SparkQA commented Oct 12, 2016

Test build #66770 has finished for PR 15338 at commit 42c9874.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • execute_command nice -n $SPARK_NICENESS $
    • execute_command nice -n $SPARK_NICENESS bash $

@jodersky
Copy link
Member

LGTM. Let's see if @nchammas has some feedback. He had some very helpful comments in #3881

Copy link
Contributor

@nchammas nchammas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are adding a new variable that is now part of the interface, we should probably also add it to spark-env.sh.template.

That said, I am generally not a fan of adding new interface options via environment variables (as opposed to via explicit command-line options) even though I appreciate that environment variables are the easiest to use at this time in this script.

I would rather we used getopt() or even converted this script to Python, but that's probably more work than we may be looking to do at this time. Anyway, I just wanted to raise this point since, if we add this in, we are committing to it through the 2.x lifecycle and probably beyond.

(class)
nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-class $command "$@" >> "$log" 2>&1 < /dev/null &
newpid="$!"
execute_command nice -n $SPARK_NICENESS ${SPARK_HOME}/bin/spark-class $command $@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SPARK_HOME contains spaces, this will break. I recommend quoting both SPARK_HOME and SPARK_NICENESS as they were before.

(submit)
nohup nice -n "$SPARK_NICENESS" "${SPARK_HOME}"/bin/spark-submit --class $command "$@" >> "$log" 2>&1 < /dev/null &
newpid="$!"
execute_command nice -n $SPARK_NICENESS bash ${SPARK_HOME}/bin/spark-submit --class $command $@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: I would quote the SPARK_ environment variables.


echo "$newpid" > "$pid"

#Poll for up to 5 seconds for the java process to start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Space after #. (I know it was like this before your PR.)

Copy link
Contributor

@nchammas nchammas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, given my earlier caveats about the interface. That's probably something for a committer to comment on.

@SparkQA
Copy link

SparkQA commented Oct 14, 2016

Test build #66969 has finished for PR 15338 at commit 7010b28.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • execute_command nice -n \"$SPARK_NICENESS\" \"$
    • execute_command nice -n \"$SPARK_NICENESS\" bash \"$

@jodersky
Copy link
Member

jodersky commented Oct 14, 2016

You raise a good point about having to commit to support environment variables in future versions. A command-line option would be preferable as a long-term solution.

Nevertheless, command-line options and environment variables aren't mutually exclusive, so it would also be possible to merge this as it stands and add command line options later. I guess the question comes down to how urgent this feature is and whether it warrants the extra maintenance?

@srowen
Copy link
Member

srowen commented Oct 18, 2016

Agree that env variables are not a great way to do anything, but that ship has sailed. I don't think we'd remove the 10 that are already here, and if this changed, would be a total rewrite. For that reason I don't mind piling on the problem a tiny bit to add this feature.

@mikejihbe
Copy link
Contributor Author

Thanks for the input guys. I'm new to the community, so...I'm unfamiliar
with the process. Who makes the final decision here? I have a slight
preference to run with environment variables, but if you'd like a CLI flag
I can do that too.

On Tue, Oct 18, 2016 at 2:17 AM, Sean Owen [email protected] wrote:

Agree that env variables are not a great way to do anything, but that ship
has sailed. I don't think we'd remove the 10 that are already here, and if
this changed, would be a total rewrite. For that reason I don't mind piling
on the problem a tiny bit to add this feature.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#15338 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AADNNp5ySnewFNykDqchzo3ml5HQ3Mfyks5q1I7HgaJpZM4KNLkF
.

Mike Ihbe
MustWin - Principal

[email protected]
[email protected]
skype: mikeihbe
Cell: 651.283.0815

@srowen
Copy link
Member

srowen commented Oct 18, 2016

The practical answer is that someone who can commit it has to want to commit it, with nobody objecting. I can commit and am mildly in favor. I tend to leave it open a day or two to let any discussion play out and make the change most acceptable to anyone interested enough to comment, and then merge it if there are no objections.

@jodersky
Copy link
Member

no objections from me ;)

@srowen
Copy link
Member

srowen commented Oct 20, 2016

Merged to master

@asfgit asfgit closed this in c2c107a Oct 20, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
## What changes were proposed in this pull request?

Add a SPARK_NO_DAEMONIZE environment variable flag to spark-daemon.sh that causes the process it would run to be run in the foreground.

It looks like there has been some prior work in apache#3881, but there was some talk about these being refactored. I'm not sure if that happened or not, but that PR is almost 2 years old at this point so it was worth revisiting.

## How was this patch tested?

./dev/run-tests still seems to work. It doesn't look like these scripts have tests, but if I missed them just let me know.

Author: Mike Ihbe <[email protected]>

Closes apache#15338 from mikejihbe/SPARK-11653.
asfgit pushed a commit that referenced this pull request Dec 1, 2016
…ows Unrecognized option

## What changes were proposed in this pull request?

spark-daemon.sh will lost single quotes around after #15338. as follows:
```
execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
```
With this fix, as follows:
```
execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 'Thrift JDBC/ODBC Server' --conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'
```

## How was this patch tested?

- Manual tests
- Build the package and start-thriftserver.sh with `--conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'`

Author: Yuming Wang <[email protected]>

Closes #16079 from wangyum/SPARK-18645.

(cherry picked from commit 2ab8551)
Signed-off-by: Sean Owen <[email protected]>
asfgit pushed a commit that referenced this pull request Dec 1, 2016
…ows Unrecognized option

## What changes were proposed in this pull request?

spark-daemon.sh will lost single quotes around after #15338. as follows:
```
execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
```
With this fix, as follows:
```
execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 'Thrift JDBC/ODBC Server' --conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'
```

## How was this patch tested?

- Manual tests
- Build the package and start-thriftserver.sh with `--conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'`

Author: Yuming Wang <[email protected]>

Closes #16079 from wangyum/SPARK-18645.
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
…ows Unrecognized option

## What changes were proposed in this pull request?

spark-daemon.sh will lost single quotes around after apache#15338. as follows:
```
execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
```
With this fix, as follows:
```
execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 'Thrift JDBC/ODBC Server' --conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'
```

## How was this patch tested?

- Manual tests
- Build the package and start-thriftserver.sh with `--conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'`

Author: Yuming Wang <[email protected]>

Closes apache#16079 from wangyum/SPARK-18645.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Add a SPARK_NO_DAEMONIZE environment variable flag to spark-daemon.sh that causes the process it would run to be run in the foreground.

It looks like there has been some prior work in apache#3881, but there was some talk about these being refactored. I'm not sure if that happened or not, but that PR is almost 2 years old at this point so it was worth revisiting.

## How was this patch tested?

./dev/run-tests still seems to work. It doesn't look like these scripts have tests, but if I missed them just let me know.

Author: Mike Ihbe <[email protected]>

Closes apache#15338 from mikejihbe/SPARK-11653.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ows Unrecognized option

## What changes were proposed in this pull request?

spark-daemon.sh will lost single quotes around after apache#15338. as follows:
```
execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
```
With this fix, as follows:
```
execute_command nice -n 0 bash /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 'Thrift JDBC/ODBC Server' --conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'
```

## How was this patch tested?

- Manual tests
- Build the package and start-thriftserver.sh with `--conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'`

Author: Yuming Wang <[email protected]>

Closes apache#16079 from wangyum/SPARK-18645.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants