-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25906][SHELL] Documents '-I' option (from Scala REPL) in spark-shell #22919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @dbtsai, @dongjoon-hyun, @gatorsmile and @cloud-fan |
|
Note that some commands like |
|
cc @felixcheung as well. This is similar code path with apache/zeppelin#3206 |
|
Test build #98350 has finished for PR 22919 at commit
|
|
hmm, I think this follows previous behavior, but I'm wondering should we follow scala REPL to replace |
|
That's exactly what I initially thought. At least we could in 3.0.0 but I think we should keep it at 2.4.x. |
|
so we would support both |
|
Yup. That's what I thought. and treat them as same |
|
personally I think Spark Shell should be consistent with the upstream Scala Shell, otherwise we may get another ticket complaining why we didn't follow... |
|
I was actually thinking that hard about that. My conclusion was that basically we should but In 3.0.0, actually I think we should disallow both |
|
Note that |
|
Also, the difference introduced between I don't superduper strongly feel about it but it was my conclusion. I will defer to you guys here about going ahead in this way or not. |
|
+1 for @HyukjinKwon 's opinion . I also want to know the community decision. |
|
I'm also on @cloud-fan's side---we should keep it consistent with the upstream Scala Shell. However, we should document it on |
|
Thanks, @dbtsai . That sounds reasonable and like a helpful way for the users. |
|
Tough call. At least it's worth documenting that this behavior changed because so did the Scala shell's behavior, and I'd support that. I'd also support supporting both, if there's no real downside to that option? |
|
This is weird. The most obvious downside is that there's no way to get the new |
|
True. but it's a bit difficult to say it's a downside because For documentation, I can mention we can use Scala shell flags. Looks @cloud-fan is going to leave this JIRA as a known issue so I guess we are good. Maybe I can leave an additional note in migration guide as well if we're not going ahead to fix it. |
|
Let me go ahead with documenting one then tomorrow. |
17b3c6c to
0e318ac
Compare
|
|
||
| rem two empty lines are required | ||
| set _SPARK_CMD_USAGE=Usage: .\bin\spark-shell.cmd [options]^%LF%%LF%^%LF%%LF%^ | ||
| Scala REPL options:^%LF%%LF%^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems no claver way then this to set newlines in variables in batch files.
|
Should be ready for a review. |
|
|
||
| rem two empty lines are required | ||
| set _SPARK_CMD_USAGE=Usage: .\bin\spark-shell.cmd [options]^%LF%%LF%^%LF%%LF%^ | ||
| Scala REPL options:^%LF%%LF%^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Script specific information looks included in _SPARK_CMD_USAGE - looks it's appropriate place then somewhere in SparkSubmitArguments.printUsageAndExit.
|
Test build #98425 has finished for PR 22919 at commit
|
|
Let me get this in to master and branch-2.4 in few days if there are no more comments. |
| export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options] | ||
| Scala REPL options: | ||
| -I <file> preload <file>, enforcing line-by-line interpretation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we define other options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested other options and this one looks only the valid one. I described in PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I didn't find
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
in the shell script. Where do we define them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh haha. Sorry. That's in SparkSubmitArguments.printUsageAndExit.thats why I left #22919 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also define -i behavior here? I think for now this option is also accepted by the REPL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea .. but -i doesn't handle implicits like toDF or symbols which are pretty basic ones (unless it's explicitly imported within user's program). I think we should better avoid to document it for now.
|
Merged to master and branch-2.4. |
…-shell
## What changes were proposed in this pull request?
This PR targets to document `-I` option from Spark 2.4.x (previously `-i` option until Spark 2.3.x).
After we upgraded Scala to 2.11.12, `-i` option (`:load`) was replaced to `-I`(SI-7898). Existing `-i` became `:paste` which does not respect Spark's implicit import (for instance `toDF`, symbol as column, etc.). Therefore, `-i` option does not correctly from Spark 2.4.x and it's not documented.
I checked other Scala REPL options but looks not applicable or working from quick tests. This PR only targets to document `-I` for now.
## How was this patch tested?
Manually tested.
**Mac:**
```bash
$ ./bin/spark-shell --help
Usage: ./bin/spark-shell [options]
Scala REPL options:
-I <file> preload <file>, enforcing line-by-line interpretation
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
...
```
**Windows:**
```cmd
C:\...\spark>.\bin\spark-shell --help
Usage: .\bin\spark-shell.cmd [options]
Scala REPL options:
-I <file> preload <file>, enforcing line-by-line interpretation
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
...
```
Closes #22919 from HyukjinKwon/SPARK-25906.
Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
(cherry picked from commit cc38abc)
Signed-off-by: hyukjinkwon <[email protected]>
…-shell
## What changes were proposed in this pull request?
This PR targets to document `-I` option from Spark 2.4.x (previously `-i` option until Spark 2.3.x).
After we upgraded Scala to 2.11.12, `-i` option (`:load`) was replaced to `-I`(SI-7898). Existing `-i` became `:paste` which does not respect Spark's implicit import (for instance `toDF`, symbol as column, etc.). Therefore, `-i` option does not correctly from Spark 2.4.x and it's not documented.
I checked other Scala REPL options but looks not applicable or working from quick tests. This PR only targets to document `-I` for now.
## How was this patch tested?
Manually tested.
**Mac:**
```bash
$ ./bin/spark-shell --help
Usage: ./bin/spark-shell [options]
Scala REPL options:
-I <file> preload <file>, enforcing line-by-line interpretation
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
...
```
**Windows:**
```cmd
C:\...\spark>.\bin\spark-shell --help
Usage: .\bin\spark-shell.cmd [options]
Scala REPL options:
-I <file> preload <file>, enforcing line-by-line interpretation
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
...
```
Closes apache#22919 from HyukjinKwon/SPARK-25906.
Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
…-shell
## What changes were proposed in this pull request?
This PR targets to document `-I` option from Spark 2.4.x (previously `-i` option until Spark 2.3.x).
After we upgraded Scala to 2.11.12, `-i` option (`:load`) was replaced to `-I`(SI-7898). Existing `-i` became `:paste` which does not respect Spark's implicit import (for instance `toDF`, symbol as column, etc.). Therefore, `-i` option does not correctly from Spark 2.4.x and it's not documented.
I checked other Scala REPL options but looks not applicable or working from quick tests. This PR only targets to document `-I` for now.
## How was this patch tested?
Manually tested.
**Mac:**
```bash
$ ./bin/spark-shell --help
Usage: ./bin/spark-shell [options]
Scala REPL options:
-I <file> preload <file>, enforcing line-by-line interpretation
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
...
```
**Windows:**
```cmd
C:\...\spark>.\bin\spark-shell --help
Usage: .\bin\spark-shell.cmd [options]
Scala REPL options:
-I <file> preload <file>, enforcing line-by-line interpretation
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
...
```
Closes apache#22919 from HyukjinKwon/SPARK-25906.
Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
(cherry picked from commit cc38abc)
Signed-off-by: hyukjinkwon <[email protected]>
…-shell
## What changes were proposed in this pull request?
This PR targets to document `-I` option from Spark 2.4.x (previously `-i` option until Spark 2.3.x).
After we upgraded Scala to 2.11.12, `-i` option (`:load`) was replaced to `-I`(SI-7898). Existing `-i` became `:paste` which does not respect Spark's implicit import (for instance `toDF`, symbol as column, etc.). Therefore, `-i` option does not correctly from Spark 2.4.x and it's not documented.
I checked other Scala REPL options but looks not applicable or working from quick tests. This PR only targets to document `-I` for now.
## How was this patch tested?
Manually tested.
**Mac:**
```bash
$ ./bin/spark-shell --help
Usage: ./bin/spark-shell [options]
Scala REPL options:
-I <file> preload <file>, enforcing line-by-line interpretation
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
...
```
**Windows:**
```cmd
C:\...\spark>.\bin\spark-shell --help
Usage: .\bin\spark-shell.cmd [options]
Scala REPL options:
-I <file> preload <file>, enforcing line-by-line interpretation
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
...
```
Closes apache#22919 from HyukjinKwon/SPARK-25906.
Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
(cherry picked from commit cc38abc)
Signed-off-by: hyukjinkwon <[email protected]>
What changes were proposed in this pull request?
This PR targets to document
-Ioption from Spark 2.4.x (previously-ioption until Spark 2.3.x).After we upgraded Scala to 2.11.12,
-ioption (:load) was replaced to-I(SI-7898). Existing-ibecame:pastewhich does not respect Spark's implicit import (for instancetoDF, symbol as column, etc.). Therefore,-ioption does not correctly from Spark 2.4.x and it's not documented.I checked other Scala REPL options but looks not applicable or working from quick tests. This PR only targets to document
-Ifor now.How was this patch tested?
Manually tested.
Mac:
Windows: