Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Nov 1, 2018

What changes were proposed in this pull request?

This PR targets to document -I option from Spark 2.4.x (previously -i option until Spark 2.3.x).
After we upgraded Scala to 2.11.12, -i option (:load) was replaced to -I(SI-7898). Existing -i became :paste which does not respect Spark's implicit import (for instance toDF, symbol as column, etc.). Therefore, -i option does not correctly from Spark 2.4.x and it's not documented.

I checked other Scala REPL options but looks not applicable or working from quick tests. This PR only targets to document -I for now.

How was this patch tested?

Manually tested.

Mac:

$ ./bin/spark-shell --help
Usage: ./bin/spark-shell [options]

Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
...

Windows:

C:\...\spark>.\bin\spark-shell --help
Usage: .\bin\spark-shell.cmd [options]

Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
...

@HyukjinKwon
Copy link
Member Author

cc @dbtsai, @dongjoon-hyun, @gatorsmile and @cloud-fan

@HyukjinKwon HyukjinKwon changed the title [SPARK-25906][SHELL] Revive -i option in spark-shell [SPARK-25906][SHELL] Revives '-i' option in spark-shell Nov 1, 2018
@HyukjinKwon
Copy link
Member Author

Note that some commands like :replay wouldn't work as well if -i option is given in the shell since :paste itself looks not working. This PR also fixes it.

@HyukjinKwon
Copy link
Member Author

cc @felixcheung as well. This is similar code path with apache/zeppelin#3206

@SparkQA
Copy link

SparkQA commented Nov 1, 2018

Test build #98350 has finished for PR 22919 at commit 17b3c6c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Nov 1, 2018

hmm, I think this follows previous behavior, but I'm wondering should we follow scala REPL to replace -i with -I?

@HyukjinKwon
Copy link
Member Author

That's exactly what I initially thought. At least we could in 3.0.0 but I think we should keep it at 2.4.x.

@HyukjinKwon HyukjinKwon changed the title [SPARK-25906][SHELL] Revives '-i' option in spark-shell [SPARK-25906][SHELL] Restores '-i' option's behaviour in spark-shell Nov 1, 2018
@cloud-fan
Copy link
Contributor

so we would support both -i and -I in 2.4?

@HyukjinKwon
Copy link
Member Author

Yup. That's what I thought. and treat them as same

@cloud-fan
Copy link
Contributor

personally I think Spark Shell should be consistent with the upstream Scala Shell, otherwise we may get another ticket complaining why we didn't follow...

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Nov 1, 2018

I was actually thinking that hard about that. My conclusion was that basically we should but spark-shell is a Spark entry point where we make a release - so I thought it's better to keep it in 2.4.x.

In 3.0.0, actually I think we should disallow both -i and -I like pyspark or sparkr shell (we allowed file before but not anymore).

@HyukjinKwon
Copy link
Member Author

Note that python shell also takes file but pyspark shell doesn't:

$ cat tmp.py
print "a"
$ python2 tmp.py
a
$ ./bin/pyspark tmp.py
Running python applications through 'pyspark' is not supported as of Spark 2.0.
Use ./bin/spark-submit <python file>

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Nov 1, 2018

Also, the difference introduced between -i and -I sounds rather a bug fix (SI-7898), which already exists in previous shell. I ended up with thinking that it might be more important that end users upgrade Spark and face such issues than syncing it to Scala 2.11.12 shell at Spark 2.4.x.

I don't superduper strongly feel about it but it was my conclusion. I will defer to you guys here about going ahead in this way or not.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 1, 2018

+1 for @HyukjinKwon 's opinion . I also want to know the community decision.

@dbtsai
Copy link
Member

dbtsai commented Nov 1, 2018

I'm also on @cloud-fan's side---we should keep it consistent with the upstream Scala Shell. However, we should document it on ./bin/spark-shell --help, so when a user complains or files a ticket, we can refer them to the doc. Thanks.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 1, 2018

Thanks, @dbtsai . That sounds reasonable and like a helpful way for the users.

@HyukjinKwon
Copy link
Member Author

Thabks, @dbtsai. Also adding @vanzin and @srowen wdyt about -i options at 2.4.x and 3.0.0?

@srowen
Copy link
Member

srowen commented Nov 1, 2018

Tough call. At least it's worth documenting that this behavior changed because so did the Scala shell's behavior, and I'd support that. I'd also support supporting both, if there's no real downside to that option?

@vanzin
Copy link
Contributor

vanzin commented Nov 1, 2018

This is weird. The most obvious downside is that there's no way to get the new -i functionality in case anyone needs it. A blurb in --help or even a release note explaining the changed behavior would be good enough, I think.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Nov 1, 2018

True. but it's a bit difficult to say it's a downside because -i has a quite major issue as described above - user's code suddenly does not work with implicit method (like encoder, symbol 'id or .toDF which are pretty common) in minor release bumpup. Also, looks the only reason why changed :load to :paste in -i option is a bug (SI-7898).

For documentation, I can mention we can use Scala shell flags. Looks @cloud-fan is going to leave this JIRA as a known issue so I guess we are good. Maybe I can leave an additional note in migration guide as well if we're not going ahead to fix it.

@HyukjinKwon
Copy link
Member Author

Let me go ahead with documenting one then tomorrow.

@HyukjinKwon HyukjinKwon changed the title [SPARK-25906][SHELL] Restores '-i' option's behaviour in spark-shell [SPARK-25906][SHELL] Documents '-I' option (from Scala REPL) in spark-shell Nov 3, 2018
@HyukjinKwon HyukjinKwon closed this Nov 3, 2018
@HyukjinKwon HyukjinKwon reopened this Nov 3, 2018

rem two empty lines are required
set _SPARK_CMD_USAGE=Usage: .\bin\spark-shell.cmd [options]^%LF%%LF%^%LF%%LF%^
Scala REPL options:^%LF%%LF%^
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems no claver way then this to set newlines in variables in batch files.

@HyukjinKwon
Copy link
Member Author

Should be ready for a review.


rem two empty lines are required
set _SPARK_CMD_USAGE=Usage: .\bin\spark-shell.cmd [options]^%LF%%LF%^%LF%%LF%^
Scala REPL options:^%LF%%LF%^
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script specific information looks included in _SPARK_CMD_USAGE - looks it's appropriate place then somewhere in SparkSubmitArguments.printUsageAndExit.

@SparkQA
Copy link

SparkQA commented Nov 3, 2018

Test build #98425 has finished for PR 22919 at commit 5f3cb87.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Let me get this in to master and branch-2.4 in few days if there are no more comments.

export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options]
Scala REPL options:
-I <file> preload <file>, enforcing line-by-line interpretation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we define other options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested other options and this one looks only the valid one. I described in PR description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I didn't find

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).

in the shell script. Where do we define them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh haha. Sorry. That's in SparkSubmitArguments.printUsageAndExit.thats why I left #22919 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also define -i behavior here? I think for now this option is also accepted by the REPL.

Copy link
Member Author

@HyukjinKwon HyukjinKwon Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea .. but -i doesn't handle implicits like toDF or symbols which are pretty basic ones (unless it's explicitly imported within user's program). I think we should better avoid to document it for now.

@HyukjinKwon
Copy link
Member Author

Merged to master and branch-2.4.

asfgit pushed a commit that referenced this pull request Nov 6, 2018
…-shell

## What changes were proposed in this pull request?

This PR targets to document `-I` option from Spark 2.4.x (previously `-i` option until Spark 2.3.x).
After we upgraded Scala to 2.11.12, `-i` option (`:load`) was replaced to `-I`(SI-7898). Existing `-i` became `:paste` which does not respect Spark's implicit import (for instance `toDF`, symbol as column, etc.). Therefore, `-i` option does not correctly from Spark 2.4.x and it's not documented.

I checked other Scala REPL options but looks not applicable or working from quick tests. This PR only targets to document `-I` for now.

## How was this patch tested?

Manually tested.

**Mac:**

```bash
$ ./bin/spark-shell --help
Usage: ./bin/spark-shell [options]

Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
...
```

**Windows:**

```cmd
C:\...\spark>.\bin\spark-shell --help
Usage: .\bin\spark-shell.cmd [options]

Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
...
```

Closes #22919 from HyukjinKwon/SPARK-25906.

Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
(cherry picked from commit cc38abc)
Signed-off-by: hyukjinkwon <[email protected]>
@asfgit asfgit closed this in cc38abc Nov 6, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…-shell

## What changes were proposed in this pull request?

This PR targets to document `-I` option from Spark 2.4.x (previously `-i` option until Spark 2.3.x).
After we upgraded Scala to 2.11.12, `-i` option (`:load`) was replaced to `-I`(SI-7898). Existing `-i` became `:paste` which does not respect Spark's implicit import (for instance `toDF`, symbol as column, etc.). Therefore, `-i` option does not correctly from Spark 2.4.x and it's not documented.

I checked other Scala REPL options but looks not applicable or working from quick tests. This PR only targets to document `-I` for now.

## How was this patch tested?

Manually tested.

**Mac:**

```bash
$ ./bin/spark-shell --help
Usage: ./bin/spark-shell [options]

Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
...
```

**Windows:**

```cmd
C:\...\spark>.\bin\spark-shell --help
Usage: .\bin\spark-shell.cmd [options]

Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
...
```

Closes apache#22919 from HyukjinKwon/SPARK-25906.

Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…-shell

## What changes were proposed in this pull request?

This PR targets to document `-I` option from Spark 2.4.x (previously `-i` option until Spark 2.3.x).
After we upgraded Scala to 2.11.12, `-i` option (`:load`) was replaced to `-I`(SI-7898). Existing `-i` became `:paste` which does not respect Spark's implicit import (for instance `toDF`, symbol as column, etc.). Therefore, `-i` option does not correctly from Spark 2.4.x and it's not documented.

I checked other Scala REPL options but looks not applicable or working from quick tests. This PR only targets to document `-I` for now.

## How was this patch tested?

Manually tested.

**Mac:**

```bash
$ ./bin/spark-shell --help
Usage: ./bin/spark-shell [options]

Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
...
```

**Windows:**

```cmd
C:\...\spark>.\bin\spark-shell --help
Usage: .\bin\spark-shell.cmd [options]

Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
...
```

Closes apache#22919 from HyukjinKwon/SPARK-25906.

Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
(cherry picked from commit cc38abc)
Signed-off-by: hyukjinkwon <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…-shell

## What changes were proposed in this pull request?

This PR targets to document `-I` option from Spark 2.4.x (previously `-i` option until Spark 2.3.x).
After we upgraded Scala to 2.11.12, `-i` option (`:load`) was replaced to `-I`(SI-7898). Existing `-i` became `:paste` which does not respect Spark's implicit import (for instance `toDF`, symbol as column, etc.). Therefore, `-i` option does not correctly from Spark 2.4.x and it's not documented.

I checked other Scala REPL options but looks not applicable or working from quick tests. This PR only targets to document `-I` for now.

## How was this patch tested?

Manually tested.

**Mac:**

```bash
$ ./bin/spark-shell --help
Usage: ./bin/spark-shell [options]

Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
...
```

**Windows:**

```cmd
C:\...\spark>.\bin\spark-shell --help
Usage: .\bin\spark-shell.cmd [options]

Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
...
```

Closes apache#22919 from HyukjinKwon/SPARK-25906.

Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
(cherry picked from commit cc38abc)
Signed-off-by: hyukjinkwon <[email protected]>
@HyukjinKwon HyukjinKwon deleted the SPARK-25906 branch March 3, 2020 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants