Branch 1.1 typo error in HistoryServer #3566

libratiger · 2014-12-03T02:50:15Z

There is a typo error in the 167 & 168 line in HistoryServer.scala file.
The "./sbin/spark-history-server.sh " should be "./sbin/start-history-server.sh "

In `SparkSubmitDriverBootstrapper`, we wait for the parent process to send us an `EOF` before finishing the application. This is applicable for the PySpark shell because we terminate the application the same way. However if we run a python application, for instance, the JVM actually never exits unless it receives a manual EOF from the user. This is causing a few tests to timeout. We only need to do this for the PySpark shell because Spark submit runs as a python subprocess only in this case. Thus, the normal Spark shell doesn't need to go through this case even though it is also a REPL. Thanks davies for reporting this. Author: Andrew Or <[email protected]> Closes #2170 from andrewor14/bootstrap-hotfix and squashes the following commits: 42963f5 [Andrew Or] Do not wait for EOF unless this is the pyspark shell (cherry picked from commit dafe343) Signed-off-by: Patrick Wendell <[email protected]>

It is not safe to run the closure cleaner on slaves. #2153 introduced this which broke all UDF execution on slaves. Will re-add cleaning of UDF closures in a follow-up PR. Author: Michael Armbrust <[email protected]> Closes #2174 from marmbrus/fixUdfs and squashes the following commits: 55406de [Michael Armbrust] [HOTFIX] Remove cleaning of UDFs (cherry picked from commit 024178c) Signed-off-by: Patrick Wendell <[email protected]>

Author: Cheng Lian <[email protected]> Closes #2172 from liancheng/sqlconf-typo and squashes the following commits: 115cc71 [Cheng Lian] Fixed 2 comment typos in SQLConf (cherry picked from commit 68f75dc) Signed-off-by: Michael Armbrust <[email protected]>

We need to convert the case classes into Rows. Author: Michael Armbrust <[email protected]> Closes #2133 from marmbrus/structUdfs and squashes the following commits: 189722f [Michael Armbrust] Merge remote-tracking branch 'origin/master' into structUdfs 8e29b1c [Michael Armbrust] Use existing function d8d0b76 [Michael Armbrust] Fix udfs that return structs (cherry picked from commit 76e3ba4) Signed-off-by: Michael Armbrust <[email protected]>

This reverts commit a118ea5.

This reverts commit 79e86ef.

This reverts commit 96926c5.

This reverts commit da4b94c.

This reverts commit 56070f1.

…teration"" This reverts commit 71ec014.

This reverts commit a118ea5.

This reverts commit 79e86ef.

This reverts commit 78e3c03.

This reverts commit 58b0be6.

…alizing default values in DriverInfo.init() The issue happens when Spark is run standalone on a cluster. When master and driver fall simultaneously on one node in a cluster, master tries to recover its state and restart spark driver. While restarting driver, it falls with NPE exception (stacktrace is below). After falling, it restarts and tries to recover its state and restart Spark driver again. It happens over and over in an infinite cycle. Namely, Spark tries to read DriverInfo state from zookeeper, but after reading it happens to be null in DriverInfo.worker. https://issues.apache.org/jira/browse/SPARK-3150 Author: Tatiana Borisova <[email protected]> Closes #2062 from tanyatik/spark-3150 and squashes the following commits: 9936043 [Tatiana Borisova] Add initializing default values in DriverInfo.init() (cherry picked from commit 70d8146) Signed-off-by: Josh Rosen <[email protected]>

The executors and the driver may not share the same Spark home. There is currently one way to set the executor side Spark home in Mesos, through setting `spark.home`. However, this is neither documented nor intuitive. This PR adds a more specific config `spark.mesos.executor.home` and exposes this to the user. liancheng tnachen Author: Andrew Or <[email protected]> Closes #2166 from andrewor14/mesos-spark-home and squashes the following commits: b87965e [Andrew Or] Merge branch 'master' of github.com:apache/spark into mesos-spark-home f6abb2e [Andrew Or] Document spark.mesos.executor.home ca7846d [Andrew Or] Add more specific configuration for executor Spark home in Mesos (cherry picked from commit 41dc598) Signed-off-by: Andrew Or <[email protected]>

VertexRDDs with more than 4 billion elements are counted incorrectly due to integer overflow when summing partition sizes. This PR fixes the issue by converting partition sizes to Longs before summing them. The following code previously returned -10000000. After applying this PR, it returns the correct answer of 5000000000 (5 billion). ```scala val pairs = sc.parallelize(0L until 500L).map(_ * 10000000) .flatMap(start => start until (start + 10000000)).map(x => (x, x)) VertexRDD(pairs).count() ``` Author: Ankur Dave <[email protected]> Closes #2106 from ankurdave/SPARK-3190 and squashes the following commits: 641f468 [Ankur Dave] Avoid overflow in VertexRDD.count() (cherry picked from commit 96df929) Signed-off-by: Josh Rosen <[email protected]>

…ste... ...d queue doesn't exist Author: Sandy Ryza <[email protected]> Closes #1984 from sryza/sandy-spark-3082 and squashes the following commits: fe08c37 [Sandy Ryza] Remove log message entirely 85253ad [Sandy Ryza] SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist (cherry picked from commit 92af231) Signed-off-by: Andrew Or <[email protected]>

**Summary of the changes** The bulk of this PR is comprised of tests and documentation; the actual fix is really just adding 1 line of code (see `BlockObjectWriter.scala`). We currently do not run the `External*` test suites with different compression codecs, and this would have caught the bug reported in [SPARK-3277](https://issues.apache.org/jira/browse/SPARK-3277). This PR extends the existing code to test spilling using all compression codecs known to Spark, including `LZ4`. **The bug itself** In `DiskBlockObjectWriter`, we only report the shuffle bytes written before we close the streams. With `LZ4`, all the bytes written reported by our metrics were 0 because `flush()` was not taking effect for some reason. In general, compression codecs may write additional bytes to the file after we call `close()`, and so we must also capture those bytes in our shuffle write metrics. Thanks mridulm and pwendell for help with debugging. Author: Andrew Or <[email protected]> Author: Patrick Wendell <[email protected]> Closes #2187 from andrewor14/fix-lz4-spilling and squashes the following commits: 1b54bdc [Andrew Or] Speed up tests by not compressing everything 1c4624e [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-lz4-spilling 6b2e7d1 [Andrew Or] Fix compilation error 92e251b [Patrick Wendell] Better documentation for BlockObjectWriter. a1ad536 [Andrew Or] Fix tests 089593f [Andrew Or] Actually fix SPARK-3277 (tests still fail) 4bbcf68 [Andrew Or] Update tests to actually test all compression codecs b264a84 [Andrew Or] ExternalAppendOnlyMapSuite code style fixes (minor) 1bfa743 [Andrew Or] Add more information to assert for better debugging

This reverts commit 72a4fdb.

This is the 1.1 version of #3302. There has been some refactoring in master so we can't cherry-pick that PR. Author: Andrew Or <[email protected]> Closes #3330 from andrewor14/sort-fetch-fail and squashes the following commits: 486fc49 [Andrew Or] Reset `elementsRead`

…sks; use HashedWheelTimer (For branch-1.1) This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs. This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message, and by keeping a WeakReference to the promise in the TimerTask. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case. Author: Kousuke Saruta <[email protected]> Closes #3321 from sarutak/connection-manager-timeout-bugfix and squashes the following commits: 786af91 [Kousuke Saruta] Fixed memory leak issue of ConnectionManager

Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction. This should be applied to branch-1.0, branch-1.1, and branch-1.2. pwendell Author: Xiangrui Meng <[email protected]> Closes #3291 from mengxr/SPARK-4433 and squashes the following commits: c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex (cherry picked from commit bb46046) Signed-off-by: Xiangrui Meng <[email protected]>

[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3338)  Author: Cheng Lian <[email protected]> Closes #3338 from liancheng/spark-3334-for-1.1 and squashes the following commits: bd17512 [Cheng Lian] Backports #3334 to branch-1.1

This is the branch-1.1 version of #3243. Author: Andrew Or <[email protected]> Closes #3355 from andrewor14/spill-log-bytes-1.1 and squashes the following commits: 36ec152 [Andrew Or] Log more precise representation of bytes in spilling code

This is the branch-1.1 version of #3353. This requires a separate PR because the code in master has been refactored a little to eliminate duplicate code. I have tested this on a standalone cluster. The goal is to merge this into 1.1.1. Author: Andrew Or <[email protected]> Closes #3354 from andrewor14/avoid-small-spills-1.1 and squashes the following commits: f2e552c [Andrew Or] Fix tests 7012595 [Andrew Or] Avoid many small spills

…treamFunctions.saveAsNewAPIHadoopFiles Solves two JIRAs in one shot - Makes the ForechDStream created by saveAsNewAPIHadoopFiles serializable for checkpoints - Makes the default configuration object used saveAsNewAPIHadoopFiles be the Spark's hadoop configuration Author: Tathagata Das <[email protected]> Closes #3457 from tdas/savefiles-fix and squashes the following commits: bb4729a [Tathagata Das] Same treatment for saveAsHadoopFiles b382ea9 [Tathagata Das] Fix serialization issue in PairDStreamFunctions.saveAsNewAPIHadoopFiles. (cherry picked from commit 8838ad7) Signed-off-by: Tathagata Das <[email protected]>

This commit provides a script that computes the contributors list by linking the github commits with JIRA issues. Automatically translating github usernames remains a TODO at this point.

…) registered with the scheduler v1.1 backport for #3483 Author: roxchkplusony <[email protected]> Closes #3503 from roxchkplusony/bugfix/4626-1.1 and squashes the following commits: 234d350 [roxchkplusony] [SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler

…empDir() `File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too. Author: Liang-Chi Hsieh <[email protected]> Closes #3449 from viirya/fix_createtempdir and squashes the following commits: 36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable. (cherry picked from commit 49fe879) Signed-off-by: Josh Rosen <[email protected]>

This PR adds the Spark version number to the UI footer; this is how it looks: ![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png) Author: Sean Owen <[email protected]> Closes #3410 from srowen/SPARK-2143 and squashes the following commits: e9b3a7a [Sean Owen] Add Spark version to footer

org.apache.spark.SPARK_VERSION is new in 1.2; in earlier versions, we have to use SparkContext.SPARK_VERSION.

[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498)  Author: Cheng Lian <[email protected]> Closes #3498 from liancheng/fix-sql-doc-typo and squashes the following commits: 865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide (cherry picked from commit 2a4d389) Signed-off-by: Josh Rosen <[email protected]>

The link points to the old scala programming guide; it should point to the submitting applications page. This should be backported to 1.1.2 (it's been broken as of 1.0). Author: Kay Ousterhout <[email protected]> Closes #3542 from kayousterhout/SPARK-4686 and squashes the following commits: a8fc43b [Kay Ousterhout] [SPARK-4686] Link to allowed master URLs is broken (cherry picked from commit d9a148b) Signed-off-by: Kay Ousterhout <[email protected]>

AmplabJenkins · 2014-12-03T02:52:11Z

Can one of the admins verify this patch?

andrewor14 · 2014-12-03T20:05:29Z

Hey @djvulee this is opened against the wrong branch! Please close this issue and open up a new PR against the correct one.

Modified typo. Author: Masayoshi TSUZUKI <[email protected]> Closes #3560 from tsudukim/feature/SPARK-4701 and squashes the following commits: ed2a3f1 [Masayoshi TSUZUKI] Another whitespace position error. 1af3a35 [Masayoshi TSUZUKI] [SPARK-4701] Typo in sbt/sbt (cherry picked from commit 96786e3) Signed-off-by: Andrew Or <[email protected]>

ShuffleMemoryManager.tryToAcquire may return a negative value. The unit test demonstrates this bug. It will output `0 did not equal -200 granted is negative`. Author: zsxwing <[email protected]> Closes #3575 from zsxwing/SPARK-4715 and squashes the following commits: a193ae6 [zsxwing] Make sure tryToAcquire won't return a negative value

…N document. Added descriptions about these parameters. - spark.yarn.queue Modified description about the defalut value of this parameter. - spark.yarn.submit.file.replication Author: Masayoshi TSUZUKI <[email protected]> Closes #3500 from tsudukim/feature/SPARK-4642 and squashes the following commits: ce99655 [Masayoshi TSUZUKI] better gramatically. 21cf624 [Masayoshi TSUZUKI] Removed intentionally undocumented properties. 88cac9b [Masayoshi TSUZUKI] [SPARK-4642] Documents about running-on-YARN needs update

…ver adds Executor The ExecutorInfo only reaches the RUNNING state if the Driver is alive to send the ExecutorStateChanged message to master. Else, appInfo.resetRetryCount() is never called and failing Executors will eventually exceed ApplicationState.MAX_NUM_RETRY, resulting in the application being removed from the master's accounting. Author: Mark Hamstra <[email protected]> Closes #3550 from markhamstra/SPARK-4498 and squashes the following commits: 8f543b1 [Mark Hamstra] Don't transition ExecutorInfo to RUNNING until Executor is added by Driver

This commit involves three main changes: (1) It separates the translation of contributor names from the generation of the contributors list. This is largely motivated by the Github API limit; even if we exceed this limit, we should at least be able to proceed manually as before. This is why the translation logic is abstracted into its own script translate-contributors.py. (2) When we look for candidate replacements for invalid author names, we should look for the assignees of the associated JIRAs too. As a result, the intermediate file must keep track of these. (3) This provides an interactive mode with which the user can sit at the terminal and manually pick the candidate replacement that he/she thinks makes the most sense. As before, there is a non-interactive mode that picks the first candidate that the script considers "valid." TODO: We should have a known_contributors file that stores known mappings so we don't have to go through all of this translation every time. This is also valuable because some contributors simply cannot be automatically translated. Conflicts: .gitignore

pwendell and others added 30 commits August 27, 2014 23:28

[maven-release-plugin] prepare release v1.1.0-rc1

58b0be6

[maven-release-plugin] prepare for next development iteration

78e3c03

HOTFIX: Don't build with YARN support for Mapr3

ad0fab2

[maven-release-plugin] prepare release v1.1.0-rc1

79e86ef

[maven-release-plugin] prepare for next development iteration

a118ea5

Revert "[maven-release-plugin] prepare for next development iteration"

71ec014

This reverts commit a118ea5.

Revert "[maven-release-plugin] prepare release v1.1.0-rc1"

56070f1

This reverts commit 79e86ef.

Additional CHANGES.txt

a9df703

[maven-release-plugin] prepare release v1.1.0-rc1

da4b94c

[maven-release-plugin] prepare for next development iteration

96926c5

Revert "[maven-release-plugin] prepare for next development iteration"

473b02d

This reverts commit 96926c5.

Revert "[maven-release-plugin] prepare release v1.1.0-rc1"

ecdbeef

This reverts commit da4b94c.

Revert "Revert "[maven-release-plugin] prepare release v1.1.0-rc1""

4186c45

This reverts commit 56070f1.

Revert "Revert "[maven-release-plugin] prepare for next development i…

df61944

…teration"" This reverts commit 71ec014.

Revert "[maven-release-plugin] prepare for next development iteration"

d01b3fa

This reverts commit a118ea5.

Revert "[maven-release-plugin] prepare release v1.1.0-rc1"

c818b2b

This reverts commit 79e86ef.

Revert "[maven-release-plugin] prepare for next development iteration"

c0bacc1

This reverts commit 78e3c03.

Revert "[maven-release-plugin] prepare release v1.1.0-rc1"

1d03330

This reverts commit 58b0be6.

[maven-release-plugin] prepare release v1.1.0-rc1

f071832

[maven-release-plugin] prepare for next development iteration

f8f7a0c

Adding new CHANGES.txt

7db87b3

Andrew Or and others added 21 commits November 17, 2014 11:49

Revert "[maven-release-plugin] prepare release v1.1.1-rc1"

e4f5695

This reverts commit 72a4fdb.

Update CHANGES.txt for 1.1.1-rc2

aa3c794

[maven-release-plugin] prepare release v1.1.1-rc2

3693ae5

[maven-release-plugin] prepare for next development iteration

1df1c1d

Update versions to 1.1.2-SNAPSHOT

6371737

[HOTFIX] Fixing broken build due to missing imports.

1a7f414

[Release] Automate generation of contributors list

a59c445

This commit provides a script that computes the contributors list by linking the github commits with JIRA issues. Automatically translating github usernames remains a TODO at this point.

[HOTFIX] Fix build break in 1a2508b

90d90b2

org.apache.spark.SPARK_VERSION is new in 1.2; in earlier versions, we have to use SparkContext.SPARK_VERSION.

[Release] Translate unknown author names automatically

aec20af

tsudukim and others added 5 commits December 3, 2014 12:08

asfgit closed this in 3cdae03 Dec 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Branch 1.1 typo error in HistoryServer #3566

Branch 1.1 typo error in HistoryServer #3566

Uh oh!

libratiger commented Dec 3, 2014

Uh oh!

AmplabJenkins commented Dec 3, 2014

Uh oh!

andrewor14 commented Dec 3, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

63 participants

Branch 1.1 typo error in HistoryServer #3566

Branch 1.1 typo error in HistoryServer #3566

Uh oh!

Conversation

libratiger commented Dec 3, 2014

Uh oh!

AmplabJenkins commented Dec 3, 2014

Uh oh!

andrewor14 commented Dec 3, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

63 participants