Skip to content

Conversation

@libratiger
Copy link

There is a typo error in the 167 & 168 line in HistoryServer.scala file.
The "./sbin/spark-history-server.sh " should be "./sbin/start-history-server.sh "

pwendell and others added 30 commits August 27, 2014 23:28
In `SparkSubmitDriverBootstrapper`, we wait for the parent process to send us an `EOF` before finishing the application. This is applicable for the PySpark shell because we terminate the application the same way. However if we run a python application, for instance, the JVM actually never exits unless it receives a manual EOF from the user. This is causing a few tests to timeout.

We only need to do this for the PySpark shell because Spark submit runs as a python subprocess only in this case. Thus, the normal Spark shell doesn't need to go through this case even though it is also a REPL.

Thanks davies for reporting this.

Author: Andrew Or <[email protected]>

Closes #2170 from andrewor14/bootstrap-hotfix and squashes the following commits:

42963f5 [Andrew Or] Do not wait for EOF unless this is the pyspark shell
(cherry picked from commit dafe343)

Signed-off-by: Patrick Wendell <[email protected]>
It is not safe to run the closure cleaner on slaves.  #2153 introduced this which broke all UDF execution on slaves.  Will re-add cleaning of UDF closures in a follow-up PR.

Author: Michael Armbrust <[email protected]>

Closes #2174 from marmbrus/fixUdfs and squashes the following commits:

55406de [Michael Armbrust] [HOTFIX] Remove cleaning of UDFs
(cherry picked from commit 024178c)

Signed-off-by: Patrick Wendell <[email protected]>
Author: Cheng Lian <[email protected]>

Closes #2172 from liancheng/sqlconf-typo and squashes the following commits:

115cc71 [Cheng Lian] Fixed 2 comment typos in SQLConf

(cherry picked from commit 68f75dc)
Signed-off-by: Michael Armbrust <[email protected]>
We need to convert the case classes into Rows.

Author: Michael Armbrust <[email protected]>

Closes #2133 from marmbrus/structUdfs and squashes the following commits:

189722f [Michael Armbrust] Merge remote-tracking branch 'origin/master' into structUdfs
8e29b1c [Michael Armbrust] Use existing function
d8d0b76 [Michael Armbrust] Fix udfs that return structs

(cherry picked from commit 76e3ba4)
Signed-off-by: Michael Armbrust <[email protected]>
…alizing default values in DriverInfo.init()

The issue happens when Spark is run standalone on a cluster.
When master and driver fall simultaneously on one node in a cluster, master tries to recover its state and restart spark driver.
While restarting driver, it falls with NPE exception (stacktrace is below).
After falling, it restarts and tries to recover its state and restart Spark driver again. It happens over and over in an infinite cycle.
Namely, Spark tries to read DriverInfo state from zookeeper, but after reading it happens to be null in DriverInfo.worker.

https://issues.apache.org/jira/browse/SPARK-3150

Author: Tatiana Borisova <[email protected]>

Closes #2062 from tanyatik/spark-3150 and squashes the following commits:

9936043 [Tatiana Borisova] Add initializing default values in DriverInfo.init()

(cherry picked from commit 70d8146)
Signed-off-by: Josh Rosen <[email protected]>
The executors and the driver may not share the same Spark home. There is currently one way to set the executor side Spark home in Mesos, through setting `spark.home`. However, this is neither documented nor intuitive. This PR adds a more specific config `spark.mesos.executor.home` and exposes this to the user.

liancheng tnachen

Author: Andrew Or <[email protected]>

Closes #2166 from andrewor14/mesos-spark-home and squashes the following commits:

b87965e [Andrew Or] Merge branch 'master' of github.com:apache/spark into mesos-spark-home
f6abb2e [Andrew Or] Document spark.mesos.executor.home
ca7846d [Andrew Or] Add more specific configuration for executor Spark home in Mesos

(cherry picked from commit 41dc598)
Signed-off-by: Andrew Or <[email protected]>
VertexRDDs with more than 4 billion elements are counted incorrectly due to integer overflow when summing partition sizes. This PR fixes the issue by converting partition sizes to Longs before summing them.

The following code previously returned -10000000. After applying this PR, it returns the correct answer of 5000000000 (5 billion).

```scala
val pairs = sc.parallelize(0L until 500L).map(_ * 10000000)
  .flatMap(start => start until (start + 10000000)).map(x => (x, x))
VertexRDD(pairs).count()
```

Author: Ankur Dave <[email protected]>

Closes #2106 from ankurdave/SPARK-3190 and squashes the following commits:

641f468 [Ankur Dave] Avoid overflow in VertexRDD.count()

(cherry picked from commit 96df929)
Signed-off-by: Josh Rosen <[email protected]>
…ste...

...d queue doesn't exist

Author: Sandy Ryza <[email protected]>

Closes #1984 from sryza/sandy-spark-3082 and squashes the following commits:

fe08c37 [Sandy Ryza] Remove log message entirely
85253ad [Sandy Ryza] SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist

(cherry picked from commit 92af231)
Signed-off-by: Andrew Or <[email protected]>
**Summary of the changes**

The bulk of this PR is comprised of tests and documentation; the actual fix is really just adding 1 line of code (see `BlockObjectWriter.scala`). We currently do not run the `External*` test suites with different compression codecs, and this would have caught the bug reported in [SPARK-3277](https://issues.apache.org/jira/browse/SPARK-3277). This PR extends the existing code to test spilling using all compression codecs known to Spark, including `LZ4`.

**The bug itself**

In `DiskBlockObjectWriter`, we only report the shuffle bytes written before we close the streams. With `LZ4`, all the bytes written reported by our metrics were 0 because `flush()` was not taking effect for some reason. In general, compression codecs may write additional bytes to the file after we call `close()`, and so we must also capture those bytes in our shuffle write metrics.

Thanks mridulm and pwendell for help with debugging.

Author: Andrew Or <[email protected]>
Author: Patrick Wendell <[email protected]>

Closes #2187 from andrewor14/fix-lz4-spilling and squashes the following commits:

1b54bdc [Andrew Or] Speed up tests by not compressing everything
1c4624e [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-lz4-spilling
6b2e7d1 [Andrew Or] Fix compilation error
92e251b [Patrick Wendell] Better documentation for BlockObjectWriter.
a1ad536 [Andrew Or] Fix tests
089593f [Andrew Or] Actually fix SPARK-3277 (tests still fail)
4bbcf68 [Andrew Or] Update tests to actually test all compression codecs
b264a84 [Andrew Or] ExternalAppendOnlyMapSuite code style fixes (minor)
1bfa743 [Andrew Or] Add more information to assert for better debugging
Andrew Or and others added 21 commits November 17, 2014 11:49
This is the 1.1 version of #3302. There has been some refactoring in master so we can't cherry-pick that PR.

Author: Andrew Or <[email protected]>

Closes #3330 from andrewor14/sort-fetch-fail and squashes the following commits:

486fc49 [Andrew Or] Reset `elementsRead`
…sks; use HashedWheelTimer (For branch-1.1)

This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs.

This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message, and by keeping a WeakReference to the promise in the TimerTask. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case.

Author: Kousuke Saruta <[email protected]>

Closes #3321 from sarutak/connection-manager-timeout-bugfix and squashes the following commits:

786af91 [Kousuke Saruta] Fixed memory leak issue of ConnectionManager
Spark hangs with the following code:

~~~
sc.parallelize(1 to 10).zipWithIndex.repartition(10).count()
~~~

This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction.

This should be applied to branch-1.0, branch-1.1, and branch-1.2.

pwendell

Author: Xiangrui Meng <[email protected]>

Closes #3291 from mengxr/SPARK-4433 and squashes the following commits:

c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex

(cherry picked from commit bb46046)
Signed-off-by: Xiangrui Meng <[email protected]>
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3338)
<!-- Reviewable:end -->

Author: Cheng Lian <[email protected]>

Closes #3338 from liancheng/spark-3334-for-1.1 and squashes the following commits:

bd17512 [Cheng Lian] Backports #3334 to branch-1.1
This is the branch-1.1 version of #3243.

Author: Andrew Or <[email protected]>

Closes #3355 from andrewor14/spill-log-bytes-1.1 and squashes the following commits:

36ec152 [Andrew Or] Log more precise representation of bytes in spilling code
This is the branch-1.1 version of #3353. This requires a separate PR because the code in master has been refactored a little to eliminate duplicate code. I have tested this on a standalone cluster. The goal is to merge this into 1.1.1.

Author: Andrew Or <[email protected]>

Closes #3354 from andrewor14/avoid-small-spills-1.1 and squashes the following commits:

f2e552c [Andrew Or] Fix tests
7012595 [Andrew Or] Avoid many small spills
…treamFunctions.saveAsNewAPIHadoopFiles

Solves two JIRAs in one shot
- Makes the ForechDStream created by saveAsNewAPIHadoopFiles serializable for checkpoints
- Makes the default configuration object used saveAsNewAPIHadoopFiles be the Spark's hadoop configuration

Author: Tathagata Das <[email protected]>

Closes #3457 from tdas/savefiles-fix and squashes the following commits:

bb4729a [Tathagata Das] Same treatment for saveAsHadoopFiles
b382ea9 [Tathagata Das] Fix serialization issue in PairDStreamFunctions.saveAsNewAPIHadoopFiles.

(cherry picked from commit 8838ad7)
Signed-off-by: Tathagata Das <[email protected]>
This commit provides a script that computes the contributors list
by linking the github commits with JIRA issues. Automatically
translating github usernames remains a TODO at this point.
…) registered with the scheduler

v1.1 backport for #3483

Author: roxchkplusony <[email protected]>

Closes #3503 from roxchkplusony/bugfix/4626-1.1 and squashes the following commits:

234d350 [roxchkplusony] [SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler
…empDir()

`File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too.

Author: Liang-Chi Hsieh <[email protected]>

Closes #3449 from viirya/fix_createtempdir and squashes the following commits:

36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable.

(cherry picked from commit 49fe879)
Signed-off-by: Josh Rosen <[email protected]>
This PR adds the Spark version number to the UI footer; this is how it looks:

![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png)

Author: Sean Owen <[email protected]>

Closes #3410 from srowen/SPARK-2143 and squashes the following commits:

e9b3a7a [Sean Owen] Add Spark version to footer
org.apache.spark.SPARK_VERSION is new in 1.2; in earlier versions,
we have to use SparkContext.SPARK_VERSION.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498)
<!-- Reviewable:end -->

Author: Cheng Lian <[email protected]>

Closes #3498 from liancheng/fix-sql-doc-typo and squashes the following commits:

865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide

(cherry picked from commit 2a4d389)
Signed-off-by: Josh Rosen <[email protected]>
The link points to the old scala programming guide; it should point to the submitting applications page.

This should be backported to 1.1.2 (it's been broken as of 1.0).

Author: Kay Ousterhout <[email protected]>

Closes #3542 from kayousterhout/SPARK-4686 and squashes the following commits:

a8fc43b [Kay Ousterhout] [SPARK-4686] Link to allowed master URLs is broken

(cherry picked from commit d9a148b)
Signed-off-by: Kay Ousterhout <[email protected]>
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@andrewor14
Copy link
Contributor

Hey @djvulee this is opened against the wrong branch! Please close this issue and open up a new PR against the correct one.

tsudukim and others added 5 commits December 3, 2014 12:08
Modified typo.

Author: Masayoshi TSUZUKI <[email protected]>

Closes #3560 from tsudukim/feature/SPARK-4701 and squashes the following commits:

ed2a3f1 [Masayoshi TSUZUKI] Another whitespace position error.
1af3a35 [Masayoshi TSUZUKI] [SPARK-4701] Typo in sbt/sbt

(cherry picked from commit 96786e3)
Signed-off-by: Andrew Or <[email protected]>
ShuffleMemoryManager.tryToAcquire may return a negative value. The unit test demonstrates this bug. It will output `0 did not equal -200 granted is negative`.

Author: zsxwing <[email protected]>

Closes #3575 from zsxwing/SPARK-4715 and squashes the following commits:

a193ae6 [zsxwing] Make sure tryToAcquire won't return a negative value
…N document.

Added descriptions about these parameters.
- spark.yarn.queue

Modified description about the defalut value of this parameter.
- spark.yarn.submit.file.replication

Author: Masayoshi TSUZUKI <[email protected]>

Closes #3500 from tsudukim/feature/SPARK-4642 and squashes the following commits:

ce99655 [Masayoshi TSUZUKI] better gramatically.
21cf624 [Masayoshi TSUZUKI] Removed intentionally undocumented properties.
88cac9b [Masayoshi TSUZUKI] [SPARK-4642] Documents about running-on-YARN needs update
…ver adds Executor

The ExecutorInfo only reaches the RUNNING state if the Driver is alive to send the ExecutorStateChanged message to master.  Else, appInfo.resetRetryCount() is never called and failing Executors will eventually exceed ApplicationState.MAX_NUM_RETRY, resulting in the application being removed from the master's accounting.

Author: Mark Hamstra <[email protected]>

Closes #3550 from markhamstra/SPARK-4498 and squashes the following commits:

8f543b1 [Mark Hamstra] Don't transition ExecutorInfo to RUNNING until Executor is added by Driver
This commit involves three main changes:

(1) It separates the translation of contributor names from the
generation of the contributors list. This is largely motivated
by the Github API limit; even if we exceed this limit, we should
at least be able to proceed manually as before. This is why the
translation logic is abstracted into its own script
translate-contributors.py.

(2) When we look for candidate replacements for invalid author
names, we should look for the assignees of the associated JIRAs
too. As a result, the intermediate file must keep track of these.

(3) This provides an interactive mode with which the user can
sit at the terminal and manually pick the candidate replacement
that he/she thinks makes the most sense. As before, there is a
non-interactive mode that picks the first candidate that the
script considers "valid."

TODO: We should have a known_contributors file that stores
known mappings so we don't have to go through all of this
translation every time. This is also valuable because some
contributors simply cannot be automatically translated.

Conflicts:
	.gitignore
@asfgit asfgit closed this in 3cdae03 Dec 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.