Skip to content

Commit bd5365b

Browse files
CodingCatsrowen
authored andcommitted
[SPARK-13803] restore the changes in SPARK-3411
## What changes were proposed in this pull request? This patch contains the functionality to balance the load of the cluster-mode drivers among workers This patch restores the changes in #1106 which was erased due to the merging of #731 ## How was this patch tested? test with existing test cases Author: CodingCat <[email protected]> Closes #11702 from CodingCat/SPARK-13803.
1 parent dafd70f commit bd5365b

File tree

1 file changed

+17
-4
lines changed
  • core/src/main/scala/org/apache/spark/deploy/master

1 file changed

+17
-4
lines changed

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -727,15 +727,28 @@ private[deploy] class Master(
727727
* every time a new app joins or resource availability changes.
728728
*/
729729
private def schedule(): Unit = {
730-
if (state != RecoveryState.ALIVE) { return }
730+
if (state != RecoveryState.ALIVE) {
731+
return
732+
}
731733
// Drivers take strict precedence over executors
732-
val shuffledWorkers = Random.shuffle(workers) // Randomization helps balance drivers
733-
for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
734-
for (driver <- waitingDrivers) {
734+
val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
735+
val numWorkersAlive = shuffledAliveWorkers.size
736+
var curPos = 0
737+
for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
738+
// We assign workers to each waiting driver in a round-robin fashion. For each driver, we
739+
// start from the last worker that was assigned a driver, and continue onwards until we have
740+
// explored all alive workers.
741+
var launched = false
742+
var numWorkersVisited = 0
743+
while (numWorkersVisited < numWorkersAlive && !launched) {
744+
val worker = shuffledAliveWorkers(curPos)
745+
numWorkersVisited += 1
735746
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
736747
launchDriver(worker, driver)
737748
waitingDrivers -= driver
749+
launched = true
738750
}
751+
curPos = (curPos + 1) % numWorkersAlive
739752
}
740753
}
741754
startExecutorsOnWorkers()

0 commit comments

Comments
 (0)