[SPARK-4029][Streaming] Update streaming driver to reliably save and recover received block metadata on driver failures #3026

tdas · 2014-10-30T23:06:36Z

As part of the initiative of preventing data loss on driver failure, this JIRA tracks the sub task of modifying the streaming driver to reliably save received block metadata, and recover them on driver restart.

This was solved by introducing a ReceivedBlockTracker that takes all the responsibility of managing the metadata of received blocks (i.e. ReceivedBlockInfo, and any actions on them (e.g, allocating blocks to batches, etc.). All actions to block info get written out to a write ahead log (using WriteAheadLogManager). On recovery, all the actions are replaying to recreate the pre-failure state of the ReceivedBlockTracker, which include the batch-to-block allocations and the unallocated blocks.

Furthermore, the ReceiverInputDStream was modified to create WriteAheadLogBackedBlockRDDs when file segment info is present in the ReceivedBlockInfo. After recovery of all the block info (through recovery ReceivedBlockTracker), the WriteAheadLogBackedBlockRDDs gets recreated with the recovered info, and jobs submitted. The data of the blocks gets pulled from the write ahead logs, thanks to the segment info present in the ReceivedBlockInfo.

This is still a WIP. Things that are missing here are.

End-to-end integration tests: Unit tests that tests the driver recovery, by killing and restarting the streaming context, and verifying all the input data gets processed. This has been implemented but not included in this PR yet. A sneak peek of that DriverFailureSuite can be found in this PR (on my personal repo): Integration test that tests driver failure with receivers in an end-to-end manner tdas/spark#25 I can either include it in this PR, or submit that as a separate PR after this gets in.
WAL cleanup: Cleaning up the received data write ahead log, by calling ReceivedBlockHandler.cleanupOldBlocks. This is being worked on.

tdas · 2014-10-30T23:07:26Z

@pwendell @JoshRosen @harishreedharan
This is the final PR of the driver HA core feature. Please take a look!
Also, since is a WIP, there will be missing docs, style issue, extra printlns here and there. Please try to focus on the high level logic in the first pass.

SparkQA · 2014-10-30T23:07:28Z

Test build #22571 has started for PR 3026 at commit 25611d6.

This patch merges cleanly.

SparkQA · 2014-10-30T23:07:32Z

Test build #22571 has finished for PR 3026 at commit 25611d6.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class WriteAheadLogBackedBlockRDDPartition(
- class WriteAheadLogBackedBlockRDD[T: ClassTag](
- case class AllocatedBlocks(streamIdToAllocatedBlocks: Map[Int, Seq[ReceivedBlockInfo]])
- class ReceivedBlockTracker(
- class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false) extends Logging

AmplabJenkins · 2014-10-30T23:07:33Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22571/
Test FAILed.

SparkQA · 2014-10-30T23:17:24Z

Test build #22572 has started for PR 3026 at commit cda62ee.

This patch merges cleanly.

SparkQA · 2014-10-30T23:18:26Z

Test build #22572 has finished for PR 3026 at commit cda62ee.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class AllocatedBlocks(streamIdToAllocatedBlocks: Map[Int, Seq[ReceivedBlockInfo]])
- class ReceivedBlockTracker(
- class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false) extends Logging

AmplabJenkins · 2014-10-30T23:18:27Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22572/
Test FAILed.

SparkQA · 2014-10-30T23:22:25Z

Test build #22574 has started for PR 3026 at commit f66d277.

This patch merges cleanly.

SparkQA · 2014-10-31T00:15:42Z

Test build #22574 has finished for PR 3026 at commit f66d277.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class AllocatedBlocks(streamIdToAllocatedBlocks: Map[Int, Seq[ReceivedBlockInfo]])
- class ReceivedBlockTracker(
- class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false) extends Logging

AmplabJenkins · 2014-10-31T00:15:45Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22574/
Test FAILed.

tdas · 2014-10-31T01:28:22Z

Jenkins, test this please.

tdas · 2014-10-31T01:29:44Z

streaming/src/main/scala/org/apache/spark/streaming/dstream/ReceiverInputDStream.scala

All the functionality to keep track of block-to-batch allocations have been moved from ReceiverInputDStream to ReceivedBlockTracker, so that all actions on the block metadata (include block-to-batch allocations) can be logged at a central location.

SparkQA · 2014-10-31T01:29:55Z

Test build #22585 has started for PR 3026 at commit 19aec7d.

This patch merges cleanly.

SparkQA · 2014-10-31T01:35:00Z

Test build #22587 has started for PR 3026 at commit 19aec7d.

This patch merges cleanly.

tdas · 2014-10-31T01:36:59Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala

All the functionality to keep track of received block metadata have been moved from ReceiverTracker to ReceivedBlockTracker, so that all actions on the block metadata (include block-to-batch allocations) can be logged at a central location.

SparkQA · 2014-10-31T02:40:07Z

Test build #22585 has finished for PR 3026 at commit 19aec7d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class AllocatedBlocks(streamIdToAllocatedBlocks: Map[Int, Seq[ReceivedBlockInfo]])
- class ReceivedBlockTracker(
- class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false) extends Logging

AmplabJenkins · 2014-10-31T02:40:10Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22585/
Test PASSed.

SparkQA · 2014-10-31T02:49:54Z

Test build #22587 has finished for PR 3026 at commit 19aec7d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class AllocatedBlocks(streamIdToAllocatedBlocks: Map[Int, Seq[ReceivedBlockInfo]])
- class ReceivedBlockTracker(
- class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false) extends Logging

AmplabJenkins · 2014-10-31T02:49:57Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22587/
Test PASSed.

pwendell · 2014-10-31T05:05:05Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceivedBlockTracker.scala

What about naming this something like LogEvent? It wasn't clear to me when I looked at this what it meant by "Action".

How about ReceivedBlockTrackerLogEvent ? I am not so sure about giving a such a generic name LogEvent; it becomes hard to immediately identify what module such this class is related to.

Sure - having LogEvent somewhere in there would just be helpful.

pwendell · 2014-10-31T22:13:49Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceivedBlockTracker.scala

Is this exposed only for testing? If so, can you note that down?

…aner for users of the tracker.

pwendell · 2014-10-31T22:19:37Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceivedBlockTracker.scala

This is a really dense expression. Can this be broken out into simpler experesssions that make it easier to read?

val streamsWithBlocks = streamIds.map { streamId => (streamId, getReceivedBlockQueue(streamId).dequeueAll(_ => true)) } val streamToBlocks = streamsWithBlocks.toMap val allocatedBlocks = AllocatedBlocks(streamToBlocks)

Haha, I had felt the same so had changed it. With your other suggestion incorporated, its cleaner.

pwendell · 2014-10-31T22:23:07Z

This looks good overall. I like the clean-up on the allocation code path. Just left minor comments.

SparkQA · 2014-10-31T22:25:11Z

Test build #22652 has started for PR 3026 at commit 47fc1e3.

This patch merges cleanly.

SparkQA · 2014-10-31T22:35:24Z

Test build #22654 has started for PR 3026 at commit 2ee2484.

This patch merges cleanly.

harishreedharan · 2014-10-31T22:54:48Z

streaming/src/main/scala/org/apache/spark/streaming/dstream/ReceiverInputDStream.scala

Does the receiver tracker read from HDFS each time getBlocksOfBatch is called (sorry, I don't remember if it does)? If it does, then this call incurs more HDFS reads than required when there are several streams in the same app, correct?

Ignore this. Verified it does not.

harishreedharan · 2014-10-31T23:44:53Z

This looks good. Apart from the one question I had above, this looks good to go

SparkQA · 2014-10-31T23:48:33Z

Test build #22654 has finished for PR 3026 at commit 2ee2484.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class AllocatedBlocks(streamIdToAllocatedBlocks: Map[Int, Seq[ReceivedBlockInfo]])
- class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false) extends Logging

AmplabJenkins · 2014-10-31T23:48:36Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22654/
Test PASSed.

SparkQA · 2014-11-01T00:25:12Z

Test build #22652 timed out for PR 3026 at commit 47fc1e3 after a configured wait of 120m.

AmplabJenkins · 2014-11-01T00:25:15Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22652/
Test FAILed.

SparkQA · 2014-11-04T21:05:25Z

Test build #22895 has started for PR 3026 at commit 1d704bb.

This patch merges cleanly.

pwendell · 2014-11-04T21:23:58Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala

Just wondering - why does this need to be set here? Who consumes this?

This was added when SparkEnv needed to be set for launching jobs on non-main threads. Since the JobGenerator is background thread which actually submits the jobs, the SparkEnv needed to be set. But since we have removed the whole threadlocal stuff from SparkEnv, this is probably not needed any more. we can either removed this (scary), or document this as potentially removable.

pwendell · 2014-11-04T21:31:29Z

Hey @tdas this LGTM. The only question was around setting of the SparkEnv... it might be good to document what consumes that downstream.

SparkQA · 2014-11-04T22:31:23Z

Test build #22895 has finished for PR 3026 at commit 1d704bb.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class AllocatedBlocks(streamIdToAllocatedBlocks: Map[Int, Seq[ReceivedBlockInfo]])
- class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false) extends Logging

AmplabJenkins · 2014-11-04T22:31:27Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22895/
Test PASSed.

tdas · 2014-11-05T01:03:39Z

@pwendell added comment. Merging this. Thanks @pwendell and @JoshRosen for reviewing this PR and the previous ones for the streaming driver HA.

SparkQA · 2014-11-05T01:10:09Z

Test build #22906 has started for PR 3026 at commit a8009ed.

This patch merges cleanly.

SparkQA · 2014-11-05T02:33:32Z

Test build #22906 has finished for PR 3026 at commit a8009ed.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class AllocatedBlocks(streamIdToAllocatedBlocks: Map[Int, Seq[ReceivedBlockInfo]])
- class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false) extends Logging

AmplabJenkins · 2014-11-05T02:33:35Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22906/
Test PASSed.

…recover received block metadata on driver failures As part of the initiative of preventing data loss on driver failure, this JIRA tracks the sub task of modifying the streaming driver to reliably save received block metadata, and recover them on driver restart. This was solved by introducing a `ReceivedBlockTracker` that takes all the responsibility of managing the metadata of received blocks (i.e. `ReceivedBlockInfo`, and any actions on them (e.g, allocating blocks to batches, etc.). All actions to block info get written out to a write ahead log (using `WriteAheadLogManager`). On recovery, all the actions are replaying to recreate the pre-failure state of the `ReceivedBlockTracker`, which include the batch-to-block allocations and the unallocated blocks. Furthermore, the `ReceiverInputDStream` was modified to create `WriteAheadLogBackedBlockRDD`s when file segment info is present in the `ReceivedBlockInfo`. After recovery of all the block info (through recovery `ReceivedBlockTracker`), the `WriteAheadLogBackedBlockRDD`s gets recreated with the recovered info, and jobs submitted. The data of the blocks gets pulled from the write ahead logs, thanks to the segment info present in the `ReceivedBlockInfo`. This is still a WIP. Things that are missing here are. - *End-to-end integration tests:* Unit tests that tests the driver recovery, by killing and restarting the streaming context, and verifying all the input data gets processed. This has been implemented but not included in this PR yet. A sneak peek of that DriverFailureSuite can be found in this PR (on my personal repo): tdas#25 I can either include it in this PR, or submit that as a separate PR after this gets in. - *WAL cleanup:* Cleaning up the received data write ahead log, by calling `ReceivedBlockHandler.cleanupOldBlocks`. This is being worked on. Author: Tathagata Das <[email protected]> Closes #3026 from tdas/driver-ha-rbt and squashes the following commits: a8009ed [Tathagata Das] Added comment 1d704bb [Tathagata Das] Enabled storing recovered WAL-backed blocks to BM 2ee2484 [Tathagata Das] More minor changes based on PR 47fc1e3 [Tathagata Das] Addressed PR comments. 9a7e3e4 [Tathagata Das] Refactored ReceivedBlockTracker API a bit to make things a little cleaner for users of the tracker. af63655 [Tathagata Das] Minor changes. fce2b21 [Tathagata Das] Removed commented lines 59496d3 [Tathagata Das] Changed class names, made allocation more explicit and added cleanup 19aec7d [Tathagata Das] Fixed casting bug. f66d277 [Tathagata Das] Fix line lengths. cda62ee [Tathagata Das] Added license 25611d6 [Tathagata Das] Minor changes before submitting PR 7ae0a7f [Tathagata Das] Transferred changes from driver-ha-working branch (cherry picked from commit 5f13759) Signed-off-by: Tathagata Das <[email protected]>

tdas added 2 commits October 30, 2014 15:43

Transferred changes from driver-ha-working branch

7ae0a7f

Minor changes before submitting PR

25611d6

Added license

cda62ee

Fix line lengths.

f66d277

Fixed casting bug.

19aec7d

tdas reviewed Oct 31, 2014
View reviewed changes

pwendell reviewed Oct 31, 2014
View reviewed changes

tdas added 2 commits October 31, 2014 15:16

Refactored ReceivedBlockTracker API a bit to make things a little cle…

9a7e3e4

…aner for users of the tracker.

Addressed PR comments.

47fc1e3

pwendell reviewed Oct 31, 2014
View reviewed changes

More minor changes based on PR

2ee2484

harishreedharan reviewed Oct 31, 2014
View reviewed changes

Enabled storing recovered WAL-backed blocks to BM

1d704bb

pwendell reviewed Nov 4, 2014
View reviewed changes

Added comment

a8009ed

asfgit closed this in 5f13759 Nov 5, 2014

[SPARK-4029][Streaming] Update streaming driver to reliably save and recover received block metadata on driver failures #3026

[SPARK-4029][Streaming] Update streaming driver to reliably save and recover received block metadata on driver failures #3026

Uh oh!

Conversation

tdas commented Oct 30, 2014

Uh oh!

tdas commented Oct 30, 2014

Uh oh!

SparkQA commented Oct 30, 2014

Uh oh!

SparkQA commented Oct 30, 2014

Uh oh!

AmplabJenkins commented Oct 30, 2014

Uh oh!

SparkQA commented Oct 30, 2014

Uh oh!

SparkQA commented Oct 30, 2014

Uh oh!

AmplabJenkins commented Oct 30, 2014

Uh oh!

SparkQA commented Oct 30, 2014

Uh oh!

SparkQA commented Oct 31, 2014

Uh oh!

AmplabJenkins commented Oct 31, 2014

Uh oh!

tdas commented Oct 31, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 31, 2014

Uh oh!

SparkQA commented Oct 31, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 31, 2014

Uh oh!

AmplabJenkins commented Oct 31, 2014

Uh oh!

SparkQA commented Oct 31, 2014

Uh oh!

AmplabJenkins commented Oct 31, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pwendell commented Oct 31, 2014

Uh oh!

SparkQA commented Oct 31, 2014

Uh oh!

SparkQA commented Oct 31, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harishreedharan commented Oct 31, 2014

Uh oh!

SparkQA commented Oct 31, 2014

Uh oh!

AmplabJenkins commented Oct 31, 2014

Uh oh!

SparkQA commented Nov 1, 2014

Uh oh!

AmplabJenkins commented Nov 1, 2014

Uh oh!

SparkQA commented Nov 4, 2014

Uh oh!