Skip to content

Commit 8d6ff9b

Browse files
committed
Addded migration guide to Spark Streaming.
1 parent 7d171df commit 8d6ff9b

File tree

2 files changed

+57
-5
lines changed

2 files changed

+57
-5
lines changed

docs/streaming-custom-receivers.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ should stop receiving data.
2323

2424
Once the data is received, that data can be stored inside Spark
2525
by calling `store(data)`, which is a method provided by the
26-
[Receiver](api/scala/index.html#org.apache.spark.streaming.receiver.Receiver).
26+
[Receiver](api/scala/index.html#org.apache.spark.streaming.receiver.Receiver) class.
2727
There are number of flavours of `store()` which allow you store the received data
2828
record-at-a-time or as whole collection of objects / serialized bytes.
2929

@@ -196,7 +196,7 @@ The full source code is in the example [JavaCustomReceiver.java](https://github.
196196
### Implementing and Using a Custom Actor-based Receiver
197197

198198
Custom [Akka Actors](http://doc.akka.io/docs/akka/2.2.4/scala/actors.html) can also be used to
199-
receive data. The [ActorHelper](api/scala/index.html#org.apache.spark.streaming.receiver.ActorHelper)
199+
receive data. The [`ActorHelper`](api/scala/index.html#org.apache.spark.streaming.receiver.ActorHelper)
200200
trait can be applied on any Akka actor, which allows received data to be stored in Spark using
201201
`store(...)` methods. The supervisor strategy of this actor can be configured to handle failures, etc.
202202

docs/streaming-programming-guide.md

Lines changed: 55 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1291,12 +1291,64 @@ in the file. This is what the sequence of outputs would be with and without a dr
12911291
If the driver had crashed in the middle of the processing of time 3, then it will process time 3
12921292
and output 30 after recovery.
12931293

1294+
***************************************************************************************************
1295+
1296+
# Migration Guide from 0.9.1 or below to 1.x
1297+
Between Spark 0.9.1 and Spark 1.0, there were a few API changes made to ensure future API stability.
1298+
This section elaborates the steps required to migrate your existing code to 1.0.
1299+
1300+
**Input DStreams**: All operations that create an input stream (e.g., `StreamingContext.socketStream`,
1301+
`FlumeUtils.createStream`, etc.) now returns
1302+
[InputDStream](api/scala/index.html#org.apache.spark.streaming.dstream.InputDStream) /
1303+
[ReceiverInputDStream](api/scala/index.html#org.apache.spark.streaming.dstream.ReceiverInputDStream)
1304+
(instead of DStream) for Scala, and [JavaInputDStream](api/java/org/apache/spark/streaming/api/java/JavaInputDStream.html) /
1305+
[JavaPairInputDStream](api/java/org/apache/spark/streaming/api/java/JavaPairInputDStream.html) /
1306+
[JavaReceiverInputDStream](api/java/org/apache/spark/streaming/api/java/JavaReceiverInputDStream.html) /
1307+
[JavaPairReceiverInputDStream](api/java/org/apache/spark/streaming/api/java/JavaPairReceiverInputDStream.html)
1308+
(instead of JavaDStream) for Java. This ensures that functionality specific to input streams can
1309+
be added to these classes in the future without breaking binary compatibility.
1310+
Note that your existing Spark Streaming applications should not require any change
1311+
(as these new classes are subclasses of DStream/JavaDStream) but may require recompilation with Spark 1.0.
1312+
1313+
**Custom Network Receivers**: Since the release to Spark Streaming, custom network receivers could be defined
1314+
in Scala using the class NetworkReceiver. However, the API was limited in terms of error handling
1315+
and reporting, and could not be used from Java. Starting Spark 1.0, this class has been
1316+
replaced by [Receiver](api/scala/index.html#org.apache.spark.streaming.receiver.Receiver) which has
1317+
the following advantages.
1318+
1319+
* Methods like `stop` and `restart` have been added to for better control of the lifecycle of a receiver. See
1320+
the [custom receiver guide](streaming-custom-receiver.html) for more details.
1321+
* Custom receivers can be implemented using both Scala and Java.
1322+
1323+
To migrate your existing custom receivers from the earlier NetworkReceiver to the new Receiver, you have
1324+
to do the following.
1325+
1326+
* Make your custom receiver class extend
1327+
[`org.apache.spark.streaming.receiver.Receiver`](api/scala/index.html#org.apache.spark.streaming.receiver.Receiver)
1328+
instead of `org.apache.spark.streaming.dstream.NetworkReceiver`.
1329+
* Earlier, a BlockGenerator object had to be created by the custom receiver, to which received data was
1330+
added for being stored in Spark. It had to be explicitly started and stopped from `onStart()` and `onStop()`
1331+
methods. The new Receiver class makes this unnecessary as it adds a set of methods named `store(<data>)`
1332+
that can be called to store the data in Spark. So, to migrate your custom network receiver, remove any
1333+
BlockGenerator object (does not exist any more in Spark 1.0 anyway), and use `store(...)` methods on
1334+
received data.
1335+
1336+
**Actor-based Receivers**: Data could have been received using any Akka Actors by extending the actor class with
1337+
`org.apache.spark.streaming.receivers.Receiver` trait. This has been renamed to
1338+
[`org.apache.spark.streaming.receiver.ActorHelper`](api/scala/index.html#org.apache.spark.streaming.receiver.ActorHelper)
1339+
and the `pushBlock(...)` methods to store received data has been renamed to `store(...)`. Other helper classes in
1340+
the `org.apache.spark.streaming.receivers` package were also moved
1341+
to [`org.apache.spark.streaming.receiver`](api/scala/index.html#org.apache.spark.streaming.receiver.package)
1342+
package and renamed for better clarity.
1343+
1344+
***************************************************************************************************
1345+
12941346
# Where to Go from Here
12951347

12961348
* API documentation
12971349
- Scala docs
12981350
* [StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) and
1299-
[DStream](api/scala/index.html#org.apache.spark.streaming.DStream)
1351+
[DStream](api/scala/index.html#org.apache.spark.streaming.dstream.DStream)
13001352
* [KafkaUtils](api/scala/index.html#org.apache.spark.streaming.kafka.KafkaUtils$),
13011353
[FlumeUtils](api/scala/index.html#org.apache.spark.streaming.flume.FlumeUtils$),
13021354
[TwitterUtils](api/scala/index.html#org.apache.spark.streaming.twitter.TwitterUtils$),
@@ -1314,5 +1366,5 @@ and output 30 after recovery.
13141366

13151367
* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples)
13161368
and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/streaming/examples)
1317-
* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) describing Spark Streaming.
1318-
* [Video](http://youtu.be/g171ndOHgJ0) describing Spark Streaming.
1369+
* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) and
1370+
[video](http://youtu.be/g171ndOHgJ0) describing Spark Streaming.

0 commit comments

Comments
 (0)