@@ -115,16 +115,14 @@ ssc.awaitTermination() // Wait for the computation to terminate
115115{% endhighlight %}
116116
117117The complete code can be found in the Spark Streaming example
118- [ NetworkWordCount] ({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/
119- org/apache/spark/streaming/examples/NetworkWordCount.scala).
118+ [ NetworkWordCount] ( {{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala ) .
120119<br >
121120
122121</div >
123122<div data-lang =" java " markdown =" 1 " >
124123
125124First, we create a
126- [ JavaStreamingContext] (api/streaming/index.html#org.apache.spark.streaming.api.java
127- .JavaStreamingContext) object,
125+ [ JavaStreamingContext] ( api/streaming/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext ) object,
128126which is the main entry point for all streaming
129127functionality. Besides Spark's configuration, we specify that any DStream would be processed
130128in 1 second batches.
@@ -184,8 +182,8 @@ wordCount.print(); // Print a few of the counts to the console
184182{% endhighlight %}
185183
186184The ` words ` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
187- 1)` pairs, using a [ PairFunction] (api/core/index.html#org.apache.spark.api.java.function
188- .PairFunction) object. Then, it is reduced to get the frequency of words in each batch of data,
185+ 1)` pairs, using a [ PairFunction] ( api/core/index.html#org.apache.spark.api.java.function.PairFunction )
186+ object. Then, it is reduced to get the frequency of words in each batch of data,
189187using a [ Function2] ( api/core/index.html#org.apache.spark.api.java.function.Function2 ) object.
190188Finally, ` wordCounts.print() ` will print a few of the counts generated every second.
191189
@@ -199,8 +197,7 @@ jssc.awaitTermination(); // Wait for the computation to terminate
199197{% endhighlight %}
200198
201199The complete code can be found in the Spark Streaming example
202- [ JavaNetworkWordCount] ({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/
203- org/apache/spark/streaming/examples/JavaNetworkWordCount.java).
200+ [ JavaNetworkWordCount] ( {{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java ) .
204201<br >
205202
206203</div >
@@ -319,8 +316,7 @@ new StreamingContext(master, appName, batchDuration, [sparkHome], [jars])
319316<div data-lang =" java " markdown =" 1 " >
320317
321318To initialize a Spark Streaming program in Java, a
322- [ ` JavaStreamingContext ` ] (api/streaming/index.html#org.apache.spark.streaming.api
323- .java.JavaStreamingContext)
319+ [ ` JavaStreamingContext ` ] ( api/streaming/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext )
324320object has to be created, which is the main entry point of all Spark Streaming functionality.
325321A ` JavaStreamingContext ` object can be created by using
326322
@@ -334,8 +330,9 @@ The `master` parameter is a standard [Spark cluster URL](scala-programming-guide
334330and can be "local" for local testing. The ` appName ` is a name of your program,
335331which will be shown on your cluster's web UI. The ` batchInterval ` is the size of the batches,
336332as explained earlier. Finally, the last two parameters are needed to deploy your code to a cluster
337- if running in distributed mode, as described in the [ Spark programming guide] (scala-programming-guide
338- .html#deploying-code-on-a-cluster). Additionally, the underlying SparkContext can be accessed as
333+ if running in distributed mode, as described in the
334+ [ Spark programming guide] ( scala-programming-guide.html#deploying-code-on-a-cluster ) .
335+ Additionally, the underlying SparkContext can be accessed as
339336` streamingContext.sparkContext ` .
340337
341338The batch interval must be set based on the latency requirements of your application
@@ -407,8 +404,8 @@ and process any files created in that directory. Note that
407404For more details on streams from files, Akka actors and sockets,
408405see the API documentations of the relevant functions in
409406[ StreamingContext] ( api/streaming/index.html#org.apache.spark.streaming.StreamingContext ) for
410- Scala and [ JavaStreamingContext] (api/streaming/index.html#org.apache.spark.streaming.api.
411- java.JavaStreamingContext) for Java.
407+ Scala and [ JavaStreamingContext] ( api/streaming/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext )
408+ for Java.
412409
413410Additional functionality for creating DStreams from sources such as Kafka, Flume, and Twitter
414411can be imported by adding the right dependencies as explained in an
@@ -578,8 +575,7 @@ JavaPairDStream<String, Integer> runningCounts = pairs.updateStateByKey(updateFu
578575The update function will be called for each word, with ` newValues ` having a sequence of 1's (from
579576the ` (word, 1) ` pairs) and the ` runningCount ` having the previous count. For the complete
580577Scala code, take a look at the example
581- [ StatefulNetworkWordCount] ({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/
582- main/scala/org/apache/spark/streaming/examples/StatefulNetworkWordCount.scala).
578+ [ StatefulNetworkWordCount] ( {{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/StatefulNetworkWordCount.scala ) .
583579
584580<h4 >Transform Operation</h4 >
585581
@@ -781,7 +777,7 @@ output operators are defined:
781777The complete list of DStream operations is available in the API documentation. For the Scala API,
782778see [ DStream] ( api/streaming/index.html#org.apache.spark.streaming.dstream.DStream )
783779and [ PairDStreamFunctions] ( api/streaming/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions ) .
784- For the Java API, see [ JavaDStream] ( api/streaming/index.html#org.apache.spark.api.java.dstream.DStream )
780+ For the Java API, see [ JavaDStream] ( api/streaming/index.html#org.apache.spark.streaming. api.java.dstream.DStream )
785781and [ JavaPairDStream] ( api/streaming/index.html#org.apache.spark.streaming.api.java.JavaPairDStream ) .
786782Specifically for the Java API, see [ Spark's Java programming guide] ( java-programming-guide.html )
787783for more information.
@@ -858,7 +854,7 @@ Cluster resources maybe under-utilized if the number of parallel tasks used in a
858854computation is not high enough. For example, for distributed reduce operations like ` reduceByKey `
859855and ` reduceByKeyAndWindow ` , the default number of parallel tasks is 8. You can pass the level of
860856parallelism as an argument (see the
861- [ ` PairDStreamFunctions ` ] ( api/streaming/index.html#org.apache.spark.PairDStreamFunctions )
857+ [ ` PairDStreamFunctions ` ] ( api/streaming/index.html#org.apache.spark.streaming.dstream. PairDStreamFunctions )
862858documentation), or set the [ config property] ( configuration.html#spark-properties )
863859` spark.default.parallelism ` to change the default.
864860
@@ -902,7 +898,8 @@ A good approach to figure out the right batch size for your application is to te
902898conservative batch size (say, 5-10 seconds) and a low data rate. To verify whether the system
903899is able to keep up with data rate, you can check the value of the end-to-end delay experienced
904900by each processed batch (either look for "Total delay" in Spark driver log4j logs, or use the
905- [ StreamingListener] ( streaming/index.html#org.apache.spark.scheduler.StreamingListener ) interface).
901+ [ StreamingListener] ( api/streaming/index.html#org.apache.spark.streaming.scheduler.StreamingListener )
902+ interface).
906903If the delay is maintained to be comparable to the batch size, then system is stable. Otherwise,
907904if the delay is continuously increasing, it means that the system is unable to keep up and it
908905therefore unstable. Once you have an idea of a stable configuration, you can try increasing the
@@ -1050,10 +1047,9 @@ context.awaitTermination()
10501047If the ` checkpointDirectory ` exists, then the context will be recreated from the checkpoint data.
10511048If the directory does not exist (i.e., running for the first time),
10521049then the function ` functionToCreateContext ` will be called to create a new
1053- context and set up the DStreams. See the Scala example [ RecoverableNetworkWordCount] (https://github
1054- .com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/
1055- RecoverableNetworkWordCount.scala?source=c). This example appends the word counts of network
1056- data into a file.
1050+ context and set up the DStreams. See the Scala example
1051+ [ RecoverableNetworkWordCount] ( {{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/RecoverableNetworkWordCount.scala ) .
1052+ This example appends the word counts of network data into a file.
10571053
10581054You can also explicitly create a ` StreamingContext ` from the checkpoint data and start the
10591055 computation by using ` new StreamingContext(checkpointDirectory) ` .
@@ -1090,11 +1086,10 @@ context.awaitTermination();
10901086If the ` checkpointDirectory ` exists, then the context will be recreated from the checkpoint data.
10911087If the directory does not exist (i.e., running for the first time),
10921088then the function ` contextFactory ` will be called to create a new
1093- context and set up the DStreams. See the Scala example [ JavaRecoverableWordCount] (https://github
1094- .com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/
1095- JavaRecoverableWordCount.scala?source=c) (note that this example is missing in the 0.9 release,
1096- so you can test it using the master branch). This example appends the word counts of network
1097- data into a file.
1089+ context and set up the DStreams. See the Scala example
1090+ [ JavaRecoverableWordCount] ( {{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/JavaRecoverableWordCount.scala )
1091+ (note that this example is missing in the 0.9 release, so you can test it using the master branch).
1092+ This example appends the word counts of network data into a file.
10981093
10991094You can also explicitly create a ` JavaStreamingContext ` from the checkpoint data and start
11001095the computation by using ` new JavaStreamingContext(checkpointDirectory) ` .
@@ -1222,16 +1217,15 @@ and output 30 after recovery.
12221217# Where to Go from Here
12231218
12241219* API documentation
1225- - Main docs of StreamingContext and DStreams in [ Scala] (api/streaming/index.html#org.apache
1226- .spark.streaming.package) and [ Java] ( api/streaming/index.html#org.apache.spark.streaming.api.java.package )
1227- - Additional docs for [ Kafka] (api/external/kafka/index.html#org.apache.spark.streaming.kafka
1228- .KafkaUtils$), [ Flume] (api/external/flume/index.html#org.apache.spark.streaming.flume
1229- .FlumeUtils$), [ Twitter] (api/external/twitter/index.html#org.apache.spark.streaming.twitter
1230- .TwitterUtils$), [ ZeroMQ] (api/external/zeromq/index.html#org.apache.spark.streaming.zeromq
1231- .ZeroMQUtils$), and [ MQTT] ( api/external/mqtt/index.html#org.apache.spark.streaming.mqtt.MQTTUtils$ )
1232-
1233-
1234- * More examples in [ Scala] ({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/
1235- scala/org/apache/spark/streaming/examples) and [ Java] ({{site.SPARK_GITHUB_URL}}/
1236- tree/master/examples/src/main/java/org/apache/spark/streaming/examples)
1220+ - Main docs of StreamingContext and DStreams in [ Scala] ( api/streaming/index.html#org.apache.spark.streaming.package )
1221+ and [ Java] ( api/streaming/index.html#org.apache.spark.streaming.api.java.package )
1222+ - Additional docs for
1223+ [ Kafka] ( api/external/kafka/index.html#org.apache.spark.streaming.kafka.KafkaUtils$ ) ,
1224+ [ Flume] ( api/external/flume/index.html#org.apache.spark.streaming.flume.FlumeUtils$ ) ,
1225+ [ Twitter] ( api/external/twitter/index.html#org.apache.spark.streaming.twitter.TwitterUtils$ ) ,
1226+ [ ZeroMQ] ( api/external/zeromq/index.html#org.apache.spark.streaming.zeromq.ZeroMQUtils$ ) , and
1227+ [ MQTT] ( api/external/mqtt/index.html#org.apache.spark.streaming.mqtt.MQTTUtils$ )
1228+
1229+ * More examples in [ Scala] ( {{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples )
1230+ and [ Java] ( {{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/streaming/examples )
12371231* [ Paper] ( http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf ) describing Spark Streaming
0 commit comments