Skip to content

Commit 18ff105

Browse files
committed
Fixed a lot of broken links.
1 parent 34a5a60 commit 18ff105

File tree

1 file changed

+34
-40
lines changed

1 file changed

+34
-40
lines changed

docs/streaming-programming-guide.md

Lines changed: 34 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -115,16 +115,14 @@ ssc.awaitTermination() // Wait for the computation to terminate
115115
{% endhighlight %}
116116

117117
The complete code can be found in the Spark Streaming example
118-
[NetworkWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/
119-
org/apache/spark/streaming/examples/NetworkWordCount.scala).
118+
[NetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala).
120119
<br>
121120

122121
</div>
123122
<div data-lang="java" markdown="1">
124123

125124
First, we create a
126-
[JavaStreamingContext](api/streaming/index.html#org.apache.spark.streaming.api.java
127-
.JavaStreamingContext) object,
125+
[JavaStreamingContext](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext) object,
128126
which is the main entry point for all streaming
129127
functionality. Besides Spark's configuration, we specify that any DStream would be processed
130128
in 1 second batches.
@@ -184,8 +182,8 @@ wordCount.print(); // Print a few of the counts to the console
184182
{% endhighlight %}
185183

186184
The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
187-
1)` pairs, using a [PairFunction](api/core/index.html#org.apache.spark.api.java.function
188-
.PairFunction) object. Then, it is reduced to get the frequency of words in each batch of data,
185+
1)` pairs, using a [PairFunction](api/core/index.html#org.apache.spark.api.java.function.PairFunction)
186+
object. Then, it is reduced to get the frequency of words in each batch of data,
189187
using a [Function2](api/core/index.html#org.apache.spark.api.java.function.Function2) object.
190188
Finally, `wordCounts.print()` will print a few of the counts generated every second.
191189

@@ -199,8 +197,7 @@ jssc.awaitTermination(); // Wait for the computation to terminate
199197
{% endhighlight %}
200198

201199
The complete code can be found in the Spark Streaming example
202-
[JavaNetworkWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/
203-
org/apache/spark/streaming/examples/JavaNetworkWordCount.java).
200+
[JavaNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java).
204201
<br>
205202

206203
</div>
@@ -319,8 +316,7 @@ new StreamingContext(master, appName, batchDuration, [sparkHome], [jars])
319316
<div data-lang="java" markdown="1">
320317

321318
To initialize a Spark Streaming program in Java, a
322-
[`JavaStreamingContext`](api/streaming/index.html#org.apache.spark.streaming.api
323-
.java.JavaStreamingContext)
319+
[`JavaStreamingContext`](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext)
324320
object has to be created, which is the main entry point of all Spark Streaming functionality.
325321
A `JavaStreamingContext` object can be created by using
326322

@@ -334,8 +330,9 @@ The `master` parameter is a standard [Spark cluster URL](scala-programming-guide
334330
and can be "local" for local testing. The `appName` is a name of your program,
335331
which will be shown on your cluster's web UI. The `batchInterval` is the size of the batches,
336332
as explained earlier. Finally, the last two parameters are needed to deploy your code to a cluster
337-
if running in distributed mode, as described in the [Spark programming guide](scala-programming-guide
338-
.html#deploying-code-on-a-cluster). Additionally, the underlying SparkContext can be accessed as
333+
if running in distributed mode, as described in the
334+
[Spark programming guide](scala-programming-guide.html#deploying-code-on-a-cluster).
335+
Additionally, the underlying SparkContext can be accessed as
339336
`streamingContext.sparkContext`.
340337

341338
The batch interval must be set based on the latency requirements of your application
@@ -407,8 +404,8 @@ and process any files created in that directory. Note that
407404
For more details on streams from files, Akka actors and sockets,
408405
see the API documentations of the relevant functions in
409406
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) for
410-
Scala and [JavaStreamingContext](api/streaming/index.html#org.apache.spark.streaming.api.
411-
java.JavaStreamingContext) for Java.
407+
Scala and [JavaStreamingContext](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaStreamingContext)
408+
for Java.
412409

413410
Additional functionality for creating DStreams from sources such as Kafka, Flume, and Twitter
414411
can be imported by adding the right dependencies as explained in an
@@ -578,8 +575,7 @@ JavaPairDStream<String, Integer> runningCounts = pairs.updateStateByKey(updateFu
578575
The update function will be called for each word, with `newValues` having a sequence of 1's (from
579576
the `(word, 1)` pairs) and the `runningCount` having the previous count. For the complete
580577
Scala code, take a look at the example
581-
[StatefulNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/
582-
main/scala/org/apache/spark/streaming/examples/StatefulNetworkWordCount.scala).
578+
[StatefulNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/StatefulNetworkWordCount.scala).
583579

584580
<h4>Transform Operation</h4>
585581

@@ -781,7 +777,7 @@ output operators are defined:
781777
The complete list of DStream operations is available in the API documentation. For the Scala API,
782778
see [DStream](api/streaming/index.html#org.apache.spark.streaming.dstream.DStream)
783779
and [PairDStreamFunctions](api/streaming/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions).
784-
For the Java API, see [JavaDStream](api/streaming/index.html#org.apache.spark.api.java.dstream.DStream)
780+
For the Java API, see [JavaDStream](api/streaming/index.html#org.apache.spark.streaming.api.java.dstream.DStream)
785781
and [JavaPairDStream](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaPairDStream).
786782
Specifically for the Java API, see [Spark's Java programming guide](java-programming-guide.html)
787783
for more information.
@@ -858,7 +854,7 @@ Cluster resources maybe under-utilized if the number of parallel tasks used in a
858854
computation is not high enough. For example, for distributed reduce operations like `reduceByKey`
859855
and `reduceByKeyAndWindow`, the default number of parallel tasks is 8. You can pass the level of
860856
parallelism as an argument (see the
861-
[`PairDStreamFunctions`](api/streaming/index.html#org.apache.spark.PairDStreamFunctions)
857+
[`PairDStreamFunctions`](api/streaming/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions)
862858
documentation), or set the [config property](configuration.html#spark-properties)
863859
`spark.default.parallelism` to change the default.
864860

@@ -902,7 +898,8 @@ A good approach to figure out the right batch size for your application is to te
902898
conservative batch size (say, 5-10 seconds) and a low data rate. To verify whether the system
903899
is able to keep up with data rate, you can check the value of the end-to-end delay experienced
904900
by each processed batch (either look for "Total delay" in Spark driver log4j logs, or use the
905-
[StreamingListener](streaming/index.html#org.apache.spark.scheduler.StreamingListener) interface).
901+
[StreamingListener](api/streaming/index.html#org.apache.spark.streaming.scheduler.StreamingListener)
902+
interface).
906903
If the delay is maintained to be comparable to the batch size, then system is stable. Otherwise,
907904
if the delay is continuously increasing, it means that the system is unable to keep up and it
908905
therefore unstable. Once you have an idea of a stable configuration, you can try increasing the
@@ -1050,10 +1047,9 @@ context.awaitTermination()
10501047
If the `checkpointDirectory` exists, then the context will be recreated from the checkpoint data.
10511048
If the directory does not exist (i.e., running for the first time),
10521049
then the function `functionToCreateContext` will be called to create a new
1053-
context and set up the DStreams. See the Scala example [RecoverableNetworkWordCount](https://github
1054-
.com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/
1055-
RecoverableNetworkWordCount.scala?source=c). This example appends the word counts of network
1056-
data into a file.
1050+
context and set up the DStreams. See the Scala example
1051+
[RecoverableNetworkWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/RecoverableNetworkWordCount.scala).
1052+
This example appends the word counts of network data into a file.
10571053

10581054
You can also explicitly create a `StreamingContext` from the checkpoint data and start the
10591055
computation by using `new StreamingContext(checkpointDirectory)`.
@@ -1090,11 +1086,10 @@ context.awaitTermination();
10901086
If the `checkpointDirectory` exists, then the context will be recreated from the checkpoint data.
10911087
If the directory does not exist (i.e., running for the first time),
10921088
then the function `contextFactory` will be called to create a new
1093-
context and set up the DStreams. See the Scala example [JavaRecoverableWordCount](https://github
1094-
.com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/
1095-
JavaRecoverableWordCount.scala?source=c) (note that this example is missing in the 0.9 release,
1096-
so you can test it using the master branch). This example appends the word counts of network
1097-
data into a file.
1089+
context and set up the DStreams. See the Scala example
1090+
[JavaRecoverableWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/JavaRecoverableWordCount.scala)
1091+
(note that this example is missing in the 0.9 release, so you can test it using the master branch).
1092+
This example appends the word counts of network data into a file.
10981093

10991094
You can also explicitly create a `JavaStreamingContext` from the checkpoint data and start
11001095
the computation by using `new JavaStreamingContext(checkpointDirectory)`.
@@ -1222,16 +1217,15 @@ and output 30 after recovery.
12221217
# Where to Go from Here
12231218

12241219
* API documentation
1225-
- Main docs of StreamingContext and DStreams in [Scala](api/streaming/index.html#org.apache
1226-
.spark.streaming.package) and [Java](api/streaming/index.html#org.apache.spark.streaming.api.java.package)
1227-
- Additional docs for [Kafka](api/external/kafka/index.html#org.apache.spark.streaming.kafka
1228-
.KafkaUtils$), [Flume](api/external/flume/index.html#org.apache.spark.streaming.flume
1229-
.FlumeUtils$), [Twitter](api/external/twitter/index.html#org.apache.spark.streaming.twitter
1230-
.TwitterUtils$), [ZeroMQ](api/external/zeromq/index.html#org.apache.spark.streaming.zeromq
1231-
.ZeroMQUtils$), and [MQTT](api/external/mqtt/index.html#org.apache.spark.streaming.mqtt.MQTTUtils$)
1232-
1233-
1234-
* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/
1235-
scala/org/apache/spark/streaming/examples) and [Java]({{site.SPARK_GITHUB_URL}}/
1236-
tree/master/examples/src/main/java/org/apache/spark/streaming/examples)
1220+
- Main docs of StreamingContext and DStreams in [Scala](api/streaming/index.html#org.apache.spark.streaming.package)
1221+
and [Java](api/streaming/index.html#org.apache.spark.streaming.api.java.package)
1222+
- Additional docs for
1223+
[Kafka](api/external/kafka/index.html#org.apache.spark.streaming.kafka.KafkaUtils$),
1224+
[Flume](api/external/flume/index.html#org.apache.spark.streaming.flume.FlumeUtils$),
1225+
[Twitter](api/external/twitter/index.html#org.apache.spark.streaming.twitter.TwitterUtils$),
1226+
[ZeroMQ](api/external/zeromq/index.html#org.apache.spark.streaming.zeromq.ZeroMQUtils$), and
1227+
[MQTT](api/external/mqtt/index.html#org.apache.spark.streaming.mqtt.MQTTUtils$)
1228+
1229+
* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples)
1230+
and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/streaming/examples)
12371231
* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) describing Spark Streaming

0 commit comments

Comments
 (0)