Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 29 additions & 28 deletions docs/java-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,36 +17,21 @@ which support the same methods as their Scala counterparts but take Java functio
Java data and collection types. The main differences have to do with passing functions to RDD
operations (e.g. map) and handling RDDs of different types, as discussed next.

# Upgrading from pre-1.0 versions of Spark

There are following API changes for codebases written in pre-1.0 versions of Spark.

* All `org.apache.spark.api.java.function.*` abstract classes are now interfaces.
So this means that concrete implementations of these `Function` abstract classes will
have `implements` instead of extends.
* APIs of map and flatMap in core and map, flatMap and transform in streaming
are changed and are defined on the basis of the passed anonymous function's
return type, for example mapToPair(...) or flatMapToPair returns
[`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD),
similarly mapToDouble and flatMapToDouble returns
[`JavaDoubleRDD`](api/core/index.html#org.apache.spark.api.java.JavaDoubleRDD).
Please check the API documentation for more details.

# Key Differences in the Java API

There are a few key differences between the Java and Scala APIs:

* Java does not support anonymous or first-class functions, so functions must
be implemented by extending the
* Java does not support anonymous or first-class functions, so functions are passed
using anonymous classes that implement the
[`org.apache.spark.api.java.function.Function`](api/core/index.html#org.apache.spark.api.java.function.Function),
[`Function2`](api/core/index.html#org.apache.spark.api.java.function.Function2), etc.
classes.
interfaces.
* To maintain type safety, the Java API defines specialized Function and RDD
classes for key-value pairs and doubles. For example,
[`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD)
stores key-value pairs.
* To support java 8 lambda expression, methods are defined on the basis of
the passed anonymous function's (a.k.a lambda expression) return type,
* Some methods are defined on the basis of the passed anonymous function's
(a.k.a lambda expression) return type,
for example mapToPair(...) or flatMapToPair returns
[`JavaPairRDD`](api/core/index.html#org.apache.spark.api.java.JavaPairRDD),
similarly mapToDouble and flatMapToDouble returns
Expand Down Expand Up @@ -74,10 +59,10 @@ each specialized RDD class, so filtering a `PairRDD` returns a new `PairRDD`,
etc (this acheives the "same-result-type" principle used by the [Scala collections
framework](http://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html)).

## Function Classes
## Function Interfaces

The following table lists the function classes used by the Java API. Each
class has a single abstract method, `call()`, that must be implemented.
The following table lists the function interfaces used by the Java API. Each
interface has a single abstract method, `call()`, that must be implemented.

<table class="table">
<tr><th>Class</th><th>Function Type</th></tr>
Expand Down Expand Up @@ -106,6 +91,21 @@ The Java API supports other Spark features, including
[broadcast variables](scala-programming-guide.html#broadcast-variables), and
[caching](scala-programming-guide.html#rdd-persistence).

# Upgrading From Pre-1.0 Versions of Spark

In version 1.0 of Spark the Java API was refactored to better support Java 8
lambda expressions. Users upgrading from older versions of Spark should note
the following changes:

* All `org.apache.spark.api.java.function.*` have been changed from abstract
classes to interfaces. This means that concrete implementations of these
`Function` classes will need to use `implements` rather than `extends`.
* Certain transformation functions now have multiple versions depending
on the return type. In Spark core, the map functions (map, flatMap,
mapPartitons) have type-specific versions, e.g.
[`mapToPair`](api/core/index.html#org.apache.spark.api.java.JavaRDD@mapToPair[K2,V2](f:org.apache.spark.api.java.function.PairFunction[T,K2,V2]):org.apache.spark.api.java.JavaPairRDD[K2,V2])
and [`mapToDouble`](api/core/index.html#org.apache.spark.api.java.JavaRDD@mapToDouble[R](f:org.apache.spark.api.java.function.DoubleFunction[T]):org.apache.spark.api.java.JavaDoubleRDD).
Spark Streaming also uses the same approach, e.g. [`transformToPair`](api/streaming/index.html#org.apache.spark.streaming.api.java.JavaDStream@transformToPair[K2,V2](transformFunc:org.apache.spark.api.java.function.Function[R,org.apache.spark.api.java.JavaPairRDD[K2,V2]]):org.apache.spark.streaming.api.java.JavaPairDStream[K2,V2]).

# Example

Expand Down Expand Up @@ -147,8 +147,8 @@ class Split extends FlatMapFunction<String, String> {
JavaRDD<String> words = lines.flatMap(new Split());
{% endhighlight %}

Java 8+ users can also possibly write the above `FlatMapFunction` in a more concise way using
lambda expression as follows:
Java 8+ users can also write the above `FlatMapFunction` in a more concise way using
a lambda expression:

{% highlight java %}
JavaRDD<String> words = lines.flatMap(s -> Arrays.asList(s.split(" ")));
Expand Down Expand Up @@ -209,10 +209,11 @@ just a matter of style.

We currently provide documentation for the Java API as Scaladoc, in the
[`org.apache.spark.api.java` package](api/core/index.html#org.apache.spark.api.java.package), because
some of the classes are implemented in Scala. The main downside is that the types and function
some of the classes are implemented in Scala. It is important to note that the types and function
definitions show Scala syntax (for example, `def reduce(func: Function2[T, T]): T` instead of
`T reduce(Function2<T, T> func)`).
We hope to generate documentation with Java-style syntax in the future.
`T reduce(Function2<T, T> func)`). In addition, the Scala `trait` modifier is used for Java
interface classes. We hope to generate documentation with Java-style syntax in the future to
avoid these quirks.


# Where to Go from Here
Expand Down
1 change: 1 addition & 0 deletions extras/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This directory contains build components not included by default in Spark's build.
25 changes: 17 additions & 8 deletions extras/java8-tests/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,24 @@
# Java 8 test suites.
# Java 8 Test Suites

These tests are bundled with spark and run if you have java 8 installed as system default or your `JAVA_HOME` points to a java 8(or higher) installation. `JAVA_HOME` is preferred to system default jdk installation. Since these tests require jdk 8 or higher, they defined to be optional to run in the build system.
These tests require having Java 8 installed and are isolated from the main Spark build.
If Java 8 is not your system's default Java version, you will need to point Spark's build
to your Java location. The set-up depends a bit on the build system:

* For sbt users, it automatically detects the presence of java 8 based on either `JAVA_HOME` environment variable or default installed jdk and run these tests. It takes highest precednce, if java home is passed as follows.
* Sbt users can either set JAVA_HOME to the location of a Java 8 JDK or explicitly pass
`-java-home` to the sbt launch script. If a Java 8 JDK is detected sbt will automatically
include the Java 8 test project.

`$ sbt/sbt -java-home "/path/to/jdk1.8.0"`
`$ JAVA_HOME=/opt/jdk1.8.0/ sbt/sbt clean "test-only org.apache.spark.JavaAPISuite"`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but it should have been Java8APISuite.


* For maven users,
* For Maven users,

This automatic detection is not possible and thus user has to ensure that either `JAVA_HOME` environment variable or default installed jdk points to jdk 8.
Maven users can also refer to their Java 8 directory using JAVA_HOME. However, Maven will not
automatically detect the presence of a Java 8 JDK, so a special build profile `-Pjava8-tests`
must be used.

`$ mvn install -Pjava8-tests`
`$ JAVA_HOME=/opt/jdk1.8.0/ mvn clean install -DskipTests`
`$ JAVA_HOME=/opt/jdk1.8.0/ mvn test -Pjava8-tests -DwildcardSuites=org.apache.spark.JavaAPISuite`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.


Above command can only be run from project root directory since this module depends on both core and test-jars of core and streaming. These jars are installed first time the above command is run as java8-tests profile enables local publishing of test-jar artifacts as well. Once these artifacts are published then these tests can be run from this module's directory as well.
Note that the above command can only be run from project root directory since this module
depends on core and the test-jars of core and streaming. This means an install step is
required to make the test dependencies visible to the Java 8 sub-project.
12 changes: 9 additions & 3 deletions extras/java8-tests/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent</artifactId>
<version>1.0.0-incubating-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
<version>1.0.0-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>

<groupId>org.apache.spark</groupId>
Expand Down Expand Up @@ -77,12 +77,18 @@
</execution>
</executions>
<configuration>
<systemPropertyVariables>
<!-- For some reason surefire isn't setting this log4j file on the
test classpath automatically. So we add it manually. -->
<log4j.configuration>
file:src/test/resources/log4j.properties
</log4j.configuration>
</systemPropertyVariables>
<skipTests>false</skipTests>
<includes>
<include>**/Suite*.java</include>
<include>**/*Suite.java</include>
</includes>
<redirectTestOutputToFile>true</redirectTestOutputToFile>
</configuration>
</plugin>
<plugin>
Expand Down
28 changes: 28 additions & 0 deletions extras/java8-tests/src/test/resources/log4j.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the file target/unit-tests.log
log4j.rootCategory=INFO, file
log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.append=false
log4j.appender.file.file=target/unit-tests.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %p %c{1}: %m%n

# Ignore messages below warning level from Jetty, because it's a bit verbose
log4j.logger.org.eclipse.jetty=WARN
org.eclipse.jetty.LEVEL=WARN
3 changes: 2 additions & 1 deletion sbt/sbt-launch-lib.bash
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ declare -a scalac_args
declare -a sbt_commands

if test -x "$JAVA_HOME/bin/java"; then
echo -e "using $JAVA_HOME for launching sbt."
echo -e "Using $JAVA_HOME as default JAVA_HOME."
echo "Note, this will be overridden by -java-home if it is set."
declare java_cmd="$JAVA_HOME/bin/java"
else
declare java_cmd=java
Expand Down