Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ echo "Spark $VERSION$GITREVSTRING built for Hadoop $SPARK_HADOOP_VERSION" > "$DI
# Copy jars
cp "$FWDIR"/assembly/target/scala*/*assembly*hadoop*.jar "$DISTDIR/lib/"
cp "$FWDIR"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/"
cp "$FWDIR"/network/yarn/target/scala*/spark-*-yarn-shuffle.jar "$DISTDIR/lib/"

# Copy example sources (needed for python and SQL)
mkdir -p "$DISTDIR/examples/src/main"
Expand Down
5 changes: 3 additions & 2 deletions network/common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,13 @@
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
</dependency>

<!-- Provided dependencies -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move the comment up here? Should this be at the provided scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually Yarn already provides slf4j so it doesn't need to be a core dependency. For standalone mode, this is also already required by Spark so it doesn't need to be a core dependency there either. HOWEVER I just realized I forgot to actually make it provided by adding the tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I added the tag

<scope>provided</scope>
</dependency>

<!-- Provided dependencies -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
Expand Down
5 changes: 3 additions & 2 deletions network/shuffle/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,13 @@
<artifactId>spark-network-common_2.10</artifactId>
<version>${project.version}</version>
</dependency>

<!-- Provided dependencies -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<scope>provided</scope>
</dependency>

<!-- Provided dependencies -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
Expand Down
33 changes: 33 additions & 0 deletions network/yarn/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -54,5 +54,38 @@
<build>
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
<plugins>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a comment on what this is doing? I don't know much about the shading plugin, and accordingly I'm not certain why shading is equivalent to producing an assembly jar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the shading plugin is primarily used to create uber jars, and the shading dependency part is just a generally useful thing to do in this process: http://maven.apache.org/plugins/maven-shade-plugin/. This is how we create assembly jars in say the example and core modules, except the only difference here is that we don't actually need to shade any dependencies. I think this is a pretty standard thing to do and I'm not sure if a comment is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey I agree with @andrewor14 on this one - it's used pervasively in our maven build and I'm not sure we want to comment it everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Perhaps just <!-- Creates a fat jar, sourced from assembly/pom.xml -->?

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<outputFile>${project.build.directory}/scala-${scala.binary.version}/spark-${project.version}-yarn-shuffle.jar</outputFile>
<artifactSet>
<includes>
<include>*:*</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
19 changes: 13 additions & 6 deletions project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,12 @@ object BuildCommons {
"streaming-flume", "streaming-kafka", "streaming-mqtt", "streaming-twitter",
"streaming-zeromq").map(ProjectRef(buildLocation, _))

val optionallyEnabledProjects@Seq(yarn, yarnStable, yarnAlpha, networkYarn, java8Tests,
sparkGangliaLgpl, sparkKinesisAsl) = Seq("yarn", "yarn-stable", "yarn-alpha", "network-yarn",
val optionallyEnabledProjects@Seq(yarn, yarnStable, yarnAlpha, java8Tests,
sparkGangliaLgpl, sparkKinesisAsl) = Seq("yarn", "yarn-stable", "yarn-alpha",
"java8-tests", "ganglia-lgpl", "kinesis-asl").map(ProjectRef(buildLocation, _))

val assemblyProjects@Seq(assembly, examples) = Seq("assembly", "examples")
.map(ProjectRef(buildLocation, _))
val assemblyProjects@Seq(assembly, examples, networkYarn) =
Seq("assembly", "examples", "network-yarn").map(ProjectRef(buildLocation, _))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to make sure the name here matches what is in the maven build. You can modify the name using "jarName in assembly" setting. You'll want to modify it just for the network-yarn project though.


val tools = ProjectRef(buildLocation, "tools")
// Root project.
Expand Down Expand Up @@ -270,8 +270,15 @@ object Assembly {

lazy val settings = assemblySettings ++ Seq(
test in assembly := {},
jarName in assembly <<= (version, moduleName) map { (v, mName) => mName + "-"+v + "-hadoop" +
Option(System.getProperty("hadoop.version")).getOrElse("1.0.4") + ".jar" },
jarName in assembly <<= (version, moduleName) map { (v, mName) =>
if (mName.contains("network-yarn")) {
// This must match the same name used in maven (see network/yarn/pom.xml)
"spark-" + v + "-yarn-shuffle.jar"
} else {
mName + "-" + v + "-hadoop" +
Option(System.getProperty("hadoop.version")).getOrElse("1.0.4") + ".jar"
}
},
mergeStrategy in assembly := {
case PathList("org", "datanucleus", xs @ _*) => MergeStrategy.discard
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
Expand Down