Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
213 commits
Select commit Hold shift + click to select a range
e1e09e0
SPARK-977 Added Python RDD.zip function
Mar 10, 2014
f551898
[SPARK-972] Added detailed callsite info for ValueError in context.py…
jyotiska Mar 10, 2014
a59419c
SPARK-1168, Added foldByKey to pyspark.
ScrapCodes Mar 10, 2014
2a51617
SPARK-1205: Clean up callSite/origin/generator.
pwendell Mar 10, 2014
2a2c964
SPARK-1211. In ApplicationMaster, set spark.master system property to…
sryza Mar 11, 2014
16788a6
SPARK-1167: Remove metrics-ganglia from default build due to LGPL iss…
pwendell Mar 11, 2014
2409af9
SPARK-1064
sryza Mar 12, 2014
af7f2f1
Spark-1163, Added missing Python RDD functions
Mar 12, 2014
c8c59b3
[SPARK-1232] Fix the hadoop 0.23 yarn build
tgravescs Mar 12, 2014
b5162f4
[SPARK-1233] Fix running hadoop 0.23 due to java.lang.NoSuchFieldExce…
tgravescs Mar 12, 2014
5d1ec64
Fix #SPARK-1149 Bad partitioners can cause Spark to hang
Mar 12, 2014
b8afe30
SPARK-1162 Added top in python.
ScrapCodes Mar 12, 2014
9032f7c
SPARK-1160: Deprecate toArray in RDD
CodingCat Mar 13, 2014
31a7040
Fix example bug: compile error
Mar 13, 2014
6bd2eaa
hot fix for PR105 - change to Java annotation
CodingCat Mar 13, 2014
4ea23db
SPARK-1019: pyspark RDD take() throws an NPE
pwendell Mar 13, 2014
e4e8d8f
[SPARK-1237, 1238] Improve the computation of YtY for implicit ALS
mengxr Mar 13, 2014
6983732
SPARK-1183. Don't use "worker" to mean executor
sryza Mar 13, 2014
ca4bf8c
SPARK-1236 - Upgrade Jetty to 9.1.3.v20140225.
rxin Mar 13, 2014
181b130
[bugfix] wrong client arg, should use executor-cores
tsdeng Mar 14, 2014
e19044c
Fix serialization of MutablePair. Also provide an interface for easy …
marmbrus Mar 14, 2014
97e4459
SPARK-1254. Consolidate, order, and harmonize repository declarations…
srowen Mar 15, 2014
f5486e9
SPARK-1255: Allow user to pass Serializer object instead of class nam…
rxin Mar 16, 2014
dc96546
SPARK-1240: handle the case of empty RDD when takeSample
CodingCat Mar 17, 2014
796977a
SPARK-1244: Throw exception if map output status exceeds frame size
pwendell Mar 17, 2014
087eedc
[Spark-1261] add instructions for running python examples to doc over…
Mar 18, 2014
e3681f2
Spark 1246 add min max to stat counter
dwmclary Mar 18, 2014
e7423d4
Revert "SPARK-1236 - Upgrade Jetty to 9.1.3.v20140225."
pwendell Mar 18, 2014
2fa26ec
SPARK-1102: Create a saveAsNewAPIHadoopDataset method
CodingCat Mar 18, 2014
79e547f
Update copyright year in NOTICE to 2014
mateiz Mar 18, 2014
e108b9a
[SPARK-1260]: faster construction of features with intercept
mengxr Mar 18, 2014
f9d8a83
[SPARK-1266] persist factors in implicit ALS
mengxr Mar 19, 2014
cc2655a
Fix SPARK-1256: Master web UI and Worker web UI returns a 404 error
witgo Mar 19, 2014
a18ea00
Bundle tachyon: SPARK-1269
nicklan Mar 19, 2014
d55ec86
bugfix: Wrong "Duration" in "Active Stages" in stages page
BlackNiuza Mar 19, 2014
6112270
SPARK-1203 fix saving to hdfs from yarn
tgravescs Mar 19, 2014
ab747d3
Bugfixes/improvements to scheduler
mridulm Mar 19, 2014
79d07d6
[SPARK-1132] Persisting Web UI through refactoring the SparkListener …
andrewor14 Mar 19, 2014
67fa71c
Added doctest for map function in rdd.py
jyotiska Mar 19, 2014
1678931
SPARK-1099:Spark's local mode should probably respect spark.cores.max…
Mar 19, 2014
ffe272d
Revert "SPARK-1099:Spark's local mode should probably respect spark.c…
aarondav Mar 20, 2014
66a03e5
Principal Component Analysis
rezazadeh Mar 20, 2014
ca76423
[Hot Fix #42] Do not stop SparkUI if bind() is not called
andrewor14 Mar 20, 2014
9aadcff
SPARK-1251 Support for optimizing and executing structured queries
marmbrus Mar 21, 2014
e09139d
Fix maven jenkins: Add explicit init for required tables in SQLQueryS…
marmbrus Mar 21, 2014
7e17fe6
Add hive test files to repository. Remove download script.
marmbrus Mar 21, 2014
2c0aa22
SPARK-1279: Fix improper use of SimpleDateFormat
zsxwing Mar 21, 2014
dab5439
Make SQL keywords case-insensitive
mateiz Mar 21, 2014
d780983
Add asCode function for dumping raw tree representations.
marmbrus Mar 21, 2014
646e554
Fix to Stage UI to display numbers on progress bar
emtiazahmed Mar 22, 2014
abf6714
SPARK-1254. Supplemental fix for HTTPS on Maven Central
srowen Mar 23, 2014
57a4379
[SPARK-1292] In-memory columnar representation for Spark SQL
liancheng Mar 23, 2014
8265dc7
Fixed coding style issues in Spark SQL
liancheng Mar 23, 2014
80c2968
[SPARK-1212] Adding sparse data support and update KMeans
mengxr Mar 24, 2014
21109fb
SPARK-1144 Added license and RAT to check licenses.
ScrapCodes Mar 24, 2014
56db8a2
HOT FIX: Exclude test files from RAT
pwendell Mar 24, 2014
8043b7b
SPARK-1294 Fix resolution of uppercase field names using a HiveContext.
marmbrus Mar 25, 2014
dc126f2
SPARK-1094 Support MiMa for reporting binary compatibility accross ve…
pwendell Mar 25, 2014
5140598
SPARK-1128: set hadoop task properties when constructing HadoopRDD
CodingCat Mar 25, 2014
b637f2d
Unify the logic for column pruning, projection, and filtering of tabl…
marmbrus Mar 25, 2014
007a733
SPARK-1286: Make usage of spark-env.sh idempotent
aarondav Mar 25, 2014
134ace7
Add more hive compatability tests to whitelist
marmbrus Mar 25, 2014
71d4ed2
SPARK-1316. Remove use of Commons IO
srowen Mar 25, 2014
f8111ea
SPARK-1319: Fix scheduler to account for tasks using > 1 CPUs.
shivaram Mar 25, 2014
8237df8
Avoid Option while generating call site
witgo Mar 25, 2014
4f7d547
Initial experimentation with Travis CI configuration
marmbrus Mar 26, 2014
b859853
SPARK-1321 Use Guava's top k implementation rather than our BoundedPr…
rxin Mar 26, 2014
a0853a3
SPARK-1322, top in pyspark should sort result in descending order.
ScrapCodes Mar 26, 2014
345825d
Unified package definition format in Spark SQL
liancheng Mar 26, 2014
32cbdfd
[SQL] Un-ignore a test that is now passing.
marmbrus Mar 27, 2014
e15e574
[SQL] Add a custom serializer for maps since they do not have a no-ar…
marmbrus Mar 27, 2014
be6d96c
SPARK-1324: SparkUI Should Not Bind to SPARK_PUBLIC_DNS
pwendell Mar 27, 2014
3e63d98
Spark 1095 : Adding explicit return types to all public methods
NirmalReddy Mar 27, 2014
1fa48d9
SPARK-1325. The maven build error for Spark Tools
srowen Mar 27, 2014
d679843
[SPARK-1327] GLM needs to check addIntercept for intercept and weights
mengxr Mar 27, 2014
5b2d863
Cut down the granularity of travis tests.
marmbrus Mar 27, 2014
426042a
SPARK-1330 removed extra echo from comput_classpath.sh
tgravescs Mar 27, 2014
53953d0
SPARK-1335. Also increase perm gen / code cache for scalatest when in…
srowen Mar 27, 2014
6f986f0
[SPARK-1268] Adding XOR and AND-NOT operations to spark.util.collecti…
Mar 27, 2014
3d89043
[SPARK-1210] Prevent ContextClassLoader of Actor from becoming ClassL…
ueshin Mar 28, 2014
632c322
Make sed do -i '' on OSX
nicklan Mar 28, 2014
60abc25
SPARK-1096, a space after comment start style checker.
ScrapCodes Mar 28, 2014
75d46be
fix path for jar, make sed actually work on OSX
nicklan Mar 28, 2014
56cc7fb
First cut implementation of Streaming UI.
tdas Mar 28, 2014
3738f24
SPARK-1345 adding missing dependency on avro for hadoop 0.23 to the n…
tgravescs Mar 29, 2014
1617816
SPARK-1126. spark-app preliminary
sryza Mar 29, 2014
af3746c
Implement the RLike & Like in catalyst
chenghao-intel Mar 29, 2014
fda86d8
[SPARK-1186] : Enrich the Spark Shell to support additional arguments.
berngp Mar 30, 2014
92b8395
Don't swallow all kryo errors, only those that indicate we are out of…
marmbrus Mar 30, 2014
2861b07
[SQL] SPARK-1354 Fix self-joins of parquet relations
marmbrus Mar 30, 2014
df1b9f7
SPARK-1336 Reducing the output of run-tests script.
ScrapCodes Mar 30, 2014
95d7d2a
[SPARK-1354][SQL] Add tableName as a qualifier for SimpleCatelogy
jerryshao Mar 30, 2014
d666053
SPARK-1352 - Comment style single space before ending */ check.
ScrapCodes Mar 30, 2014
841721e
SPARK-1352: Improve robustness of spark-submit script
pwendell Mar 31, 2014
5731af5
[SQL] Rewrite join implementation to allow streaming of one relation.
marmbrus Mar 31, 2014
33b3c2a
SPARK-1365 [HOTFIX] Fix RateLimitedOutputStream test
pwendell Mar 31, 2014
93f1c69
Added network receiver information to the Streaming UI.
tdas Mar 31, 2014
564f1c1
SPARK-1376. In the yarn-cluster submitter, rename "args" option to "arg"
sryza Apr 1, 2014
94fe7fd
[SPARK-1377] Upgrade Jetty to 8.1.14v20131031
andrewor14 Apr 1, 2014
ada310a
[Hot Fix #42] Persisted RDD disappears on storage page if re-used
andrewor14 Apr 1, 2014
4d86e98
Added basic stats to the StreamingUI and refactored the UI to a Page …
tdas Apr 1, 2014
db27bad
Added last batch processing time to StreamingUI.
tdas Apr 1, 2014
f5c418d
[SQL] SPARK-1372 Support for caching and uncaching tables in a SQLCon…
marmbrus Apr 1, 2014
aef4dd5
Added Apache licenses.
tdas Apr 1, 2014
764353d
[SPARK-1342] Scala 2.10.4
markhamstra Apr 2, 2014
afb5ea6
[Spark-1134] only call ipython if no arguments are given; remove IPYT…
Apr 2, 2014
45df912
Revert "[Spark-1134] only call ipython if no arguments are given; rem…
mateiz Apr 2, 2014
8b3045c
MLI-1 Decision Trees
manishamde Apr 2, 2014
ea9de65
Remove * from test case golden filename.
marmbrus Apr 2, 2014
11973a7
Renamed stageIdToActiveJob to jobIdToActiveJob.
kayousterhout Apr 2, 2014
de8eefa
[SPARK-1385] Use existing code for JSON de/serialization of BlockId
andrewor14 Apr 2, 2014
7823633
Do not re-use objects in the EdgePartition/EdgeTriplet iterators.
darabos Apr 2, 2014
1faa579
[SPARK-1371][WIP] Compression support for Spark SQL in-memory columna…
liancheng Apr 2, 2014
ed730c9
StopAfter / TopK related changes
rxin Apr 2, 2014
9c65fa7
[SPARK-1212, Part II] Support sparse data in MLlib
mengxr Apr 2, 2014
7d57444
Refactoring the UI interface to add flexibility
andrewor14 Apr 2, 2014
47ebea5
[SQL] SPARK-1364 Improve datatype and test coverage for ScalaReflecti…
marmbrus Apr 3, 2014
cd000b0
Merge github.com:apache/spark into ui-refactor
andrewor14 Apr 3, 2014
a37ad4f
Comments, imports and formatting (minor)
andrewor14 Apr 3, 2014
ed25dfc
Generalize SparkUI header to display tabs dynamically
andrewor14 Apr 3, 2014
92a86b2
[SPARK-1398] Removed findbugs jsr305 dependency
markhamstra Apr 3, 2014
fbebaed
Spark parquet improvements
AndreSchumacher Apr 3, 2014
5d1feda
[SPARK-1360] Add Timestamp Support for SQL
chenghao-intel Apr 3, 2014
53be2c5
Minor style updates.
tdas Apr 3, 2014
61358e3
Merge remote-tracking branch 'apache-github/master' into streaming-we…
tdas Apr 3, 2014
c1ea3af
Spark 1162 Implemented takeOrdered in pyspark.
ScrapCodes Apr 3, 2014
b8f5341
[SQL] SPARK-1333 First draft of java API
marmbrus Apr 3, 2014
a599e43
[SPARK-1134] Fix and document passing of arguments to IPython
Apr 3, 2014
9a48fa1
Allow adding tabs to SparkUI dynamically + add example
andrewor14 Apr 3, 2014
0d61ee8
Merge branch 'streaming-web-ui' of github.com:tdas/spark into ui-refa…
andrewor14 Apr 3, 2014
d94826b
[BUILD FIX] Fix compilation of Spark SQL Java API.
marmbrus Apr 3, 2014
8f7323b
End of file new lines, indentation, and imports (minor)
andrewor14 Apr 3, 2014
9231b01
Fix jenkins from giving the green light to builds that don't compile.
marmbrus Apr 3, 2014
33e6361
Revert "[SPARK-1398] Removed findbugs jsr305 dependency"
pwendell Apr 4, 2014
ee6e9e7
SPARK-1337: Application web UI garbage collects newest stages
pwendell Apr 4, 2014
7f32fd4
SPARK-1350. Always use JAVA_HOME to run executor container JVMs.
sryza Apr 4, 2014
01cf4c4
SPARK-1404: Always upgrade spark-env.sh vars to environment vars
aarondav Apr 4, 2014
f1fa617
[SPARK-1133] Add whole text files reader in MLlib
yinxusen Apr 4, 2014
16b8308
SPARK-1375. Additional spark-submit cleanup
sryza Apr 4, 2014
a02b535
Don't create SparkContext in JobProgressListenerSuite.
pwendell Apr 4, 2014
198892f
[SPARK-1198] Allow pipes tasks to run in different sub-directories
tgravescs Apr 5, 2014
d956cc2
[SQL] Minor fixes.
marmbrus Apr 5, 2014
60e18ce
SPARK-1414. Python API for SparkContext.wholeTextFiles
mateiz Apr 5, 2014
5f3c1bb
Add test utility for generating Jar files with compiled classes.
pwendell Apr 5, 2014
1347ebd
[SPARK-1419] Bumped parent POM to apache 14
markhamstra Apr 5, 2014
b50ddfd
SPARK-1305: Support persisting RDD's directly to Tachyon
haoyuan Apr 5, 2014
8de038e
[SQL] SPARK-1366 Consistent sql function across different types of SQ…
marmbrus Apr 5, 2014
0acc7a0
small fix ( proogram -> program )
prabeesh Apr 5, 2014
7c18428
HOTFIX for broken CI, by SPARK-1336
ScrapCodes Apr 5, 2014
2d0150c
Remove the getStageInfo() method from SparkContext.
kayousterhout Apr 5, 2014
6e88583
[SPARK-1371] fix computePreferredLocations signature to not depend on…
Apr 5, 2014
890d63b
Fix for PR #195 for Java 6
srowen Apr 6, 2014
0b85516
SPARK-1421. Make MLlib work on Python 2.6
mateiz Apr 6, 2014
7012ffa
Fix SPARK-1420 The maven build error for Spark Catalyst
witgo Apr 6, 2014
e258e50
[SPARK-1259] Make RDD locally iterable
epahomov Apr 6, 2014
856c50f
SPARK-1387. Update build plugins, avoid plugin version warning, centr…
srowen Apr 7, 2014
7ce52c4
SPARK-1349: spark-shell gets its own command history
aarondav Apr 7, 2014
4106558
SPARK-1314: Use SPARK_HIVE to determine if we include Hive in packaging
aarondav Apr 7, 2014
1440154
SPARK-1154: Clean up app folders in worker nodes
Apr 7, 2014
87d0928
SPARK-1431: Allow merging conflicting pull requests
pwendell Apr 7, 2014
accd099
[SQL] SPARK-1371 Hash Aggregation Improvements
marmbrus Apr 7, 2014
b5bae84
[SQL] SPARK-1427 Fix toString for SchemaRDD NativeCommands.
marmbrus Apr 7, 2014
a3c51c6
SPARK-1432: Make sure that all metadata fields are properly cleaned
Apr 7, 2014
83f2a2f
[sql] Rename Expression.apply to eval for better readability.
rxin Apr 7, 2014
9dd8b91
SPARK-1252. On YARN, use container-log4j.properties for executors
sryza Apr 7, 2014
2a2ca48
HOTFIX: Disable actor input stream test.
pwendell Apr 7, 2014
0307db0
SPARK-1099: Introduce local[*] mode to infer number of cores
aarondav Apr 7, 2014
c78c92d
Remove outdated comment
andrewor14 Apr 7, 2014
14c9238
[sql] Rename execution/aggregates.scala Aggregate.scala, and added a …
rxin Apr 8, 2014
55dfd5d
Removed the default eval implementation from Expression, and added a …
rxin Apr 8, 2014
31e6fff
Added eval for Rand (without any support for user-defined seed).
rxin Apr 8, 2014
f27e56a
Change timestamp cast semantics. When cast to numeric types, return t…
rxin Apr 8, 2014
0d0493f
[SPARK-1402] Added 3 more compression schemes
liancheng Apr 8, 2014
11eabbe
[SPARK-1103] Automatic garbage collection of RDD, shuffle and broadca…
tdas Apr 8, 2014
83ac9a4
[SPARK-1331] Added graceful shutdown to Spark Streaming
tdas Apr 8, 2014
6dc5f58
[SPARK-1396] Properly cleanup DAGScheduler on job cancellation.
kayousterhout Apr 8, 2014
3bc0548
Remove extra semicolon in import statement and unused import in Appli…
hsaputra Apr 8, 2014
a8d86b0
SPARK-1348 binding Master, Worker, and App Web UI to all interfaces
kanzhang Apr 8, 2014
e25b593
SPARK-1445: compute-classpath should not print error if lib_managed n…
aarondav Apr 8, 2014
fac6085
[SPARK-1397] Notify SparkListeners when stages fail or are cancelled.
kayousterhout Apr 8, 2014
12c077d
SPARK-1433: Upgrade Mesos dependency to 0.17.0
techaddict Apr 8, 2014
ce8ec54
Spark 1271: Co-Group and Group-By should pass Iterable[X]
holdenk Apr 9, 2014
b9e0c93
[SPARK-1434] [MLLIB] change labelParser from anonymous function to trait
mengxr Apr 9, 2014
fa0524f
Spark-939: allow user jars to take precedence over spark jars
holdenk Apr 9, 2014
9689b66
[SPARK-1390] Refactoring of matrices backed by RDDs
mengxr Apr 9, 2014
87bd1f9
SPARK-1093: Annotate developer and experimental API's
pwendell Apr 9, 2014
bde9cc1
[SPARK-1357] [MLLIB] Annotate developer and experimental APIs
mengxr Apr 9, 2014
eb5f2b6
SPARK-1407 drain event queue before stopping event logger
kanzhang Apr 9, 2014
0adc932
[SPARK-1357 (fix)] remove empty line after :: DeveloperApi/Experiment…
mengxr Apr 10, 2014
8ca3b2b
SPARK-729: Closures not always serialized at capture time
willb Apr 10, 2014
3e986f8
Merge remote-tracking branch 'apache/master' into streaming-web-ui
tdas Apr 10, 2014
168fe86
Merge pull request #2 from andrewor14/ui-refactor
tdas Apr 10, 2014
827e81a
Merge branch 'streaming-web-ui' of github.com:tdas/spark into streami…
tdas Apr 10, 2014
1af239b
Changed streaming UI to attach itself as a tab with the Spark UI.
tdas Apr 10, 2014
1c0bcef
Refactored streaming UI into two files.
tdas Apr 10, 2014
fa760fe
Fixed long line.
tdas Apr 10, 2014
e55cc4b
SPARK-1446: Spark examples should not do a System.exit
techaddict Apr 10, 2014
e6d4a74
Revert "SPARK-729: Closures not always serialized at capture time"
pwendell Apr 10, 2014
a74fbbb
Fix SPARK-1413: Parquet messes up stdout and stdin when used in Spark…
witgo Apr 10, 2014
79820fe
[SPARK-1276] Add a HistoryServer to render persisted UI
andrewor14 Apr 10, 2014
3bd3129
SPARK-1428: MLlib should convert non-float64 NumPy arrays to float64 …
techaddict Apr 10, 2014
ee6543f
Minor changes based on Andrew's comments.
tdas Apr 10, 2014
6de06b0
Merge remote-tracking branch 'apache/master' into streaming-web-ui
tdas Apr 10, 2014
548c98c
Wide refactoring of WebUI, UITab, and UIPage (see commit message)
andrewor14 Apr 11, 2014
914b8ff
Moved utils functions to UIUtils.
tdas Apr 11, 2014
585cd65
Merge pull request #5 from andrewor14/ui-refactor
tdas Apr 11, 2014
caa5e05
Merge branch 'streaming-web-ui' of github.com:tdas/spark into streami…
tdas Apr 11, 2014
f8e1053
Added Spark and Streaming UI unit tests.
tdas Apr 11, 2014
aa396d4
Rename tabs and pages (No more IndexPage.scala)
andrewor14 Apr 11, 2014
2fc09c8
Added binary check exclusions
tdas Apr 11, 2014
72fe256
Merge pull request #6 from andrewor14/ui-refactor
tdas Apr 11, 2014
89dae36
Merge branch 'streaming-web-ui' of github.com:tdas/spark into streami…
tdas Apr 11, 2014
90feb8d
Address Patrick's comments
andrewor14 Apr 11, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
sbt/*.jar
.settings
.cache
.mima-excludes
/build/
work/
out/
Expand Down Expand Up @@ -45,3 +46,5 @@ dist/
spark-*-bin.tar.gz
unit-tests.log
/lib/
rat-results.txt
scalastyle.txt
42 changes: 42 additions & 0 deletions .rat-excludes
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
target
.gitignore
.project
.classpath
.mima-excludes
.rat-excludes
.*md
derby.log
TAGS
RELEASE
control
docs
fairscheduler.xml.template
log4j.properties
log4j.properties.template
metrics.properties.template
slaves
spark-env.sh
spark-env.sh.template
log4j-defaults.properties
sorttable.js
.*txt
.*data
.*log
cloudpickle.py
join.py
SparkExprTyper.scala
SparkILoop.scala
SparkILoopInit.scala
SparkIMain.scala
SparkImports.scala
SparkJLineCompletion.scala
SparkJLineReader.scala
SparkMemberHandlers.scala
sbt
sbt-launch-lib.bash
plugins.sbt
work
.*\.q
golden
test.out/*
.*iml
32 changes: 32 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

language: scala
scala:
- "2.10.3"
jdk:
- oraclejdk7
env:
matrix:
- TEST="scalastyle assembly/assembly"
- TEST="catalyst/test sql/test streaming/test mllib/test graphx/test bagel/test"
- TEST=hive/test
cache:
directories:
- $HOME/.m2
- $HOME/.ivy2
- $HOME/.sbt
script:
- "sbt ++$TRAVIS_SCALA_VERSION $TEST"
11 changes: 10 additions & 1 deletion NOTICE
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
Apache Spark
Copyright 2013 The Apache Software Foundation.
Copyright 2014 The Apache Software Foundation.

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

In addition, this product includes:

- JUnit (http://www.junit.org) is a testing framework for Java. We included it
under the terms of the Eclipse Public License v1.0.

- JTransforms (https://sites.google.com/site/piotrwendykier/software/jtransforms)
provides fast transforms in Java. It is tri-licensed, and we included it under
the terms of the Mozilla Public License v1.1.
27 changes: 26 additions & 1 deletion assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,11 @@
<artifactId>spark-graphx_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>net.sf.py4j</groupId>
<artifactId>py4j</artifactId>
Expand Down Expand Up @@ -158,6 +163,26 @@
</dependency>
</dependencies>
</profile>
<profile>
<id>hive</id>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
<profile>
<id>spark-ganglia-lgpl</id>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-ganglia-lgpl_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
<profile>
<id>bigtop-dist</id>
<!-- This profile uses the assembly plugin to create a special "dist" package for BigTop
Expand Down Expand Up @@ -193,7 +218,7 @@
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>buildnumber-maven-plugin</artifactId>
<version>1.1</version>
<version>1.2</version>
<executions>
<execution>
<phase>validate</phase>
Expand Down
20 changes: 12 additions & 8 deletions bagel/src/main/scala/org/apache/spark/bagel/Bagel.scala
Original file line number Diff line number Diff line change
Expand Up @@ -220,27 +220,31 @@ object Bagel extends Logging {
*/
private def comp[K: Manifest, V <: Vertex, M <: Message[K], C](
sc: SparkContext,
grouped: RDD[(K, (Seq[C], Seq[V]))],
grouped: RDD[(K, (Iterable[C], Iterable[V]))],
compute: (V, Option[C]) => (V, Array[M]),
storageLevel: StorageLevel
): (RDD[(K, (V, Array[M]))], Int, Int) = {
var numMsgs = sc.accumulator(0)
var numActiveVerts = sc.accumulator(0)
val processed = grouped.flatMapValues {
case (_, vs) if vs.size == 0 => None
case (c, vs) =>
val processed = grouped.mapValues(x => (x._1.iterator, x._2.iterator))
.flatMapValues {
case (_, vs) if !vs.hasNext => None
case (c, vs) => {
val (newVert, newMsgs) =
compute(vs(0), c match {
case Seq(comb) => Some(comb)
case Seq() => None
})
compute(vs.next,
c.hasNext match {
case true => Some(c.next)
case false => None
}
)

numMsgs += newMsgs.size
if (newVert.active) {
numActiveVerts += 1
}

Some((newVert, newMsgs))
}
}.persist(storageLevel)

// Force evaluation of processed RDD for accurate performance measurements
Expand Down
39 changes: 31 additions & 8 deletions bin/compute-classpath.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,35 +25,55 @@ SCALA_VERSION=2.10
# Figure out where Spark is installed
FWDIR="$(cd `dirname $0`/..; pwd)"

# Load environment variables from conf/spark-env.sh, if it exists
if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
. $FWDIR/conf/spark-env.sh
fi
. $FWDIR/bin/load-spark-env.sh

# Build up classpath
CLASSPATH="$SPARK_CLASSPATH:$FWDIR/conf"

ASSEMBLY_DIR="$FWDIR/assembly/target/scala-$SCALA_VERSION"

# First check if we have a dependencies jar. If so, include binary classes with the deps jar
if [ -f "$FWDIR"/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar ]; then
if [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar ]; then
CLASSPATH="$CLASSPATH:$FWDIR/core/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/repl/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/mllib/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/bagel/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/graphx/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/streaming/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/tools/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/catalyst/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/classes"

DEPS_ASSEMBLY_JAR=`ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar`
DEPS_ASSEMBLY_JAR=`ls "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar`
CLASSPATH="$CLASSPATH:$DEPS_ASSEMBLY_JAR"
else
# Else use spark-assembly jar from either RELEASE or assembly directory
if [ -f "$FWDIR/RELEASE" ]; then
ASSEMBLY_JAR=`ls "$FWDIR"/jars/spark-assembly*.jar`
ASSEMBLY_JAR=`ls "$FWDIR"/jars/spark*-assembly*.jar`
else
ASSEMBLY_JAR=`ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*.jar`
ASSEMBLY_JAR=`ls "$ASSEMBLY_DIR"/spark*-assembly*hadoop*.jar`
fi
CLASSPATH="$CLASSPATH:$ASSEMBLY_JAR"
fi

# When Hive support is needed, Datanucleus jars must be included on the classpath.
# Datanucleus jars do not work if only included in the uber jar as plugin.xml metadata is lost.
# Both sbt and maven will populate "lib_managed/jars/" with the datanucleus jars when Spark is
# built with Hive, so first check if the datanucleus jars exist, and then ensure the current Spark
# assembly is built for Hive, before actually populating the CLASSPATH with the jars.
# Note that this check order is faster (by up to half a second) in the case where Hive is not used.
num_datanucleus_jars=$(ls "$FWDIR"/lib_managed/jars/ 2>/dev/null | grep "datanucleus-.*\\.jar" | wc -l)
if [ $num_datanucleus_jars -gt 0 ]; then
AN_ASSEMBLY_JAR=${ASSEMBLY_JAR:-$DEPS_ASSEMBLY_JAR}
num_hive_files=$(jar tvf "$AN_ASSEMBLY_JAR" org/apache/hadoop/hive/ql/exec 2>/dev/null | wc -l)
if [ $num_hive_files -gt 0 ]; then
echo "Spark assembly has been built with Hive, including Datanucleus jars on classpath" 1>&2
DATANUCLEUSJARS=$(echo "$FWDIR/lib_managed/jars"/datanucleus-*.jar | tr " " :)
CLASSPATH=$CLASSPATH:$DATANUCLEUSJARS
fi
fi

# Add test classes if we're running from SBT or Maven with SPARK_TESTING set to 1
if [[ $SPARK_TESTING == 1 ]]; then
CLASSPATH="$CLASSPATH:$FWDIR/core/target/scala-$SCALA_VERSION/test-classes"
Expand All @@ -62,6 +82,9 @@ if [[ $SPARK_TESTING == 1 ]]; then
CLASSPATH="$CLASSPATH:$FWDIR/bagel/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/graphx/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/streaming/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/catalyst/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/test-classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/test-classes"
fi

# Add hadoop conf dir if given -- otherwise FileSystem.*, etc fail !
Expand Down
38 changes: 38 additions & 0 deletions bin/load-spark-env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# This script loads spark-env.sh if it exists, and ensures it is only loaded once.
# spark-env.sh is loaded from SPARK_CONF_DIR if set, or within the current directory's
# conf/ subdirectory.

if [ -z "$SPARK_ENV_LOADED" ]; then
export SPARK_ENV_LOADED=1

# Returns the parent of the directory this script lives in.
parent_dir="$(cd `dirname $0`/..; pwd)"

use_conf_dir=${SPARK_CONF_DIR:-"$parent_dir/conf"}

if [ -f "${use_conf_dir}/spark-env.sh" ]; then
# Promote all variable declarations to environment (exported) variables
set -a
. "${use_conf_dir}/spark-env.sh"
set +a
fi
fi
8 changes: 3 additions & 5 deletions bin/pyspark
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,7 @@ if [ ! -f "$FWDIR/RELEASE" ]; then
fi
fi

# Load environment variables from conf/spark-env.sh, if it exists
if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
. $FWDIR/conf/spark-env.sh
fi
. $FWDIR/bin/load-spark-env.sh

# Figure out which Python executable to use
if [ -z "$PYSPARK_PYTHON" ] ; then
Expand All @@ -58,7 +55,8 @@ if [ -n "$IPYTHON_OPTS" ]; then
IPYTHON=1
fi

if [[ "$IPYTHON" = "1" ]] ; then
# Only use ipython if no command line arguments were provided [SPARK-1134]
if [[ "$IPYTHON" = "1" && $# = 0 ]] ; then
exec ipython $IPYTHON_OPTS
else
exec "$PYSPARK_PYTHON" "$@"
Expand Down
5 changes: 1 addition & 4 deletions bin/run-example
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,7 @@ FWDIR="$(cd `dirname $0`/..; pwd)"
# Export this as SPARK_HOME
export SPARK_HOME="$FWDIR"

# Load environment variables from conf/spark-env.sh, if it exists
if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
. $FWDIR/conf/spark-env.sh
fi
. $FWDIR/bin/load-spark-env.sh

if [ -z "$1" ]; then
echo "Usage: run-example <example-class> [<args>]" >&2
Expand Down
18 changes: 8 additions & 10 deletions bin/spark-class
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,7 @@ FWDIR="$(cd `dirname $0`/..; pwd)"
# Export this as SPARK_HOME
export SPARK_HOME="$FWDIR"

# Load environment variables from conf/spark-env.sh, if it exists
if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
. $FWDIR/conf/spark-env.sh
fi
. $FWDIR/bin/load-spark-env.sh

if [ -z "$1" ]; then
echo "Usage: spark-class <class> [<args>]" >&2
Expand All @@ -50,9 +47,9 @@ DEFAULT_MEM=${SPARK_MEM:-512m}

SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.akka.logLifecycleEvents=true"

# Add java opts and memory settings for master, worker, executors, and repl.
# Add java opts and memory settings for master, worker, history server, executors, and repl.
case "$1" in
# Master and Worker use SPARK_DAEMON_JAVA_OPTS (and specific opts) + SPARK_DAEMON_MEMORY.
# Master, Worker, and HistoryServer use SPARK_DAEMON_JAVA_OPTS (and specific opts) + SPARK_DAEMON_MEMORY.
'org.apache.spark.deploy.master.Master')
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS $SPARK_MASTER_OPTS"
OUR_JAVA_MEM=${SPARK_DAEMON_MEMORY:-$DEFAULT_MEM}
Expand All @@ -61,6 +58,10 @@ case "$1" in
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS $SPARK_WORKER_OPTS"
OUR_JAVA_MEM=${SPARK_DAEMON_MEMORY:-$DEFAULT_MEM}
;;
'org.apache.spark.deploy.history.HistoryServer')
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS $SPARK_HISTORY_OPTS"
OUR_JAVA_MEM=${SPARK_DAEMON_MEMORY:-$DEFAULT_MEM}
;;

# Executors use SPARK_JAVA_OPTS + SPARK_EXECUTOR_MEMORY.
'org.apache.spark.executor.CoarseGrainedExecutorBackend')
Expand Down Expand Up @@ -137,8 +138,7 @@ fi

# Compute classpath using external script
CLASSPATH=`$FWDIR/bin/compute-classpath.sh`

if [ "$1" == "org.apache.spark.tools.JavaAPICompletenessChecker" ]; then
if [[ "$1" =~ org.apache.spark.tools.* ]]; then
CLASSPATH="$CLASSPATH:$SPARK_TOOLS_JAR"
fi

Expand All @@ -158,5 +158,3 @@ if [ "$SPARK_PRINT_LAUNCH_COMMAND" == "1" ]; then
fi

exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"


Loading