Skip to content

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented Dec 2, 2014

This is currently a work in progress to experiment with various options for parallelizing our Scala/Java tests using sbt.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24061 has finished for PR 3564 at commit c50ec06.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24062 has finished for PR 3564 at commit 5032cfa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24063 has finished for PR 3564 at commit 9d39f22.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24066 timed out for PR 3564 at commit 0a099c1 after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24078 timed out for PR 3564 at commit 0709468 after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Dec 4, 2014

Test build #24145 has finished for PR 3564 at commit ef705a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 4, 2014

Test build #24146 has finished for PR 3564 at commit bf1d46f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 4, 2014

Test build #24147 has finished for PR 3564 at commit ab127b7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

nchammas commented Dec 6, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 6, 2014

Test build #24203 has finished for PR 3564 at commit ab127b7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

nchammas commented Dec 6, 2014

Hmm, this error from the latest test is interesting:

[info] - Read with RegexSerDe *** FAILED *** (2 seconds, 339 milliseconds)
[info]   Failed to generate golden answer for query:
[info]   Error: src/test/resources/golden/Read with RegexSerDe-0-9b96fab8d55a0e19fae00d8adb57ffaa (No such file or directory)
[info]   java.io.FileNotFoundException: src/test/resources/golden/Read with RegexSerDe-0-9b96fab8d55a0e19fae00d8adb57ffaa (No such file or directory)

When I comment out the line grouping tests everything runs fine.

Are the forked JVMs somehow not picking up paths or whatnot correctly?

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24271 has finished for PR 3564 at commit ac73262.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24289 has finished for PR 3564 at commit 2f24a84.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24300 timed out for PR 3564 at commit b583f81 after a configured wait of 120m.

@JoshRosen
Copy link
Contributor

I tried running sbt/sbt core/test with this PR but noticed some weird output interleaving from multiple test suites:

[info] HashShuffleSuite:
[info] ThreadingSuite:
[info] ProactiveClosureSerializationSuite:
[info] ShuffleNettySuite:
[info] ExecutorAllocationManagerSuite:
[info] FailureSuite:
[info] FileSuite:
[info] SparkContextSuite:
[info] - accessing SparkContext form a different thread (9 seconds, 997 milliseconds)
[info] - throws expected serialization exceptions on actions (158 milliseconds)
[info] - mapPartitions transformations throw proactive serialization exceptions (29 milliseconds)
[info] - map transformations throw proactive serialization exceptions (57 milliseconds)
[info] - filter transformations throw proactive serialization exceptions (29 milliseconds)
[info] - flatMap transformations throw proactive serialization exceptions (58 milliseconds)
[info] - groupByKey without compression (10 seconds, 994 milliseconds)
[info] - mapPartitionsWithIndex transformations throw proactive serialization exceptions (165 milliseconds)
[info] - text files (9 seconds, 609 milliseconds)
[info] KryoSerializerDistributedSuite:
[info] - failure in a single-stage job (11 seconds, 604 milliseconds)
[info] - groupByKey without compression (12 seconds, 363 milliseconds)
[info] - accessing SparkContext form multiple threads (2 seconds, 19 milliseconds)
[info] - verify min/max executors (16 seconds, 63 milliseconds)
[info] - text files (compressed) (3 seconds, 731 milliseconds)
[info] - failure in a two-stage job (3 seconds, 84 milliseconds)
[info] - starting state (520 milliseconds)
[info] - Only one SparkContext may be active at a time (14 seconds, 482 milliseconds)
[info] - accessing multi-threaded SparkContext form multiple threads (2 seconds, 198 milliseconds)
[info] - SequenceFiles (724 milliseconds)
[info] - add executors (541 milliseconds)
[info] - Can still construct a new SparkContext after failing to construct a previous one (583 milliseconds)
[info] - parallel job execution (874 milliseconds)
[info] - failure in a map stage (1 second, 388 milliseconds)
[info] - SequenceFile (compressed) (772 milliseconds)
[info] - add executors capped by num pending tasks (846 milliseconds)
[info] - set local properties in different thread (822 milliseconds)
[info] - failure because task results are not serializable (1 second, 392 milliseconds)
[info] - SequenceFile with writable key (1 second, 143 milliseconds)
[info] - Check for multiple SparkContexts can be disabled via undocumented debug option (2 seconds, 587 milliseconds)
[info] - set and get local properties in parent-children thread (606 milliseconds)
[info] - remove executors (1 second, 202 milliseconds)
[info] - BytesWritable implicit conversion is correct (66 milliseconds)
[info] - failure because task closure is not serializable (2 seconds, 67 milliseconds)
[info] - interleaving add and remove (860 milliseconds)
[info] - SequenceFile with writable value (1 second, 489 milliseconds)
[info] - starting/canceling add timer (483 milliseconds)
[info] - SequenceFile with writable key and value (667 milliseconds)
[info] - starting/canceling remove timers (850 milliseconds)
[info] - implicit conversions in reading SequenceFiles (1 second, 191 milliseconds)
[...]

You also see this in Jenkins. This was the sort of output-interleaving problem that I mentioned on the JIRA. I wish that there was a way to work around this, since this can make it really hard to debug failures.

@JoshRosen
Copy link
Contributor

Actually, maybe the interleaving isn't a big deal as long as we can rely on the test report XML to be properly displayed in Jenkins.

@nchammas
Copy link
Contributor Author

Yeah, I'm ignoring the problem of interleaved output for now since I'm hitting two more critical problems first: 1) the tests aren't running successfully, or 2) the tests aren't running faster than before.

That said, I'm slowing building a better mental model of how sbt works and how it can be configured. Unfortunately, there aren't many examples out there of how to do this stuff, so I often end up trawling through posts like this one and this one to see how others have parallelized their tests.

@SparkQA
Copy link

SparkQA commented Dec 12, 2014

Test build #24395 has finished for PR 3564 at commit f6bd3ed.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 12, 2014

Test build #24398 has finished for PR 3564 at commit 9a22bd4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 12, 2014

Test build #24400 has finished for PR 3564 at commit 815c377.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 12, 2014

Test build #24401 has finished for PR 3564 at commit 815c377.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 12, 2014

Test build #24402 has finished for PR 3564 at commit 1c8081a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 12, 2014

Test build #24404 has finished for PR 3564 at commit 1c8081a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas nchammas changed the title [SPARK-3431] [WIP] Parallelize test execution [SPARK-3431] [WIP] Parallelize Scala/Java test execution Dec 21, 2014
@nchammas
Copy link
Contributor Author

Hmm, taking a second look at how the Python tests are invoked, I wonder if we can use nose. Maybe GNU parallel is the way to go.

@nchammas
Copy link
Contributor Author

By the way, I opened a question on Stack Overflow about some kind of "show execution plan" feature in sbt. It would make understanding what sbt is doing easier as we refine the build configuration.

@SparkQA
Copy link

SparkQA commented Dec 29, 2014

Test build #24852 has finished for PR 3564 at commit 00e2b93.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25115 has finished for PR 3564 at commit 00e2b93.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

We've been slowly making progress on the streaming test refactoring and it looks like we're down to only three failing tests on this branch:

 org.apache.spark.network.nio.ConnectionManagerSuite.security auth off
 org.apache.spark.streaming.CheckpointSuite.recovery with saveAsHadoopFiles operation
 org.apache.spark.streaming.CheckpointSuite.recovery with saveAsHadoopFile inside transform operation

I should be able to address the saveAsHadoopFile* ones in another followup PR. So, we're still a little far away from being able to merge this, but just wanted to give an update to say that we're making steady progress.

@nchammas
Copy link
Contributor Author

nchammas commented Jan 7, 2015

Sounds good. Thanks for following up; I haven't had a chance to look over this PR in a while.

(Btw, I rebased for kicks. You can disregard the coming test run.)

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25136 has finished for PR 3564 at commit f697a55.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 17, 2015

Test build #25706 has finished for PR 3564 at commit f697a55.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25870 has finished for PR 3564 at commit f697a55.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

Not sure what to make of this @JoshRosen, but there are several more test failures in the most recent run than the 3 we thought we were down to.

For example, why would this DriverSuite test fail?

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 23, 2015

Test build #26005 has finished for PR 3564 at commit f697a55.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26430 has finished for PR 3564 at commit 5ef856d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor Author

nchammas commented Feb 8, 2015

Jenkinmensch, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 8, 2015

Test build #27031 timed out for PR 3564 at commit 5ef856d after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Feb 25, 2015

Test build #27961 has finished for PR 3564 at commit 5f7683b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pwendell
Copy link
Contributor

pwendell commented Jun 4, 2015

I'd like to close this issue for now pending further development.

@nchammas nchammas closed this Jun 4, 2015
@nchammas nchammas deleted the parallel-test branch December 4, 2019 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants