Skip to content

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Jun 30, 2017

What changes were proposed in this pull request?

Spark cluster can reproduce, local can't:

  1. Start a spark context with spark.reducer.maxReqSizeShuffleToMem=1K and spark.serializer=org.apache.spark.serializer.KryoSerializer:
$ spark-shell --conf spark.reducer.maxReqSizeShuffleToMem=1K --conf spark.serializer=org.apache.spark.serializer.KryoSerializer
  1. A shuffle:
scala> sc.parallelize(0 until 3000000, 10).repartition(2001).count()

The error messages:

org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
        at org.apache.spark.MapOutputTracker$$anonfun$convertMapStatuses$2.apply(MapOutputTracker.scala:808)
        at org.apache.spark.MapOutputTracker$$anonfun$convertMapStatuses$2.apply(MapOutputTracker.scala:804)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
        at org.apache.spark.MapOutputTracker$.convertMapStatuses(MapOutputTracker.scala:804)
        at org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:618)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:49)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
        at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1802)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1159)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1159)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2065)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2065)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:341)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

)
...

This PR fix this issue.

How was this patch tested?

Add later

@SparkQA
Copy link

SparkQA commented Jun 30, 2017

Test build #78987 has finished for PR 18490 at commit 1c837b6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Jun 30, 2017

I don't get it. Could you point out which place serializes MapStatus using Kyro? NVM

@zsxwing
Copy link
Member

zsxwing commented Jun 30, 2017

I cannot reproduce this issue. Could you provide a unit test to reproduce this?

Anyway, I suggest using kryo.register(classOf[HighlyCompressedMapStatus], new KryoJavaSerializer()) to force Kryo using Java serialization so that we don't need to worry about if the Kryo default serializer works or not.

@asfgit asfgit closed this in 3a45c7f Aug 5, 2017
@wangyum wangyum deleted the missing-output-location branch September 11, 2017 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants