-
Notifications
You must be signed in to change notification settings - Fork 261
Closed
Labels
P0Must have for releaseMust have for releaseSpark 3.1+Bugs only related to Spark 3.1 or higherBugs only related to Spark 3.1 or higherbugSomething isn't workingSomething isn't workingshufflethings that impact the shuffle pluginthings that impact the shuffle plugin
Milestone
Description
ShuffleBlockResolver.getBlockData has the following signature:
def getBlockData(blockId: BlockId, dirs: Option[Array[String]] = None): ManagedBuffer
The GpuShuffleBlockResolver.getBlockData wrapper was not passing the dirs optional to the underlying. When local shuffle reads became available with a disabled shuffle service in Spark 3.1.x (apache/spark#28911), this bug caused a non-GPU shuffle (aka a regular Exchange) that wanted a local block to attempt an invalid directory for the peer's blockmgr path.
This manifests in the following exception:
java.nio.file.NoSuchFileException: spark-1ded244a-3067-4806-813b-289f63b661b7/executor-b0f161a5-ea01-4373-ba5f-a8966cb0b7ea/blockmgr-36b0f938-a1a0-4ef5-9a6e-b4f5658b0210/01/shuffle_4_407_0.index
The fix is to pass dirs to the underlying getBlockData.
An ugly work around if the RapidsShuffleManager is enabled is:
--conf spark.shuffle.readHostLocalDisk=false
Metadata
Metadata
Assignees
Labels
P0Must have for releaseMust have for releaseSpark 3.1+Bugs only related to Spark 3.1 or higherBugs only related to Spark 3.1 or higherbugSomething isn't workingSomething isn't workingshufflethings that impact the shuffle pluginthings that impact the shuffle plugin