Skip to content

[BUG] RapidsShuffleManager didn't pass dirs to getBlockData from a wrapped ShuffleBlockResolver #2001

@abellina

Description

@abellina

ShuffleBlockResolver.getBlockData has the following signature:

def getBlockData(blockId: BlockId, dirs: Option[Array[String]] = None): ManagedBuffer

The GpuShuffleBlockResolver.getBlockData wrapper was not passing the dirs optional to the underlying. When local shuffle reads became available with a disabled shuffle service in Spark 3.1.x (apache/spark#28911), this bug caused a non-GPU shuffle (aka a regular Exchange) that wanted a local block to attempt an invalid directory for the peer's blockmgr path.

This manifests in the following exception:

java.nio.file.NoSuchFileException: spark-1ded244a-3067-4806-813b-289f63b661b7/executor-b0f161a5-ea01-4373-ba5f-a8966cb0b7ea/blockmgr-36b0f938-a1a0-4ef5-9a6e-b4f5658b0210/01/shuffle_4_407_0.index 

The fix is to pass dirs to the underlying getBlockData.

An ugly work around if the RapidsShuffleManager is enabled is:

 --conf spark.shuffle.readHostLocalDisk=false

Metadata

Metadata

Assignees

Labels

P0Must have for releaseSpark 3.1+Bugs only related to Spark 3.1 or higherbugSomething isn't workingshufflethings that impact the shuffle plugin

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions