You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-45678][CORE] Cover BufferReleasingInputStream.available/reset under tryOrFetchFailedException
### What changes were proposed in this pull request?
This patch proposes to wrap `BufferReleasingInputStream.available/reset` under `tryOrFetchFailedException`. So `IOException` during `available`/`reset` call will be rethrown as `FetchFailedException`.
### Why are the changes needed?
We have encountered shuffle data corruption issue:
```
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:112)
at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:504)
at org.xerial.snappy.Snappy.uncompress(Snappy.java:543)
at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:450)
at org.xerial.snappy.SnappyInputStream.available(SnappyInputStream.java:497)
at org.apache.spark.storage.BufferReleasingInputStream.available(ShuffleBlockFetcherIterator.scala:1356)
```
Spark shuffle has capacity to detect corruption for a few stream op like `read` and `skip`, such `IOException` in the stack trace will be rethrown as `FetchFailedException` that will re-try the failed shuffle task. But in the stack trace it is `available` that is not covered by the mechanism. So no-retry has been happened and the Spark application just failed.
As the `available`/`reset` op will also involve data decompression and throw `IOException`, we should be able to check it like `read` and `skip` do.
### Does this PR introduce _any_ user-facing change?
Yes. Data corruption during `available`/`reset` op is now causing `FetchFailedException` like `read` and `skip` that can be retried instead of `IOException`.
### How was this patch tested?
Added test.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes#43543 from viirya/add_available.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Chao Sun <[email protected]>
0 commit comments