Skip to content

Commit 271b338

Browse files
committed
[SPARK-38631][CORE] Uses Java-based implementation for un-tarring at Utils.unpack
### What changes were proposed in this pull request? This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264. The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter). ### Why are the changes needed? There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904 ### Does this PR introduce _any_ user-facing change? Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command. ### How was this patch tested? Manually tested in local, and existing test cases should cover. Closes #35946 from HyukjinKwon/SPARK-38631. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 057c051) Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent c0d7a45 commit 271b338

File tree

1 file changed

+28
-2
lines changed

1 file changed

+28
-2
lines changed

core/src/main/scala/org/apache/spark/util/Utils.scala

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -592,16 +592,42 @@ private[spark] object Utils extends Logging {
592592
if (lowerSrc.endsWith(".jar")) {
593593
RunJar.unJar(source, dest, RunJar.MATCH_ANY)
594594
} else if (lowerSrc.endsWith(".zip")) {
595+
// TODO(SPARK-37677): should keep file permissions. Java implementation doesn't.
595596
FileUtil.unZip(source, dest)
596-
} else if (
597-
lowerSrc.endsWith(".tar.gz") || lowerSrc.endsWith(".tgz") || lowerSrc.endsWith(".tar")) {
597+
} else if (lowerSrc.endsWith(".tar.gz") || lowerSrc.endsWith(".tgz")) {
598598
FileUtil.unTar(source, dest)
599+
} else if (lowerSrc.endsWith(".tar")) {
600+
// TODO(SPARK-38632): should keep file permissions. Java implementation doesn't.
601+
unTarUsingJava(source, dest)
599602
} else {
600603
logWarning(s"Cannot unpack $source, just copying it to $dest.")
601604
copyRecursive(source, dest)
602605
}
603606
}
604607

608+
/**
609+
* The method below was copied from `FileUtil.unTar` but uses Java-based implementation
610+
* to work around a security issue, see also SPARK-38631.
611+
*/
612+
private def unTarUsingJava(source: File, dest: File): Unit = {
613+
if (!dest.mkdirs && !dest.isDirectory) {
614+
throw new IOException(s"Mkdirs failed to create $dest")
615+
} else {
616+
try {
617+
// Should not fail because all Hadoop 2.1+ (from HADOOP-9264)
618+
// have 'unTarUsingJava'.
619+
val mth = classOf[FileUtil].getDeclaredMethod(
620+
"unTarUsingJava", classOf[File], classOf[File], classOf[Boolean])
621+
mth.setAccessible(true)
622+
mth.invoke(null, source, dest, java.lang.Boolean.FALSE)
623+
} catch {
624+
// Re-throw the original exception.
625+
case e: java.lang.reflect.InvocationTargetException if e.getCause != null =>
626+
throw e.getCause
627+
}
628+
}
629+
}
630+
605631
/** Records the duration of running `body`. */
606632
def timeTakenMs[T](body: => T): (T, Long) = {
607633
val startTime = System.nanoTime()

0 commit comments

Comments
 (0)