-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Commit b5e345c
[SPARK-49533][CORE][TESTS] Change default
### What changes were proposed in this pull request?
This pull request introduces changes to the default value of the `ivySettings` parameter in the `IvyTestUtils#withRepository` function. During the construction of the `IvySettings` object, the configurations of `DefaultIvyUserDir` and `DefaultCache` within the instance are modified through an additional call to the `MavenUtils.processIvyPathArg` function:
1. The `DefaultIvyUserDir` is set to `${user.home}/.ivy2.5.2`.
2. The `DefaultCache` is set to the `cache` directory under the modified `IvyUserDir`. By default, the `cache` directory is `${user.home}/.ivy2/cache`.
These alterations are made to address a Badcase in the testing process.
Additionally, to allow `IvyTestUtils` to invoke the `MavenUtils.processIvyPathArg` function, the visibility of the `processIvyPathArg` function has been adjusted from `private` to `private[util]`.
### Why are the changes needed?
To fix a Badcase in the testing, the reproduction steps are as follows:
1. Clean up files and directories related to `mylib-0.1.jar` under `~/.ivy2.5.2`
2. Execute the following tests using Java 21:
```
java -version
openjdk version "21.0.4" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu21.36+17-CA (build 21.0.4+7-LTS)
OpenJDK 64-Bit Server VM Zulu21.36+17-CA (build 21.0.4+7-LTS, mixed mode, sharing)
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive
```
```
Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false
file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/ added as a remote repository with the name: repo-1
:: loading settings :: url = jar:file:/Users/yangjie01/Library/Caches/Coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/org/apache/ivy/ivy/2.5.2/ivy-2.5.2.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/yangjie01/.ivy2.5.2/cache
The jars for the packages stored in: /Users/yangjie01/.ivy2.5.2/jars
my.great.lib#mylib added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4;1.0
confs: [default]
found my.great.lib#mylib;0.1 in repo-1
downloading file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-2a9107ea-4e09-4dfe-a270-921d799837fb/my/great/lib/mylib/0.1/mylib-0.1.jar ...
[SUCCESSFUL ] my.great.lib#mylib;0.1!mylib.jar (1ms)
:: resolution report :: resolve 4325ms :: artifacts dl 2ms
:: modules in use:
my.great.lib#mylib;0.1 from repo-1 in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 1 | 1 | 0 || 1 | 1 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-5827ff8a-7a85-4598-8ced-e949457752e4
confs: [default]
1 artifacts copied, 0 already retrieved (0kB/6ms)
Deleting /Users/yangjie01/.ivy2/cache/my.great.lib, exists: false
[info] - External JAR (6 seconds, 288 milliseconds)
...
[info] Run completed in 40 seconds, 441 milliseconds.
[info] Total number of tests run: 26
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 26, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
3. Re-execute the above tests using Java 17:
```
java -version
openjdk version "17.0.12" 2024-07-16 LTS
OpenJDK Runtime Environment Zulu17.52+17-CA (build 17.0.12+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.52+17-CA (build 17.0.12+7-LTS, mixed mode, sharing)
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.application.ReplE2ESuite" -Phive
```
```
[info] - External JAR *** FAILED *** (1 second, 626 milliseconds)
[info] isContain was false Ammonite output did not contain 'Array[Int] = Array(1, 2, 3, 4, 5)':
[info] scala>
[info] scala> // this import will fail
[info] scala> import my.great.lib.MyLib
[info] scala>
[info] scala> // making library available in the REPL to compile UDF
[info] scala> import coursierapi.{Credentials, MavenRepository}
import coursierapi.{Credentials, MavenRepository}
[info]
[info] scala> interp.repositories() ++= Seq(MavenRepository.of("file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/"))
[info]
[info] scala> import $ivy.`my.great.lib:mylib:0.1`
import $ivy.$
[info]
[info] scala>
[info] scala> val func = udf((a: Int) => {
[info] import my.great.lib.MyLib
[info] MyLib.myFunc(a)
[info] })
func: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction(
[info] f = ammonite.$sess.cmd28$Helper$$Lambda$3059/0x0000000801da4218721b2487,
[info] dataType = IntegerType,
[info] inputEncoders = ArraySeq(Some(value = PrimitiveIntEncoder)),
[info] outputEncoder = Some(value = BoxedIntEncoder),
[info] givenName = None,
[info] nullable = true,
[info] deterministic = true
[info] )
[info]
[info] scala>
[info] scala> // add library to the Executor
[info] scala> spark.addArtifact("ivy://my.great.lib:mylib:0.1?repos=file:/Users/yangjie01/SourceCode/git/spark-sbt/target/tmp/spark-6e6bc234-758f-44f1-a8b3-fbb79ed74647/")
[info]
[info] scala>
[info] scala> spark.range(5).select(func(col("id"))).as[Int].collect()
[info] scala>
[info] scala> semaphore.release()
[info] Error Output: Compiling (synthetic)/ammonite/predef/ArgsPredef.sc
[info] Compiling /Users/yangjie01/SourceCode/git/spark-sbt/connector/connect/client/jvm/(console)
[info] cmd25.sc:1: not found: value my
[info] import my.great.lib.MyLib
[info] ^
[info] Compilation Failed
[info] org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] User defined function (` (cmd28$Helper$$Lambda$3054/0x0000007002189800)`: (int) => int) failed due to: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0. SQLSTATE: 39000
[info] org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:195)
[info] org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
[info] org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:114)
[info] org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
[info] org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
[info] scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
[info] scala.collection.mutable.Growable.addAll(Growable.scala:61)
[info] scala.collection.mutable.Growable.addAll$(Growable.scala:57)
[info] scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75)
[info] scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505)
[info] scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498)
[info] scala.collection.AbstractIterator.toArray(Iterator.scala:1303)
[info] org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183)
[info] org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608)
[info] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] org.apache.spark.scheduler.Task.run(Task.scala:146)
[info] org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)
[info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] java.lang.Thread.run(Thread.java:840)
[info] org.apache.spark.SparkException: java.lang.UnsupportedClassVersionError: my/great/lib/MyLib has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0
[info] java.lang.ClassLoader.defineClass1(Native Method)
[info] java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
[info] java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
[info] java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
[info] java.net.URLClassLoader$1.run(URLClassLoader.java:427)
[info] java.net.URLClassLoader$1.run(URLClassLoader.java:421)
[info] java.security.AccessController.doPrivileged(AccessController.java:712)
[info] java.net.URLClassLoader.findClass(URLClassLoader.java:420)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:592)
[info] org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:55)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:579)
[info] org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:525)
[info] org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:109)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:592)
[info] java.lang.ClassLoader.loadClass(ClassLoader.java:525)
[info] ammonite.$sess.cmd28$Helper.$anonfun$func$1(cmd28.sc:3)
[info] ammonite.$sess.cmd28$Helper.$anonfun$func$1$adapted(cmd28.sc:1)
[info] org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:112)
[info] org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info] org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
[info] org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchIterator.hasNext(ArrowConverters.scala:100)
[info] scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
[info] scala.collection.mutable.Growable.addAll(Growable.scala:61)
[info] scala.collection.mutable.Growable.addAll$(Growable.scala:57)
[info] scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75)
[info] scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505)
[info] scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498)
[info] scala.collection.AbstractIterator.toArray(Iterator.scala:1303)
[info] org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.$anonfun$processAsArrowBatches$5(SparkConnectPlanExecution.scala:183)
[info] org.apache.spark.SparkContext.$anonfun$submitJob$1(SparkContext.scala:2608)
[info] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
[info] org.apache.spark.scheduler.Task.run(Task.scala:146)
[info] org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info] org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info] org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)
[info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] java.lang.Thread.run(Thread.java:840) (ReplE2ESuite.scala:117)
```
The reasons I suspect for the aforementioned bad case are as follows:
1. Following #45075, to address compatibility issues, Spark 4.0 adopted `~/.ivy2.5.2` as the default Ivy user directory. When tests are executed with Java 21, the compiled `mylib-0.1.jar` is published to the directory `~/.ivy2.5.2/cache/my.great.lib/mylib/jars`.
2. However, the `getDefaultCache` method within the default `IvySettings` instance still returns `~/.ivy2/cache`. Consequently, when the `purgeLocalIvyCache` function is called within the `withRepository` function, it attempts to clean the `artifact` and `deps` directories under `~/.ivy2/cache`. This results in the failure to effectively clean up the `mylib-0.1.jar` file located at `~/.ivy2.5.2/cache/my.great.lib/mylib/jars`, which was originally published by Java 21. Subsequently, when tests are executed with Java 17 and attempt to load this Java 21-compiled `mylib-0.1.jar`, the tests fail.
https://github.com/apache/spark/blob/9269a0bfed56429e999269dfdfd89aefcb1b7261/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala#L361-L371
https://github.com/apache/spark/blob/9269a0bfed56429e999269dfdfd89aefcb1b7261/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala#L392-L403
To address this issue, the pull request modifies the default configuration of the `IvySettings` instance, ensuring that `purgeLocalIvyCache` is able to properly clean up the corresponding cache files located in `~/.ivy2.5.2/cache`. This resolution fixes the aforementioned problem.
### Does this PR introduce _any_ user-facing change?
No, just for test
### How was this patch tested?
1. Pass GitHub Actions
2. Manually executing the tests described in the pull request results in success, and it is confirmed that the `~/.ivy2.5.2/cache/my.great.lib` directory is cleaned up promptly.
### Was this patch authored or co-authored using generative AI tooling?
NO
Closes #48006 from LuciferYang/IvyTestUtils-withRepository.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>ivySettings in the IvyTestUtis#withRepository function to use .ivy2.5.2 as the Default Ivy User Dir1 parent 62cdc56 commit b5e345cCopy full SHA for b5e345c
File tree
Expand file treeCollapse file tree
2 files changed
+14
-2
lines changedOpen diff view settings
Filter options
- common/utils/src
- main/scala/org/apache/spark/util
- test/scala/org/apache/spark/util
Expand file treeCollapse file tree
2 files changed
+14
-2
lines changedOpen diff view settings
Collapse file
common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
Copy file name to clipboardExpand all lines: common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala+1-1Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
342 | 342 | | |
343 | 343 | | |
344 | 344 | | |
345 | | - | |
| 345 | + | |
346 | 346 | | |
347 | 347 | | |
348 | 348 | | |
| |||
Collapse file
common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala
Copy file name to clipboardExpand all lines: common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala+13-1Lines changed: 13 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
365 | 365 | | |
366 | 366 | | |
367 | 367 | | |
368 | | - | |
| 368 | + | |
369 | 369 | | |
370 | 370 | | |
371 | 371 | | |
| |||
401 | 401 | | |
402 | 402 | | |
403 | 403 | | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
404 | 416 | | |
0 commit comments