[WIP][SPARK-1875]:NoClassDefFoundError: StringUtils when building agains... #820

witgo · 2014-05-19T04:22:42Z

...t Hadoop 1

…inst Hadoop 1

AmplabJenkins · 2014-05-19T04:22:58Z

Can one of the admins verify this patch?

witgo · 2014-05-19T04:23:51Z

@pwendell
Do you have time to review the code?

rxin · 2014-05-19T04:38:17Z

tagging the right person @pwendell and @mateiz, who reported the problem initially.

witgo · 2014-05-19T04:42:33Z

Oh, I'm sorry.

aarondav · 2014-05-19T04:49:49Z

Jenkins, ok to test.

AmplabJenkins · 2014-05-19T04:52:57Z

Merged build triggered.

AmplabJenkins · 2014-05-19T04:53:03Z

Merged build started.

AmplabJenkins · 2014-05-19T05:28:44Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-19T05:28:44Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15075/

mateiz · 2014-05-19T06:03:45Z

Thanks @witgo for the quick turnaround -- will take a look at it. One thing that concerns me slightly now is that we always need to specify a Maven profile. Is there a problem if we add commons-lang in the Hadoop 2 profiles too? Do those break?

mateiz · 2014-05-19T06:06:55Z

Basically one specific problem here is that when we publish to Maven Central, and people link to our POM, they will get a dependency on Spark without any profiles activated. So if they test their app, they might run into this error. If that's not the case I'd at least like to understand why. Our build should work by default without any profiles for this reason.

witgo · 2014-05-19T06:13:30Z

@mateiz
This problem only occurs in spark-assembly_2.10 with -Phive ,-Dhadoop.version=1.x,will not affect user testing.

[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-assembly_2.10 ---
[INFO] org.apache.spark:spark-assembly_2.10:pom:1.0.1-SNAPSHOT
[INFO] +- org.apache.spark:spark-core_2.10:jar:1.0.1-SNAPSHOT:compile
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:1.0.4:compile
[INFO] |  |  \- org.apache.hadoop:hadoop-core:jar:1.0.4:compile
[INFO] |  |     +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  |     +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |  |     +- commons-el:commons-el:jar:1.0:compile
[INFO] |  |     +- hsqldb:hsqldb:jar:1.8.0.10:compile
[INFO] |  |     \- oro:oro:jar:2.0.8:compile
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.7.1
.......
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-core_2.10 ---
[INFO] org.apache.spark:spark-core_2.10:jar:1.0.0-SNAPSHOT
[INFO] +- org.apache.hadoop:hadoop-client:jar:1.0.4:compile
[INFO] |  \- org.apache.hadoop:hadoop-core:jar:1.0.4:compile
[INFO] |     +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |     +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |     +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |     |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |     |  +- commons-lang:commons-lang:jar:2.4:compile
[INFO] |     |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |     |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |     |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |     +- commons-el:commons-el:jar:1.0:compile
....

mateiz · 2014-05-19T06:19:44Z

So if a user adds a dependency in their project on spark-core 1.0.0, spark-hive 1.0.0 and a hadoop-client that Hive supports, will things work? I'll try this too but it's good to test locally. You need to do mvn -DskipTests install to publish the packages into your local Maven cache, then build a standalone Spark app against it.

We need to test both a dependency on spark-core by itself and one on spark-core with spark-hive.

srowen · 2014-05-19T06:36:38Z

I disagree with this change. At least please see comment at https://issues.apache.org/jira/browse/SPARK-1875 first.

mateiz · 2014-05-19T06:45:41Z

The fix suggested there does seem better, namely don't exclude commons-lang.

witgo · 2014-05-19T06:45:47Z

@mateiz
spark-examples_2.10 is consistent with the situation you say

[INFO] 
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-examples_2.10 ---
[INFO] org.apache.spark:spark-examples_2.10:jar:1.0.0-SNAPSHOT
[INFO] +- org.apache.spark:spark-core_2.10:jar:1.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:1.2.1:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-core:jar:1.2.1:provided
[INFO] |  |     +- xmlenc:xmlenc:jar:0.52:provided
[INFO] |  |     +- org.apache.commons:commons-math:jar:2.1:provided
[INFO] |  |     +- hsqldb:hsqldb:jar:1.8.0.10:provided
[INFO] |  |     \- oro:oro:jar:2.0.8:provided
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.7.1:runtime

How about this solution?

srowen · 2014-05-19T06:51:25Z

@witgo This shows no dependency on commons-lang. Surely this is a problem just for the reason @mateiz found? Hadoop does depend on it from the client side.

witgo · 2014-05-19T07:07:25Z

@srowen
I agree with what @mateiz said. We should not exclude commons-lang.

…nProtoConverter ### What changes were proposed in this pull request? This PR refactors the `LiteralExpressionProtoConverter` to use `CatalystTypeConverters` for consistent type conversion, eliminating code duplication and improving maintainability. **Key changes:** 1. **Simplified `LiteralExpressionProtoConverter.toCatalystExpression()`**: Replaced the large switch statement (86 lines) with a clean 3-line implementation that leverages existing conversion utilities 2. **Added TIME type support**: Added missing TIME literal type conversion in `LiteralValueProtoConverter.toScalaValue()` ### Why are the changes needed? 1. **Type conversion issues**: Some complex nested data structures (e.g., arrays of case classes) failed to convert to Catalyst's internal representation when using `expressions.Literal.create(...)`. 2. **Inconsistent behaviors**: Differences in behavior between Spark Connect and classic Spark for the same data types (e.g., Decimal). ### Does this PR introduce _any_ user-facing change? **Yes** - Users can now successfully use `typedLit` with an array of case classes. Previously, attempting to use `typedlit(Array(CaseClass(1, "a")))` would fail (please see the code piece below for details), but now it works correctly and converts case classes to proper struct representations. ```scala import org.apache.spark.sql.functions.typedlit case class CaseClass(a: Int, b: String) spark.sql("select 1").select(typedlit(Array(CaseClass(1, "a")))).collect() // Below is the error message: """ org.apache.spark.SparkIllegalArgumentException: requirement failed: Literal must have a corresponding value to array<struct<a:int,b:string>>, but class GenericArrayData found. scala.Predef$.require(Predef.scala:337) org.apache.spark.sql.catalyst.expressions.Literal$.validateLiteralValue(literals.scala:306) org.apache.spark.sql.catalyst.expressions.Literal.<init>(literals.scala:456) org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:206) org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverter$.toCatalystExpression(LiteralExpressionProtoConverter.scala:103) """ ``` Besides, some catalyst values (e.g., Decimal 89.97620 -> 89.976200000000000000) have changed. Please see the changes in `explain-results/` for details. ```scala import org.apache.spark.sql.functions.typedlit spark.sql("select 1").select(typedlit(BigDecimal(8997620, 5)),typedlit(Array(BigDecimal(8997620, 5), BigDecimal(8997621, 5)))).explain() // Current explain() output: """ Project [89.97620 AS 89.97620#819, [89.97620,89.97621] AS ARRAY(89.97620BD, 89.97621BD)#820] """ // Expected explain() output (i.e., same as the classic mode): """ Project [89.976200000000000000 AS 89.976200000000000000#132, [89.976200000000000000,89.976210000000000000] AS ARRAY(89.976200000000000000BD, 89.976210000000000000BD)#133] """ ``` ### How was this patch tested? `build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite"` `build/sbt "connect/testOnly org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite"` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.4.5 Closes #52188 from heyihong/SPARK-53438. Authored-by: Yihong He <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

HOTFIX SPARK-1875:NoClassDefFoundError: StringUtils when building aga…

d73b6ec

…inst Hadoop 1

witgo closed this May 20, 2014

witgo deleted the SPARK-1875 branch May 20, 2014 02:49

[WIP][SPARK-1875]:NoClassDefFoundError: StringUtils when building agains... #820

[WIP][SPARK-1875]:NoClassDefFoundError: StringUtils when building agains... #820

Uh oh!

Conversation

witgo commented May 19, 2014

Uh oh!

AmplabJenkins commented May 19, 2014

Uh oh!

witgo commented May 19, 2014

Uh oh!

rxin commented May 19, 2014

Uh oh!

witgo commented May 19, 2014

Uh oh!

aarondav commented May 19, 2014

Uh oh!

AmplabJenkins commented May 19, 2014

Uh oh!

AmplabJenkins commented May 19, 2014

Uh oh!

AmplabJenkins commented May 19, 2014

Uh oh!

AmplabJenkins commented May 19, 2014

Uh oh!

mateiz commented May 19, 2014

Uh oh!

mateiz commented May 19, 2014

Uh oh!

witgo commented May 19, 2014

Uh oh!

mateiz commented May 19, 2014

Uh oh!

srowen commented May 19, 2014

Uh oh!

mateiz commented May 19, 2014

Uh oh!

witgo commented May 19, 2014

Uh oh!

srowen commented May 19, 2014

Uh oh!

witgo commented May 19, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants