Skip to content

Conversation

@witgo
Copy link
Contributor

@witgo witgo commented May 19, 2014

...t Hadoop 1

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@witgo
Copy link
Contributor Author

witgo commented May 19, 2014

@pwendell
Do you have time to review the code?

@rxin
Copy link
Contributor

rxin commented May 19, 2014

tagging the right person @pwendell and @mateiz, who reported the problem initially.

@witgo
Copy link
Contributor Author

witgo commented May 19, 2014

Oh, I'm sorry.

@aarondav
Copy link
Contributor

Jenkins, ok to test.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15075/

@mateiz
Copy link
Contributor

mateiz commented May 19, 2014

Thanks @witgo for the quick turnaround -- will take a look at it. One thing that concerns me slightly now is that we always need to specify a Maven profile. Is there a problem if we add commons-lang in the Hadoop 2 profiles too? Do those break?

@mateiz
Copy link
Contributor

mateiz commented May 19, 2014

Basically one specific problem here is that when we publish to Maven Central, and people link to our POM, they will get a dependency on Spark without any profiles activated. So if they test their app, they might run into this error. If that's not the case I'd at least like to understand why. Our build should work by default without any profiles for this reason.

@witgo
Copy link
Contributor Author

witgo commented May 19, 2014

@mateiz
This problem only occurs in spark-assembly_2.10 with -Phive ,-Dhadoop.version=1.x,will not affect user testing.

[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-assembly_2.10 ---
[INFO] org.apache.spark:spark-assembly_2.10:pom:1.0.1-SNAPSHOT
[INFO] +- org.apache.spark:spark-core_2.10:jar:1.0.1-SNAPSHOT:compile
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:1.0.4:compile
[INFO] |  |  \- org.apache.hadoop:hadoop-core:jar:1.0.4:compile
[INFO] |  |     +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  |     +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |  |     +- commons-el:commons-el:jar:1.0:compile
[INFO] |  |     +- hsqldb:hsqldb:jar:1.8.0.10:compile
[INFO] |  |     \- oro:oro:jar:2.0.8:compile
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.7.1
.......
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-core_2.10 ---
[INFO] org.apache.spark:spark-core_2.10:jar:1.0.0-SNAPSHOT
[INFO] +- org.apache.hadoop:hadoop-client:jar:1.0.4:compile
[INFO] |  \- org.apache.hadoop:hadoop-core:jar:1.0.4:compile
[INFO] |     +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |     +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |     +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |     |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |     |  +- commons-lang:commons-lang:jar:2.4:compile
[INFO] |     |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |     |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |     |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |     +- commons-el:commons-el:jar:1.0:compile
....

@mateiz
Copy link
Contributor

mateiz commented May 19, 2014

So if a user adds a dependency in their project on spark-core 1.0.0, spark-hive 1.0.0 and a hadoop-client that Hive supports, will things work? I'll try this too but it's good to test locally. You need to do mvn -DskipTests install to publish the packages into your local Maven cache, then build a standalone Spark app against it.

We need to test both a dependency on spark-core by itself and one on spark-core with spark-hive.

@srowen
Copy link
Member

srowen commented May 19, 2014

I disagree with this change. At least please see comment at https://issues.apache.org/jira/browse/SPARK-1875 first.

@mateiz
Copy link
Contributor

mateiz commented May 19, 2014

The fix suggested there does seem better, namely don't exclude commons-lang.

@witgo
Copy link
Contributor Author

witgo commented May 19, 2014

@mateiz
spark-examples_2.10 is consistent with the situation you say

[INFO] 
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-examples_2.10 ---
[INFO] org.apache.spark:spark-examples_2.10:jar:1.0.0-SNAPSHOT
[INFO] +- org.apache.spark:spark-core_2.10:jar:1.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:1.2.1:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-core:jar:1.2.1:provided
[INFO] |  |     +- xmlenc:xmlenc:jar:0.52:provided
[INFO] |  |     +- org.apache.commons:commons-math:jar:2.1:provided
[INFO] |  |     +- hsqldb:hsqldb:jar:1.8.0.10:provided
[INFO] |  |     \- oro:oro:jar:2.0.8:provided
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.7.1:runtime

How about this solution?

@srowen
Copy link
Member

srowen commented May 19, 2014

@witgo This shows no dependency on commons-lang. Surely this is a problem just for the reason @mateiz found? Hadoop does depend on it from the client side.

@witgo
Copy link
Contributor Author

witgo commented May 19, 2014

@srowen
I agree with what @mateiz said. We should not exclude commons-lang.

@witgo witgo closed this May 20, 2014
@witgo witgo deleted the SPARK-1875 branch May 20, 2014 02:49
cloud-fan pushed a commit that referenced this pull request Sep 17, 2025
…nProtoConverter

### What changes were proposed in this pull request?

This PR refactors the `LiteralExpressionProtoConverter` to use `CatalystTypeConverters` for consistent type conversion, eliminating code duplication and improving maintainability.

**Key changes:**
1. **Simplified `LiteralExpressionProtoConverter.toCatalystExpression()`**: Replaced the large switch statement (86 lines) with a clean 3-line implementation that leverages existing conversion utilities
2. **Added TIME type support**: Added missing TIME literal type conversion in `LiteralValueProtoConverter.toScalaValue()`

### Why are the changes needed?

1. **Type conversion issues**: Some complex nested data structures (e.g., arrays of case classes) failed to convert to Catalyst's internal representation when using `expressions.Literal.create(...)`.
2. **Inconsistent behaviors**: Differences in behavior between Spark Connect and classic Spark for the same data types (e.g., Decimal).

### Does this PR introduce _any_ user-facing change?

**Yes** - Users can now successfully use `typedLit` with an array of case classes. Previously, attempting to use `typedlit(Array(CaseClass(1, "a")))` would fail (please see the code piece below for details), but now it works correctly and converts case classes to proper struct representations.

```scala
import org.apache.spark.sql.functions.typedlit
case class CaseClass(a: Int, b: String)
spark.sql("select 1").select(typedlit(Array(CaseClass(1, "a")))).collect()

// Below is the error message:
"""
org.apache.spark.SparkIllegalArgumentException: requirement failed: Literal must have a corresponding value to array<struct<a:int,b:string>>, but class GenericArrayData found.
  scala.Predef$.require(Predef.scala:337)
  org.apache.spark.sql.catalyst.expressions.Literal$.validateLiteralValue(literals.scala:306)
  org.apache.spark.sql.catalyst.expressions.Literal.<init>(literals.scala:456)
  org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:206)
  org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverter$.toCatalystExpression(LiteralExpressionProtoConverter.scala:103)
"""
```

Besides, some catalyst values (e.g., Decimal 89.97620 -> 89.976200000000000000) have changed. Please see the changes in `explain-results/` for details.
```scala
import org.apache.spark.sql.functions.typedlit

spark.sql("select 1").select(typedlit(BigDecimal(8997620, 5)),typedlit(Array(BigDecimal(8997620, 5), BigDecimal(8997621, 5)))).explain()
// Current explain() output:
"""
Project [89.97620 AS 89.97620#819, [89.97620,89.97621] AS ARRAY(89.97620BD, 89.97621BD)#820]
"""
// Expected explain() output (i.e., same as the classic mode):
"""
Project [89.976200000000000000 AS 89.976200000000000000#132, [89.976200000000000000,89.976210000000000000] AS ARRAY(89.976200000000000000BD, 89.976210000000000000BD)#133]
"""
```
### How was this patch tested?

`build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite"`
`build/sbt "connect/testOnly org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite"`

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.4.5

Closes #52188 from heyihong/SPARK-53438.

Authored-by: Yihong He <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants