hot fix for PR105 - change to Java annotation #133

CodingCat · 2014-03-13T02:42:53Z

No description provided.

AmplabJenkins · 2014-03-13T02:43:53Z

Merged build triggered.

AmplabJenkins · 2014-03-13T02:43:53Z

Merged build started.

CodingCat · 2014-03-13T02:44:11Z

Hi, @pwendell and @aarondav, is it good?

pwendell · 2014-03-13T02:47:36Z

LGTM @aarondav?

aarondav · 2014-03-13T02:49:39Z

LGTM, merged into master. Thanks!!

AmplabJenkins · 2014-03-13T03:36:46Z

Merged build finished.

AmplabJenkins · 2014-03-13T03:36:46Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13146/

update default github (cherry picked from commit 41ead7a) Signed-off-by: Reynold Xin <[email protected]>

## What changes were proposed in this pull request? Currently streaming tests fail often because multiple SparkContexts are created in a single VM. This is both a problem in DB Spark as Apache Spark. This PR fixes by making sure the previous SparkContext is always destroyed before a new one is created, and preferably by reusing the existing SparkContext. I went kind of overboard on the refactor. ## How was this patch tested? It are only tests. Author: Herman van Hovell <[email protected]> Closes apache#133 from hvanhovell/fix_streaming_tests.

* removed mesos security * Removed mesos security from HS * added clear mesage

* table fix * another effort to fix the table a successful one

Some refactor for cloud-provider-openstack-acceptance-test-e2e-conformance

### What changes were proposed in this pull request? Currently `BroadcastHashJoinExec` and `ShuffledHashJoinExec` do not preserve children output ordering information (inherit from `SparkPlan.outputOrdering`, which is Nil). This can add unnecessary sort in complex queries involved multiple joins. Example: ``` withSQLConf( SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "50") { val df1 = spark.range(100).select($"id".as("k1")) val df2 = spark.range(100).select($"id".as("k2")) val df3 = spark.range(3).select($"id".as("k3")) val df4 = spark.range(100).select($"id".as("k4")) val plan = df1.join(df2, $"k1" === $"k2") .join(df3, $"k1" === $"k3") .join(df4, $"k1" === $"k4") .queryExecution .executedPlan } ``` Current physical plan (extra sort on `k1` before top sort merge join): ``` *(9) SortMergeJoin [k1#220L], [k4#232L], Inner :- *(6) Sort [k1#220L ASC NULLS FIRST], false, 0 : +- *(6) BroadcastHashJoin [k1#220L], [k3#228L], Inner, BuildRight : :- *(6) SortMergeJoin [k1#220L], [k2#224L], Inner : : :- *(2) Sort [k1#220L ASC NULLS FIRST], false, 0 : : : +- Exchange hashpartitioning(k1#220L, 5), true, [id=#128] : : : +- *(1) Project [id#218L AS k1#220L] : : : +- *(1) Range (0, 100, step=1, splits=2) : : +- *(4) Sort [k2#224L ASC NULLS FIRST], false, 0 : : +- Exchange hashpartitioning(k2#224L, 5), true, [id=#134] : : +- *(3) Project [id#222L AS k2#224L] : : +- *(3) Range (0, 100, step=1, splits=2) : +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false])), [id=#141] : +- *(5) Project [id#226L AS k3#228L] : +- *(5) Range (0, 3, step=1, splits=2) +- *(8) Sort [k4#232L ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(k4#232L, 5), true, [id=#148] +- *(7) Project [id#230L AS k4#232L] +- *(7) Range (0, 100, step=1, splits=2) ``` Ideal physical plan (no extra sort on `k1` before top sort merge join): ``` *(9) SortMergeJoin [k1#220L], [k4#232L], Inner :- *(6) BroadcastHashJoin [k1#220L], [k3#228L], Inner, BuildRight : :- *(6) SortMergeJoin [k1#220L], [k2#224L], Inner : : :- *(2) Sort [k1#220L ASC NULLS FIRST], false, 0 : : : +- Exchange hashpartitioning(k1#220L, 5), true, [id=#127] : : : +- *(1) Project [id#218L AS k1#220L] : : : +- *(1) Range (0, 100, step=1, splits=2) : : +- *(4) Sort [k2#224L ASC NULLS FIRST], false, 0 : : +- Exchange hashpartitioning(k2#224L, 5), true, [id=#133] : : +- *(3) Project [id#222L AS k2#224L] : : +- *(3) Range (0, 100, step=1, splits=2) : +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false])), [id=#140] : +- *(5) Project [id#226L AS k3#228L] : +- *(5) Range (0, 3, step=1, splits=2) +- *(8) Sort [k4#232L ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(k4#232L, 5), true, [id=#146] +- *(7) Project [id#230L AS k4#232L] +- *(7) Range (0, 100, step=1, splits=2) ``` ### Why are the changes needed? To avoid unnecessary sort in query, and it has most impact when users read sorted bucketed table. Though the unnecessary sort is operating on already sorted data, it would have obvious negative impact on IO and query run time if the data is large and external sorting happens. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unit test in `JoinSuite`. Closes #29181 from c21/ordering. Authored-by: Cheng Su <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…nProtoConverter ### What changes were proposed in this pull request? This PR refactors the `LiteralExpressionProtoConverter` to use `CatalystTypeConverters` for consistent type conversion, eliminating code duplication and improving maintainability. **Key changes:** 1. **Simplified `LiteralExpressionProtoConverter.toCatalystExpression()`**: Replaced the large switch statement (86 lines) with a clean 3-line implementation that leverages existing conversion utilities 2. **Added TIME type support**: Added missing TIME literal type conversion in `LiteralValueProtoConverter.toScalaValue()` ### Why are the changes needed? 1. **Type conversion issues**: Some complex nested data structures (e.g., arrays of case classes) failed to convert to Catalyst's internal representation when using `expressions.Literal.create(...)`. 2. **Inconsistent behaviors**: Differences in behavior between Spark Connect and classic Spark for the same data types (e.g., Decimal). ### Does this PR introduce _any_ user-facing change? **Yes** - Users can now successfully use `typedLit` with an array of case classes. Previously, attempting to use `typedlit(Array(CaseClass(1, "a")))` would fail (please see the code piece below for details), but now it works correctly and converts case classes to proper struct representations. ```scala import org.apache.spark.sql.functions.typedlit case class CaseClass(a: Int, b: String) spark.sql("select 1").select(typedlit(Array(CaseClass(1, "a")))).collect() // Below is the error message: """ org.apache.spark.SparkIllegalArgumentException: requirement failed: Literal must have a corresponding value to array<struct<a:int,b:string>>, but class GenericArrayData found. scala.Predef$.require(Predef.scala:337) org.apache.spark.sql.catalyst.expressions.Literal$.validateLiteralValue(literals.scala:306) org.apache.spark.sql.catalyst.expressions.Literal.<init>(literals.scala:456) org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:206) org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverter$.toCatalystExpression(LiteralExpressionProtoConverter.scala:103) """ ``` Besides, some catalyst values (e.g., Decimal 89.97620 -> 89.976200000000000000) have changed. Please see the changes in `explain-results/` for details. ```scala import org.apache.spark.sql.functions.typedlit spark.sql("select 1").select(typedlit(BigDecimal(8997620, 5)),typedlit(Array(BigDecimal(8997620, 5), BigDecimal(8997621, 5)))).explain() // Current explain() output: """ Project [89.97620 AS 89.97620#819, [89.97620,89.97621] AS ARRAY(89.97620BD, 89.97621BD)#820] """ // Expected explain() output (i.e., same as the classic mode): """ Project [89.976200000000000000 AS 89.976200000000000000#132, [89.976200000000000000,89.976210000000000000] AS ARRAY(89.976200000000000000BD, 89.976210000000000000BD)#133] """ ``` ### How was this patch tested? `build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite"` `build/sbt "connect/testOnly org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite"` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.4.5 Closes #52188 from heyihong/SPARK-53438. Authored-by: Yihong He <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

hot fix for PR105 - change to Java annotation

6607155

asfgit closed this in 6bd2eaa Mar 13, 2014

CodingCat deleted the SPARK-1160-2 branch March 17, 2014 17:21

jhartlaub referenced this pull request in jhartlaub/spark May 27, 2014

Merge pull request alteryx#133 from Mistobaan/link_fix

5b45c9b

update default github (cherry picked from commit 41ead7a) Signed-off-by: Reynold Xin <[email protected]>

jlopezmalla pushed a commit to jlopezmalla/spark that referenced this pull request Jan 30, 2018

Remove History server auth (apache#133)

08db386

* removed mesos security * Removed mesos security from HS * added clear mesage

Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018

table fix (apache#133)

dc835fb

* table fix * another effort to fix the table a successful one

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#133 from theopenlab/refactor-e2e

16b986f

Some refactor for cloud-provider-openstack-acceptance-test-e2e-conformance

c21 mentioned this pull request Jul 22, 2020

[SPARK-32383][SQL] Preserve hash join (BHJ and SHJ) stream side ordering #29181

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hot fix for PR105 - change to Java annotation #133

hot fix for PR105 - change to Java annotation #133

Uh oh!

CodingCat commented Mar 13, 2014

Uh oh!

AmplabJenkins commented Mar 13, 2014

Uh oh!

AmplabJenkins commented Mar 13, 2014

Uh oh!

CodingCat commented Mar 13, 2014

Uh oh!

pwendell commented Mar 13, 2014

Uh oh!

aarondav commented Mar 13, 2014

Uh oh!

AmplabJenkins commented Mar 13, 2014

Uh oh!

AmplabJenkins commented Mar 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hot fix for PR105 - change to Java annotation #133

hot fix for PR105 - change to Java annotation #133

Uh oh!

Conversation

CodingCat commented Mar 13, 2014

Uh oh!

AmplabJenkins commented Mar 13, 2014

Uh oh!

AmplabJenkins commented Mar 13, 2014

Uh oh!

CodingCat commented Mar 13, 2014

Uh oh!

pwendell commented Mar 13, 2014

Uh oh!

aarondav commented Mar 13, 2014

Uh oh!

AmplabJenkins commented Mar 13, 2014

Uh oh!

AmplabJenkins commented Mar 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants