Skip to content

Conversation

scwf
Copy link

@scwf scwf commented Apr 23, 2015

No description provided.

@scwf scwf closed this Apr 23, 2015
adrian-wang pushed a commit that referenced this pull request May 4, 2015
SQL
```
select key from (select key from src limit 100) t2 limit 10
```
Optimized Logical Plan before modifying
```
== Optimized Logical Plan ==
Limit 10
Limit 100
Project key#3
MetastoreRelation default, src, None
```
Optimized Logical Plan after modifying
```
== Optimized Logical Plan ==
Limit 10
 Project [key#1]
  MetastoreRelation default, src, None
```

Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>

Closes apache#5770 from DoingDone9/limitOptimizer and squashes the following commits:

c68eaa7 [Zhongshuai Pei] Update CombiningLimitsSuite.scala
97e18cf [Zhongshuai Pei] Update Optimizer.scala
19ab875 [Zhongshuai Pei] Update CombiningLimitsSuite.scala
7db4566 [Zhongshuai Pei] Update CombiningLimitsSuite.scala
e2a491d [Zhongshuai Pei] Update Optimizer.scala
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request apache#5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
adrian-wang pushed a commit that referenced this pull request May 4, 2015
SQL
```
select key from (select key,value from t1 limit 100) t2 limit 10
```
Optimized Logical Plan before modifying
```
== Optimized Logical Plan ==
Limit 10
  Project key#228
    Limit 100
      MetastoreRelation default, t1, None
```
Optimized Logical Plan after modifying
```
== Optimized Logical Plan ==
Limit 10
  Limit 100
    Project key#228
      MetastoreRelation default, t1, None
```
After this, we can combine limits

Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>

Closes apache#5797 from DoingDone9/ProjectLimit and squashes the following commits:

70d0fca [Zhongshuai Pei] Update FilterPushdownSuite.scala
dc83ae9 [Zhongshuai Pei] Update FilterPushdownSuite.scala
485c61c [Zhongshuai Pei] Update Optimizer.scala
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request apache#5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
adrian-wang pushed a commit that referenced this pull request May 12, 2015
JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-5893).

One thing to make clear, the `buckets` parameter, which is an array of `Double`, performs as split points. Say,

```scala
buckets = Array(-0.5, 0.0, 0.5)
```

splits the real number into 4 ranges, (-inf, -0.5], (-0.5, 0.0], (0.0, 0.5], (0.5, +inf), which is encoded as 0, 1, 2, 3.

Author: Xusen Yin <[email protected]>
Author: Joseph K. Bradley <[email protected]>

Closes apache#5980 from yinxusen/SPARK-5893 and squashes the following commits:

dc8c843 [Xusen Yin] Merge pull request #4 from jkbradley/yinxusen-SPARK-5893
1ca973a [Joseph K. Bradley] one more bucketizer test
34f124a [Joseph K. Bradley] Removed lowerInclusive, upperInclusive params from Bucketizer, and used splits instead.
eacfcfa [Xusen Yin] change ML attribute from splits into buckets
c3cc770 [Xusen Yin] add more unit test for binary search
3a16cc2 [Xusen Yin] refine comments and names
ac77859 [Xusen Yin] fix style error
fb30d79 [Xusen Yin] fix and test binary search
2466322 [Xusen Yin] refactor Bucketizer
11fb00a [Xusen Yin] change it into an Estimator
998bc87 [Xusen Yin] check buckets
4024cf1 [Xusen Yin] add test suite
5fe190e [Xusen Yin] add bucketizer
adrian-wang pushed a commit that referenced this pull request May 12, 2015
…" into true or false directly

SQL
```
select key from src where 3 in (4, 5);
```
Before
```
== Optimized Logical Plan ==
Project [key#12]
 Filter 3 INSET (5,4)
  MetastoreRelation default, src, None
```

After
```
== Optimized Logical Plan ==
LocalRelation [key#228], []
```

Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>

Closes apache#5972 from DoingDone9/InToFalse and squashes the following commits:

4c722a2 [Zhongshuai Pei] Update predicates.scala
abe2bbb [Zhongshuai Pei] Update Optimizer.scala
fa461a5 [Zhongshuai Pei] Update Optimizer.scala
e34c28a [Zhongshuai Pei] Update predicates.scala
24739bd [Zhongshuai Pei] Update ConstantFoldingSuite.scala
f4dbf50 [Zhongshuai Pei] Update ConstantFoldingSuite.scala
35ceb7a [Zhongshuai Pei] Update Optimizer.scala
36c194e [Zhongshuai Pei] Update Optimizer.scala
2e8f6ca [Zhongshuai Pei] Update Optimizer.scala
14952e2 [Zhongshuai Pei] Merge pull request apache#13 from apache/master
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request apache#5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
adrian-wang pushed a commit that referenced this pull request Jun 12, 2015
…into a single batch.

SQL
```
select * from tableA join tableB on (a > 3 and b = d) or (a > 3 and b = e)
```
Plan before modify
```
== Optimized Logical Plan ==
Project [a#293,b#294,c#295,d#296,e#297]
 Join Inner, Some(((a#293 > 3) && ((b#294 = d#296) || (b#294 = e#297))))
  MetastoreRelation default, tablea, None
  MetastoreRelation default, tableb, None
```
Plan after modify
```
== Optimized Logical Plan ==
Project [a#293,b#294,c#295,d#296,e#297]
 Join Inner, Some(((b#294 = d#296) || (b#294 = e#297)))
  Filter (a#293 > 3)
   MetastoreRelation default, tablea, None
  MetastoreRelation default, tableb, None
```

CombineLimits ==> Limit(If(LessThan(ne, le), ne, le), grandChild) and LessThan is in BooleanSimplification ,  so CombineLimits  must before BooleanSimplification and BooleanSimplification must before PushPredicateThroughJoin.

Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>

Closes apache#6351 from DoingDone9/master and squashes the following commits:

20de7be [Zhongshuai Pei] Update Optimizer.scala
7bc7d28 [Zhongshuai Pei] Merge pull request apache#17 from apache/master
0ba5f42 [Zhongshuai Pei] Update Optimizer.scala
f8b9314 [Zhongshuai Pei] Update FilterPushdownSuite.scala
c529d9f [Zhongshuai Pei] Update FilterPushdownSuite.scala
ae3af6d [Zhongshuai Pei] Update FilterPushdownSuite.scala
a04ffae [Zhongshuai Pei] Update Optimizer.scala
11beb61 [Zhongshuai Pei] Update FilterPushdownSuite.scala
f2ee5fe [Zhongshuai Pei] Update Optimizer.scala
be6b1d5 [Zhongshuai Pei] Update Optimizer.scala
b01e622 [Zhongshuai Pei] Merge pull request apache#15 from apache/master
8df716a [Zhongshuai Pei] Update FilterPushdownSuite.scala
d98bc35 [Zhongshuai Pei] Update FilterPushdownSuite.scala
fa65718 [Zhongshuai Pei] Update Optimizer.scala
ab8e9a6 [Zhongshuai Pei] Merge pull request apache#14 from apache/master
14952e2 [Zhongshuai Pei] Merge pull request apache#13 from apache/master
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request apache#5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
adrian-wang pushed a commit that referenced this pull request Jun 19, 2015
…l operators

This patch introduces `SparkPlanTest`, a base class for unit tests of SparkPlan physical operators.  This is analogous to Spark SQL's existing `QueryTest`, which does something similar for end-to-end tests with actual queries.

These helper methods provide nicer error output when tests fail and help developers to avoid writing lots of boilerplate in order to execute manually constructed physical plans.

Author: Josh Rosen <[email protected]>
Author: Josh Rosen <[email protected]>
Author: Michael Armbrust <[email protected]>

Closes apache#6885 from JoshRosen/spark-plan-test and squashes the following commits:

f8ce275 [Josh Rosen] Fix some IntelliJ inspections and delete some dead code
84214be [Josh Rosen] Add an extra column which isn't part of the sort
ae1896b [Josh Rosen] Provide implicits automatically
a80f9b0 [Josh Rosen] Merge pull request #4 from marmbrus/pr/6885
d9ab1e4 [Michael Armbrust] Add simple resolver
c60a44d [Josh Rosen] Manually bind references
996332a [Josh Rosen] Add types so that tests compile
a46144a [Josh Rosen] WIP
adrian-wang pushed a commit that referenced this pull request Apr 14, 2016
…l` in IF/CASEWHEN

## What changes were proposed in this pull request?

Currently, `SimplifyConditionals` handles `true` and `false` to optimize branches. This PR improves `SimplifyConditionals` to take advantage of `null` conditions for `if` and `CaseWhen` expressions, too.

**Before**
```
scala> sql("SELECT IF(null, 1, 0)").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [if (null) 1 else 0 AS (IF(CAST(NULL AS BOOLEAN), 1, 0))#4]
:     +- INPUT
+- Scan OneRowRelation[]
scala> sql("select case when cast(null as boolean) then 1 else 2 end").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [CASE WHEN null THEN 1 ELSE 2 END AS CASE WHEN CAST(NULL AS BOOLEAN) THEN 1 ELSE 2 END#14]
:     +- INPUT
+- Scan OneRowRelation[]
```

**After**
```
scala> sql("SELECT IF(null, 1, 0)").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [0 AS (IF(CAST(NULL AS BOOLEAN), 1, 0))#4]
:     +- INPUT
+- Scan OneRowRelation[]
scala> sql("select case when cast(null as boolean) then 1 else 2 end").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [2 AS CASE WHEN CAST(NULL AS BOOLEAN) THEN 1 ELSE 2 END#4]
:     +- INPUT
+- Scan OneRowRelation[]
```

**Hive**
```
hive> select if(null,1,2);
OK
2
hive> select case when cast(null as boolean) then 1 else 2 end;
OK
2
```

## How was this patch tested?

Pass the Jenkins tests (including new extended test cases).

Author: Dongjoon Hyun <[email protected]>

Closes apache#12122 from dongjoon-hyun/SPARK-14338.
adrian-wang pushed a commit that referenced this pull request Feb 28, 2018
…pressions

## What changes were proposed in this pull request?

This PR changes the direction of expression transformation in the DecimalPrecision rule. Previously, the expressions were transformed down, which led to incorrect result types when decimal expressions had other decimal expressions as their operands. The root cause of this issue was in visiting outer nodes before their children. Consider the example below:

```
    val inputSchema = StructType(StructField("col", DecimalType(26, 6)) :: Nil)
    val sc = spark.sparkContext
    val rdd = sc.parallelize(1 to 2).map(_ => Row(BigDecimal(12)))
    val df = spark.createDataFrame(rdd, inputSchema)

    // Works correctly since no nested decimal expression is involved
    // Expected result type: (26, 6) * (26, 6) = (38, 12)
    df.select($"col" * $"col").explain(true)
    df.select($"col" * $"col").printSchema()

    // Gives a wrong result since there is a nested decimal expression that should be visited first
    // Expected result type: ((26, 6) * (26, 6)) * (26, 6) = (38, 12) * (26, 6) = (38, 18)
    df.select($"col" * $"col" * $"col").explain(true)
    df.select($"col" * $"col" * $"col").printSchema()
```

The example above gives the following output:

```
// Correct result without sub-expressions
== Parsed Logical Plan ==
'Project [('col * 'col) AS (col * col)#4]
+- LogicalRDD [col#1]

== Analyzed Logical Plan ==
(col * col): decimal(38,12)
Project [CheckOverflow((promote_precision(cast(col#1 as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) AS (col * col)#4]
+- LogicalRDD [col#1]

== Optimized Logical Plan ==
Project [CheckOverflow((col#1 * col#1), DecimalType(38,12)) AS (col * col)#4]
+- LogicalRDD [col#1]

== Physical Plan ==
*Project [CheckOverflow((col#1 * col#1), DecimalType(38,12)) AS (col * col)#4]
+- Scan ExistingRDD[col#1]

// Schema
root
 |-- (col * col): decimal(38,12) (nullable = true)

// Incorrect result with sub-expressions
== Parsed Logical Plan ==
'Project [(('col * 'col) * 'col) AS ((col * col) * col)apache#11]
+- LogicalRDD [col#1]

== Analyzed Logical Plan ==
((col * col) * col): decimal(38,12)
Project [CheckOverflow((promote_precision(cast(CheckOverflow((promote_precision(cast(col#1 as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) AS ((col * col) * col)apache#11]
+- LogicalRDD [col#1]

== Optimized Logical Plan ==
Project [CheckOverflow((cast(CheckOverflow((col#1 * col#1), DecimalType(38,12)) as decimal(26,6)) * col#1), DecimalType(38,12)) AS ((col * col) * col)apache#11]
+- LogicalRDD [col#1]

== Physical Plan ==
*Project [CheckOverflow((cast(CheckOverflow((col#1 * col#1), DecimalType(38,12)) as decimal(26,6)) * col#1), DecimalType(38,12)) AS ((col * col) * col)apache#11]
+- Scan ExistingRDD[col#1]

// Schema
root
 |-- ((col * col) * col): decimal(38,12) (nullable = true)
```

## How was this patch tested?

This PR was tested with available unit tests. Moreover, there are tests to cover previously failing scenarios.

Author: aokolnychyi <[email protected]>

Closes apache#18583 from aokolnychyi/spark-21332.

(cherry picked from commit 0be5fb4)
Signed-off-by: gatorsmile <[email protected]>
adrian-wang pushed a commit that referenced this pull request May 13, 2024
### What changes were proposed in this pull request?

In the `Window` node, both `partitionSpec` and `orderSpec` must be orderable, but the current type check only verifies `orderSpec` is orderable. This can cause an error in later optimizing phases.

Given a query:

```
with t as (select id, map(id, id) as m from range(0, 10))
select rank() over (partition by m order by id) from t
```

Before the PR, it fails with an `INTERNAL_ERROR`:

```
org.apache.spark.SparkException: [INTERNAL_ERROR] grouping/join/window partition keys cannot be map type. SQLSTATE: XX000
at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
at org.apache.spark.SparkException$.internalError(SparkException.scala:96)
at org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers$.needNormalize(NormalizeFloatingNumbers.scala:103)
at org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers$.org$apache$spark$sql$catalyst$optimizer$NormalizeFloatingNumbers$$needNormalize(NormalizeFloatingNumbers.scala:94)
...
```

After the PR, it fails with a `EXPRESSION_TYPE_IS_NOT_ORDERABLE`, which is expected:

```
  org.apache.spark.sql.catalyst.ExtendedAnalysisException: [EXPRESSION_TYPE_IS_NOT_ORDERABLE] Column expression "m" cannot be sorted because its type "MAP<BIGINT, BIGINT>" is not orderable. SQLSTATE: 42822; line 2 pos 53;
Project [RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4]
+- Project [id#1L, m#0, RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4, RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4]
   +- Window [rank(id#1L) windowspecdefinition(m#0, id#1L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4], [m#0], [id#1L ASC NULLS FIRST]
      +- Project [id#1L, m#0]
         +- SubqueryAlias t
            +- SubqueryAlias t
               +- Project [id#1L, map(id#1L, id#1L) AS m#0]
                  +- Range (0, 10, step=1, splits=None)
  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
...
```

### How was this patch tested?

Unit test.

Closes apache#45730 from chenhao-db/SPARK-47572.

Authored-by: Chenhao Li <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant