[SPARK-19544][SQL] Improve error message when some column types are compatible and others are not in set operations #16882

HyukjinKwon · 2017-02-10T05:05:25Z

What changes were proposed in this pull request?

This PR proposes to fix the error message when some data types are compatible and others are not in set/union operation.

Currently, the code below:

Seq((1,("a", 1))).toDF.union(Seq((1L,("a", "b"))).toDF)

throws an exception saying LongType and IntegerType are incompatible types. It should say something about StructTypes with more readable format as below:

Before

Union can only be performed on tables with the compatible column types.
LongType <> IntegerType at the first column of the second table;;

After

Union can only be performed on tables with the compatible column types.
struct<_1:string,_2:string> <> struct<_1:string,_2:int> at the second column of the second table;;

*I manually inserted a newline in the messages above for readability only in this PR description.

How was this patch tested?

Unit tests in AnalysisErrorSuite, manual tests and build wth Scala 2.10.

HyukjinKwon · 2017-02-10T05:08:19Z

cc @hvanhovell could you please take a look?

SparkQA · 2017-02-10T07:09:22Z

Test build #72683 has finished for PR 16882 at commit 07e6984.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-10T07:23:37Z

Test build #72691 has started for PR 16882 at commit c2fe413.

…s are not in set/union operations

SparkQA · 2017-02-10T07:27:37Z

Test build #72692 has started for PR 16882 at commit 073c6f9.

SparkQA · 2017-02-10T07:33:40Z

Test build #72693 has started for PR 16882 at commit 20b8e77.

HyukjinKwon · 2017-02-10T10:35:30Z

retest this please

hvanhovell · 2017-02-10T12:05:44Z

@HyukjinKwon Can you also improve the error message. I don't think StructType(StructField(_1,StringType,true), StructField(_2,StringType,true)) <> StructType(StructField(_1,StringType,true), StructField(_2,IntegerType,false)) is readable at all. Can we use catalog string instead?

Other than that, this looks fine.

HyukjinKwon · 2017-02-10T12:14:07Z

Oh, sure. Let me give a shot. Thank you for your quick review!

SparkQA · 2017-02-10T13:20:40Z

Test build #72701 has finished for PR 16882 at commit 20b8e77.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-02-11T13:35:27Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

                      |${operator.nodeName} can only be performed on tables with the compatible
-                      |column types. $dt1 <> $dt2 at the ${ordinalNumber(ci)} column of
-                      |the ${ordinalNumber(ti + 1)} table
+                      |column types. ${dt1.simpleString} <> ${dt2.simpleString} at the


(I used simpleString for consistency with other codes in this)

StructType.simpleString gets truncated, so it might be hard/impossible to find the error. Can you use StructType.catalogString?

For better UX it might be an idea to render the path to the offending variable, instead of printing the entire struct.

Sure, let me change.

SparkQA · 2017-02-11T15:01:49Z

Test build #72737 has finished for PR 16882 at commit ad2fe5f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-02-12T02:21:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala

   * i.e. the main difference with [[findTightestCommonType]] is that here we allow some
   * loss of precision when widening decimal and double, and promotion to string.
   */
-  private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match {


Hi, @HyukjinKwon .
private[analysis]?

SparkQA · 2017-02-12T07:15:55Z

Test build #72756 has finished for PR 16882 at commit 03ec9de.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-13T15:05:08Z

Test build #72817 has finished for PR 16882 at commit 4ef5c07.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2017-02-13T15:07:56Z

LGTM - merging to master. Thanks!

HyukjinKwon · 2017-02-13T17:50:48Z

Thank you @hvanhovell

…ompatible and others are not in set operations ## What changes were proposed in this pull request? This PR proposes to fix the error message when some data types are compatible and others are not in set/union operation. Currently, the code below: ```scala Seq((1,("a", 1))).toDF.union(Seq((1L,("a", "b"))).toDF) ``` throws an exception saying `LongType` and `IntegerType` are incompatible types. It should say something about `StructType`s with more readable format as below: **Before** ``` Union can only be performed on tables with the compatible column types. LongType <> IntegerType at the first column of the second table;; ``` **After** ``` Union can only be performed on tables with the compatible column types. struct<_1:string,_2:string> <> struct<_1:string,_2:int> at the second column of the second table;; ``` *I manually inserted a newline in the messages above for readability only in this PR description. ## How was this patch tested? Unit tests in `AnalysisErrorSuite`, manual tests and build wth Scala 2.10. Author: hyukjinkwon <[email protected]> Closes apache#16882 from HyukjinKwon/SPARK-19544.

HyukjinKwon changed the title ~~[SPARK-19544][SQL] Improve error message when some column types are compatible and others are not in set/union operations~~ [SPARK-19544][SQL] Improve error message when some column types are compatible and others are not in set operations Feb 10, 2017

HyukjinKwon force-pushed the SPARK-19544 branch 2 times, most recently from c2fe413 to 073c6f9 Compare February 10, 2017 07:23

Improve error message when some column types are compatible and other…

20b8e77

…s are not in set/union operations

HyukjinKwon force-pushed the SPARK-19544 branch from 073c6f9 to 20b8e77 Compare February 10, 2017 07:27

Use simpleString to make the error messages more readable

ad2fe5f

HyukjinKwon commented Feb 11, 2017

View reviewed changes

dongjoon-hyun reviewed Feb 12, 2017

View reviewed changes

Add access modifier, private[analysis], to findWiderTypeForTwo

03ec9de

Use catalogString instead of simpleString

4ef5c07

asfgit closed this in 4321ff9 Feb 13, 2017

hvanhovell mentioned this pull request Mar 10, 2017

[SPARK-19893][SQL] should not run DataFrame set oprations with map type #17236

Closed

HyukjinKwon deleted the SPARK-19544 branch January 2, 2018 03:38

[SPARK-19544][SQL] Improve error message when some column types are compatible and others are not in set operations #16882

[SPARK-19544][SQL] Improve error message when some column types are compatible and others are not in set operations #16882

Uh oh!

Conversation

HyukjinKwon commented Feb 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Feb 10, 2017

Uh oh!

SparkQA commented Feb 10, 2017

Uh oh!

SparkQA commented Feb 10, 2017

Uh oh!

SparkQA commented Feb 10, 2017

Uh oh!

SparkQA commented Feb 10, 2017

Uh oh!

HyukjinKwon commented Feb 10, 2017

Uh oh!

hvanhovell commented Feb 10, 2017

Uh oh!

HyukjinKwon commented Feb 10, 2017

Uh oh!

SparkQA commented Feb 10, 2017

Uh oh!

HyukjinKwon Feb 11, 2017

Choose a reason for hiding this comment

Uh oh!

hvanhovell Feb 13, 2017

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Feb 13, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 11, 2017

Uh oh!

dongjoon-hyun Feb 12, 2017

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Feb 12, 2017

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Feb 12, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 12, 2017

Uh oh!

SparkQA commented Feb 13, 2017

Uh oh!

hvanhovell commented Feb 13, 2017

Uh oh!

HyukjinKwon commented Feb 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HyukjinKwon commented Feb 10, 2017 •

edited

Loading