Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Feb 10, 2017

What changes were proposed in this pull request?

This PR proposes to fix the error message when some data types are compatible and others are not in set/union operation.

Currently, the code below:

Seq((1,("a", 1))).toDF.union(Seq((1L,("a", "b"))).toDF)

throws an exception saying LongType and IntegerType are incompatible types. It should say something about StructTypes with more readable format as below:

Before

Union can only be performed on tables with the compatible column types.
LongType <> IntegerType at the first column of the second table;;

After

Union can only be performed on tables with the compatible column types.
struct<_1:string,_2:string> <> struct<_1:string,_2:int> at the second column of the second table;;

*I manually inserted a newline in the messages above for readability only in this PR description.

How was this patch tested?

Unit tests in AnalysisErrorSuite, manual tests and build wth Scala 2.10.

@HyukjinKwon
Copy link
Member Author

cc @hvanhovell could you please take a look?

@HyukjinKwon HyukjinKwon changed the title [SPARK-19544][SQL] Improve error message when some column types are compatible and others are not in set/union operations [SPARK-19544][SQL] Improve error message when some column types are compatible and others are not in set operations Feb 10, 2017
@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72683 has finished for PR 16882 at commit 07e6984.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon force-pushed the SPARK-19544 branch 2 times, most recently from c2fe413 to 073c6f9 Compare February 10, 2017 07:23
@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72691 has started for PR 16882 at commit c2fe413.

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72692 has started for PR 16882 at commit 073c6f9.

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72693 has started for PR 16882 at commit 20b8e77.

@HyukjinKwon
Copy link
Member Author

retest this please

@hvanhovell
Copy link
Contributor

@HyukjinKwon Can you also improve the error message. I don't think StructType(StructField(_1,StringType,true), StructField(_2,StringType,true)) <> StructType(StructField(_1,StringType,true), StructField(_2,IntegerType,false)) is readable at all. Can we use catalog string instead?

Other than that, this looks fine.

@HyukjinKwon
Copy link
Member Author

Oh, sure. Let me give a shot. Thank you for your quick review!

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72701 has finished for PR 16882 at commit 20b8e77.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

|${operator.nodeName} can only be performed on tables with the compatible
|column types. $dt1 <> $dt2 at the ${ordinalNumber(ci)} column of
|the ${ordinalNumber(ti + 1)} table
|column types. ${dt1.simpleString} <> ${dt2.simpleString} at the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I used simpleString for consistency with other codes in this)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructType.simpleString gets truncated, so it might be hard/impossible to find the error. Can you use StructType.catalogString?

For better UX it might be an idea to render the path to the offending variable, instead of printing the entire struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me change.

@SparkQA
Copy link

SparkQA commented Feb 11, 2017

Test build #72737 has finished for PR 16882 at commit ad2fe5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* i.e. the main difference with [[findTightestCommonType]] is that here we allow some
* loss of precision when widening decimal and double, and promotion to string.
*/
private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @HyukjinKwon .
private[analysis]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Added)

@SparkQA
Copy link

SparkQA commented Feb 12, 2017

Test build #72756 has finished for PR 16882 at commit 03ec9de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 13, 2017

Test build #72817 has finished for PR 16882 at commit 4ef5c07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM - merging to master. Thanks!

@asfgit asfgit closed this in 4321ff9 Feb 13, 2017
@HyukjinKwon
Copy link
Member Author

Thank you @hvanhovell

cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…ompatible and others are not in set operations

## What changes were proposed in this pull request?

This PR proposes to fix the error message when some data types are compatible and others are not in set/union operation.

Currently, the code below:

```scala
Seq((1,("a", 1))).toDF.union(Seq((1L,("a", "b"))).toDF)
```

throws an exception saying `LongType` and `IntegerType` are incompatible types. It should say something about `StructType`s with more readable format as below:

**Before**

```
Union can only be performed on tables with the compatible column types.
LongType <> IntegerType at the first column of the second table;;
```

**After**

```
Union can only be performed on tables with the compatible column types.
struct<_1:string,_2:string> <> struct<_1:string,_2:int> at the second column of the second table;;
```

*I manually inserted a newline in the messages above for readability only in this PR description.

## How was this patch tested?

Unit tests in `AnalysisErrorSuite`, manual tests and build wth Scala 2.10.

Author: hyukjinkwon <[email protected]>

Closes apache#16882 from HyukjinKwon/SPARK-19544.
@HyukjinKwon HyukjinKwon deleted the SPARK-19544 branch January 2, 2018 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants