[SPARK-18674][SQL] improve the error message of using join #16100

cloud-fan · 2016-12-01T14:43:44Z

What changes were proposed in this pull request?

The current error message of USING join is quite confusing, for example:

scala> val df1 = List(1,2,3).toDS.withColumnRenamed("value", "c1")
df1: org.apache.spark.sql.DataFrame = [c1: int]

scala> val df2 = List(1,2,3).toDS.withColumnRenamed("value", "c2")
df2: org.apache.spark.sql.DataFrame = [c2: int]

scala> df1.join(df2, usingColumn = "c1")
org.apache.spark.sql.AnalysisException: using columns ['c1] can not be resolved given input columns: [c1, c2] ;;
'Join UsingJoin(Inner,List('c1))
:- Project [value#1 AS c1#3]
:  +- LocalRelation [value#1]
+- Project [value#7 AS c2#9]
   +- LocalRelation [value#7]

after this PR, it becomes:

scala> val df1 = List(1,2,3).toDS.withColumnRenamed("value", "c1")
df1: org.apache.spark.sql.DataFrame = [c1: int]

scala> val df2 = List(1,2,3).toDS.withColumnRenamed("value", "c2")
df2: org.apache.spark.sql.DataFrame = [c2: int]

scala> df1.join(df2, usingColumn = "c1")
org.apache.spark.sql.AnalysisException: USING column `c1` can not be resolved with the right join side, the right output is: [c2];

How was this patch tested?

updated tests

cloud-fan · 2016-12-01T14:45:09Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala

  override def sql: String = "NATURAL " + tpe.sql
 }

-case class UsingJoin(tpe: JoinType, usingColumns: Seq[UnresolvedAttribute]) extends JoinType {


the USING column can never has a qualifier, or be a nested field, we don't need to use UnresolvedAttribute here.

Yeah, we do not support the nested field. It also fails with your newly changed error.

sql("CREATE TABLE complexTypeTable (s struct<i: string>)") val df = table("complexTypeTable") df.as("b").join(df.as("a"), "s.i").show()

Could you add the test case for it? Thanks!

Submitted a follow-up PR #16110 for the test case of nested fields. When we implementing using join, we did not add any test case for nested fields. Thus, it was not covered before.

cloud-fan · 2016-12-01T14:45:33Z

cc @rxin @gatorsmile

gatorsmile · 2016-12-01T16:29:25Z

This is a good fix! Will review it today.

SparkQA · 2016-12-01T17:09:39Z

Test build #69481 has finished for PR 16100 at commit 9461839.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class UsingJoin(tpe: JoinType, usingColumns: Seq[String]) extends JoinType

hvanhovell · 2016-12-01T19:48:14Z

LGTM

hvanhovell · 2016-12-01T19:52:35Z

Merging to master and 2.1. Thanks!

## What changes were proposed in this pull request? The current error message of USING join is quite confusing, for example: ``` scala> val df1 = List(1,2,3).toDS.withColumnRenamed("value", "c1") df1: org.apache.spark.sql.DataFrame = [c1: int] scala> val df2 = List(1,2,3).toDS.withColumnRenamed("value", "c2") df2: org.apache.spark.sql.DataFrame = [c2: int] scala> df1.join(df2, usingColumn = "c1") org.apache.spark.sql.AnalysisException: using columns ['c1] can not be resolved given input columns: [c1, c2] ;; 'Join UsingJoin(Inner,List('c1)) :- Project [value#1 AS c1#3] : +- LocalRelation [value#1] +- Project [value#7 AS c2#9] +- LocalRelation [value#7] ``` after this PR, it becomes: ``` scala> val df1 = List(1,2,3).toDS.withColumnRenamed("value", "c1") df1: org.apache.spark.sql.DataFrame = [c1: int] scala> val df2 = List(1,2,3).toDS.withColumnRenamed("value", "c2") df2: org.apache.spark.sql.DataFrame = [c2: int] scala> df1.join(df2, usingColumn = "c1") org.apache.spark.sql.AnalysisException: USING column `c1` can not be resolved with the right join side, the right output is: [c2]; ``` ## How was this patch tested? updated tests Author: Wenchen Fan <[email protected]> Closes #16100 from cloud-fan/natural. (cherry picked from commit e653484) Signed-off-by: Herman van Hovell <[email protected]>

## What changes were proposed in this pull request? The current error message of USING join is quite confusing, for example: ``` scala> val df1 = List(1,2,3).toDS.withColumnRenamed("value", "c1") df1: org.apache.spark.sql.DataFrame = [c1: int] scala> val df2 = List(1,2,3).toDS.withColumnRenamed("value", "c2") df2: org.apache.spark.sql.DataFrame = [c2: int] scala> df1.join(df2, usingColumn = "c1") org.apache.spark.sql.AnalysisException: using columns ['c1] can not be resolved given input columns: [c1, c2] ;; 'Join UsingJoin(Inner,List('c1)) :- Project [value#1 AS c1#3] : +- LocalRelation [value#1] +- Project [value#7 AS c2#9] +- LocalRelation [value#7] ``` after this PR, it becomes: ``` scala> val df1 = List(1,2,3).toDS.withColumnRenamed("value", "c1") df1: org.apache.spark.sql.DataFrame = [c1: int] scala> val df2 = List(1,2,3).toDS.withColumnRenamed("value", "c2") df2: org.apache.spark.sql.DataFrame = [c2: int] scala> df1.join(df2, usingColumn = "c1") org.apache.spark.sql.AnalysisException: USING column `c1` can not be resolved with the right join side, the right output is: [c2]; ``` ## How was this patch tested? updated tests Author: Wenchen Fan <[email protected]> Closes apache#16100 from cloud-fan/natural.

improve the error message of natural join

9461839

cloud-fan commented Dec 1, 2016

View reviewed changes

cloud-fan changed the title ~~[SPARK-18674][SQL] improve the error message of natural join~~ [SPARK-18674][SQL] improve the error message of using join Dec 1, 2016

asfgit closed this in e653484 Dec 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-18674][SQL] improve the error message of using join #16100

[SPARK-18674][SQL] improve the error message of using join #16100

Uh oh!

cloud-fan commented Dec 1, 2016

Uh oh!

cloud-fan Dec 1, 2016

Uh oh!

gatorsmile Dec 1, 2016 •

edited

Loading

Uh oh!

gatorsmile Dec 1, 2016

Uh oh!

cloud-fan commented Dec 1, 2016

Uh oh!

gatorsmile commented Dec 1, 2016

Uh oh!

SparkQA commented Dec 1, 2016

Uh oh!

hvanhovell commented Dec 1, 2016

Uh oh!

hvanhovell commented Dec 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-18674][SQL] improve the error message of using join #16100

[SPARK-18674][SQL] improve the error message of using join #16100

Uh oh!

Conversation

cloud-fan commented Dec 1, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan Dec 1, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Dec 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Dec 1, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 1, 2016

Uh oh!

gatorsmile commented Dec 1, 2016

Uh oh!

SparkQA commented Dec 1, 2016

Uh oh!

hvanhovell commented Dec 1, 2016

Uh oh!

hvanhovell commented Dec 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gatorsmile Dec 1, 2016 •

edited

Loading