Skip to content

Conversation

@nsyca
Copy link
Contributor

@nsyca nsyca commented Jan 26, 2017

What changes were proposed in this pull request?

This PR adds new test cases for scalar subquery in SELECT clause.

How was this patch tested?

The test result is compared with the result run from another SQL engine (in this case is IBM DB2). If the result are equivalent, we assume the result is correct.

nsyca added 16 commits July 29, 2016 17:43
…rrect results

## What changes were proposed in this pull request?

This patch fixes the incorrect results in the rule ResolveSubquery in Catalyst's Analysis phase.

## How was this patch tested?
./dev/run-tests
a new unit test on the problematic pattern.
…rrect results

## What changes were proposed in this pull request?

This patch fixes the incorrect results in the rule ResolveSubquery in Catalyst's Analysis phase.

## How was this patch tested?
./dev/run-tests
a new unit test on the problematic pattern.
@nsyca
Copy link
Contributor Author

nsyca commented Jan 26, 2017

Attached are a slightly modified version of the submitted test file to adapt to IBM DB2 syntax, and the result of the run.
Modified version of the test file
Run result from DB2

@nsyca
Copy link
Contributor Author

nsyca commented Jan 26, 2017

cc: @kevinyu98, @gatorsmile. Also FYI to @hvanhovell.

@SparkQA
Copy link

SparkQA commented Jan 26, 2017

Test build #72035 has finished for PR 16712 at commit 48ff3c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

t1d NULL
t1e 10
t1e 10
t1e 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have compared this result set with the attached DB2's result set, they are equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @kevinyu98.

("t1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null),
("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null),
("t3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'),
("t3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasons we use the column names as the value of t3a, t2a and t1a? It looks confusing when reading the queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. I just followed the convention used in #16337 that you reviewed and merged. Please suggest a pattern if you want to have this changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please change it to something like val3a. It will be easy for reviewers to review the changes if you just change the prefix.

Copy link
Member

@gatorsmile gatorsmile Jan 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this issue in #16337

(SELECT max(t2h) FROM t2) max_t2h
FROM t1
WHERE t1a = 't1c'
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style issue. How about following the other test cases? Do not put ; as a separate ilne?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to have ; not as part of the last line is so we can add additions to the query without the need to edit the last line. I will make change to satisfy your comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

ON t2a = t1a
WHERE t2c = t3c)
AND t3a = t1a)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remove this empty line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do.

@gatorsmile
Copy link
Member

What is the part-2 for scalar subquery test cases?

@nsyca
Copy link
Contributor Author

nsyca commented Jan 30, 2017

The part-2 is for scalar subquery in predicates.

@nsyca
Copy link
Contributor Author

nsyca commented Jan 30, 2017

Thank you, @gatorsmile, for your time reviewing this test PR. I will wait for your suggestion on the pattern of the literals in the first columns of the tables if you do need to have them changed. Then I will make necessary changes to the files and do another push.

@gatorsmile
Copy link
Member

uh, I see. Maybe you can improve the PR title to

[SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 1 of 2) - scalar subquery in SELECT clause

@nsyca nsyca changed the title [SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 1 of 2) [SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 1 of 2) - scalar subquery in SELECT clause Jan 30, 2017
@nsyca
Copy link
Contributor Author

nsyca commented Jan 30, 2017

These two commands should give you the delta of the changes I made to address your comments.

0db0bc3

818df9e

@SparkQA
Copy link

SparkQA commented Jan 30, 2017

Test build #72171 has finished for PR 16712 at commit 818df9e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM. cc @hvanhovell for final sign off

@hvanhovell
Copy link
Contributor

LGTM - merging to master. Thanks!

@gatorsmile
Copy link
Member

: ) Let me merge it to master.

@asfgit asfgit closed this in 266c1e7 Feb 8, 2017
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…f 2) - scalar subquery in SELECT clause

## What changes were proposed in this pull request?
This PR adds new test cases for scalar subquery in SELECT clause.

## How was this patch tested?
The test result is compared with the result run from another SQL engine (in this case is IBM DB2). If the result are equivalent, we assume the result is correct.

Author: Nattavut Sutyanyong <[email protected]>

Closes apache#16712 from nsyca/18873.
asfgit pushed a commit that referenced this pull request Mar 14, 2017
…ll up to Optimizer phase

## What changes were proposed in this pull request?
Currently Analyzer as part of ResolveSubquery, pulls up the correlated predicates to its
originating SubqueryExpression. The subquery plan is then transformed to remove the correlated
predicates after they are moved up to the outer plan. In this PR, the task of pulling up
correlated predicates is deferred to Optimizer. This is the initial work that will allow us to
support the form of correlated subqueries that we don't support today. The design document
from nsyca can be found in the following link :
[DesignDoc](https://docs.google.com/document/d/1QDZ8JwU63RwGFS6KVF54Rjj9ZJyK33d49ZWbjFBaIgU/edit#)

The brief description of code changes (hopefully to aid with code review) can be be found in the
following link:
[CodeChanges](https://docs.google.com/document/d/18mqjhL9V1An-tNta7aVE13HkALRZ5GZ24AATA-Vqqf0/edit#)

## How was this patch tested?
The test case PRs were submitted earlier using.
[16337](#16337) [16759](#16759) [16841](#16841) [16915](#16915) [16798](#16798) [16712](#16712) [16710](#16710) [16760](#16760) [16802](#16802)

Author: Dilip Biswal <[email protected]>

Closes #16954 from dilipbiswal/SPARK-18874.
@nsyca nsyca deleted the 18873 branch March 14, 2017 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants