-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 1 of 2) - scalar subquery in SELECT clause #16712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rrect results ## What changes were proposed in this pull request? This patch fixes the incorrect results in the rule ResolveSubquery in Catalyst's Analysis phase. ## How was this patch tested? ./dev/run-tests a new unit test on the problematic pattern.
…rrect results ## What changes were proposed in this pull request? This patch fixes the incorrect results in the rule ResolveSubquery in Catalyst's Analysis phase. ## How was this patch tested? ./dev/run-tests a new unit test on the problematic pattern.
|
Attached are a slightly modified version of the submitted test file to adapt to IBM DB2 syntax, and the result of the run. |
|
cc: @kevinyu98, @gatorsmile. Also FYI to @hvanhovell. |
|
Test build #72035 has finished for PR 16712 at commit
|
| t1d NULL | ||
| t1e 10 | ||
| t1e 10 | ||
| t1e 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have compared this result set with the attached DB2's result set, they are equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, @kevinyu98.
| ("t1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null), | ||
| ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null), | ||
| ("t3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), | ||
| ("t3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasons we use the column names as the value of t3a, t2a and t1a? It looks confusing when reading the queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason. I just followed the convention used in #16337 that you reviewed and merged. Please suggest a pattern if you want to have this changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please change it to something like val3a. It will be easy for reviewers to review the changes if you just change the prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this issue in #16337
| (SELECT max(t2h) FROM t2) max_t2h | ||
| FROM t1 | ||
| WHERE t1a = 't1c' | ||
| ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The style issue. How about following the other test cases? Do not put ; as a separate ilne?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason to have ; not as part of the last line is so we can add additions to the query without the need to edit the last line. I will make change to satisfy your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| ON t2a = t1a | ||
| WHERE t2c = t3c) | ||
| AND t3a = t1a) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also remove this empty line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do.
|
What is the part-2 for scalar subquery test cases? |
|
The part-2 is for scalar subquery in predicates. |
|
Thank you, @gatorsmile, for your time reviewing this test PR. I will wait for your suggestion on the pattern of the literals in the first columns of the tables if you do need to have them changed. Then I will make necessary changes to the files and do another push. |
|
uh, I see. Maybe you can improve the PR title to |
|
Test build #72171 has finished for PR 16712 at commit
|
|
LGTM. cc @hvanhovell for final sign off |
|
LGTM - merging to master. Thanks! |
|
: ) Let me merge it to master. |
…f 2) - scalar subquery in SELECT clause ## What changes were proposed in this pull request? This PR adds new test cases for scalar subquery in SELECT clause. ## How was this patch tested? The test result is compared with the result run from another SQL engine (in this case is IBM DB2). If the result are equivalent, we assume the result is correct. Author: Nattavut Sutyanyong <[email protected]> Closes apache#16712 from nsyca/18873.
…ll up to Optimizer phase ## What changes were proposed in this pull request? Currently Analyzer as part of ResolveSubquery, pulls up the correlated predicates to its originating SubqueryExpression. The subquery plan is then transformed to remove the correlated predicates after they are moved up to the outer plan. In this PR, the task of pulling up correlated predicates is deferred to Optimizer. This is the initial work that will allow us to support the form of correlated subqueries that we don't support today. The design document from nsyca can be found in the following link : [DesignDoc](https://docs.google.com/document/d/1QDZ8JwU63RwGFS6KVF54Rjj9ZJyK33d49ZWbjFBaIgU/edit#) The brief description of code changes (hopefully to aid with code review) can be be found in the following link: [CodeChanges](https://docs.google.com/document/d/18mqjhL9V1An-tNta7aVE13HkALRZ5GZ24AATA-Vqqf0/edit#) ## How was this patch tested? The test case PRs were submitted earlier using. [16337](#16337) [16759](#16759) [16841](#16841) [16915](#16915) [16798](#16798) [16712](#16712) [16710](#16710) [16760](#16760) [16802](#16802) Author: Dilip Biswal <[email protected]> Closes #16954 from dilipbiswal/SPARK-18874.
What changes were proposed in this pull request?
This PR adds new test cases for scalar subquery in SELECT clause.
How was this patch tested?
The test result is compared with the result run from another SQL engine (in this case is IBM DB2). If the result are equivalent, we assume the result is correct.