- 
                Notifications
    You must be signed in to change notification settings 
- Fork 28.9k
[SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 3rd batch #16841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
get latest code from upstream
adding trim characters support
get latest code for pr12646
merge latest code
merge upstream/master
| struct<t1a:string,max(t1b):smallint> | ||
| -- !query 11 output | ||
| val1a 16 | ||
| val1d 10 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results match with the ones from DB2.
| -- !query 13 output | ||
| val1b 8 16 1 10 12 | ||
| val1b 8 16 1 8 16 | ||
| val1b 8 16 1 NULL 16 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results match with the ones from DB2.
| val1b 8 val1b 8 | ||
| val1b 8 val1c 8 | ||
| val1c 8 val1b 8 | ||
| val1c 8 val1c 8 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results match with the ones from DB2.
| test this please | 
| -- !query 3 output | ||
| val1a 16 2014-06-04 01:02:00.001 | ||
| val1a 16 2014-07-04 01:01:00 | ||
| val1a 6 2014-04-04 01:00:00 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not match the DB2 output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same issue for the input data. After I updated the data for db2 test cases, the result is same now. I have upload the db2 files
| -- It includes correlated cases. | ||
|  | ||
| create temporary view t1 as select * from values | ||
| ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values of timestamp do not match what you insert to DB2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Sean: Thank for catching this. The db2 data is
insert into t1 values ('val1a', 6, 8, 10, float(15.0), 20, 20E2, timestamp('2014-04-04 00:00:00.000'), date('2014-04-04'))..
I have verified that is the only difference in the data. I have updated the db2 data, and run against the db2 for all the test files, and verify the result are the same. I upload the updated db2 files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| Test build #72833 has finished for PR 16841 at commit  
 | 
| ok to test | 
| Test build #72984 has finished for PR 16841 at commit  
 | 
| LGTM | 
| Thanks! Merging to master. | 
…atch ## What changes were proposed in this pull request? This is 3ird batch of test case for IN/NOT IN subquery. In this PR, it has these test files: `in-having.sql` `in-joins.sql` `in-multiple-columns.sql` These are the queries and results from running on DB2. [in-having DB2 version](https://github.com/apache/spark/files/772668/in-having.sql.db2.txt) [output of in-having](https://github.com/apache/spark/files/772670/in-having.sql.db2.out.txt) [in-joins DB2 version](https://github.com/apache/spark/files/772672/in-joins.sql.db2.txt) [output of in-joins](https://github.com/apache/spark/files/772673/in-joins.sql.db2.out.txt) [in-multiple-columns DB2 version](https://github.com/apache/spark/files/772678/in-multiple-columns.sql.db2.txt) [output of in-multiple-columns](https://github.com/apache/spark/files/772680/in-multiple-columns.sql.db2.out.txt) ## How was this patch tested? This pr is adding new test cases. We compare the result from spark with the result from another RDBMS(We used DB2 LUW). If the results are the same, we assume the result is correct. Author: Kevin Yu <[email protected]> Closes apache#16841 from kevinyu98/spark-18871-33.
| @kevinyu98 Several of the new tests fail on Big Endian platforms. It appears that rows are returned in a slightly different order but are still a correct output from the query. For example in-joins query 4: -- !query 4 on Little Endian returns wheras on big endian returns: I believe GROUP BY does not define any ordering so both of these outputs are valid for the query as the ORDER BY is only on t1a but obviously the big endian output does not match your expected output so fails. I'm trying to determine why the execution on big endian returns the rows in a different order. | 
| Hello Pete: Thanks for running the test case. Can you send the failing test case file to me? Also I can provide new test files with the output files, can you help test on your platforms? thanks. | 
| To make the results consistent between big endian and small endian, we can improve the queries with the extra order by clauses. @robbinspg Which queries failed? @kevinyu98 Can you collect the failed cases and submit another PR for resolving the issues? Thanks! | 
| @gatorsmile sure, I will do that. Thanks. | 
| OK I'll raise a separate Jira, document the differences and submit a PR | 
…ll up to Optimizer phase ## What changes were proposed in this pull request? Currently Analyzer as part of ResolveSubquery, pulls up the correlated predicates to its originating SubqueryExpression. The subquery plan is then transformed to remove the correlated predicates after they are moved up to the outer plan. In this PR, the task of pulling up correlated predicates is deferred to Optimizer. This is the initial work that will allow us to support the form of correlated subqueries that we don't support today. The design document from nsyca can be found in the following link : [DesignDoc](https://docs.google.com/document/d/1QDZ8JwU63RwGFS6KVF54Rjj9ZJyK33d49ZWbjFBaIgU/edit#) The brief description of code changes (hopefully to aid with code review) can be be found in the following link: [CodeChanges](https://docs.google.com/document/d/18mqjhL9V1An-tNta7aVE13HkALRZ5GZ24AATA-Vqqf0/edit#) ## How was this patch tested? The test case PRs were submitted earlier using. [16337](#16337) [16759](#16759) [16841](#16841) [16915](#16915) [16798](#16798) [16712](#16712) [16710](#16710) [16760](#16760) [16802](#16802) Author: Dilip Biswal <[email protected]> Closes #16954 from dilipbiswal/SPARK-18874.
What changes were proposed in this pull request?
This is 3ird batch of test case for IN/NOT IN subquery. In this PR, it has these test files:
in-having.sqlin-joins.sqlin-multiple-columns.sqlThese are the queries and results from running on DB2.
in-having DB2 version
output of in-having
in-joins DB2 version
output of in-joins
in-multiple-columns DB2 version
output of in-multiple-columns
How was this patch tested?
This pr is adding new test cases. We compare the result from spark with the result from another RDBMS(We used DB2 LUW). If the results are the same, we assume the result is correct.