[SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 3rd batch #16841

kevinyu98 · 2017-02-07T19:34:54Z

What changes were proposed in this pull request?

This is 3ird batch of test case for IN/NOT IN subquery. In this PR, it has these test files:

in-having.sql
in-joins.sql
in-multiple-columns.sql

These are the queries and results from running on DB2.
in-having DB2 version
output of in-having
in-joins DB2 version
output of in-joins
in-multiple-columns DB2 version
output of in-multiple-columns

How was this patch tested?

This pr is adding new test cases. We compare the result from spark with the result from another RDBMS(We used DB2 LUW). If the results are the same, we assume the result is correct.

get latest code from upstream

adding trim characters support

get latest code for pr12646

merge latest code

merge upstream/master

nsyca · 2017-02-08T14:30:16Z

sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-having.sql.out

+struct<t1a:string,max(t1b):smallint>
+-- !query 11 output
+val1a	16
+val1d	10


The results match with the ones from DB2.

nsyca · 2017-02-08T14:32:18Z

sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-joins.sql.out

+-- !query 13 output
+val1b	8	16	1	10	12
+val1b	8	16	1	8	16
+val1b	8	16	1	NULL	16


The results match with the ones from DB2.

nsyca · 2017-02-08T14:33:10Z

sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-multiple-columns.sql.out

+val1b	8	val1b	8
+val1b	8	val1c	8
+val1c	8	val1b	8
+val1c	8	val1c	8


The results match with the ones from DB2.

gatorsmile · 2017-02-13T21:45:58Z

test this please

gatorsmile · 2017-02-13T21:49:19Z

sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-multiple-columns.sql.out

+-- !query 3 output
+val1a	16	2014-06-04 01:02:00.001
+val1a	16	2014-07-04 01:01:00
+val1a	6	2014-04-04 01:00:00


This does not match the DB2 output.

This is the same issue for the input data. After I updated the data for db2 test cases, the result is same now. I have upload the db2 files

gatorsmile · 2017-02-13T21:50:59Z

sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-multiple-columns.sql

+-- It includes correlated cases.
+
+create temporary view t1 as select * from values
+  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'),


The values of timestamp do not match what you insert to DB2

Hello Sean: Thank for catching this. The db2 data is
insert into t1 values ('val1a', 6, 8, 10, float(15.0), 20, 20E2, timestamp('2014-04-04 00:00:00.000'), date('2014-04-04'))..
I have verified that is the only difference in the data. I have updated the db2 data, and run against the db2 for all the test files, and verify the result are the same. I upload the updated db2 files.

SparkQA · 2017-02-14T00:21:07Z

Test build #72833 has finished for PR 16841 at commit 850aacd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-02-16T05:22:27Z

ok to test

SparkQA · 2017-02-16T07:59:25Z

Test build #72984 has finished for PR 16841 at commit 850aacd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-02-16T08:00:31Z

LGTM

gatorsmile · 2017-02-16T08:00:57Z

Thanks! Merging to master.

…atch ## What changes were proposed in this pull request? This is 3ird batch of test case for IN/NOT IN subquery. In this PR, it has these test files: `in-having.sql` `in-joins.sql` `in-multiple-columns.sql` These are the queries and results from running on DB2. [in-having DB2 version](https://github.com/apache/spark/files/772668/in-having.sql.db2.txt) [output of in-having](https://github.com/apache/spark/files/772670/in-having.sql.db2.out.txt) [in-joins DB2 version](https://github.com/apache/spark/files/772672/in-joins.sql.db2.txt) [output of in-joins](https://github.com/apache/spark/files/772673/in-joins.sql.db2.out.txt) [in-multiple-columns DB2 version](https://github.com/apache/spark/files/772678/in-multiple-columns.sql.db2.txt) [output of in-multiple-columns](https://github.com/apache/spark/files/772680/in-multiple-columns.sql.db2.out.txt) ## How was this patch tested? This pr is adding new test cases. We compare the result from spark with the result from another RDBMS(We used DB2 LUW). If the results are the same, we assume the result is correct. Author: Kevin Yu <[email protected]> Closes apache#16841 from kevinyu98/spark-18871-33.

robbinspg · 2017-02-22T16:19:35Z

@kevinyu98 Several of the new tests fail on Big Endian platforms. It appears that rows are returned in a slightly different order but are still a correct output from the query. For example in-joins query 4:

-- !query 4
SELECT Count(DISTINCT(t1a)),
t1b,
t3a,
t3b,
t3c
FROM t1 natural left JOIN t3
WHERE t1a IN
(
SELECT t2a
FROM t2
WHERE t1d = t2d)
AND t1b > t3b
GROUP BY t1a,
t1b,
t3a,
t3b,
t3c
ORDER BY t1a DESC

on Little Endian returns
1 10 val3b 8 NULL
1 10 val1b 8 16
1 10 val3a 6 12
1 8 val3a 6 12
1 8 val3a 6 12

wheras on big endian returns:
1 10 val3a 6 12
1 10 val3b 8 NULL
1 10 val1b 8 16
1 8 val3a 6 12
1 8 val3a 6 12

I believe GROUP BY does not define any ordering so both of these outputs are valid for the query as the ORDER BY is only on t1a but obviously the big endian output does not match your expected output so fails.

I'm trying to determine why the execution on big endian returns the rows in a different order.

kevinyu98 · 2017-02-23T05:48:33Z

Hello Pete: Thanks for running the test case. Can you send the failing test case file to me? Also I can provide new test files with the output files, can you help test on your platforms? thanks.

gatorsmile · 2017-02-23T05:53:18Z

To make the results consistent between big endian and small endian, we can improve the queries with the extra order by clauses.

@robbinspg Which queries failed? @kevinyu98 Can you collect the failed cases and submit another PR for resolving the issues? Thanks!

kevinyu98 · 2017-02-23T06:04:20Z

@gatorsmile sure, I will do that. Thanks.

robbinspg · 2017-02-23T09:22:54Z

OK I'll raise a separate Jira, document the differences and submit a PR

https://issues.apache.org/jira/browse/SPARK-19710
PR #17039

…ll up to Optimizer phase ## What changes were proposed in this pull request? Currently Analyzer as part of ResolveSubquery, pulls up the correlated predicates to its originating SubqueryExpression. The subquery plan is then transformed to remove the correlated predicates after they are moved up to the outer plan. In this PR, the task of pulling up correlated predicates is deferred to Optimizer. This is the initial work that will allow us to support the form of correlated subqueries that we don't support today. The design document from nsyca can be found in the following link : [DesignDoc](https://docs.google.com/document/d/1QDZ8JwU63RwGFS6KVF54Rjj9ZJyK33d49ZWbjFBaIgU/edit#) The brief description of code changes (hopefully to aid with code review) can be be found in the following link: [CodeChanges](https://docs.google.com/document/d/18mqjhL9V1An-tNta7aVE13HkALRZ5GZ24AATA-Vqqf0/edit#) ## How was this patch tested? The test case PRs were submitted earlier using. [16337](#16337) [16759](#16759) [16841](#16841) [16915](#16915) [16798](#16798) [16712](#16712) [16710](#16710) [16760](#16760) [16802](#16802) Author: Dilip Biswal <[email protected]> Closes #16954 from dilipbiswal/SPARK-18874.

kevinyu98 added 30 commits April 20, 2016 11:06

adding testcase

3b44c59

Merge remote-tracking branch 'upstream/master'

18b4a31

Merge remote-tracking branch 'upstream/master'

4f4d1c8

get latest code from upstream

Merge remote-tracking branch 'upstream/master'

f5f0cbe

adding trim characters support

Merge remote-tracking branch 'upstream/master'

d8b2edb

get latest code for pr12646

Merge remote-tracking branch 'upstream/master'

196b6c6

merge latest code

Merge remote-tracking branch 'upstream/master'

f37a01e

merge upstream/master

Merge remote-tracking branch 'upstream/master'

bb5b01f

Merge remote-tracking branch 'upstream/master'

bde5820

Merge remote-tracking branch 'upstream/master'

5f7cd96

Merge remote-tracking branch 'upstream/master'

893a49a

Merge remote-tracking branch 'upstream/master'

4bbe1fd

Merge remote-tracking branch 'upstream/master'

b2dd795

Merge remote-tracking branch 'upstream/master'

8c3e5da

Merge remote-tracking branch 'upstream/master'

a0eaa40

Merge remote-tracking branch 'upstream/master'

d03c940

Merge remote-tracking branch 'upstream/master'

d728d5e

Merge remote-tracking branch 'upstream/master'

ea104dd

Merge remote-tracking branch 'upstream/master'

6ab1215

Merge remote-tracking branch 'upstream/master'

0c56653

Merge remote-tracking branch 'upstream/master'

d7a1874

Merge remote-tracking branch 'upstream/master'

85d3500

Merge remote-tracking branch 'upstream/master'

c056f91

Merge remote-tracking branch 'upstream/master'

0b8189d

Merge remote-tracking branch 'upstream/master'

c2ea31d

Merge remote-tracking branch 'upstream/master'

a2d3056

Merge remote-tracking branch 'upstream/master'

39e5648

Merge remote-tracking branch 'upstream/master'

b9370a3

Merge remote-tracking branch 'upstream/master'

01224a4

Merge remote-tracking branch 'upstream/master'

d05d39a

kevinyu98 added 10 commits November 4, 2016 10:23

Merge remote-tracking branch 'upstream/master'

5525dff

Merge remote-tracking branch 'upstream/master'

63715e4

Merge remote-tracking branch 'upstream/master'

a084410

Merge remote-tracking branch 'upstream/master'

b6e5b97

Merge remote-tracking branch 'upstream/master'

bdd5423

Merge remote-tracking branch 'upstream/master'

6638336

Merge remote-tracking branch 'upstream/master'

f89863e

Merge remote-tracking branch 'upstream/master'

b48993c

Merge remote-tracking branch 'upstream/master'

9324514

add IN subquery test case 3ird batch

850aacd

nsyca reviewed Feb 8, 2017

View reviewed changes

gatorsmile reviewed Feb 13, 2017

View reviewed changes

kevinyu98 changed the title ~~[SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 3ird batch~~ [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 3rd batch Feb 13, 2017

asfgit closed this in 3871d94 Feb 16, 2017

dilipbiswal mentioned this pull request Feb 16, 2017

[SPARK-18874][SQL] First phase: Deferring the correlated predicate pull up to Optimizer phase #16954

Closed

[SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 3rd batch #16841

[SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 3rd batch #16841

Uh oh!

Conversation

kevinyu98 commented Feb 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Feb 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 14, 2017

Uh oh!

gatorsmile commented Feb 16, 2017

Uh oh!

SparkQA commented Feb 16, 2017

Uh oh!

gatorsmile commented Feb 16, 2017

Uh oh!

gatorsmile commented Feb 16, 2017

Uh oh!

robbinspg commented Feb 22, 2017

Uh oh!

kevinyu98 commented Feb 23, 2017

Uh oh!

gatorsmile commented Feb 23, 2017

Uh oh!

kevinyu98 commented Feb 23, 2017

Uh oh!

robbinspg commented Feb 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kevinyu98 commented Feb 7, 2017 •

edited

Loading

robbinspg commented Feb 23, 2017 •

edited

Loading