Skip to content

Conversation

@kevinyu98
Copy link
Contributor

@kevinyu98 kevinyu98 commented Feb 7, 2017

What changes were proposed in this pull request?

This is 3ird batch of test case for IN/NOT IN subquery. In this PR, it has these test files:

in-having.sql
in-joins.sql
in-multiple-columns.sql

These are the queries and results from running on DB2.
in-having DB2 version
output of in-having
in-joins DB2 version
output of in-joins
in-multiple-columns DB2 version
output of in-multiple-columns

How was this patch tested?

This pr is adding new test cases. We compare the result from spark with the result from another RDBMS(We used DB2 LUW). If the results are the same, we assume the result is correct.

get latest code from upstream
adding trim characters support
struct<t1a:string,max(t1b):smallint>
-- !query 11 output
val1a 16
val1d 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results match with the ones from DB2.

-- !query 13 output
val1b 8 16 1 10 12
val1b 8 16 1 8 16
val1b 8 16 1 NULL 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results match with the ones from DB2.

val1b 8 val1b 8
val1b 8 val1c 8
val1c 8 val1b 8
val1c 8 val1c 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results match with the ones from DB2.

@gatorsmile
Copy link
Member

test this please

-- !query 3 output
val1a 16 2014-06-04 01:02:00.001
val1a 16 2014-07-04 01:01:00
val1a 6 2014-04-04 01:00:00
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not match the DB2 output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same issue for the input data. After I updated the data for db2 test cases, the result is same now. I have upload the db2 files

-- It includes correlated cases.

create temporary view t1 as select * from values
("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values of timestamp do not match what you insert to DB2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Sean: Thank for catching this. The db2 data is
insert into t1 values ('val1a', 6, 8, 10, float(15.0), 20, 20E2, timestamp('2014-04-04 00:00:00.000'), date('2014-04-04'))..
I have verified that is the only difference in the data. I have updated the db2 data, and run against the db2 for all the test files, and verify the result are the same. I upload the updated db2 files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kevinyu98 kevinyu98 changed the title [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 3ird batch [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN subquery 3rd batch Feb 13, 2017
@SparkQA
Copy link

SparkQA commented Feb 14, 2017

Test build #72833 has finished for PR 16841 at commit 850aacd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Feb 16, 2017

Test build #72984 has finished for PR 16841 at commit 850aacd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 3871d94 Feb 16, 2017
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 16, 2017
…atch

## What changes were proposed in this pull request?

This is 3ird batch of test case for IN/NOT IN subquery. In this PR, it has these test files:

`in-having.sql`
`in-joins.sql`
`in-multiple-columns.sql`

These are the queries and results from running on DB2.
[in-having DB2 version](https://github.com/apache/spark/files/772668/in-having.sql.db2.txt)
[output of in-having](https://github.com/apache/spark/files/772670/in-having.sql.db2.out.txt)
[in-joins DB2 version](https://github.com/apache/spark/files/772672/in-joins.sql.db2.txt)
[output of in-joins](https://github.com/apache/spark/files/772673/in-joins.sql.db2.out.txt)
[in-multiple-columns DB2 version](https://github.com/apache/spark/files/772678/in-multiple-columns.sql.db2.txt)
[output of in-multiple-columns](https://github.com/apache/spark/files/772680/in-multiple-columns.sql.db2.out.txt)

## How was this patch tested?
This pr is adding new test cases. We compare the result from spark with the result from another RDBMS(We used DB2 LUW). If the results are the same, we assume the result is correct.

Author: Kevin Yu <[email protected]>

Closes apache#16841 from kevinyu98/spark-18871-33.
@robbinspg
Copy link
Member

@kevinyu98 Several of the new tests fail on Big Endian platforms. It appears that rows are returned in a slightly different order but are still a correct output from the query. For example in-joins query 4:

-- !query 4
SELECT Count(DISTINCT(t1a)),
t1b,
t3a,
t3b,
t3c
FROM t1 natural left JOIN t3
WHERE t1a IN
(
SELECT t2a
FROM t2
WHERE t1d = t2d)
AND t1b > t3b
GROUP BY t1a,
t1b,
t3a,
t3b,
t3c
ORDER BY t1a DESC

on Little Endian returns
1 10 val3b 8 NULL
1 10 val1b 8 16
1 10 val3a 6 12
1 8 val3a 6 12
1 8 val3a 6 12

wheras on big endian returns:
1 10 val3a 6 12
1 10 val3b 8 NULL
1 10 val1b 8 16
1 8 val3a 6 12
1 8 val3a 6 12

I believe GROUP BY does not define any ordering so both of these outputs are valid for the query as the ORDER BY is only on t1a but obviously the big endian output does not match your expected output so fails.

I'm trying to determine why the execution on big endian returns the rows in a different order.

@kevinyu98
Copy link
Contributor Author

Hello Pete: Thanks for running the test case. Can you send the failing test case file to me? Also I can provide new test files with the output files, can you help test on your platforms? thanks.

@gatorsmile
Copy link
Member

To make the results consistent between big endian and small endian, we can improve the queries with the extra order by clauses.

@robbinspg Which queries failed? @kevinyu98 Can you collect the failed cases and submit another PR for resolving the issues? Thanks!

@kevinyu98
Copy link
Contributor Author

@gatorsmile sure, I will do that. Thanks.

@robbinspg
Copy link
Member

robbinspg commented Feb 23, 2017

OK I'll raise a separate Jira, document the differences and submit a PR

https://issues.apache.org/jira/browse/SPARK-19710
PR #17039

asfgit pushed a commit that referenced this pull request Mar 14, 2017
…ll up to Optimizer phase

## What changes were proposed in this pull request?
Currently Analyzer as part of ResolveSubquery, pulls up the correlated predicates to its
originating SubqueryExpression. The subquery plan is then transformed to remove the correlated
predicates after they are moved up to the outer plan. In this PR, the task of pulling up
correlated predicates is deferred to Optimizer. This is the initial work that will allow us to
support the form of correlated subqueries that we don't support today. The design document
from nsyca can be found in the following link :
[DesignDoc](https://docs.google.com/document/d/1QDZ8JwU63RwGFS6KVF54Rjj9ZJyK33d49ZWbjFBaIgU/edit#)

The brief description of code changes (hopefully to aid with code review) can be be found in the
following link:
[CodeChanges](https://docs.google.com/document/d/18mqjhL9V1An-tNta7aVE13HkALRZ5GZ24AATA-Vqqf0/edit#)

## How was this patch tested?
The test case PRs were submitted earlier using.
[16337](#16337) [16759](#16759) [16841](#16841) [16915](#16915) [16798](#16798) [16712](#16712) [16710](#16710) [16760](#16760) [16802](#16802)

Author: Dilip Biswal <[email protected]>

Closes #16954 from dilipbiswal/SPARK-18874.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants