Skip to content

FULL OUTER JOIN and LIMIT produces wrong results #14335

@alamb

Description

@alamb

Describe the bug

LIMITs are incorrectly pushed through FULL OUTER Joins

To Reproduce

COPY (values (1), (2), (3), (4), (5))  TO '/tmp/t1.csv' STORED AS CSV;
-- store t2 in different order so the top N rows are not the same as the top N rows of t1
COPY (values (5), (4), (3), (2), (1))  TO '/tmp/t2.csv' STORED AS CSV;

statement ok
create external table t1(a int) stored as CSV location '/tmp/t1.csv';

statement ok
create external table t2(b int) stored as CSV location '/tmp/t2.csv';

-- FULL join produces 5 rows (all have matches)
select * from t1 FULL JOIN t2 ON t1.a = t2.b;
/*
+---+---+
| a | b |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
+---+---+
*/

-- the output of this query should be two rows from the previous query, there
-- should be no nulls
select * from t1 FULL JOIN t2 ON t1.a = t2.b LIMIT 2;

/*
+------+------+
| a    | b    |
+------+------+
| 1    | NULL |
| NULL | 4    |
+------+------+
*/

Expected behavior

Any two rows from

+---+---+
| a | b |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
+---+---+

There should be no nulls introduced

Additional context

Found while working with @zhuqi-lucas on:

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions