Skip to content

Find a safe alternative to LogicalPlan::using_columns() #14118

@jonahgao

Description

@jonahgao

Describe the bug

Background

Select with wildcard over a USING/NATURAL JOIN should deduplicate join columns.
For example:

with t as(select 1 as a, 2 as b) select * from t t1 join t t2 using(a)

This query above should output the column 'a' only once.

LogicalPlan::using_columns() is used to find these join columns and to help exclude duplicated columns when expanding wildcards.

Problem

using_columns() works by traversing the plan tree. This manner might be unsafe as it could incorrectly find columns that are not relevant to the current SQL context. This may lead to some output columns being incorrectly excluded.

For example, the result of the query below is different from other databases.

create table t(a int);
insert into t values(1),(2),(3);
select * from (select t.a+2 as a from t join t t2 using(a)) as t2;

To Reproduce

Run query in CLI (compiled from the latest main: 722307f)

> create table t(a int);
insert into t values(1),(2),(3);
select * from (select t.a+2 as a from t join t t2 using(a)) as t2;
0 row(s) fetched.
Elapsed 0.008 seconds.

+-------+
| count |
+-------+
| 3     |
+-------+
1 row(s) fetched.
Elapsed 0.015 seconds.

++
++
++
3 row(s) fetched.
Elapsed 0.013 seconds.

It outputs no columns.

Expected behavior

In PostgreSQL it does output one column.

psql (16.6 (Ubuntu 16.6-0ubuntu0.24.04.1))
Type "help" for help.

psql=> create table t(a int);
insert into t values(1),(2),(3);
select * from (select t.a+2 as a from t join t t2 using(a)) as t2;
CREATE TABLE
INSERT 0 3
 a
---
 3
 4
 5
(3 rows)

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions