-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I noticed that the behavior of some tests is unusual.
datafusion/datafusion/sqllogictest/test_files/group_by.slt
Lines 3481 to 3487 in b5d7931
| SELECT r.sn, r.amount, SUM(r.amount) | |
| FROM (SELECT * | |
| FROM sales_global_with_pk as l | |
| LEFT JOIN sales_global_with_pk as r | |
| ON l.amount >= r.amount + 10) | |
| GROUP BY r.sn | |
| ORDER BY r.sn |
Typically, non-aggregation expressions should appear in the GROUP BY clause. Other databases do not allow this behavior.
DuckDB
D CREATE TABLE sales_global_with_pk (zip_code INT,
country VARCHAR(3),
sn INT,
ts TIMESTAMP,
currency VARCHAR(3),
amount FLOAT,
primary key(sn)
);
D SELECT r.sn, r.amount, SUM(r.amount)
FROM (SELECT *
FROM sales_global_with_pk as l
LEFT JOIN sales_global_with_pk as r
ON l.amount >= r.amount + 10) r
GROUP BY r.sn
ORDER BY r.sn;
Binder Error: column "amount" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(amount)" if the exact value of "amount" is not important.
LINE 1: SELECT r.sn, r.amount, SUM(r.amount)
^Postgres
test= SELECT x.sn, x.amount, SUM(x.amount)
FROM (SELECT l.sn, l.amount
FROM sales_global_with_pk as l
LEFT JOIN sales_global_with_pk as r
ON l.amount >= r.amount + 10) x
GROUP BY x.sn
ORDER BY x.sn;
ERROR: column "x.amount" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT x.sn, x.amount, SUM(x.amount)I modified the SQL to fix the scoping issue. However, both DuckDB and Postgres require that the dimensions (sn and amount) appear in the GROUP BY clause. In my experience, most databases follow similar behavior.
To Reproduce
As the above.
Expected behavior
This case should fail:
SELECT r.sn, r.amount, SUM(r.amount)
FROM (SELECT *
FROM sales_global_with_pk as l
LEFT JOIN sales_global_with_pk as r
ON l.amount >= r.amount + 10)
GROUP BY r.sn
ORDER BY r.sn;We should provide all non-aggregation expressions in the group-by clause
SELECT r.sn, r.amount, SUM(r.amount)
FROM (SELECT *
FROM sales_global_with_pk as l
LEFT JOIN sales_global_with_pk as r
ON l.amount >= r.amount + 10)
GROUP BY r.sn, r.amount
ORDER BY r.sn;Additional context
I'm working on #11681 now. I guess it can also fix this issue partially. However, another case as below won't be fixed.
SELECT r.sn, r.amount, SUM(r.amount)
FROM (SELECT r.sn, r.amount
FROM sales_global_with_pk as l
LEFT JOIN sales_global_with_pk as r
ON l.amount >= r.amount + 10)
GROUP BY sn
ORDER BY r.snMetadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working