Skip to content

Non-aggregation expressions should be included in the grouping keys #11903

@goldmedal

Description

@goldmedal

Describe the bug

I noticed that the behavior of some tests is unusual.

SELECT r.sn, r.amount, SUM(r.amount)
FROM (SELECT *
FROM sales_global_with_pk as l
LEFT JOIN sales_global_with_pk as r
ON l.amount >= r.amount + 10)
GROUP BY r.sn
ORDER BY r.sn

Typically, non-aggregation expressions should appear in the GROUP BY clause. Other databases do not allow this behavior.

DuckDB

D CREATE TABLE sales_global_with_pk (zip_code INT,
            country VARCHAR(3),
            sn INT,
            ts TIMESTAMP,
            currency VARCHAR(3),
            amount FLOAT,
            primary key(sn)
          );
D SELECT r.sn, r.amount, SUM(r.amount)
    FROM (SELECT *
      FROM sales_global_with_pk as l
      LEFT JOIN sales_global_with_pk as r
      ON l.amount >= r.amount + 10) r
    GROUP BY r.sn
  ORDER BY r.sn;
Binder Error: column "amount" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(amount)" if the exact value of "amount" is not important.
LINE 1: SELECT r.sn, r.amount, SUM(r.amount)
                     ^

Postgres

test= SELECT x.sn, x.amount, SUM(x.amount)                
  FROM (SELECT l.sn, l.amount
    FROM sales_global_with_pk as l
    LEFT JOIN sales_global_with_pk as r
    ON l.amount >= r.amount + 10) x
  GROUP BY x.sn  
ORDER BY x.sn;
ERROR:  column "x.amount" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT x.sn, x.amount, SUM(x.amount)

I modified the SQL to fix the scoping issue. However, both DuckDB and Postgres require that the dimensions (sn and amount) appear in the GROUP BY clause. In my experience, most databases follow similar behavior.

To Reproduce

As the above.

Expected behavior

This case should fail:

SELECT r.sn, r.amount, SUM(r.amount)
    FROM (SELECT *
      FROM sales_global_with_pk as l
      LEFT JOIN sales_global_with_pk as r
      ON l.amount >= r.amount + 10)
    GROUP BY r.sn
  ORDER BY r.sn;

We should provide all non-aggregation expressions in the group-by clause

SELECT r.sn, r.amount, SUM(r.amount)
    FROM (SELECT *
      FROM sales_global_with_pk as l
      LEFT JOIN sales_global_with_pk as r
      ON l.amount >= r.amount + 10)
    GROUP BY r.sn, r.amount
  ORDER BY r.sn;

Additional context

I'm working on #11681 now. I guess it can also fix this issue partially. However, another case as below won't be fixed.

SELECT r.sn, r.amount, SUM(r.amount)
  FROM (SELECT r.sn, r.amount
    FROM sales_global_with_pk as l
    LEFT JOIN sales_global_with_pk as r
    ON l.amount >= r.amount + 10)
  GROUP BY sn
ORDER BY r.sn

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions