Skip to content

Some aggregates silently ignore IGNORE NULLS and ORDER BY on arguments #9924

@alamb

Description

@alamb

Describe the bug

DataFusion now supports providing ordering and nulls information to aggregates, but some aggregates ignore the flags silently

For example

select first_value(column1 ORDER BY column2) FROM (values (1,2), (3,4), (-1,0)) ;
+----------------------+
| FIRST_VALUE(column1) |
+----------------------+
| -1                   |
+----------------------+select first_value(column1 ORDER BY column2) IGNORE NULLS FROM (values (1,2), (3,4), (null,0)) ;
+----------------------+
| FIRST_VALUE(column1) |
+----------------------+
| 1                    |
+----------------------+
1 row in set. Query took 0.002 seconds.

To Reproduce

select count(*) from (values (1), (null), (2));
+----------+
| COUNT(*) |
+----------+
| 3        |
+----------+
1 row in set. Query took 0.039 seconds.

❯ select count(*) IGNORE NULLS from (values (1), (null), (2));
+----------+
| COUNT(*) |
+----------+
| 3        |
+----------+
1 row in set. Query took 0.001 seconds.

Also, for ordering

select avg(column1 ORDER BY column2) FROM (values (1,2), (3,4), (null,0)) ;
+--------------+
| AVG(column1) |
+--------------+
| 2.0          |
+--------------+
1 row in set. Query took 0.008 seconds.

Expected behavior

I expect

select count(*) IGNORE NULLS from (values (1), (null), (2));

To error with "IGNORE NULLS is not supported

I also expect

select avg(column1 ORDER BY column2) FROM (values (1,2), (3,4), (null,0)) ;

to error with "ORDER BY" not supported for avg

Additional context

@jayzhan211 has some good ideas at #9920 (comment) about how to make checking this easier / harder to miss

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions