Skip to content

Grouping by column position #110

@alamb

Description

@alamb

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-10374

It would be great to have the support of grouping by column position instead of grouping by exact expression. For example:

SELECT state, COUNT(*) FROM customers GROUP BY 1

For example, for a query like

> select database_name, storage, sum(estimated_bytes) from chunks group by database_name, storage;
+-----------------------------------+---------------------+----------------------+
| database_name                     | storage             | SUM(estimated_bytes) |
+-----------------------------------+---------------------+----------------------+
| 844910ece80be8bc_cac95fa59126cd01 | OpenMutableBuffer   | 109737               |
| 844910ece80be8bc_05d1e95653672000 | OpenMutableBuffer   | 2337719              |
| 844910ece80be8bc_7be09b71c487d5d3 | ClosedMutableBuffer | 799682176            |
+-----------------------------------+---------------------+----------------------+

It can be expressed in the same way using numbers to refer to other items in the select list.

However, this does not work today in DataFusion:

> select database_name, storage, sum(estimated_bytes) from chunks group by 1, 2;
Plan("Projection references non-aggregate values")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions