Skip to content

Unexpected results with group by and random() #7876

@Blajda

Description

@Blajda

Describe the bug

I have table t1 with a column called file_path
I want to obtain a list of file_paths where each element is unique and then take a random subset of those columns.
I thought that this could be achieved with the following code.

  let files = ctx.sql("select file_path from t1 group by file_path").await.unwrap()
      .with_column("r", random() ).unwrap()
      .filter(col("r").lt_eq(lit(0.2))).unwrap();
  files.show().await.unwrap();

However in the output of my query I see the following entries which contains a record that should be filtered out.

| A                    | 0.8023022275259943   |
| B                    | 0.05829777789599211  |
| C                    | 0.14330028518553894  |

This is the calculated logical plan

Projection: t1.file_path, random() AS r
    Aggregate: groupBy=[[t1.file_path]], aggr=[[]]
        Filter: random() <= Float64(0.2) 
           TableScan: t1 projection=[file_path]

In this case I would expect the filter to occur after the aggregate operation not before.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions