-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I have table t1 with a column called file_path
I want to obtain a list of file_paths where each element is unique and then take a random subset of those columns.
I thought that this could be achieved with the following code.
let files = ctx.sql("select file_path from t1 group by file_path").await.unwrap()
.with_column("r", random() ).unwrap()
.filter(col("r").lt_eq(lit(0.2))).unwrap();
files.show().await.unwrap();However in the output of my query I see the following entries which contains a record that should be filtered out.
| A | 0.8023022275259943 |
| B | 0.05829777789599211 |
| C | 0.14330028518553894 |
This is the calculated logical plan
Projection: t1.file_path, random() AS r
Aggregate: groupBy=[[t1.file_path]], aggr=[[]]
Filter: random() <= Float64(0.2)
TableScan: t1 projection=[file_path]
In this case I would expect the filter to occur after the aggregate operation not before.
To Reproduce
No response
Expected behavior
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working