-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I have this csv file:
a,b
x,NaN
x,NaN
x,NaN
With a simple select query, DF says there is only 1 distinct value for column b (which, I think is correct).
> select count(distinct b) from 'nan.csv';
+---------------------------+
| count(DISTINCT nan.csv.b) |
+---------------------------+
| 1 |
+---------------------------+
However, in an aggregate query, DF says there are 3 distinct values:
> select a, count(distinct b) from 'nan.csv' group by 1 order by 1;
+---+---------------------------+
| a | count(DISTINCT nan.csv.b) |
+---+---------------------------+
| x | 3 |
+---+---------------------------+
This behavior seems inconsistent. I would expect the aggregate query to also report that there is one distinct value (in Spark, the behavior is consistent between the two queries).
To Reproduce
No response
Expected behavior
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working