Skip to content

Miscellaneous ntile function bugs, possible incorrect results #8284

@msirek

Description

@msirek

Describe the bug

While reviewing #8270 I found some bugs with the ntile window function.

  1. Possible incorrect results when the ntile function argument is larger than the number of rows
  2. Returning an internal error with large function arguments as opposed to a standard error message
  3. Crashing the datafusion CLI on negative function arguments

To Reproduce

DataFusion CLI v33.0.0
❯ create table t1 (a int);
0 rows in set. Query took 0.005 seconds.

❯ insert into t1 values (1),(2),(3);
+-------+
| count |
+-------+
| 3     |
+-------+
1 row in set. Query took 0.006 seconds.

-- Do these results make sense?  All other databases return ntile values 1,2,3.
-- Tested at https://dbfiddle.uk/select ntile(9223377) OVER(ORDER BY a) from t1;
+--------------------------------------------------------------------------------------------------------+
| NTILE(Int64(9223377)) ORDER BY [t1.a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
+--------------------------------------------------------------------------------------------------------+
| 1                                                                                                      |
| 3074460                                                                                                |
| 6148919                                                                                                |
+--------------------------------------------------------------------------------------------------------+

-- This should return a regular error instead of an internal errorselect ntile(9223372036854775809) OVER(ORDER BY a) from t1;
Internal error: Cannot convert UInt64(9223372036854775809) to i64.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

-- This should not panic and crash the datafusion cli
❯ select ntile(-922337203685477580) OVER(ORDER BY a) from t1;
thread 'main' panicked at /home/ms/git/arrow-datafusion/datafusion/physical-expr/src/window/ntile.rs:100:23:
attempt to multiply with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
/home/ms/git/arrow-datafusion/datafusion-cli (ntile_output_type ✔) ᐅ 

Expected behavior

Covered in the SQL script comments

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions