- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
Closed
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Is your feature request related to a problem or challenge?
After #7721 a SortExec with a limit will use a special TopK . We have basic unit tests, but I think the coverage could be improved, specifically with Fuzz testing
Describe the solution you'd like
What I would like is a new fuzz test to be added to the the existing fuzz cases: https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/fuzz_cases
The structure of SortTest in https://github.com/apache/arrow-datafusion/blob/e95a24b5a260e0e2f603d52682d36cce192676f8/datafusion/core/tests/fuzz_cases/sort_fuzz.rs#L111 might be a good one to follow
The basic outline would be:
- Create an input with several columns (integers, strings, floats)
- Reorder the input randomly
- Divide the input up multiple batches using make_staggered_batches
- Run a query like SELECT * FROM t ORDER BY <col(s)> LIMIT <N>and collect the output
- Compute the expected result programmatically (e.g. by sort the data, prior to creating RecordBatches)
- Ensure the output matches the expected result
Input size: 1000 rows
Parameters to vary
- sort cols: (int), (string), (float), (int, string), (string, int), etc.
- N: 1, 10, 100, 300 (aka how many are kept)
Bonus points
make it easy to add new columns / types (e.g. like string dictionary)
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers