Skip to content

Conversation

@medwards
Copy link
Contributor

@medwards medwards commented Apr 28, 2021

Closes #235

What changes are included in this PR?

The result slice in sort_primitive is the size of all the values being sorted even when a limit is provided. Change the size to be the maximum possible result (thus avoiding the panic in #235)

There are other slice size errors as well (such as when the number of nulls exceeds the limit) that are included.

Are there any user-facing changes?

No

Notes

I've provided a reproducing test, but skipped other variants of limits /w nulls. Those should be easy so let me know if they're of interest. Additionally it is possible that result is intentionally over-large in which case I can try alternative fixes.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find @medwards -- thank you for the contribution. I am not sure about the use of saturating_sub but I have another idea too (see medwards#1).

👍

FYI @sundy-li who I think contributed the original limit code in apache/arrow#9602

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry -- I think this PR is good. Any other tests you are willing to write would be appreciated.

BTW if you rebase this PR now to pick up the latest changes on master it should pass the CI checks.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @medwards , thanks a lot for this PR! It looks great to me 💯

I left two comments that I think we may aim to, but please do challenge them if you disagree :)

@codecov-commenter
Copy link

codecov-commenter commented Apr 30, 2021

Codecov Report

Merging #236 (e5840d7) into master (d008f31) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #236   +/-   ##
=======================================
  Coverage   82.51%   82.52%           
=======================================
  Files         162      162           
  Lines       43655    43672   +17     
=======================================
+ Hits        36022    36039   +17     
  Misses       7633     7633           
Impacted Files Coverage Δ
arrow/src/compute/kernels/sort.rs 94.41% <100.00%> (+0.08%) ⬆️
parquet/src/encodings/encoding.rs 94.86% <0.00%> (-0.20%) ⬇️
arrow/src/array/transform/fixed_binary.rs 84.21% <0.00%> (+5.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d008f31...e5840d7. Read the comment docs.

@alamb
Copy link
Contributor

alamb commented Apr 30, 2021

BTW we are working on the Dev PR / Process failing check in #242

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks a lot for solving this and for your contribution 👍

@alamb
Copy link
Contributor

alamb commented May 1, 2021

Thanks again @medwards !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sort with limit panics for the limit includes some but not all nulls, for large arrays

4 participants