-
Notifications
You must be signed in to change notification settings - Fork 28
Description
For example, radixsort has been employed in SortingLab.jl to sort strings using radixsort.
Currently, it is not possible to do anything other than bits types, the reason is very simple.
In the radixsort algorithm uint_mapping(Forward, val)
is to used to map every value in an array to a UInt
. This mapping preserves the order, so build a counting table (a crucial step in the sort) with the mapped value and then sort the array is good enough (you need to understand how radixsort works to understand this counting table bit).
However, a shortcoming is that many datasets can't be mapped to UInt
, e.g. an array of strings whose elements are longer than 8 bytes. So how do we deal with this? We can introduce a function called number_unit_mapping_required
which can a look at a value and return how many UInt
values are needed to map it correctly, e.g. a string of 16 bytes in sizes will return 2, 4 bytes will return 1, 25 bytes will return 5 etc.
We can run this n = maximum(number_unit_mapping_required, array)
to return the number of rounds required.
Then we need to modify uint_mapping(..., iteration = 1)
to accept one more parameter call iteration
which defaults to 1. The ideas is that we increase i = 1
to n
and call uint_mapping(val, iteration = i)
extract the i
th UInt
required. And then run the loop 5 times to sort things like strings.
This way, radixsort can be employed to sort many things, and the user only has to overload
number_unit_mapping_required
and uint_mapping
for those types to achieve it.
The other option that may be worth exploring is using https://github.com/rfourquet/BitIntegers.jl
to define larger integers that ``uint_mappingcan map to. But then for things like
String`s the output may be type-unstable. Also, I had a few segfaults with it and the bugs are like Heisenburg so it's a more risky option which I am not comfortable with atm.