Skip to content

RFC: Extending Radixsort interface to allow for sorting of non-bits types. #38

@xiaodaigh

Description

@xiaodaigh

For example, radixsort has been employed in SortingLab.jl to sort strings using radixsort.

Currently, it is not possible to do anything other than bits types, the reason is very simple.

In the radixsort algorithm uint_mapping(Forward, val) is to used to map every value in an array to a UInt. This mapping preserves the order, so build a counting table (a crucial step in the sort) with the mapped value and then sort the array is good enough (you need to understand how radixsort works to understand this counting table bit).

However, a shortcoming is that many datasets can't be mapped to UInt, e.g. an array of strings whose elements are longer than 8 bytes. So how do we deal with this? We can introduce a function called number_unit_mapping_required which can a look at a value and return how many UInt values are needed to map it correctly, e.g. a string of 16 bytes in sizes will return 2, 4 bytes will return 1, 25 bytes will return 5 etc.

We can run this n = maximum(number_unit_mapping_required, array) to return the number of rounds required.

Then we need to modify uint_mapping(..., iteration = 1) to accept one more parameter call iteration which defaults to 1. The ideas is that we increase i = 1 to n and call uint_mapping(val, iteration = i) extract the ith UInt required. And then run the loop 5 times to sort things like strings.

This way, radixsort can be employed to sort many things, and the user only has to overload

number_unit_mapping_required and uint_mapping for those types to achieve it.

The other option that may be worth exploring is using https://github.com/rfourquet/BitIntegers.jl to define larger integers that ``uint_mappingcan map to. But then for things likeString`s the output may be type-unstable. Also, I had a few segfaults with it and the bugs are like Heisenburg so it's a more risky option which I am not comfortable with atm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions