Skip to content

sort(collection; by=transformation) should invoke transformation only once per element #34947

@lassepe

Description

@lassepe

Currently, when sorting a collection with sort(collection; by=transformation), the by callable can be called multiple times on the same element (essentially whenever it is compared to another element within the sort algorithm).

MWE

using Random
struct Counter
    d::Dict
end
Counter() = Counter(Dict{Any, Int}())
updatecount(counter::Counter, x) = counter.d[x] = get(counter.d, x, 0) + 1
getcount(counter::Counter, x) = get(counter.d, x, 0)

function counted_transformation(x; counter)
    updatecount(counter, x)
    return x
end

@show collection = shuffle(1:10)
counter = Counter()
sort(collection; by=x->counted_transformation(x; counter=counter))
@show [getcount(counter, x) for x in collection]

Output:

collection = shuffle(1:10) = [2, 6, 9, 10, 3, 5, 7, 8, 1, 4]
[getcount(counter, x) for x = collection] = [3, 7, 8, 7, 7, 6, 6, 5, 8, 7]

As you can see, for some elements by has been called 7 or 8 times.
This can be quite expensive if the by transformation involves some non-trivial computation.
IMHO, it would be preferable if sort transforms these values only once before sorting.

From the high-level, this could look like this (this allocates a bunch, so this is not a good workaround)

# single transformation sort
function sort_custom(collection; by=identity, kwargs...)
    collection_with_transformation = map(x->x=>by(x), collection)
    sort!(collection_with_transformation; by=x->last(x))
    return map(x -> first(x), collection_with_transformation)
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsThis change adds or pertains to documentationsortingPut things in order

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions