Improve foldl with tail-call function-barrier #34293

tkf · 2020-01-07T00:43:07Z

This PR improves foldl for type-based filtering by using the function-barrier at tail-call positions. This improves the performance of the benchmark (equivalent to what) I mentioned in #33526

using BenchmarkTools
xs = [abs(x) < 1 ? x : missing for x in randn(1000)];
@btime sum(x for x in $xs if x !== missing)

Before (589a6d8): 1.222 μs (0 allocations: 0 bytes)
After (14839b2): 680.954 ns (0 allocations: 0 bytes)

This benchmark is included in JuliaCI/BaseBenchmarks.jl#254

To be honest, I am not 100% sure this is the best approach. Maybe this is too much of a strain to the compiler? Should we care about generated code size? Does x1.8 speedup justify that? As some versions of Julia compiler was able to generate a comparable code, is it better to tweak the compiler to recover the performance?

Implementation notes

As I discussed in this discourse thread, using this trick in generic foldl is harder than in implementations of collect-like functions. This is because it is easy to have stack overflow if you don't have some kind of monotonicity in the way the accumulator changes its type. This is why I have the counter at type-level. At the moment the type can be refined only up to three times (counter = Val(3)) which is arbitrary but I think conservative.

Aside: it would be great if Julia compiler generates a finite state machine from the tail calls as Guy Steele argued in the talk I linked in the discourse thread.

ararslan · 2020-01-07T00:46:13Z

base/reduce.jl

+    if v isa T
+        return _foldl_impl(op, v, itr, y[2])
+    else
+        return _foldl_impl(op, v, itr, y[2])


Isn't this exactly the same as what's done in the other branch of the conditional? 🤔

Yes, but it helps the compiler. Removing this branch introduces an allocation and makes the code a bit slower:

julia> @eval Base function _foldl_impl(op::OP, init::T, itr) where {OP, T} y = iterate(itr) y === nothing && return init v = op(init, y[1]) return _foldl_impl(op, v, itr, y[2]) end _foldl_impl (generic function with 4 methods) julia> @btime sum(x for x in $xs if x !== missing) 780.859 ns (1 allocation: 16 bytes)

But the speedup is not as drastic as I felt while I was playing with it. So I'm OK with removing this micro-optimization.

Actually, as this "no-op if branch" gets rid of a constant-time work, the difference becomes large if you consider shorter arrays:

julia> xs = [abs(x) < 1 ? x : missing for x in randn(10)]; julia> @btime sum(x for x in $xs if x !== missing) 8.405 ns (0 allocations: 0 bytes) 1.5696312630017393 julia> @eval Base function _foldl_impl(op::OP, init::T, itr) where {OP, T} y = iterate(itr) y === nothing && return init v = op(init, y[1]) return _foldl_impl(op, v, itr, y[2]) end _foldl_impl (generic function with 4 methods) julia> @btime sum(x for x in $xs if x !== missing) 29.120 ns (1 allocation: 16 bytes) 1.5696312630017393

That is incredibly bizarre. 😳

I thought I was just doing union splitting manually.

adienes · 2025-06-20T14:30:38Z

while I'm sure there is room for improvement in foldl, the diff here rebased on nightly appears to no longer constitute a performance improvement

Improve foldl with tail-call function-barrier

14839b2

tkf mentioned this pull request Jan 7, 2020

Transducer as an optimization: map, filter and flatten #33526

Merged

ararslan reviewed Jan 7, 2020

View reviewed changes

tkf added fold sum, maximum, reduce, foldl, etc. performance Must go faster labels Jun 13, 2020

tkf mentioned this pull request Jun 13, 2020

Defining zero() seriously #34003

Open

adienes closed this Jun 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve foldl with tail-call function-barrier #34293

Improve foldl with tail-call function-barrier #34293

Uh oh!

tkf commented Jan 7, 2020

Uh oh!

ararslan Jan 7, 2020

Uh oh!

tkf Jan 7, 2020

Uh oh!

tkf Jan 7, 2020

Uh oh!

ararslan Jan 7, 2020

Uh oh!

tkf Jan 8, 2020

Uh oh!

adienes commented Jun 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Improve foldl with tail-call function-barrier #34293

Improve foldl with tail-call function-barrier #34293

Uh oh!

Conversation

tkf commented Jan 7, 2020

Implementation notes

Uh oh!

ararslan Jan 7, 2020

Choose a reason for hiding this comment

Uh oh!

tkf Jan 7, 2020

Choose a reason for hiding this comment

Uh oh!

tkf Jan 7, 2020

Choose a reason for hiding this comment

Uh oh!

ararslan Jan 7, 2020

Choose a reason for hiding this comment

Uh oh!

tkf Jan 8, 2020

Choose a reason for hiding this comment

Uh oh!

adienes commented Jun 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants