Improve foldl with tail-call function-barrier #34293
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR improves
foldlfor type-based filtering by using the function-barrier at tail-call positions. This improves the performance of the benchmark (equivalent to what) I mentioned in #33526This benchmark is included in JuliaCI/BaseBenchmarks.jl#254
To be honest, I am not 100% sure this is the best approach. Maybe this is too much of a strain to the compiler? Should we care about generated code size? Does x1.8 speedup justify that? As some versions of Julia compiler was able to generate a comparable code, is it better to tweak the compiler to recover the performance?
Implementation notes
As I discussed in this discourse thread, using this trick in generic
foldlis harder than in implementations ofcollect-like functions. This is because it is easy to have stack overflow if you don't have some kind of monotonicity in the way the accumulator changes its type. This is why I have the counter at type-level. At the moment the type can be refined only up to three times (counter = Val(3)) which is arbitrary but I think conservative.Aside: it would be great if Julia compiler generates a finite state machine from the tail calls as Guy Steele argued in the talk I linked in the discourse thread.