Skip to content

Conversation

thomasdullien
Copy link

When scanning lots of data, the existing code can spend significant amounts of time in the loop that checks for non-ASCII characters.

Unfortunately, all compilers I tested failed to properly vectorize the loop; on the other hand, it is easy to just use a bitmask + popcount instruction to check 8 characters per loop iteration. This should also be nice for speculative execution, as there are no branches to mispredict inside the loop any more. In my benchmarks, the new version is about 6-8x faster.

This may lead to a performance regression on pre-2010 intel CPUs - but I am not sure that matters still?

…t_bytes to be vectorized; provides ~8x speedup on modern CPUs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant