-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
During code review, I've prevented two bugs where usage of post-SSE2 instructions was being incorrectly guarded with _Use_sse2() - see #4384 (comment) and #4495 (comment). This is extremely hazardous, and the correctness of the STL shouldn't depend on whether I've had 270 mg of caffeine every single time I've reviewed a vectorization PR.
At this time, we still need to support the tiny fraction (~0.7%, I've heard) of processors that have SSE2 but not SSE4.2. However, we don't need to extend novel optimizations to them - they were perfectly happy running classic STL algorithms up to 2019.
We should prevent this class of mistakes by removing the distinction between SSE2 and SSE4.2 in vector_algorithms.cpp. That is, we should test for the presence of SSE4.2 only, before attempting to use anything up to and including SSE4.2. (This will supersede the error-prone _Traits::_Sse_available().)
We'll still need a distinction between "SSE4.2 is available" and "AVX2 is available", but I consider this to be much less dangerous, because AVX/AVX2 intrinsics and types are very distinctive.