-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
faster isfinite
#46166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster isfinite
#46166
Conversation
|
Nice! It seems brave to assume that I think this method should apply only to |
| isnan(x::AbstractFloat) = (x != x)::Bool | ||
| isnan(x::Number) = false | ||
|
|
||
| isfinite(x::IEEEFloat) = x - x === zero(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said in #46163 (comment), I feel like this should be iszero(x - x), and also iszero can be optimised for IEEFloat with x === zero(x), instead of x == zero(x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
julia> iszero(-0.0)
true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that would break the optimization. The reason for this change is that === on floating points can turn into a bit-compare rather than a more expensive floating point compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sad 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead then:
julia> Base.iszero(x::Float64) = reinterpret(UInt64, x) & 0x7fff_ffff_ffff_ffff === UInt(0)
(btime cannot distinguish these three approaches in performance, and returns 3.5 ns for all of them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll define the initially-proposed solution
isfinite_1(x::IEEEFloat) = x-x===zero(x).
Another contender, related to the iszero(x-x) variant but using the other half of the fact that x-x can only take the values +0.0 or NaN, is
isfinite_2(x::IEEEFloat) = !isnan(x-x).
Of course, now that we're limited to IEEEFloat we can instead reach for bit-twiddling
isfinite_3(x::IEEEFloat) = (reinterpret(Unsigned,x) & Base.exponent_mask(typeof(x))) != Base.exponent_mask(typeof(x))
julia> code_native(isfinite_1,(Float64,);debuginfo=:none)
vsubsd %xmm0, %xmm0, %xmm0
vmovq %xmm0, %rax
testq %rax, %rax
sete %al
retq
julia> code_native(isfinite_2,(Float64,);debuginfo=:none)
vsubsd %xmm0, %xmm0, %xmm0
vucomisd %xmm0, %xmm0
setnp %al
retq
julia> code_native(isfinite_3,(Float64,);debuginfo=:none)
vmovq %xmm0, %rax
movabsq $9218868437227405312, %rcx # imm = 0x7FF0000000000000
andnq %rcx, %rax, %rax
setne %al
retq
Compared to isfinite_1, isfinite_2 is one instruction shorter in isolation (on my x86). isfinite_3 is the same number of instructions as isfinite_1 but one of those is a hoistable movabsq. Another advantage is that it requires no floating point operations. Note that, after inlining, this may not be what these functions look like in the wild.
Keep in mind that mytest! is not a very canonical use of isfinite. Applying a @code_native shows this fact. That said, I see the _1 and _2 variants benchmarking identically and _3 about 15% faster.
EDIT:
Actually, I'm increasingly a fan of the !isnan(x-x) variant. While it's not quite as fast as bit twiddling in this nanobenchmark, I think that it has the benefit of clarity. Further, I think that it /would/ be a suitable ::AbstractFloat definition.
|
How about this?
|
|
I would think that wouldn't be better since |
|
Well observed... The point was just that it this handles the |
This has been tested exhaustively for
Float32andFloat16