-
Notifications
You must be signed in to change notification settings - Fork 254
Description
Is your feature request related to a problem? Please describe.
To get kernel performance matching clang we have had to add fast-math flags such as contract (which clang and nvcc do by default). Currently, we do this by an ugly-hack, see for example
Lines 21 to 57 in bb37b50
| # HACK: module-local versions of core arithmetic; needed to get FMA | |
| for (jlf, f) in zip((:+, :*, :-), (:add, :mul, :sub)) | |
| for (T, llvmT) in ((:Float32, "float"), (:Float64, "double")) | |
| ir = """ | |
| %x = f$f contract nsz $llvmT %0, %1 | |
| ret $llvmT %x | |
| """ | |
| @eval begin | |
| # the @pure is necessary so that we can constant propagate. | |
| @inline Base.@pure function $jlf(a::$T, b::$T) | |
| Base.llvmcall($ir, $T, Tuple{$T, $T}, a, b) | |
| end | |
| end | |
| end | |
| @eval function $jlf(args...) | |
| Base.$jlf(args...) | |
| end | |
| end | |
| let (jlf, f) = (:div_arcp, :div) | |
| for (T, llvmT) in ((:Float32, "float"), (:Float64, "double")) | |
| ir = """ | |
| %x = f$f fast $llvmT %0, %1 | |
| ret $llvmT %x | |
| """ | |
| @eval begin | |
| # the @pure is necessary so that we can constant propagate. | |
| @inline Base.@pure function $jlf(a::$T, b::$T) | |
| Base.llvmcall($ir, $T, Tuple{$T, $T}, a, b) | |
| end | |
| end | |
| end | |
| @eval function $jlf(args...) | |
| Base.$jlf(args...) | |
| end | |
| end | |
| rcp(x) = div_arcp(one(x), x) # still leads to rcp.rn which is also a function call |
Describe the solution you'd like
I would like a macro like @fastmath that had fine-grained control over the fast-math flags.
Describe alternatives you've considered
KernelAbstractions used to do this with https://github.com/JuliaLabs/Cassette.jl and other people use macros (although it opens up less optimization and thus not desired) https://github.com/JuliaLabs/Cassette.jl. I don't know if https://github.com/JuliaDebug/CassetteOverlay.jl can be used with kernels but it might be a possible way to implement this.
It would be nice if this functionality eventually got added to base julia.