-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
add FZ16 to fpcr_fz_mask for aarch64 #40221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Have you checked if it is defined to write to reserved bits on chips without the fp16 feature? (I'll assume you are writing the correct bits) |
|
Works like a charm on a64fx julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
1.1125369292536007e-308
5.877472f-39
Float16(3.05e-5)
julia> set_zero_subnormals(true)
true
julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
0.0
0.0f0
Float16(0.0)Many thanks Valentin! I can't check what happens on other aarch64 chips though |
|
And last check, this PR also addresses the performance issues with subnormals and yields a decent 4x speed-up compared to Float64 without and with subnormals, whereas the latter previously slowed down things by 5x julia> benchmark_float16_subnormals(100_000)
Without subnormals
Float64: 88.040 μs (0 allocations: 0 bytes)
Float32: 44.390 μs (0 allocations: 0 bytes)
Float16: 22.351 μs (0 allocations: 0 bytes)
With 1% subnormals
Float64: 88.240 μs (0 allocations: 0 bytes)
Float32: 44.360 μs (0 allocations: 0 bytes)
Float16: 22.400 μs (0 allocations: 0 bytes)this just benchmarks adding two vectors{Float64/32/16} of length 100_000 element-wise. |
The resource I based this off was https://developer.arm.com/documentation/100403/0201/register-descriptions/advanced-simd-and-floating-point-registers/aarch64-register-descriptions/fpcr--floating-point-control-register If we want to be safe I can guard this behind a check for FP16 support (i.e. ID_AA64PFR0_EL1.) |
I couldn't find anything that says that the reserved bit is safe to access so I think we should safeguard this. Initializing a variable holding this flag is probably OK. |
486ab69 to
02249d5
Compare
|
@milankl would be good to check whether this still works on A64FX? |
|
No problem! Will do that tomorrow |
|
@vchuravy Yes, still works julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
1.1125369292536007e-308
5.877472f-39
Float16(3.05e-5)
julia> set_zero_subnormals(true)
true
julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
0.0
0.0f0
Float16(0.0) |
Fixes #40151 by also setting the FZ16 mask
@milankl can you test on your A64fx machine?