Skip to content

Conversation

@vchuravy
Copy link
Member

Fixes #40151 by also setting the FZ16 mask

@milankl can you test on your A64fx machine?

@vchuravy vchuravy added the system:arm ARMv7 and AArch64 label Mar 26, 2021
@vchuravy vchuravy requested a review from yuyichao March 26, 2021 20:17
@yuyichao
Copy link
Contributor

Have you checked if it is defined to write to reserved bits on chips without the fp16 feature? (I'll assume you are writing the correct bits)

@milankl
Copy link

milankl commented Mar 27, 2021

Works like a charm on a64fx

julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
 1.1125369292536007e-308
 5.877472f-39
  Float16(3.05e-5)

julia> set_zero_subnormals(true)
true

julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
 0.0
 0.0f0
  Float16(0.0)

Many thanks Valentin! I can't check what happens on other aarch64 chips though

@milankl
Copy link

milankl commented Mar 27, 2021

And last check, this PR also addresses the performance issues with subnormals and yields a decent 4x speed-up compared to Float64 without and with subnormals, whereas the latter previously slowed down things by 5x

julia> benchmark_float16_subnormals(100_000)
Without subnormals
  Float64:   88.040 μs (0 allocations: 0 bytes)
  Float32:   44.390 μs (0 allocations: 0 bytes)
  Float16:   22.351 μs (0 allocations: 0 bytes)
With 1% subnormals
  Float64:   88.240 μs (0 allocations: 0 bytes)
  Float32:   44.360 μs (0 allocations: 0 bytes)
  Float16:   22.400 μs (0 allocations: 0 bytes)

this just benchmarks adding two vectors{Float64/32/16} of length 100_000 element-wise.

@vchuravy
Copy link
Member Author

Have you checked if it is defined to write to reserved bits on chips without the fp16 feature? (I'll assume you are writing the correct bits)

The resource I based this off was https://developer.arm.com/documentation/100403/0201/register-descriptions/advanced-simd-and-floating-point-registers/aarch64-register-descriptions/fpcr--floating-point-control-register

If we want to be safe I can guard this behind a check for FP16 support (i.e. ID_AA64PFR0_EL1.)

@yuyichao
Copy link
Contributor

If we want to be safe I can guard this behind a check for FP16 support (i.e. ID_AA64PFR0_EL1.)

I couldn't find anything that says that the reserved bit is safe to access so I think we should safeguard this. Initializing a variable holding this flag is probably OK.
The system register isn't accessible from EL0 (as the name suggests), it's the fullfp16 feature.

@vchuravy vchuravy force-pushed the vc/aarch_ftz_fp16 branch from 486ab69 to 02249d5 Compare March 29, 2021 17:03
@vchuravy
Copy link
Member Author

@milankl would be good to check whether this still works on A64FX?

@milankl
Copy link

milankl commented Mar 30, 2021

No problem! Will do that tomorrow

@milankl
Copy link

milankl commented Mar 31, 2021

@vchuravy Yes, still works

julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
 1.1125369292536007e-308
 5.877472f-39
  Float16(3.05e-5)

julia> set_zero_subnormals(true)
true

julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
 0.0
 0.0f0
  Float16(0.0)

@vchuravy vchuravy merged commit 15d876f into master Mar 31, 2021
@vchuravy vchuravy deleted the vc/aarch_ftz_fp16 branch March 31, 2021 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

system:arm ARMv7 and AArch64

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set_zero_subnormals ineffective for Float16

4 participants