add FZ16 to fpcr_fz_mask for aarch64 #40221

vchuravy · 2021-03-26T20:17:08Z

Fixes #40151 by also setting the FZ16 mask

@milankl can you test on your A64fx machine?

yuyichao · 2021-03-26T20:33:22Z

Have you checked if it is defined to write to reserved bits on chips without the fp16 feature? (I'll assume you are writing the correct bits)

milankl · 2021-03-27T02:02:52Z

Works like a charm on a64fx

julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
 1.1125369292536007e-308
 5.877472f-39
  Float16(3.05e-5)

julia> set_zero_subnormals(true)
true

julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
 0.0
 0.0f0
  Float16(0.0)

Many thanks Valentin! I can't check what happens on other aarch64 chips though

milankl · 2021-03-27T02:35:43Z

And last check, this PR also addresses the performance issues with subnormals and yields a decent 4x speed-up compared to Float64 without and with subnormals, whereas the latter previously slowed down things by 5x

julia> benchmark_float16_subnormals(100_000)
Without subnormals
  Float64:   88.040 μs (0 allocations: 0 bytes)
  Float32:   44.390 μs (0 allocations: 0 bytes)
  Float16:   22.351 μs (0 allocations: 0 bytes)
With 1% subnormals
  Float64:   88.240 μs (0 allocations: 0 bytes)
  Float32:   44.360 μs (0 allocations: 0 bytes)
  Float16:   22.400 μs (0 allocations: 0 bytes)

this just benchmarks adding two vectors{Float64/32/16} of length 100_000 element-wise.

vchuravy · 2021-03-27T03:58:54Z

Have you checked if it is defined to write to reserved bits on chips without the fp16 feature? (I'll assume you are writing the correct bits)

The resource I based this off was https://developer.arm.com/documentation/100403/0201/register-descriptions/advanced-simd-and-floating-point-registers/aarch64-register-descriptions/fpcr--floating-point-control-register

If we want to be safe I can guard this behind a check for FP16 support (i.e. ID_AA64PFR0_EL1.)

yuyichao · 2021-03-27T20:30:53Z

If we want to be safe I can guard this behind a check for FP16 support (i.e. ID_AA64PFR0_EL1.)

I couldn't find anything that says that the reserved bit is safe to access so I think we should safeguard this. Initializing a variable holding this flag is probably OK.
The system register isn't accessible from EL0 (as the name suggests), it's the fullfp16 feature.

src/processor_arm.cpp

vchuravy · 2021-03-30T17:44:55Z

@milankl would be good to check whether this still works on A64FX?

milankl · 2021-03-30T20:08:17Z

No problem! Will do that tomorrow

milankl · 2021-03-31T11:06:45Z

@vchuravy Yes, still works

julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
 1.1125369292536007e-308
 5.877472f-39
  Float16(3.05e-5)

julia> set_zero_subnormals(true)
true

julia> [floatmin(T)/2 for T in [Float64,Float32,Float16]]
3-element Vector{AbstractFloat}:
 0.0
 0.0f0
  Float16(0.0)

vchuravy added the system:arm ARMv7 and AArch64 label Mar 26, 2021

vchuravy requested a review from yuyichao March 26, 2021 20:17

yuyichao reviewed Mar 29, 2021

View reviewed changes

src/processor_arm.cpp Outdated Show resolved Hide resolved

add FZ16 to fpcr_fz_mask for aarch64

02249d5

vchuravy force-pushed the vc/aarch_ftz_fp16 branch from 486ab69 to 02249d5 Compare March 29, 2021 17:03

vchuravy merged commit 15d876f into master Mar 31, 2021

vchuravy deleted the vc/aarch_ftz_fp16 branch March 31, 2021 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add FZ16 to fpcr_fz_mask for aarch64 #40221

add FZ16 to fpcr_fz_mask for aarch64 #40221

Uh oh!

vchuravy commented Mar 26, 2021

Uh oh!

yuyichao commented Mar 26, 2021

Uh oh!

milankl commented Mar 27, 2021 •

edited

Loading

Uh oh!

milankl commented Mar 27, 2021

Uh oh!

vchuravy commented Mar 27, 2021

Uh oh!

yuyichao commented Mar 27, 2021

Uh oh!

Uh oh!

vchuravy commented Mar 30, 2021

Uh oh!

milankl commented Mar 30, 2021

Uh oh!

milankl commented Mar 31, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

add FZ16 to fpcr_fz_mask for aarch64 #40221

add FZ16 to fpcr_fz_mask for aarch64 #40221

Uh oh!

Conversation

vchuravy commented Mar 26, 2021

Uh oh!

yuyichao commented Mar 26, 2021

Uh oh!

milankl commented Mar 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milankl commented Mar 27, 2021

Uh oh!

vchuravy commented Mar 27, 2021

Uh oh!

yuyichao commented Mar 27, 2021

Uh oh!

Uh oh!

vchuravy commented Mar 30, 2021

Uh oh!

milankl commented Mar 30, 2021

Uh oh!

milankl commented Mar 31, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

milankl commented Mar 27, 2021 •

edited

Loading