This repository was archived by the owner on Apr 28, 2025. It is now read-only.
Commit 51718a1
committed
Add assembly version of simple operations on aarch64
For aarch64 and arm64ec with Neon, add assembly versions of the
following:
* `ceil`
* `ceilf`
* `fabs`
* `fabsf`
* `floor`
* `floorf`
* `fma`
* `fmaf`
* `round`
* `roundf`
* `sqrt`
* `sqrtf`
* `trunc`
* `truncf`
If the `fp16` target feature is available, which implies `neon`, also
include the following:
* `ceilf16`
* `fabsf16`
* `floorf16`
* `rintf16`
* `sqrtf16`
* `truncf16`
Additionally, replace `core::arch` versions of the following with
handwritten assembly (which avoids issues with `aarch64be`):
* `rint`
* `rintf`
Instructions for `fmax` and `fmin` are also available but seem to
provide different results based on whether NaN inputs are signaling or
quiet. Our current implementation does not do this, so omit these for
now.1 parent bc6a615 commit 51718a1
File tree
25 files changed
+391
-36
lines changed- etc
- src/math
- arch
25 files changed
+391
-36
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
| 110 | + | |
110 | 111 | | |
111 | 112 | | |
112 | 113 | | |
| |||
116 | 117 | | |
117 | 118 | | |
118 | 119 | | |
| 120 | + | |
119 | 121 | | |
120 | 122 | | |
121 | 123 | | |
| |||
131 | 133 | | |
132 | 134 | | |
133 | 135 | | |
| 136 | + | |
134 | 137 | | |
135 | 138 | | |
136 | 139 | | |
| |||
274 | 277 | | |
275 | 278 | | |
276 | 279 | | |
| 280 | + | |
277 | 281 | | |
278 | 282 | | |
279 | 283 | | |
| |||
282 | 286 | | |
283 | 287 | | |
284 | 288 | | |
| 289 | + | |
285 | 290 | | |
286 | 291 | | |
287 | 292 | | |
| |||
297 | 302 | | |
298 | 303 | | |
299 | 304 | | |
| 305 | + | |
300 | 306 | | |
301 | 307 | | |
302 | 308 | | |
| |||
334 | 340 | | |
335 | 341 | | |
336 | 342 | | |
| 343 | + | |
337 | 344 | | |
338 | 345 | | |
339 | 346 | | |
| |||
343 | 350 | | |
344 | 351 | | |
345 | 352 | | |
| 353 | + | |
346 | 354 | | |
347 | 355 | | |
348 | 356 | | |
| |||
358 | 366 | | |
359 | 367 | | |
360 | 368 | | |
| 369 | + | |
361 | 370 | | |
362 | 371 | | |
363 | 372 | | |
| |||
366 | 375 | | |
367 | 376 | | |
368 | 377 | | |
| 378 | + | |
369 | 379 | | |
370 | 380 | | |
371 | 381 | | |
372 | 382 | | |
373 | 383 | | |
374 | 384 | | |
| 385 | + | |
375 | 386 | | |
376 | 387 | | |
377 | 388 | | |
| |||
677 | 688 | | |
678 | 689 | | |
679 | 690 | | |
| 691 | + | |
680 | 692 | | |
681 | 693 | | |
682 | 694 | | |
| |||
685 | 697 | | |
686 | 698 | | |
687 | 699 | | |
| 700 | + | |
688 | 701 | | |
689 | 702 | | |
690 | 703 | | |
691 | 704 | | |
692 | 705 | | |
693 | 706 | | |
| 707 | + | |
694 | 708 | | |
695 | 709 | | |
696 | 710 | | |
| |||
750 | 764 | | |
751 | 765 | | |
752 | 766 | | |
| 767 | + | |
753 | 768 | | |
754 | 769 | | |
755 | 770 | | |
| |||
759 | 774 | | |
760 | 775 | | |
761 | 776 | | |
| 777 | + | |
762 | 778 | | |
763 | 779 | | |
764 | 780 | | |
| |||
775 | 791 | | |
776 | 792 | | |
777 | 793 | | |
| 794 | + | |
778 | 795 | | |
779 | 796 | | |
780 | 797 | | |
| |||
822 | 839 | | |
823 | 840 | | |
824 | 841 | | |
| 842 | + | |
825 | 843 | | |
826 | 844 | | |
827 | 845 | | |
| |||
830 | 848 | | |
831 | 849 | | |
832 | 850 | | |
| 851 | + | |
833 | 852 | | |
834 | 853 | | |
835 | 854 | | |
| |||
845 | 864 | | |
846 | 865 | | |
847 | 866 | | |
| 867 | + | |
848 | 868 | | |
849 | 869 | | |
850 | 870 | | |
| |||
0 commit comments