Skip to content

Add support for AMX instructions (bf16 and/or int8) #2555

@WilliamTambellini

Description

@WilliamTambellini

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Please provide a detailed written description of what you were trying to do:
running any ggml at int8 precision
what you expected llama.cpp to do:
using AMX acceleration

Current Behavior

does not seem to use any AMX instructions

Environment and Context

Physical (or virtual) hardware you are using, e.g. for Linux:
    $ lscpu Architecture:            
x86_64 CPU op-mode(s):        32-bit, 64-bit 
Address sizes:         46 bits physical, 48 bits virtual Byte Order:            Little Endian 
CPU(s):                  8 
On-line CPU(s) list:   0-7 
Vendor ID:               GenuineIntel 
Model name:            Intel(R) Xeon(R) Platinum 8488C 
CPU family:          6 
Model:               143 
Thread(s) per core:  2 
Core(s) per socket:  4 
Socket(s):           1 
Stepping:            8 
BogoMIPS:            4800.00 
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_ perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe p opcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_ adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec x getbv1 xsaves avx_vnni avx512_bf16 wbnoinvd ida arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme  avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear serialize amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities

Operating System, e.g. for Linux:

    fedora37
    $ uname -a Linux  6.1.9-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Feb  2 00:21:48 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ g++ --version
11.3

ref:
https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/overview.html

https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-ec2-m7i-flex-m7i-instances/

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions