Skip to content

pytensor and blas problems on on MacOS 15 Sequoia with Apple Silicon  #1005

@danieltomasz

Description

@danieltomasz

Describe the issue:

Since update to MacOS 15 I have a problem with using Apple implementation of BLAS.
Installing pytensor from miniconda3-3.12-24.7.1-0 via conda create -n voxel-bayes-3.12 -c conda-forge pytensor seems to install openblas instead of accelerate.

~/.pyenv/versions/miniconda3-3.12-24.7.1-0/bin/conda create -n voxel-bayes-3.12   -c conda-forge  pytensor
Channels:
 - conda-forge
 - defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12

  added / updated specs:
    - pytensor


The following NEW packages will be INSTALLED:

  accelerate         conda-forge/noarch::accelerate-0.34.2-pyhd8ed1ab_0 
  blas               conda-forge/osx-arm64::blas-2.124-openblas 
  blas-devel         conda-forge/osx-arm64::blas-devel-3.9.0-24_osxarm64_openblas 
  brotli-python      conda-forge/osx-arm64::brotli-python-1.1.0-py312hde4cb15_2 
  bzip2              conda-forge/osx-arm64::bzip2-1.0.8-h99b78c6_7 
  ca-certificates    conda-forge/osx-arm64::ca-certificates-2024.8.30-hf0a4a13_0 
  cctools_osx-arm64  conda-forge/osx-arm64::cctools_osx-arm64-1010.6-h4208deb_1 
  certifi            conda-forge/noarch::certifi-2024.8.30-pyhd8ed1ab_0 
  cffi               conda-forge/osx-arm64::cffi-1.17.1-py312h0fad829_0 
  charset-normalizer conda-forge/noarch::charset-normalizer-3.3.2-pyhd8ed1ab_0 
  clang              conda-forge/osx-arm64::clang-17.0.6-default_h360f5da_7 
  clang-17           conda-forge/osx-arm64::clang-17-17.0.6-default_h146c034_7 
  clang_impl_osx-ar~ conda-forge/osx-arm64::clang_impl_osx-arm64-17.0.6-he47c785_19 
  clang_osx-arm64    conda-forge/osx-arm64::clang_osx-arm64-17.0.6-h54d7cd3_19 
  clangxx            conda-forge/osx-arm64::clangxx-17.0.6-default_h360f5da_7 
  clangxx_impl_osx-~ conda-forge/osx-arm64::clangxx_impl_osx-arm64-17.0.6-h50f59cd_19 
  clangxx_osx-arm64  conda-forge/osx-arm64::clangxx_osx-arm64-17.0.6-h54d7cd3_19 
  colorama           conda-forge/noarch::colorama-0.4.6-pyhd8ed1ab_0 
  compiler-rt        conda-forge/osx-arm64::compiler-rt-17.0.6-h856b3c1_2 
  compiler-rt_osx-a~ conda-forge/noarch::compiler-rt_osx-arm64-17.0.6-h832e737_2 
  cons               conda-forge/noarch::cons-0.4.6-pyhd8ed1ab_0 
  etuples            conda-forge/noarch::etuples-0.3.9-pyhd8ed1ab_0 
  filelock           conda-forge/noarch::filelock-3.16.1-pyhd8ed1ab_0 
  fsspec             conda-forge/noarch::fsspec-2024.9.0-pyhff2d567_0 
  gmp                conda-forge/osx-arm64::gmp-6.3.0-h7bae524_2 
  gmpy2              conda-forge/osx-arm64::gmpy2-2.1.5-py312h87fada9_2 
  h2                 conda-forge/noarch::h2-4.1.0-pyhd8ed1ab_0 
  hpack              conda-forge/noarch::hpack-4.0.0-pyh9f0ad1d_0 
  huggingface_hub    conda-forge/noarch::huggingface_hub-0.25.1-pyhd8ed1ab_0 
  hyperframe         conda-forge/noarch::hyperframe-6.0.1-pyhd8ed1ab_0 
  icu                conda-forge/osx-arm64::icu-75.1-hfee45f7_0 
  idna               conda-forge/noarch::idna-3.10-pyhd8ed1ab_0 
  jinja2             conda-forge/noarch::jinja2-3.1.4-pyhd8ed1ab_0 
  ld64_osx-arm64     conda-forge/osx-arm64::ld64_osx-arm64-951.9-hc81425b_1 
  libabseil          conda-forge/osx-arm64::libabseil-20240116.2-cxx17_h00cdb27_1 
  libblas            conda-forge/osx-arm64::libblas-3.9.0-24_osxarm64_openblas 
  libcblas           conda-forge/osx-arm64::libcblas-3.9.0-24_osxarm64_openblas 
  libclang-cpp17     conda-forge/osx-arm64::libclang-cpp17-17.0.6-default_h146c034_7 
  libcxx             conda-forge/osx-arm64::libcxx-19.1.0-ha82da77_0 
  libcxx-devel       conda-forge/osx-arm64::libcxx-devel-17.0.6-h86353a2_6 
  libexpat           conda-forge/osx-arm64::libexpat-2.6.3-hf9b8971_0 
  libffi             conda-forge/osx-arm64::libffi-3.4.2-h3422bc3_5 
  libgfortran        conda-forge/osx-arm64::libgfortran-5.0.0-13_2_0_hd922786_3 
  libgfortran5       conda-forge/osx-arm64::libgfortran5-13.2.0-hf226fd6_3 
  libiconv           conda-forge/osx-arm64::libiconv-1.17-h0d3ecfb_2 
  liblapack          conda-forge/osx-arm64::liblapack-3.9.0-24_osxarm64_openblas 
  liblapacke         conda-forge/osx-arm64::liblapacke-3.9.0-24_osxarm64_openblas 
  libllvm17          conda-forge/osx-arm64::libllvm17-17.0.6-h5090b49_2 
  libopenblas        conda-forge/osx-arm64::libopenblas-0.3.27-openmp_h517c56d_1 
  libprotobuf        conda-forge/osx-arm64::libprotobuf-4.25.3-hc39d83c_1 
  libsqlite          conda-forge/osx-arm64::libsqlite-3.46.1-hc14010f_0 
  libtorch           conda-forge/osx-arm64::libtorch-2.4.0-cpu_generic_h4365fe2_1 
  libuv              conda-forge/osx-arm64::libuv-1.49.0-hd74edd7_0 
  libxml2            conda-forge/osx-arm64::libxml2-2.12.7-h01dff8b_4 
  libzlib            conda-forge/osx-arm64::libzlib-1.3.1-hfb2fe0b_1 
  llvm-openmp        conda-forge/osx-arm64::llvm-openmp-18.1.8-hde57baf_1 
  llvm-tools         conda-forge/osx-arm64::llvm-tools-17.0.6-h5090b49_2 
  logical-unificati~ conda-forge/noarch::logical-unification-0.4.6-pyhd8ed1ab_0 
  macosx_deployment~ conda-forge/noarch::macosx_deployment_target_osx-arm64-11.0-h6553868_1 
  markupsafe         conda-forge/osx-arm64::markupsafe-2.1.5-py312h024a12e_1 
  minikanren         conda-forge/noarch::minikanren-1.0.3-pyhd8ed1ab_0 
  mpc                conda-forge/osx-arm64::mpc-1.3.1-h8f1351a_1 
  mpfr               conda-forge/osx-arm64::mpfr-4.2.1-hb693164_3 
  mpmath             conda-forge/noarch::mpmath-1.3.0-pyhd8ed1ab_0 
  multipledispatch   conda-forge/noarch::multipledispatch-0.6.0-pyhd8ed1ab_1 
  ncurses            conda-forge/osx-arm64::ncurses-6.5-h7bae524_1 
  networkx           conda-forge/noarch::networkx-3.3-pyhd8ed1ab_1 
  nomkl              conda-forge/noarch::nomkl-1.0-h5ca1d4c_0 
  numpy              conda-forge/osx-arm64::numpy-1.26.4-py312h8442bc7_0 
  openblas           conda-forge/osx-arm64::openblas-0.3.27-openmp_h560b219_1 
  openssl            conda-forge/osx-arm64::openssl-3.3.2-h8359307_0 
  packaging          conda-forge/noarch::packaging-24.1-pyhd8ed1ab_0 
  pip                conda-forge/noarch::pip-24.2-pyh8b19718_1 
  psutil             conda-forge/osx-arm64::psutil-6.0.0-py312h024a12e_1 
  pycparser          conda-forge/noarch::pycparser-2.22-pyhd8ed1ab_0 
  pysocks            conda-forge/noarch::pysocks-1.7.1-pyha2e5f31_6 
  pytensor           conda-forge/osx-arm64::pytensor-2.25.4-py312h3f593ad_0 
  pytensor-base      conda-forge/osx-arm64::pytensor-base-2.25.4-py312h02baea5_0 
  python             conda-forge/osx-arm64::python-3.12.6-h739c21a_1_cpython 
  python_abi         conda-forge/osx-arm64::python_abi-3.12-5_cp312 
  pytorch            conda-forge/osx-arm64::pytorch-2.4.0-cpu_generic_py312h6bd8f41_1 
  pyyaml             conda-forge/osx-arm64::pyyaml-6.0.2-py312h024a12e_1 
  readline           conda-forge/osx-arm64::readline-8.2-h92ec313_1 
  requests           conda-forge/noarch::requests-2.32.3-pyhd8ed1ab_0 
  safetensors        conda-forge/osx-arm64::safetensors-0.4.5-py312he431725_0 
  scipy              conda-forge/osx-arm64::scipy-1.14.1-py312heb3a901_0 
  setuptools         conda-forge/noarch::setuptools-75.1.0-pyhd8ed1ab_0 
  sigtool            conda-forge/osx-arm64::sigtool-0.1.3-h44b9a77_0 
  six                conda-forge/noarch::six-1.16.0-pyh6c4a22f_0 
  sleef              conda-forge/osx-arm64::sleef-3.7-h7783ee8_0 
  sympy              conda-forge/noarch::sympy-1.13.3-pypyh2585a3b_103 
  tapi               conda-forge/osx-arm64::tapi-1300.6.5-h03f4b80_0 
  tk                 conda-forge/osx-arm64::tk-8.6.13-h5083fa2_1 
  toolz              conda-forge/noarch::toolz-0.12.1-pyhd8ed1ab_0 
  tqdm               conda-forge/noarch::tqdm-4.66.5-pyhd8ed1ab_0 
  typing-extensions  conda-forge/noarch::typing-extensions-4.12.2-hd8ed1ab_0 
  typing_extensions  conda-forge/noarch::typing_extensions-4.12.2-pyha770c72_0 
  tzdata             conda-forge/noarch::tzdata-2024a-h8827d51_1 
  urllib3            conda-forge/noarch::urllib3-2.2.3-pyhd8ed1ab_0 
  wheel              conda-forge/noarch::wheel-0.44.0-pyhd8ed1ab_0 
  xz                 conda-forge/osx-arm64::xz-5.2.6-h57fd34a_0 
  yaml               conda-forge/osx-arm64::yaml-0.2.5-h3422bc3_2 
  zstandard          conda-forge/osx-arm64::zstandard-0.23.0-py312h15fbf35_1 
  zstd               conda-forge/osx-arm64::zstd-1.5.6-hb46c0d2_0 


Proceed ([y]/n)? y

Running this the check

python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")

        Some results that you can compare against. They were 10 executions
        of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
        All memory layout was in C order.

        CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
                    Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
                    Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
                    Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
                    Core i7 950(3.07GHz, hyper-threads enabled)
                    Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)


        Libraries tested:
            * numpy with ATLAS from distribution (FC9) package (1 thread)
            * manually compiled numpy and ATLAS with 2 threads
            * goto 1.26 with 1, 2, 4 and 8 threads
            * goto2 1.13 compiled with multiple threads enabled

                          Xeon   Xeon   Xeon  Core2 i7    i7     Xeon   Xeon
        lib/nb threads    E5345  E5430  E5450 E8500 930   950    X5560  X5550

        numpy 1.3.0 blas                                                775.92s
        numpy_FC9_atlas/1 39.2s  35.0s  30.7s 29.6s 21.5s 19.60s
        goto/1            18.7s  16.1s  14.2s 13.7s 16.1s 14.67s
        numpy_MAN_atlas/2 12.0s  11.6s  10.2s  9.2s  9.0s
        goto/2             9.5s   8.1s   7.1s  7.3s  8.1s  7.4s
        goto/4             4.9s   4.4s   3.7s  -     4.1s  3.8s
        goto/8             2.7s   2.4s   2.0s  -     4.1s  3.8s
        openblas/1                                        14.04s
        openblas/2                                         7.16s
        openblas/4                                         3.71s
        openblas/8                                         3.70s
        mkl 11.0.083/1            7.97s
        mkl 10.2.2.025/1                                         13.7s
        mkl 10.2.2.025/2                                          7.6s
        mkl 10.2.2.025/4                                          4.0s
        mkl 10.2.2.025/8                                          2.0s
        goto2 1.13/1                                                     14.37s
        goto2 1.13/2                                                      7.26s
        goto2 1.13/4                                                      3.70s
        goto2 1.13/8                                                      1.94s
        goto2 1.13/16                                                     3.16s

        Test time in float32. There were 10 executions of gemm in
        float32 with matrices of shape 5000x5000 (M=N=K=5000)
        All memory layout was in C order.


        cuda version      8.0    7.5    7.0
        gpu
        M40               0.45s  0.47s
        k80               0.92s  0.96s
        K6000/NOECC       0.71s         0.69s
        P6000/NOECC       0.25s

        Titan X (Pascal)  0.28s
        GTX Titan X       0.45s  0.45s  0.47s
        GTX Titan Black   0.66s  0.64s  0.64s
        GTX 1080          0.35s
        GTX 980 Ti               0.41s
        GTX 970                  0.66s
        GTX 680                         1.57s
        GTX 750 Ti               2.01s  2.01s
        GTX 750                  2.46s  2.37s
        GTX 660                  2.32s  2.32s
        GTX 580                  2.42s
        GTX 480                  2.87s
        TX1                             7.6s (float32 storage and computation)
        GT 610                          33.5s
        
Some PyTensor flags:
    blas__ldflags= -L/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib -llapack -lblas -lcblas -lm -Wl,-rpath,/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib
    compiledir= /Users/daniel/.pytensor/compiledir_macOS-15.0-arm64-arm-64bit-arm-3.12.6-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= darwin
    sys.version= 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:07:06) [Clang 17.0.6 ]
    sys.prefix= /Users/daniel/.pyenv/versions/voxel-bayes-3.12
Some environment variables:
    MKL_NUM_THREADS= None
    OMP_NUM_THREADS= None
    GOTO_NUM_THREADS= None

Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include
    lib directory: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib
    name: blas
    openblas configuration: unknown
    pc file directory: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib/pkgconfig
    version: 3.9.0
  lapack:
    detection method: internal
    found: true
    include directory: unknown
    lib directory: unknown
    name: dep4569863840
    openblas configuration: unknown
    pc file directory: unknown
    version: 1.26.4
Compilers:
  c:
    args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem,
      /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
      -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
      -mmacosx-version-min=11.0
    commands: arm64-apple-darwin20.0.0-clang
    linker: ld64
    linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib,
      -L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib,
      -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
      -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
      -mmacosx-version-min=11.0
    name: clang
    version: 16.0.6
  c++:
    args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++,
      -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
      -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
      -mmacosx-version-min=11.0
    commands: arm64-apple-darwin20.0.0-clang++
    linker: ld64
    linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib,
      -L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib,
      -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++,
      -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
      -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
      -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix,
      -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
      -mmacosx-version-min=11.0
    name: clang
    version: 16.0.6
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.8
Machine Information:
  build:
    cpu: aarch64
    endian: little
    family: aarch64
    system: darwin
  cross-compiled: true
  host:
    cpu: arm64
    endian: little
    family: aarch64
    system: darwin
Python Information:
  path: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/bin/python
  version: '3.12'
SIMD Extensions:
  baseline:
  - NEON
  - NEON_FP16
  - NEON_VFPV4
  - ASIMD
  found:
  - ASIMDHP
  not found:
  - ASIMDFHM

Numpy dot module: numpy
Numpy location: /Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4

We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).

Total execution time: 31.56s on CPU (with direct PyTensor binding to blas

Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.

And when I try to run the same command but in env with pip installed pytensor results in this

Some PyTensor flags:
    blas__ldflags= 
    compiledir= /Users/daniel/.pytensor/compiledir_macOS-15.0-arm64-arm-64bit-arm-3.12.6-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= darwin
    sys.version= 3.12.6 (main, Sep 28 2024, 17:45:34) [Clang 15.0.0 (clang-1500.3.9.4)]
    sys.prefix= /Users/daniel/.pyenv/versions/3.12.6/envs/zotero-3.12.6
Some environment variables:
    MKL_NUM_THREADS= None
    OMP_NUM_THREADS= None
    GOTO_NUM_THREADS= None

Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
/Users/daniel/.pyenv/versions/3.12.6/envs/zotero-3.12.6/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
  warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
  "Compilers": {
    "c": {
      "name": "clang",
      "linker": "ld64",
      "version": "14.0.0",
      "commands": "cc",
      "args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64",
      "linker args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64"
    },
    "cython": {
      "name": "cython",
      "linker": "cython",
      "version": "3.0.8",
      "commands": "cython"
    },
    "c++": {
      "name": "clang",
      "linker": "ld64",
      "version": "14.0.0",
      "commands": "c++",
      "args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64",
      "linker args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64"
    }
  },
  "Machine Information": {
    "host": {
      "cpu": "aarch64",
      "family": "aarch64",
      "endian": "little",
      "system": "darwin"
    },
    "build": {
      "cpu": "aarch64",
      "family": "aarch64",
      "endian": "little",
      "system": "darwin"
    }
  },
  "Build Dependencies": {
    "blas": {
      "name": "openblas64",
      "found": true,
      "version": "0.3.23.dev",
      "detection method": "pkgconfig",
      "include directory": "/opt/arm64-builds/include",
      "lib directory": "/opt/arm64-builds/lib",
      "openblas configuration": "USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS= NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SANDYBRIDGE MAX_THREADS=3",
      "pc file directory": "/usr/local/lib/pkgconfig"
    },
    "lapack": {
      "name": "dep4335021056",
      "found": true,
      "version": "1.26.4",
      "detection method": "internal",
      "include directory": "unknown",
      "lib directory": "unknown",
      "openblas configuration": "unknown",
      "pc file directory": "unknown"
    }
  },
  "Python Information": {
    "path": "/private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cibw-run-q69bfk1p/cp312-macosx_arm64/build/venv/bin/python",
    "version": "3.12"
  },
  "SIMD Extensions": {
    "baseline": [
      "NEON",
      "NEON_FP16",
      "NEON_VFPV4",
      "ASIMD"
    ],
    "found": [
      "ASIMDHP"
    ],
    "not found": [
      "ASIMDFHM"
    ]
  }
}
Numpy dot module: numpy
Numpy location: /Users/daniel/.pyenv/versions/3.12.6/envs/zotero-3.12.6/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4

We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).

Total execution time: 45.75s on CPU (with direct PyTensor binding to blas).

Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.

When I try to specify the accelerate the old way via "libblas=*=*accelerate" when installing the conda environment, when I try to run this it fails , I copied the output here https://discourse.pymc.io/t/pytensor-support-to-apple-accelerate-blas-with-conda-forge-on-macos-15/15131/2

Reproducable code example:

from `python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")`

Error message:

No response

PyTensor version information:

conda-forge/osx-arm64::pytensor-2.25.4-py312h3f593ad_0

Context for the issue:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions