-
Notifications
You must be signed in to change notification settings - Fork 143
Closed
Labels
Description
Describe the issue:
Since update to MacOS 15 I have a problem with using Apple implementation of BLAS.
Installing pytensor
from miniconda3-3.12-24.7.1-0
via conda create -n voxel-bayes-3.12 -c conda-forge pytensor
seems to install openblas
instead of accelerate.
~/.pyenv/versions/miniconda3-3.12-24.7.1-0/bin/conda create -n voxel-bayes-3.12 -c conda-forge pytensor
Channels:
- conda-forge
- defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12
added / updated specs:
- pytensor
The following NEW packages will be INSTALLED:
accelerate conda-forge/noarch::accelerate-0.34.2-pyhd8ed1ab_0
blas conda-forge/osx-arm64::blas-2.124-openblas
blas-devel conda-forge/osx-arm64::blas-devel-3.9.0-24_osxarm64_openblas
brotli-python conda-forge/osx-arm64::brotli-python-1.1.0-py312hde4cb15_2
bzip2 conda-forge/osx-arm64::bzip2-1.0.8-h99b78c6_7
ca-certificates conda-forge/osx-arm64::ca-certificates-2024.8.30-hf0a4a13_0
cctools_osx-arm64 conda-forge/osx-arm64::cctools_osx-arm64-1010.6-h4208deb_1
certifi conda-forge/noarch::certifi-2024.8.30-pyhd8ed1ab_0
cffi conda-forge/osx-arm64::cffi-1.17.1-py312h0fad829_0
charset-normalizer conda-forge/noarch::charset-normalizer-3.3.2-pyhd8ed1ab_0
clang conda-forge/osx-arm64::clang-17.0.6-default_h360f5da_7
clang-17 conda-forge/osx-arm64::clang-17-17.0.6-default_h146c034_7
clang_impl_osx-ar~ conda-forge/osx-arm64::clang_impl_osx-arm64-17.0.6-he47c785_19
clang_osx-arm64 conda-forge/osx-arm64::clang_osx-arm64-17.0.6-h54d7cd3_19
clangxx conda-forge/osx-arm64::clangxx-17.0.6-default_h360f5da_7
clangxx_impl_osx-~ conda-forge/osx-arm64::clangxx_impl_osx-arm64-17.0.6-h50f59cd_19
clangxx_osx-arm64 conda-forge/osx-arm64::clangxx_osx-arm64-17.0.6-h54d7cd3_19
colorama conda-forge/noarch::colorama-0.4.6-pyhd8ed1ab_0
compiler-rt conda-forge/osx-arm64::compiler-rt-17.0.6-h856b3c1_2
compiler-rt_osx-a~ conda-forge/noarch::compiler-rt_osx-arm64-17.0.6-h832e737_2
cons conda-forge/noarch::cons-0.4.6-pyhd8ed1ab_0
etuples conda-forge/noarch::etuples-0.3.9-pyhd8ed1ab_0
filelock conda-forge/noarch::filelock-3.16.1-pyhd8ed1ab_0
fsspec conda-forge/noarch::fsspec-2024.9.0-pyhff2d567_0
gmp conda-forge/osx-arm64::gmp-6.3.0-h7bae524_2
gmpy2 conda-forge/osx-arm64::gmpy2-2.1.5-py312h87fada9_2
h2 conda-forge/noarch::h2-4.1.0-pyhd8ed1ab_0
hpack conda-forge/noarch::hpack-4.0.0-pyh9f0ad1d_0
huggingface_hub conda-forge/noarch::huggingface_hub-0.25.1-pyhd8ed1ab_0
hyperframe conda-forge/noarch::hyperframe-6.0.1-pyhd8ed1ab_0
icu conda-forge/osx-arm64::icu-75.1-hfee45f7_0
idna conda-forge/noarch::idna-3.10-pyhd8ed1ab_0
jinja2 conda-forge/noarch::jinja2-3.1.4-pyhd8ed1ab_0
ld64_osx-arm64 conda-forge/osx-arm64::ld64_osx-arm64-951.9-hc81425b_1
libabseil conda-forge/osx-arm64::libabseil-20240116.2-cxx17_h00cdb27_1
libblas conda-forge/osx-arm64::libblas-3.9.0-24_osxarm64_openblas
libcblas conda-forge/osx-arm64::libcblas-3.9.0-24_osxarm64_openblas
libclang-cpp17 conda-forge/osx-arm64::libclang-cpp17-17.0.6-default_h146c034_7
libcxx conda-forge/osx-arm64::libcxx-19.1.0-ha82da77_0
libcxx-devel conda-forge/osx-arm64::libcxx-devel-17.0.6-h86353a2_6
libexpat conda-forge/osx-arm64::libexpat-2.6.3-hf9b8971_0
libffi conda-forge/osx-arm64::libffi-3.4.2-h3422bc3_5
libgfortran conda-forge/osx-arm64::libgfortran-5.0.0-13_2_0_hd922786_3
libgfortran5 conda-forge/osx-arm64::libgfortran5-13.2.0-hf226fd6_3
libiconv conda-forge/osx-arm64::libiconv-1.17-h0d3ecfb_2
liblapack conda-forge/osx-arm64::liblapack-3.9.0-24_osxarm64_openblas
liblapacke conda-forge/osx-arm64::liblapacke-3.9.0-24_osxarm64_openblas
libllvm17 conda-forge/osx-arm64::libllvm17-17.0.6-h5090b49_2
libopenblas conda-forge/osx-arm64::libopenblas-0.3.27-openmp_h517c56d_1
libprotobuf conda-forge/osx-arm64::libprotobuf-4.25.3-hc39d83c_1
libsqlite conda-forge/osx-arm64::libsqlite-3.46.1-hc14010f_0
libtorch conda-forge/osx-arm64::libtorch-2.4.0-cpu_generic_h4365fe2_1
libuv conda-forge/osx-arm64::libuv-1.49.0-hd74edd7_0
libxml2 conda-forge/osx-arm64::libxml2-2.12.7-h01dff8b_4
libzlib conda-forge/osx-arm64::libzlib-1.3.1-hfb2fe0b_1
llvm-openmp conda-forge/osx-arm64::llvm-openmp-18.1.8-hde57baf_1
llvm-tools conda-forge/osx-arm64::llvm-tools-17.0.6-h5090b49_2
logical-unificati~ conda-forge/noarch::logical-unification-0.4.6-pyhd8ed1ab_0
macosx_deployment~ conda-forge/noarch::macosx_deployment_target_osx-arm64-11.0-h6553868_1
markupsafe conda-forge/osx-arm64::markupsafe-2.1.5-py312h024a12e_1
minikanren conda-forge/noarch::minikanren-1.0.3-pyhd8ed1ab_0
mpc conda-forge/osx-arm64::mpc-1.3.1-h8f1351a_1
mpfr conda-forge/osx-arm64::mpfr-4.2.1-hb693164_3
mpmath conda-forge/noarch::mpmath-1.3.0-pyhd8ed1ab_0
multipledispatch conda-forge/noarch::multipledispatch-0.6.0-pyhd8ed1ab_1
ncurses conda-forge/osx-arm64::ncurses-6.5-h7bae524_1
networkx conda-forge/noarch::networkx-3.3-pyhd8ed1ab_1
nomkl conda-forge/noarch::nomkl-1.0-h5ca1d4c_0
numpy conda-forge/osx-arm64::numpy-1.26.4-py312h8442bc7_0
openblas conda-forge/osx-arm64::openblas-0.3.27-openmp_h560b219_1
openssl conda-forge/osx-arm64::openssl-3.3.2-h8359307_0
packaging conda-forge/noarch::packaging-24.1-pyhd8ed1ab_0
pip conda-forge/noarch::pip-24.2-pyh8b19718_1
psutil conda-forge/osx-arm64::psutil-6.0.0-py312h024a12e_1
pycparser conda-forge/noarch::pycparser-2.22-pyhd8ed1ab_0
pysocks conda-forge/noarch::pysocks-1.7.1-pyha2e5f31_6
pytensor conda-forge/osx-arm64::pytensor-2.25.4-py312h3f593ad_0
pytensor-base conda-forge/osx-arm64::pytensor-base-2.25.4-py312h02baea5_0
python conda-forge/osx-arm64::python-3.12.6-h739c21a_1_cpython
python_abi conda-forge/osx-arm64::python_abi-3.12-5_cp312
pytorch conda-forge/osx-arm64::pytorch-2.4.0-cpu_generic_py312h6bd8f41_1
pyyaml conda-forge/osx-arm64::pyyaml-6.0.2-py312h024a12e_1
readline conda-forge/osx-arm64::readline-8.2-h92ec313_1
requests conda-forge/noarch::requests-2.32.3-pyhd8ed1ab_0
safetensors conda-forge/osx-arm64::safetensors-0.4.5-py312he431725_0
scipy conda-forge/osx-arm64::scipy-1.14.1-py312heb3a901_0
setuptools conda-forge/noarch::setuptools-75.1.0-pyhd8ed1ab_0
sigtool conda-forge/osx-arm64::sigtool-0.1.3-h44b9a77_0
six conda-forge/noarch::six-1.16.0-pyh6c4a22f_0
sleef conda-forge/osx-arm64::sleef-3.7-h7783ee8_0
sympy conda-forge/noarch::sympy-1.13.3-pypyh2585a3b_103
tapi conda-forge/osx-arm64::tapi-1300.6.5-h03f4b80_0
tk conda-forge/osx-arm64::tk-8.6.13-h5083fa2_1
toolz conda-forge/noarch::toolz-0.12.1-pyhd8ed1ab_0
tqdm conda-forge/noarch::tqdm-4.66.5-pyhd8ed1ab_0
typing-extensions conda-forge/noarch::typing-extensions-4.12.2-hd8ed1ab_0
typing_extensions conda-forge/noarch::typing_extensions-4.12.2-pyha770c72_0
tzdata conda-forge/noarch::tzdata-2024a-h8827d51_1
urllib3 conda-forge/noarch::urllib3-2.2.3-pyhd8ed1ab_0
wheel conda-forge/noarch::wheel-0.44.0-pyhd8ed1ab_0
xz conda-forge/osx-arm64::xz-5.2.6-h57fd34a_0
yaml conda-forge/osx-arm64::yaml-0.2.5-h3422bc3_2
zstandard conda-forge/osx-arm64::zstandard-0.23.0-py312h15fbf35_1
zstd conda-forge/osx-arm64::zstd-1.5.6-hb46c0d2_0
Proceed ([y]/n)? y
Running this the check
python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")
Some results that you can compare against. They were 10 executions
of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
All memory layout was in C order.
CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
Core i7 950(3.07GHz, hyper-threads enabled)
Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)
Libraries tested:
* numpy with ATLAS from distribution (FC9) package (1 thread)
* manually compiled numpy and ATLAS with 2 threads
* goto 1.26 with 1, 2, 4 and 8 threads
* goto2 1.13 compiled with multiple threads enabled
Xeon Xeon Xeon Core2 i7 i7 Xeon Xeon
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560 X5550
numpy 1.3.0 blas 775.92s
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s
goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s
numpy_MAN_atlas/2 12.0s 11.6s 10.2s 9.2s 9.0s
goto/2 9.5s 8.1s 7.1s 7.3s 8.1s 7.4s
goto/4 4.9s 4.4s 3.7s - 4.1s 3.8s
goto/8 2.7s 2.4s 2.0s - 4.1s 3.8s
openblas/1 14.04s
openblas/2 7.16s
openblas/4 3.71s
openblas/8 3.70s
mkl 11.0.083/1 7.97s
mkl 10.2.2.025/1 13.7s
mkl 10.2.2.025/2 7.6s
mkl 10.2.2.025/4 4.0s
mkl 10.2.2.025/8 2.0s
goto2 1.13/1 14.37s
goto2 1.13/2 7.26s
goto2 1.13/4 3.70s
goto2 1.13/8 1.94s
goto2 1.13/16 3.16s
Test time in float32. There were 10 executions of gemm in
float32 with matrices of shape 5000x5000 (M=N=K=5000)
All memory layout was in C order.
cuda version 8.0 7.5 7.0
gpu
M40 0.45s 0.47s
k80 0.92s 0.96s
K6000/NOECC 0.71s 0.69s
P6000/NOECC 0.25s
Titan X (Pascal) 0.28s
GTX Titan X 0.45s 0.45s 0.47s
GTX Titan Black 0.66s 0.64s 0.64s
GTX 1080 0.35s
GTX 980 Ti 0.41s
GTX 970 0.66s
GTX 680 1.57s
GTX 750 Ti 2.01s 2.01s
GTX 750 2.46s 2.37s
GTX 660 2.32s 2.32s
GTX 580 2.42s
GTX 480 2.87s
TX1 7.6s (float32 storage and computation)
GT 610 33.5s
Some PyTensor flags:
blas__ldflags= -L/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib -llapack -lblas -lcblas -lm -Wl,-rpath,/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib
compiledir= /Users/daniel/.pytensor/compiledir_macOS-15.0-arm64-arm-64bit-arm-3.12.6-64
floatX= float64
device= cpu
Some OS information:
sys.platform= darwin
sys.version= 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:07:06) [Clang 17.0.6 ]
sys.prefix= /Users/daniel/.pyenv/versions/voxel-bayes-3.12
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
Build Dependencies:
blas:
detection method: pkgconfig
found: true
include directory: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include
lib directory: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib
name: blas
openblas configuration: unknown
pc file directory: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib/pkgconfig
version: 3.9.0
lapack:
detection method: internal
found: true
include directory: unknown
lib directory: unknown
name: dep4569863840
openblas configuration: unknown
pc file directory: unknown
version: 1.26.4
Compilers:
c:
args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem,
/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
-fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
-fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix,
-D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
-mmacosx-version-min=11.0
commands: arm64-apple-darwin20.0.0-clang
linker: ld64
linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib,
-L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib,
-ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
-fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
-fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix,
-D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
-mmacosx-version-min=11.0
name: clang
version: 16.0.6
c++:
args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++,
-fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
-fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
-fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix,
-D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
-mmacosx-version-min=11.0
commands: arm64-apple-darwin20.0.0-clang++
linker: ld64
linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib,
-L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib,
-ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++,
-fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
-fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
-fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix,
-D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include,
-mmacosx-version-min=11.0
name: clang
version: 16.0.6
cython:
commands: cython
linker: cython
name: cython
version: 3.0.8
Machine Information:
build:
cpu: aarch64
endian: little
family: aarch64
system: darwin
cross-compiled: true
host:
cpu: arm64
endian: little
family: aarch64
system: darwin
Python Information:
path: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/bin/python
version: '3.12'
SIMD Extensions:
baseline:
- NEON
- NEON_FP16
- NEON_VFPV4
- ASIMD
found:
- ASIMDHP
not found:
- ASIMDFHM
Numpy dot module: numpy
Numpy location: /Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 31.56s on CPU (with direct PyTensor binding to blas
Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.
And when I try to run the same command but in env with pip installed pytensor results in this
Some PyTensor flags:
blas__ldflags=
compiledir= /Users/daniel/.pytensor/compiledir_macOS-15.0-arm64-arm-64bit-arm-3.12.6-64
floatX= float64
device= cpu
Some OS information:
sys.platform= darwin
sys.version= 3.12.6 (main, Sep 28 2024, 17:45:34) [Clang 15.0.0 (clang-1500.3.9.4)]
sys.prefix= /Users/daniel/.pyenv/versions/3.12.6/envs/zotero-3.12.6
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
/Users/daniel/.pyenv/versions/3.12.6/envs/zotero-3.12.6/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
"Compilers": {
"c": {
"name": "clang",
"linker": "ld64",
"version": "14.0.0",
"commands": "cc",
"args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64",
"linker args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64"
},
"cython": {
"name": "cython",
"linker": "cython",
"version": "3.0.8",
"commands": "cython"
},
"c++": {
"name": "clang",
"linker": "ld64",
"version": "14.0.0",
"commands": "c++",
"args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64",
"linker args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64"
}
},
"Machine Information": {
"host": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
},
"build": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
}
},
"Build Dependencies": {
"blas": {
"name": "openblas64",
"found": true,
"version": "0.3.23.dev",
"detection method": "pkgconfig",
"include directory": "/opt/arm64-builds/include",
"lib directory": "/opt/arm64-builds/lib",
"openblas configuration": "USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS= NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SANDYBRIDGE MAX_THREADS=3",
"pc file directory": "/usr/local/lib/pkgconfig"
},
"lapack": {
"name": "dep4335021056",
"found": true,
"version": "1.26.4",
"detection method": "internal",
"include directory": "unknown",
"lib directory": "unknown",
"openblas configuration": "unknown",
"pc file directory": "unknown"
}
},
"Python Information": {
"path": "/private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cibw-run-q69bfk1p/cp312-macosx_arm64/build/venv/bin/python",
"version": "3.12"
},
"SIMD Extensions": {
"baseline": [
"NEON",
"NEON_FP16",
"NEON_VFPV4",
"ASIMD"
],
"found": [
"ASIMDHP"
],
"not found": [
"ASIMDFHM"
]
}
}
Numpy dot module: numpy
Numpy location: /Users/daniel/.pyenv/versions/3.12.6/envs/zotero-3.12.6/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 45.75s on CPU (with direct PyTensor binding to blas).
Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.
When I try to specify the accelerate the old way via "libblas=*=*accelerate" when installing the conda environment, when I try to run this it fails , I copied the output here https://discourse.pymc.io/t/pytensor-support-to-apple-accelerate-blas-with-conda-forge-on-macos-15/15131/2
Reproducable code example:
from `python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")`
Error message:
No response
PyTensor version information:
conda-forge/osx-arm64::pytensor-2.25.4-py312h3f593ad_0
Context for the issue:
No response