Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
332 commits
Select commit Hold shift + click to select a range
61ebe99
[invoke_subgraph] Do not cache fake tensors for AOTDispatcher first p…
anijain2305 Apr 1, 2025
f09513e
[CUDA]][SymmetricMemory] Interpret empty string as `std::nullopt` in …
eqy Apr 2, 2025
5734909
[Profiler] Fix Empty C Call Queue (#150370)
sraikund16 Apr 2, 2025
063ea5d
[AOTInductor] Modify test for Memory tracking for memory-related (#15…
muchulee8 Apr 1, 2025
25eff6e
[dynamo] add reason field to torch.compiler.disable (#150341)
williamwen42 Apr 1, 2025
3ac5a49
[dynamo] add dynamo disable reasons to codebase (#150440)
williamwen42 Apr 1, 2025
dee016c
[MPSInductor] Add `store_reduce` method (#150457)
malfet Apr 1, 2025
c65de03
Add `Any` return annotation to `__getattr__` methods that return a un…
rchen152 Apr 2, 2025
0da8127
Compare device name of profiler dynamically (#150396)
elpis-furiosa Apr 2, 2025
3f54b14
[CUDAGraph] support meta tensor (#150478)
BoyuanFeng Apr 2, 2025
75f38df
cpp_wrapper: precompile a few more commonly used headers, and improve…
benjaminglass1 Apr 2, 2025
0313873
[AOTI] Emit Triton kernels as comment (#150188)
desertfire Mar 30, 2025
c41fbb4
Change arg_kwarg_vals propagation strategy (#148046)
zou3519 Apr 2, 2025
c69c3c8
Add needs_exact_strides operator tag for Inductor to force exact stri…
zou3519 Apr 2, 2025
4d121d2
Implement needs_exact_strides for mutable custom operators (#148091)
zou3519 Apr 2, 2025
aae3692
Rename node.meta["arg_kwarg_vals"] to node.meta["eager_input_vals"] (…
zou3519 Apr 2, 2025
5f62d07
Fix log2, PowByNatural printing (#147592)
isuruf Mar 3, 2025
82ceebc
[inductor] Lowerings for max_pool3d (#148210)
isuruf Apr 1, 2025
42c7c7f
[invoke_subgraph] Filter out grad_out where fw_out requires_grad is F…
anijain2305 Apr 2, 2025
8102272
[BE] Fix triton windows build (#150512)
chuanqi129 Apr 2, 2025
f38566d
[MPSInductor] Disable mm/bmm decompositions (#150541)
manuelcandales Apr 2, 2025
532530b
Revert "[Profiler] Fix Empty C Call Queue (#150370)"
pytorchmergebot Apr 2, 2025
98453c1
[dynamo] Support Tensor subclass that has dynamic attributes or calls…
StrongerXi Apr 2, 2025
203e1d6
[dynamo] Support `torch.Tensor._make_subclass` and tracing through te…
StrongerXi Apr 2, 2025
7e53c58
[dynamo] Support tensor subclass with overriden tensor methods and pr…
StrongerXi Apr 2, 2025
238109a
[dynamo] Always trace into tensor subclass `__torch_function__` (#149…
StrongerXi Apr 2, 2025
e62d958
[Inductor] Reland Merge Triton ScaledMM as epilogue to MM template #1…
PaulZhang12 Apr 2, 2025
cb4cd61
Address Cmake update issue in windows magma builds (#150549)
atalman Apr 2, 2025
d4298f2
[CI] Use system nccl in build (#150226)
clee2000 Apr 2, 2025
22030ef
expect fail scan test in sigmoid (#150475)
ydwu4 Apr 2, 2025
b03c421
Proactively remove CompiledTritonKernels before loading from cache/st…
jamesjwu Apr 1, 2025
af5c1b9
ci: Set minimum cmake version for halide build (#150560)
seemethere Apr 2, 2025
e545567
Revert "[dynamo] Always trace into tensor subclass `__torch_function_…
pytorchmergebot Apr 2, 2025
01411c7
Revert "[dynamo] Support tensor subclass with overriden tensor method…
pytorchmergebot Apr 2, 2025
18908c8
Revert "[dynamo] Support `torch.Tensor._make_subclass` and tracing th…
pytorchmergebot Apr 2, 2025
03c879d
Revert "[dynamo] Support Tensor subclass that has dynamic attributes …
pytorchmergebot Apr 2, 2025
a8f6b40
[inductor] skip non-trivial tiling if unbacked symints are present (#…
ColinPeppler Apr 1, 2025
85df0dc
[dynamo] emit only 1 graph break message on unrecoverable data-depend…
williamwen42 Apr 1, 2025
33535b3
[dynamo] Support Tensor subclass that has dynamic attributes or calls…
StrongerXi Apr 2, 2025
0d4dbfd
[dynamo] Support `torch.Tensor._make_subclass` and tracing through te…
StrongerXi Apr 2, 2025
3463ea1
[dynamo] Support tensor subclass with overriden tensor methods and pr…
StrongerXi Apr 2, 2025
bb98749
[dynamo] Always trace into tensor subclass `__torch_function__` (#149…
StrongerXi Apr 2, 2025
1017927
multidimensional slicing (#150104)
avikchaudhuri Apr 2, 2025
74aa9f5
ci: Use cache / progress when local docker build (#150551)
seemethere Apr 2, 2025
a677b49
[Profiler] Fix Empty C Call Queue (#150370)
sraikund16 Apr 2, 2025
0bacb90
[invoke_subgraph][min-cut partitioner] Fix bug to use the correct roo…
anijain2305 Apr 2, 2025
8667a00
Add stride + dtype to autotune results (#150419)
PaulZhang12 Apr 1, 2025
0198e44
Update torch-xpu-ops commit pin to 98c808d (#150554)
chuanqi129 Apr 2, 2025
de15ef0
[invoke_subgraph] Force grad_outs to be contiguous at tracing time (#…
anijain2305 Apr 2, 2025
61a1f09
Revert "[cuda] Add new faster gammabeta backward kernel (#148605)"
pytorchmergebot Apr 2, 2025
24f5065
fix bug in logging code (#150518)
exclamaforte Apr 2, 2025
f363fe6
[AOTInductor] Fix autotuning code's codegen (#150522)
muchulee8 Apr 3, 2025
13f4819
Add Chillee as core reviewer (#150579)
zou3519 Apr 2, 2025
77dca39
[aoti] make a check function for each input (#150553)
yushangdi Apr 3, 2025
2e5d95a
[FlexAttention] Remove dead code (#150575)
drisspg Apr 2, 2025
90ddb33
[export] specialize for aten.to (#149235)
pianpwk Apr 3, 2025
fc674b4
[c10d] Add logging for desync debug report (#150513)
fduwjj Apr 3, 2025
c067127
Ensure cuda_dlink_post_cflags are quoted as well (#150151)
saagarjha Apr 3, 2025
9e10601
[XPU] Add an implict conversion from XPUStream to sycl::queue* (#148646)
zhiweij1 Apr 3, 2025
e6e07ec
[ROCm] code cleanup of architecture checks (#150473)
apakbin Apr 3, 2025
6fa1b17
ROCm: Add trailing comma for consistency in gfx architecture list (#1…
jagadish-amd Apr 3, 2025
d4c30b4
[AOTI][dashboard] Update how peak memory is measured (#150534)
desertfire Apr 2, 2025
5d9c7f7
[fbcode]Removing `@NoIntBaseDeprecated` annotation in `evaluation.thr…
Sunnie912 Apr 3, 2025
e0d19cf
Enable weekly test for operator benchmark (#150502)
LifengWang Apr 3, 2025
cbc901f
Implement `raise ... from ...` (#148766)
guilhermeleobas Apr 2, 2025
781d28e
add unit test for preferred_blas_library settings (#150581)
jeffdaily Apr 3, 2025
70b34a4
Add new dependences for gen_pyi.py (#150391)
fffrog Apr 3, 2025
ff783f0
Fix shape guard failure to be valid python (#149149)
isuruf Apr 2, 2025
f9a7eac
use python fallback if there are overflows (#149197)
isuruf Apr 2, 2025
a72b4eb
Support windows in C++ shape guards (#149211)
isuruf Apr 2, 2025
5314a6f
[export] Fix deserialization issue (#150515)
angelayi Apr 3, 2025
440c07e
Fix detection of GPU multicast (#150563)
lw Apr 3, 2025
5be5cfe
[inductor][autotune cache] add torch_key() to configs hash (#150494)
davidberard98 Apr 3, 2025
fa0fdc0
if blaslt fails, fall back to blas (#150147)
jeffdaily Apr 3, 2025
5d36253
Refactoring: fix the python constant check (#150608)
fffrog Apr 3, 2025
78d1165
[DTensor][tp] fix errors in FSDP+TP checkpointing test (#150354)
XilunWu Mar 31, 2025
96f35f5
update get start xpu document for v2.7 (#150397)
ZhaoqiongZ Apr 3, 2025
3b02f79
Add torch._scaled_mm for CPU (#150410)
yanbing-j Apr 3, 2025
1843ad4
[Inductor] Cache CUDA compilation errors (#149716)
kadeng Apr 3, 2025
c1d5035
Enable C++ dynamic shape guards by default (#140756)
isuruf Apr 2, 2025
51da241
[aoti] Fix cannot determine truth value of Relation error when propa…
yushangdi Apr 3, 2025
a3f9e04
[export] Make aoti_call_delegate hop traceable (#148804)
yiming0416 Apr 3, 2025
277369a
Move formulas on separate line in loss.py (#150565)
svekars Apr 3, 2025
d41c22b
Revert "[fx] Move Node._prepend/Node._remove_from_list to C++ (#14826…
jansel Apr 3, 2025
5a654de
Revert "Enable C++ dynamic shape guards by default (#140756)"
pytorchmergebot Apr 3, 2025
941090a
Make sure torch.compiler._is_compiling_flag=True in aoti (#150588)
yushangdi Apr 3, 2025
2abd814
[validations] Run nccl version check on Linux only (#150635)
atalman Apr 3, 2025
c6defa9
[cuda] Add new faster gammabeta backward kernel (#148605) (Reapply wi…
ahmadsharif1 Apr 3, 2025
9e55dae
CUDA CachingHostAllocator tracks registrations to call correct free (…
jeffdaily Apr 3, 2025
76994d4
[pytorch] add experimental TORCH_LIBRARY_THREAD_UNSAFE_LAZY_INIT (#15…
rmaz Apr 3, 2025
c0618a3
Update commitlist.py instructions for the GitHub repo regime (#149535)
janeyx99 Apr 3, 2025
a2dce42
Split up cub-RadixSortPairs.cu to parallelize compilation (#148936)
TovlyFB Apr 3, 2025
118e386
[dynamo] disable new test_assert_failure_in_generic_ctx_mgr internall…
williamwen42 Apr 3, 2025
5cf3029
Remove unused rand call if not fallback to eager for rand (#147790)
henryhu6 Apr 3, 2025
8878289
[aten] 8 bytes aligned vector loads for bf16 and fp16 dtypes in torch…
zhaozhul Apr 3, 2025
1ab6c4f
[Codemod][AddExplicitStrictExportForTrainingInferenceArg] caffe2/ (#1…
gmagogsfm Apr 3, 2025
b0e28f6
Revert "add unit test for preferred_blas_library settings (#150581)"
pytorchmergebot Apr 3, 2025
1bc2b2b
bound sympy accuracy (#150383)
avikchaudhuri Apr 4, 2025
d0026fa
[ROCm][TunableOp] Fix UT race condition and reduce UT duration. (#150…
naromero77amd Apr 4, 2025
f9f6c08
support guard or false/true in user code and add tests (#150178)
laithsakka Mar 28, 2025
1979a40
Make CompileEventLogger more defensive w.r.t to AOTAutogradCache and …
jamesjwu Apr 3, 2025
a9e2f22
[Bugfix] Fix compile error with `torch.Tensor.unsqueeze_` and inplace…
Lucaskabela Apr 4, 2025
bd9c42e
[c10d] Surface error type when we unlink and create named pipe for Du…
fduwjj Apr 3, 2025
ed0fd2f
clang-format aten/src/ATen/cpu/vec/*.h (#150426)
swolchok Apr 2, 2025
7df6f93
Adapt test_misc.py for HPUs (#149499)
amathewc Apr 4, 2025
c6d79c1
[dynamic shapes] allow duck typing for 0/1 (#150222)
pianpwk Apr 4, 2025
e6e1f8c
[audio hash update] update the pinned audio hash (#150589)
pytorchupdatebot Apr 4, 2025
98d06b4
[Dynamo] Fix `dict.items()` return type (#150112)
shink Apr 4, 2025
f3cb355
[executorch hash update] update the pinned executorch hash (#149817)
pytorchupdatebot Apr 4, 2025
4854926
Revert "Add torch._scaled_mm for CPU (#150410)"
pytorchmergebot Apr 4, 2025
73358d3
Fix codegen, change str comparison opeator to == for proper equality …
jgrzybek-habana Apr 4, 2025
09c4da9
[CUDA][avgpool2d] Fix backward launch bounds again for `sm100`, `sm12…
eqy Apr 4, 2025
295b7e2
[MPS/inductor] Add support for hermite_polynomial_h. (#150664)
dcci Apr 4, 2025
1b0a023
[Dynamo][Misc] Apply typing hints for `codegen` (#150289)
shink Apr 4, 2025
07d439e
[aoti] Split ConstantType definition out of model.h (#150545)
zhxchen17 Apr 4, 2025
f443035
Revert "[cuda] Add new faster gammabeta backward kernel (#148605) (Re…
pytorchmergebot Apr 4, 2025
c93e34d
Revert "bound sympy accuracy (#150383)"
pytorchmergebot Apr 4, 2025
c53bc61
caffe2: Fix lint errors in native/xnnpack/Linear.cpp (#150508)
EricGriffith Apr 4, 2025
861d2cc
Add a param for save format in Storage Writer (#150025)
ankitageorge Apr 4, 2025
2a2ddff
[Inductor] Fix consolidating _scaled_mm into mm template TMA error (#…
PaulZhang12 Apr 4, 2025
2e23768
Expose symbols on macos in the xplat pytorch stack (#150487)
stepanhruda Apr 4, 2025
d6887f4
[Inductor] Fallback embedding when sparse is True (#150659)
leslie-fang-intel Apr 4, 2025
2e4ae2a
Fix conv2d strided prologue (#150697)
eellison Apr 4, 2025
3320efe
Refresh expected results. (#150264)
laithsakka Apr 4, 2025
c14977e
Use 'rocm' naming for rocm-related workflows/jobs (#150555)
jithunnair-amd Apr 5, 2025
7ac8186
[MPSInductor] Speedup `sum`/`prod` reductions (#150566)
malfet Apr 5, 2025
60a45eb
[AOTInductor] Introduce MaybeOwningAtenTensorHandle for ConstantMap (…
muchulee8 Apr 4, 2025
cfea55d
[MPS] fix inverse bug for N>1024 (#146754)
Isalia20 Apr 5, 2025
c830c12
[MPSInductor] Fix tiled reduction logic (#150737)
malfet Apr 5, 2025
83b870a
Fix missing braces for clang CUDA (#150736)
r-barnes Apr 6, 2025
15768cc
add unit test for preferred_blas_library settings (#150581)
jeffdaily Apr 6, 2025
2d98a1c
[MTIA] Map names to operand indices when folding submodules (#150692)
klintqinami Apr 6, 2025
caf8d9b
Revert "Fix conv2d strided prologue (#150697)"
pytorchmergebot Apr 6, 2025
55e62ff
bf16 grouped gemm (#150374)
ngimel Apr 6, 2025
49f6cce
[MPS] grad scaler (#150255)
Isalia20 Apr 6, 2025
6c38b9b
[typing] Add type hints to `__init__` methods in `torch.distributions…
randolf-scholz Apr 6, 2025
6a8ab90
[AOTI][dashboard] Fix mis-calculated memory compression ratio (#150695)
desertfire Apr 6, 2025
8adfcd3
[cuDNN][SDPA] Loosen constraints for GQA for cuDNN Attention (#150337)
eqy Apr 6, 2025
912102b
Make at::vec::Vectorized ops work with scalars (#150380)
swolchok Apr 6, 2025
0aaf353
Overload unary - operator on at::vec::Vectorized to call neg() (#150568)
swolchok Apr 6, 2025
47b494e
Add type hints to `_tensor_docs.add_docstr_all` (#150715)
pganssle-google Apr 6, 2025
370ba6b
[codemod] Fix `-Wambiguous-reversed-operator` in aten/src/ATen/cuda/t…
r-barnes Apr 7, 2025
d8d306c
Suppress `-Wunused-function` for DSA (#150735)
r-barnes Apr 7, 2025
d985758
Generalize compile collective to avoid cuda-bias (#150405)
Chao1Han Apr 7, 2025
d86c141
Generalize poison fork logic for each device backend (#144664)
guangyey Apr 6, 2025
b6929ae
Fix conv2d strided prologue (#150697)
eellison Apr 6, 2025
24aadb4
[precompile] Serialization for GlobalStateGuard (#150636)
zhxchen17 Apr 7, 2025
164d2c8
Add check in `test_cow_input` to ensure COW data is never changed (#1…
kurtamohler Apr 5, 2025
25662d3
[xla hash update] update the pinned xla hash (#132021)
pytorchupdatebot Apr 7, 2025
cdf3b63
Update slow tests (#150283)
pytorchupdatebot Apr 7, 2025
e209625
[torchrec] update local_shards_wrapper to latest version (#150469)
iamzainhuda Apr 7, 2025
99c9a31
[submodule] [Snapshot/Profiler] Memory Snapshot On Demand (#150559)
sraikund16 Apr 7, 2025
5e3c821
cpp_wrapper: Re-enable code disabled for forward compatibility (#150671)
benjaminglass1 Apr 7, 2025
f0abbab
AOTI fallback ops: sort alphabetically (#150672)
benjaminglass1 Apr 7, 2025
f813d64
cpp_wrapper: Fix even more tests (#147225)
benjaminglass1 Apr 7, 2025
0ad2c5d
Add RECORD_FUNCTION for AOTI (#150150)
shiyang-weng Apr 7, 2025
56ab71d
[ROCm] Expand workspace size for gfx95 (#150632)
jpvillam-amd Apr 7, 2025
06e9dea
[c10d][fr] Improve FR dump robustness with all watchdog broadcast wai…
fduwjj Apr 5, 2025
957faaa
Avoid overflow in vector_norm for scalar input (#144073)
isuruf Apr 5, 2025
7d2411d
[DCP][OSS] Introduce barrier util in the DistWrapper for rank local c…
saumishr Apr 7, 2025
6fcffd8
Optimize SVE embedding performance (#150176)
annop-w Apr 7, 2025
2a1e2b8
[logging] Add pgo remote get/put timings to dynamo_compile (#150322)
masnesral Apr 4, 2025
f8b53f4
[export] raise when Dim.DYNAMIC 0/1 specializes (#150716)
pianpwk Apr 7, 2025
bf1132c
Revert "Generalize poison fork logic for each device backend (#144664)"
pytorchmergebot Apr 7, 2025
ed0dea3
[AO] update port_metadata_pass to support quant_affine ops (#150642)
mcr229 Apr 7, 2025
5653fb3
[AO] Add Moving Average Affine Observer (#150643)
mcr229 Apr 7, 2025
eba05e2
[AO] Refactor convert and add QuantAffinePlaceholderObserver (#150644)
mcr229 Apr 7, 2025
fbccbfe
[BE] Fix Amp.metal compilation warning (#150783)
malfet Apr 7, 2025
78fe079
Support having no metadata file for HuggingFaceStorageReader (#150701)
ankitageorge Apr 7, 2025
6ea5514
[invoke_subgraph] Lazy backward (#150666)
anijain2305 Apr 7, 2025
91173ff
Fixing NCCL abort hang issue when a ProcessGroupNCCL manages multiple…
hexinw-nvidia Apr 7, 2025
e9e5682
[ROCm] Build Pytorch extensions with amdclang++ (#150451)
akashveramd Apr 7, 2025
5228986
[CUDA] Only use vec128 if CUDA version is newer than 12.8 (#150705)
malfet Apr 8, 2025
d7f3cd0
Add Half support for weight_norm on CPU (#148878)
CaoE Apr 8, 2025
c0991b0
README: anaconda license violation / no longer recommend anaconda sin…
morotti Apr 8, 2025
73b4938
[cuda] Add new faster gammabeta backward kernel (#148605) (Reapply wi…
ahmadsharif1 Apr 8, 2025
836955b
[Manylinux 2.28] Correct Linux aarch64 cuda binaries wheel name (#150…
atalman Apr 8, 2025
7e11089
Optimize dataloader Self typing (#146816)
zeshengzong Apr 8, 2025
c9c0f8e
Add plot for `torch.nn.Threshold` and `torch.nn.GLU` (#150171)
zeshengzong Apr 8, 2025
f8aa640
Refactor: add initialization of math.lcm into torch_c_binding_in_grap…
fffrog Apr 7, 2025
58ede0c
[Inductor XPU] Refine `test_mkldnn_pattern_matcher.py` to be reusable…
etaf Apr 7, 2025
a106842
[XPU] Fix XPU unit test on Windows (#150520)
LuFinch Apr 8, 2025
881d994
Add more check for torch.ormqr (#150759)
fffrog Apr 8, 2025
3da14d3
Fix the Problems About Defining Static Variable in Inline Function (#…
fffrog Apr 8, 2025
3649e2e
Safer bookkeeping of NCCL communicators (#150681)
lw Apr 7, 2025
1791b41
Clarify behavior of TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK (#1…
lw Apr 7, 2025
f3b2fb6
Allow trace through unittest (#146500)
guilhermeleobas Apr 7, 2025
ad51618
Update CPython tests for ctx manager to use unittest (#146501)
guilhermeleobas Apr 7, 2025
a402c2f
Remove redundant code in cuda/__init__.py (#150529)
fffrog Apr 8, 2025
05365e3
Remove torch functions that do not support device arguments from _dev…
fffrog Apr 8, 2025
da73225
[Intel GPU] int4 WOQ gemm XPU Support (#137566)
ZhiweiYan-96 Apr 8, 2025
52d172e
Facilitate at::_weight_int4pack_mm_with_scale_and_zeros related regis…
ZhiweiYan-96 Apr 8, 2025
ec5f2e3
[Build] Fix fbgemm build with gcc-12+ (#150847)
malfet Apr 8, 2025
1239260
[Accelerator][Chore] Use existing `acc` when raising an error (#150829)
shink Apr 8, 2025
97f34f0
[ROCm][Windows] Include AOTriton dependent sources in Windows build (…
ikalinic Apr 8, 2025
4447352
Revert "[CUDA] Only use vec128 if CUDA version is newer than 12.8 (#1…
pytorchmergebot Apr 8, 2025
173f126
[invoke_subgraph] Preserve node meta (#150782)
anijain2305 Apr 7, 2025
3e0038a
Fix torch.matmul related out dtype check (#148174)
fffrog Apr 8, 2025
4926bd6
Revert "Fix the Problems About Defining Static Variable in Inline Fun…
pytorchmergebot Apr 8, 2025
9775961
[dynamo] reconstruct functions decorated in the compiled region prope…
williamwen42 Apr 8, 2025
e6bd133
add batching rule for `torch.Tensor.scatter_add_` (#150543)
guilhermeleobas Apr 8, 2025
aafc4b6
Do not depend on numpy during the import (#150816)
basilwong Apr 8, 2025
c36d9b0
[Codemod][AddExplicitStrictExportForTrainingInferenceArg] caffe2/torc…
gmagogsfm Apr 8, 2025
901b02c
[Inductor] fix alignement assumption for fallback (#150777)
shunting314 Apr 7, 2025
17f9276
Code Clean: Remove python3.8 specific code because PyTorch now need P…
fffrog Apr 8, 2025
89505f4
[AOTI] Always use oss schema for ExternKernelNodes serialization (#15…
yiming0416 Apr 8, 2025
27ded35
Fix inplacing with multiple, fused uses (#150845)
eellison Apr 8, 2025
d9f47c7
Revert "Fixing NCCL abort hang issue when a ProcessGroupNCCL manages …
pytorchmergebot Apr 9, 2025
5f18b7d
[docs] remove --recursive flag from readme (#150785)
danielvegamyhre Apr 9, 2025
44deb67
Fix _del_library (#150495)
zou3519 Apr 8, 2025
2e7c9d3
Refactor layout constraint selection logic (#148104)
zou3519 Apr 8, 2025
bc47d53
[MPS] Support ArgumentBuffer bindings from C++/Python (#150780)
malfet Apr 9, 2025
4d6ff6c
Fill config2launcher with correct launchers during cache hit coordina…
jamesjwu Apr 8, 2025
b01877a
Fix addbmm & addmv & baddbmm out dtype check (#148176)
fffrog Mar 17, 2025
604467d
Code Clean: Remove specific bytecode support in dynamo for python3.8 …
fffrog Apr 8, 2025
81f60f3
Expand allowed_getattr_types_for_subgm to torch.Tensor (#150867)
SherlockNoMad Apr 9, 2025
142f0f8
Enable modernize-use-default-member-init (#149046)
cyyever Apr 9, 2025
64ac41f
[pytorch] add header docs for TORCH_LIBRARY_THREAD_UNSAFE_LAZY_INIT (…
rmaz Apr 9, 2025
886d9ac
[docs] Add 32-bit complex to the list of dtypes (#144590)
antoinebrl Apr 9, 2025
2299087
[ROCm] Introduce AMD specific inductor gemm tuning (#147315)
jataylo Apr 9, 2025
246f3b6
[Quant][PT2E][X86] enable qconv1d-relu fusion (#150751)
Xia-Weiwen Apr 9, 2025
5a42215
Add `torch.triu_indices`, `torch.tril_indices` dtype description (#15…
zeshengzong Apr 9, 2025
d0e3482
Update triton wheel build, setuptools pin (#150931)
atalman Apr 9, 2025
97a5e5c
Added _fused_sdp_choice_stub dispatcher support for HPU device (#149512)
pralay-das Apr 9, 2025
1a56609
[ONNX] Supporting different opset versions for torchlib registry (#14…
shubhambhokare1 Apr 9, 2025
c8d37b9
[ez][c10d] Disable start event recording for coalesced col and improv…
fduwjj Apr 8, 2025
8aaf296
[c10d][fr] Refactor analysis script for modularization and reusing fo…
fduwjj Apr 9, 2025
72755a4
Avoid circular imports in tracing_state_functions (#150325)
justinchuby Apr 9, 2025
c714d2f
[hop] support base_hop._gen_schema (#149688)
ydwu4 Apr 7, 2025
a4bb2f1
Inductor respects exact strides on custom ops by default (#150511)
zou3519 Apr 8, 2025
a0e796d
Revert "Inductor respects exact strides on custom ops by default (#15…
pytorchmergebot Apr 9, 2025
01568cb
Revert "Refactor layout constraint selection logic (#148104)"
pytorchmergebot Apr 9, 2025
c59aaa0
[DTensor] add _explicit_order_placements util (#150493)
wconstab Apr 4, 2025
6fb089f
[AO] fix per token block size calculation (#150890)
mcr229 Apr 9, 2025
cc185c3
[aoti] Use generate_fake_kernels_from_real_mismatches config for draf…
yushangdi Apr 9, 2025
d04a6ec
add reduce_scatter to symm mem ops (#150813)
ngimel Apr 9, 2025
d3a2872
Hipify global scrach defintion in AOTI codegen (#150893)
zoranzhao Apr 9, 2025
cfab04d
Fix aten.div type promotion for FakeTensor (#150874)
yushangdi Apr 9, 2025
a4545f0
[Codemod][AddExplicitStrictExportForTrainingInferenceArg] caffe2/test…
gmagogsfm Apr 9, 2025
f237ee5
ProcessGroupGloo: support lazy_init (#150801)
d4l3k Apr 9, 2025
ea0cbba
[export] Refine draft-export CVE with Dim.AUTO (#150876)
angelayi Apr 9, 2025
2b9d8a5
Fix `-Wmissing-braces` in a few files (#150802)
r-barnes Apr 9, 2025
860765d
update benchamark result due to <1% regression (#150937)
laithsakka Apr 9, 2025
d751698
Support negative values for fill with uint tensors (#144458)
isuruf Apr 8, 2025
357814c
[AOTI] Remove typedef for half and bfloat16 (#150657)
desertfire Apr 9, 2025
31fe258
[inductor] Add features to docstring_linter (see #142496) (#145834)
rec Apr 9, 2025
087e858
support backed_size_oblivious in guard_or_false/guard_or_true (#150231)
laithsakka Apr 9, 2025
786422a
Remove a workaround added in #149381 (#150693)
tengyifei Apr 9, 2025
cc2decd
[CI][CUDA][Distributed]Update test_composability.py (#148578)
nWEIdia Apr 9, 2025
b347f0c
Add xpu backend for depthwise_conv2d/3d Ops
yucai-intel Mar 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
12 changes: 11 additions & 1 deletion .ci/aarch64_linux/aarch64_wheel_ci_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,9 @@ def complete_wheel(folder: str) -> str:
"""
wheel_name = list_dir(f"/{folder}/dist")[0]

# Please note for cuda we don't run auditwheel since we use custom script to package
# the cuda dependencies to the wheel file using update_wheel() method.
# However we need to make sure filename reflects the correct Manylinux platform.
if "pytorch" in folder and not enable_cuda:
print("Repairing Wheel with AuditWheel")
check_call(["auditwheel", "repair", f"dist/{wheel_name}"], cwd=folder)
Expand All @@ -147,7 +150,14 @@ def complete_wheel(folder: str) -> str:
f"/{folder}/dist/{repaired_wheel_name}",
)
else:
repaired_wheel_name = wheel_name
repaired_wheel_name = wheel_name.replace(
"linux_aarch64", "manylinux_2_28_aarch64"
)
print(f"Renaming {wheel_name} wheel to {repaired_wheel_name}")
os.rename(
f"/{folder}/dist/{wheel_name}",
f"/{folder}/dist/{repaired_wheel_name}",
)

print(f"Copying {repaired_wheel_name} to artifacts")
shutil.copy2(
Expand Down
2 changes: 2 additions & 0 deletions .ci/docker/almalinux/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ FROM base as cuda
ARG CUDA_VERSION=12.4
RUN rm -rf /usr/local/cuda-*
ADD ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
ENV CUDA_HOME=/usr/local/cuda-${CUDA_VERSION}
# Preserve CUDA_VERSION for the builds
ENV CUDA_VERSION=${CUDA_VERSION}
Expand Down
12 changes: 10 additions & 2 deletions .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -460,10 +460,18 @@ if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then
fi
fi

no_cache_flag=""
progress_flag=""
# Do not use cache and progress=plain when in CI
if [[ -n "${CI:-}" ]]; then
no_cache_flag="--no-cache"
progress_flag="--progress=plain"
fi

# Build image
docker build \
--no-cache \
--progress=plain \
${no_cache_flag} \
${progress_flag} \
--build-arg "BUILD_ENVIRONMENT=${image}" \
--build-arg "PROTOBUF=${PROTOBUF:-}" \
--build-arg "LLVMDEV=${LLVMDEV:-}" \
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/executorch.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
cedf52aa8e4df879886270a5920da6fe84cbaa67
7e487c24e1c20c3f4606c2d8aca2778873b00b4c
46 changes: 8 additions & 38 deletions .ci/docker/common/install_cuda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

set -ex

NCCL_VERSION=v2.26.2-1
CUDNN_VERSION=9.5.1.17

function install_cusparselt_040 {
Expand Down Expand Up @@ -40,8 +39,7 @@ function install_cusparselt_063 {

function install_118 {
CUDNN_VERSION=9.1.0.70
NCCL_VERSION=v2.21.5-1
echo "Installing CUDA 11.8 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.4.0"
echo "Installing CUDA 11.8 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.4.0"
rm -rf /usr/local/cuda-11.8 /usr/local/cuda
# install CUDA 11.8.0 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
Expand All @@ -59,14 +57,7 @@ function install_118 {
cd ..
rm -rf tmp_cudnn

# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=11.8 bash install_nccl.sh

install_cusparselt_040

Expand All @@ -75,7 +66,7 @@ function install_118 {

function install_124 {
CUDNN_VERSION=9.1.0.70
echo "Installing CUDA 12.4.1 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.2"
echo "Installing CUDA 12.4.1 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.6.2"
rm -rf /usr/local/cuda-12.4 /usr/local/cuda
# install CUDA 12.4.1 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
Expand All @@ -93,22 +84,15 @@ function install_124 {
cd ..
rm -rf tmp_cudnn

# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=12.4 bash install_nccl.sh

install_cusparselt_062

ldconfig
}

function install_126 {
echo "Installing CUDA 12.6.3 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.3"
echo "Installing CUDA 12.6.3 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.6.3"
rm -rf /usr/local/cuda-12.6 /usr/local/cuda
# install CUDA 12.6.3 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda_12.6.3_560.35.05_linux.run
Expand All @@ -126,14 +110,7 @@ function install_126 {
cd ..
rm -rf tmp_cudnn

# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=12.6 bash install_nccl.sh

install_cusparselt_063

Expand Down Expand Up @@ -241,7 +218,7 @@ function prune_126 {

function install_128 {
CUDNN_VERSION=9.8.0.87
echo "Installing CUDA 12.8.0 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.3"
echo "Installing CUDA 12.8.0 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.6.3"
rm -rf /usr/local/cuda-12.8 /usr/local/cuda
# install CUDA 12.8.0 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux.run
Expand All @@ -259,14 +236,7 @@ function install_128 {
cd ..
rm -rf tmp_cudnn

# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=12.8 bash install_nccl.sh

install_cusparselt_063

Expand Down
12 changes: 2 additions & 10 deletions .ci/docker/common/install_cuda_aarch64.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@

set -ex

NCCL_VERSION=v2.26.2-1
CUDNN_VERSION=9.8.0.87

function install_cusparselt_063 {
Expand All @@ -18,7 +17,7 @@ function install_cusparselt_063 {
}

function install_128 {
echo "Installing CUDA 12.8.0 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.3"
echo "Installing CUDA 12.8.0 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.6.3"
rm -rf /usr/local/cuda-12.8 /usr/local/cuda
# install CUDA 12.8.0 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux_sbsa.run
Expand All @@ -36,14 +35,7 @@ function install_128 {
cd ..
rm -rf tmp_cudnn

# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b ${NCCL_VERSION} --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=12.8 bash install_nccl.sh

install_cusparselt_063

Expand Down
3 changes: 1 addition & 2 deletions .ci/docker/common/install_executorch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,7 @@ setup_executorch() {
pushd executorch

export PYTHON_EXECUTABLE=python
export EXECUTORCH_BUILD_PYBIND=ON
export CMAKE_ARGS="-DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
export CMAKE_ARGS="-DEXECUTORCH_BUILD_PYBIND=ON -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"

as_jenkins .ci/scripts/setup-linux.sh --build-tool cmake || true
popd
Expand Down
4 changes: 3 additions & 1 deletion .ci/docker/common/install_halide.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ git clone https://github.com/halide/Halide.git
pushd Halide
git checkout ${COMMIT} && git submodule update --init --recursive
pip_install -r requirements.txt
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -S . -B build
# NOTE: pybind has a requirement for cmake > 3.5 so set the minimum cmake version here with a flag
# Context: https://github.com/pytorch/pytorch/issues/150420
cmake -G Ninja -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_BUILD_TYPE=Release -S . -B build
cmake --build build
test -e ${CONDA_PREFIX}/lib/python3 || ln -s python${ANACONDA_PYTHON_VERSION} ${CONDA_PREFIX}/lib/python3
cmake --install build --prefix ${CONDA_PREFIX}
Expand Down
26 changes: 26 additions & 0 deletions .ci/docker/common/install_nccl.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

set -ex

NCCL_VERSION=""
if [[ ${CUDA_VERSION:0:2} == "11" ]]; then
NCCL_VERSION=$(cat ci_commit_pins/nccl-cu11.txt)
elif [[ ${CUDA_VERSION:0:2} == "12" ]]; then
NCCL_VERSION=$(cat ci_commit_pins/nccl-cu12.txt)
else
echo "Unexpected CUDA_VERSION ${CUDA_VERSION}"
exit 1
fi

if [[ -n "${NCCL_VERSION}" ]]; then
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
pushd nccl
make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
popd
rm -rf nccl
ldconfig
fi
2 changes: 2 additions & 0 deletions .ci/docker/libtorch/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM cpu as cuda
ADD ./common/install_cuda.sh install_cuda.sh
ADD ./common/install_magma.sh install_magma.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
ENV CUDA_HOME /usr/local/cuda

FROM cuda as cuda11.8
Expand Down
4 changes: 3 additions & 1 deletion .ci/docker/linter-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ RUN bash ./install_python.sh && rm install_python.sh /opt/requirements-ci.txt
# Install cuda and cudnn
ARG CUDA_VERSION
COPY ./common/install_cuda.sh install_cuda.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh install_nccl.sh /ci_commit_pins/nccl-cu*
ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH

Expand Down
4 changes: 3 additions & 1 deletion .ci/docker/manywheel/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,9 @@ FROM base as cuda
ARG BASE_CUDA_VERSION=10.2
# Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh install_nccl.sh /ci_commit_pins/nccl-cu*

FROM base as intel
# MKL
Expand Down
4 changes: 3 additions & 1 deletion .ci/docker/manywheel/Dockerfile_2_28
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,9 @@ FROM base as cuda
ARG BASE_CUDA_VERSION=11.8
# Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh install_nccl.sh ci_commit_pins/nccl-cu*

FROM base as intel
# MKL
Expand Down
4 changes: 3 additions & 1 deletion .ci/docker/manywheel/Dockerfile_cuda_aarch64
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,9 @@ FROM base as cuda
ARG BASE_CUDA_VERSION
# Install CUDA
ADD ./common/install_cuda_aarch64.sh install_cuda_aarch64.sh
RUN bash ./install_cuda_aarch64.sh ${BASE_CUDA_VERSION} && rm install_cuda_aarch64.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
RUN bash ./install_cuda_aarch64.sh ${BASE_CUDA_VERSION} && rm install_cuda_aarch64.sh install_nccl.sh ci_commit_pins/nccl-cu*

FROM base as magma
ARG BASE_CUDA_VERSION
Expand Down
10 changes: 10 additions & 0 deletions .ci/docker/ubuntu-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,16 @@ COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash install_cusparselt.sh
RUN rm install_cusparselt.sh

# Install NCCL
ARG CUDA_VERSION
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
RUN bash install_nccl.sh
RUN rm install_nccl.sh /ci_commit_pins/nccl-cu*
ENV USE_SYSTEM_NCCL=1
ENV NCCL_INCLUDE_DIR="/usr/local/cuda/include/"
ENV NCCL_LIB_DIR="/usr/local/cuda/lib64/"

# Install CUDSS
ARG CUDA_VERSION
COPY ./common/install_cudss.sh install_cudss.sh
Expand Down
9 changes: 8 additions & 1 deletion .ci/docker/ubuntu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,16 @@ RUN bash ./install_lcov.sh && rm install_lcov.sh
# Install cuda and cudnn
ARG CUDA_VERSION
COPY ./common/install_cuda.sh install_cuda.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh install_nccl.sh /ci_commit_pins/nccl-cu*
ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH
# No effect if cuda not installed
ENV USE_SYSTEM_NCCL=1
ENV NCCL_INCLUDE_DIR="/usr/local/cuda/include/"
ENV NCCL_LIB_DIR="/usr/local/cuda/lib64/"


# (optional) Install UCC
ARG UCX_COMMIT
Expand Down
50 changes: 3 additions & 47 deletions .ci/pytorch/macos-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,55 +33,11 @@ if which sccache > /dev/null; then
export PATH="${tmp_dir}:$PATH"
fi

cross_compile_arm64() {
# Cross compilation for arm64
# Explicitly set USE_DISTRIBUTED=0 to align with the default build config on mac. This also serves as the sole CI config that tests
# that building with USE_DISTRIBUTED=0 works at all. See https://github.com/pytorch/pytorch/issues/86448
USE_DISTRIBUTED=0 CMAKE_OSX_ARCHITECTURES=arm64 MACOSX_DEPLOYMENT_TARGET=11.0 USE_MKLDNN=OFF USE_QNNPACK=OFF WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel
}

compile_arm64() {
# Compilation for arm64
# TODO: Compile with OpenMP support (but this causes CI regressions as cross-compilation were done with OpenMP disabled)
USE_DISTRIBUTED=0 USE_OPENMP=1 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel
}

compile_x86_64() {
USE_DISTRIBUTED=0 WERROR=1 python setup.py bdist_wheel --plat-name=macosx_10_9_x86_64
}

build_lite_interpreter() {
echo "Testing libtorch (lite interpreter)."

CPP_BUILD="$(pwd)/../cpp_build"
# Ensure the removal of the tmp directory
trap 'rm -rfv ${CPP_BUILD}' EXIT
rm -rf "${CPP_BUILD}"
mkdir -p "${CPP_BUILD}/caffe2"

# It looks libtorch need to be built in "${CPP_BUILD}/caffe2 folder.
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
pushd "${CPP_BUILD}/caffe2" || exit
VERBOSE=1 DEBUG=1 python "${BUILD_LIBTORCH_PY}"
popd || exit

"${CPP_BUILD}/caffe2/build/bin/test_lite_interpreter_runtime"
}

print_cmake_info

if [[ ${BUILD_ENVIRONMENT} = *arm64* ]]; then
if [[ $(uname -m) == "arm64" ]]; then
compile_arm64
else
cross_compile_arm64
fi
elif [[ ${BUILD_ENVIRONMENT} = *lite-interpreter* ]]; then
export BUILD_LITE_INTERPRETER=1
build_lite_interpreter
else
compile_x86_64
fi
# Explicitly set USE_DISTRIBUTED=0 to align with the default build config on mac. This also serves as the sole CI config that tests
# that building with USE_DISTRIBUTED=0 works at all. See https://github.com/pytorch/pytorch/issues/86448
USE_DISTRIBUTED=0 USE_OPENMP=1 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel

if which sccache > /dev/null; then
print_sccache_stats
Expand Down
Loading