-
Notifications
You must be signed in to change notification settings - Fork 59
Split IndextoOffset() into offline and online versions #2136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Split IndexToOffset into compile-time (static Dims) and runtime (-1) variants to reduce nvcc compilation time and refactor all kernel call sites to use the new template form. Key changes remove the previous contiguous fast path flag and introduce dimension template parameters across many XPU SYCL kernels.
- Introduced IndexToOffset<T, IndexType, Dims> and dynamic specialization with Dims = -1; removed strict/non-strict contiguous branching.
- Updated all kernel usages to pass an explicit Dims (positive, -1, or new sentinel -2) and added indexing_kind template parameters in RNN kernels.
- Replaced standard SYCL subgroup size attribute with Intel-specific attribute in one kernel.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
src/comm/TensorInfo.h | Replaced old IndexToOffset implementation with compile-time and runtime (-1) variants, removed contiguous fast path. |
src/ATen/native/xpu/sycl/WeightNormKernels.cpp | Updated all IndexToOffset calls to new API (runtime -1). |
src/ATen/native/xpu/sycl/TensorModeKernel.cpp | Switched subgroup size attribute to Intel-specific and updated IndexToOffset usage. |
src/ATen/native/xpu/sycl/TensorApplyUtils.h | Updated ApplyOp2 to use runtime (-1) IndexToOffset. |
src/ATen/native/xpu/sycl/SummaryOpsKernels.cpp | Added ADims/BDims template params and updated IndexToOffset calls. |
src/ATen/native/xpu/sycl/Sorting.cpp | Pass compile-time Dim to IndexToOffset. |
src/ATen/native/xpu/sycl/ScanUtils.h | Migrated to runtime (-1) IndexToOffset calls. |
src/ATen/native/xpu/sycl/RNNKernels.cpp | Added indexing_kind template parameter and adjusted macros to new IndexToOffset signature. |
src/ATen/native/xpu/sycl/Indexing.h | Updated offset calculations to new runtime form. |
src/ATen/native/xpu/sycl/Indexing.cpp | Added DstDim/SrcDim/IdxDim template params, macros now emit various Dims (including -2) for IndexToOffset. |
src/ATen/native/xpu/sycl/Dropout.cpp | Added ADims / BDims template-based offset computation. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Hi @yucai-intel, I can't build torch-xpu-ops since this commit was merged. Does it require some dependency update? /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/TensorApplyUtils.h:139:11: error: too few template arguments for class template 'IndexToOffset'
139 | ? IndexToOffset<scalar1, IndexType>::get(linearIndex, a)
| ^
/home/gta/torch-xpu-ops/src/comm/TensorInfo.h:155:8: note: template is declared here
154 | template <typename T, typename IndexType, int Dims>
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
155 | struct IndexToOffset {
| ^
In file included from /home/gta/torch-xpu-ops/src/ATen/native/quantized/sycl/FusedObsFakeQuantKernels.cpp:10:
In file included from /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/DistributionTemplates.h:11:
/home/gta/pytorch/torch/include/ATen/native/xpu/sycl/TensorApplyUtils.h:144:11: error: too few template arguments for class template 'IndexToOffset'
144 | ? IndexToOffset<scalar2, IndexType>::get(linearIndex, b)
| ^
/home/gta/torch-xpu-ops/src/comm/TensorInfo.h:155:8: note: template is declared here
154 | template <typename T, typename IndexType, int Dims>
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
155 | struct IndexToOffset {
| ^
2 errors generated. |
@Silv3S Most likely content of pytorch/torch/include/ is outdated. Manually removing it helped for me, but this file should be updated automatically Seems that this part of: torch-xpu-ops/src/ATen/CMakeLists.txt does not track updates in files:
|
Thanks Paweł, it works. Problem is only visible in incremental builds and this PR didn't break anything. Sorry for the false alarm |
Divide indextoOffset() into two versions, offline and online, to reduce runtime overhead and as much as possible.