Split IndextoOffset() into offline and online versions #2136

yucai-intel · 2025-10-07T10:14:11Z

Divide indextoOffset() into two versions, offline and online, to reduce runtime overhead and as much as possible.

Copilot

Pull Request Overview

Split IndexToOffset into compile-time (static Dims) and runtime (-1) variants to reduce nvcc compilation time and refactor all kernel call sites to use the new template form. Key changes remove the previous contiguous fast path flag and introduce dimension template parameters across many XPU SYCL kernels.

Introduced IndexToOffset<T, IndexType, Dims> and dynamic specialization with Dims = -1; removed strict/non-strict contiguous branching.
Updated all kernel usages to pass an explicit Dims (positive, -1, or new sentinel -2) and added indexing_kind template parameters in RNN kernels.
Replaced standard SYCL subgroup size attribute with Intel-specific attribute in one kernel.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/comm/TensorInfo.h	Replaced old IndexToOffset implementation with compile-time and runtime (-1) variants, removed contiguous fast path.
src/ATen/native/xpu/sycl/WeightNormKernels.cpp	Updated all IndexToOffset calls to new API (runtime -1).
src/ATen/native/xpu/sycl/TensorModeKernel.cpp	Switched subgroup size attribute to Intel-specific and updated IndexToOffset usage.
src/ATen/native/xpu/sycl/TensorApplyUtils.h	Updated ApplyOp2 to use runtime (-1) IndexToOffset.
src/ATen/native/xpu/sycl/SummaryOpsKernels.cpp	Added ADims/BDims template params and updated IndexToOffset calls.
src/ATen/native/xpu/sycl/Sorting.cpp	Pass compile-time Dim to IndexToOffset.
src/ATen/native/xpu/sycl/ScanUtils.h	Migrated to runtime (-1) IndexToOffset calls.
src/ATen/native/xpu/sycl/RNNKernels.cpp	Added indexing_kind template parameter and adjusted macros to new IndexToOffset signature.
src/ATen/native/xpu/sycl/Indexing.h	Updated offset calculations to new runtime form.
src/ATen/native/xpu/sycl/Indexing.cpp	Added DstDim/SrcDim/IdxDim template params, macros now emit various Dims (including -2) for IndexToOffset.
src/ATen/native/xpu/sycl/Dropout.cpp	Added ADims / BDims template-based offset computation.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/comm/TensorInfo.h

src/ATen/native/xpu/sycl/TensorModeKernel.cpp

src/ATen/native/xpu/sycl/Indexing.cpp

Silv3S · 2025-10-20T09:28:31Z

Hi @yucai-intel, I can't build torch-xpu-ops since this commit was merged. Does it require some dependency update?
I'm using Ubuntu 24.04.2 LTS with oneAPI Base Toolkit 2025.2.1

 /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/TensorApplyUtils.h:139:11: error: too few template arguments for class template 'IndexToOffset'
    139 |         ? IndexToOffset<scalar1, IndexType>::get(linearIndex, a)
        |           ^
  /home/gta/torch-xpu-ops/src/comm/TensorInfo.h:155:8: note: template is declared here
    154 | template <typename T, typename IndexType, int Dims>
        | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    155 | struct IndexToOffset {
        |        ^
  In file included from /home/gta/torch-xpu-ops/src/ATen/native/quantized/sycl/FusedObsFakeQuantKernels.cpp:10:
  In file included from /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/DistributionTemplates.h:11:
  /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/TensorApplyUtils.h:144:11: error: too few template arguments for class template 'IndexToOffset'
    144 |         ? IndexToOffset<scalar2, IndexType>::get(linearIndex, b)
        |           ^
  /home/gta/torch-xpu-ops/src/comm/TensorInfo.h:155:8: note: template is declared here
    154 | template <typename T, typename IndexType, int Dims>
        | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    155 | struct IndexToOffset {
        |        ^
  2 errors generated.

PawelSwider2000 · 2025-10-20T10:58:00Z

Hi @yucai-intel, I can't build torch-xpu-ops since this commit was merged. Does it require some dependency update? I'm using Ubuntu 24.04.2 LTS with oneAPI Base Toolkit 2025.2.1

 /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/TensorApplyUtils.h:139:11: error: too few template arguments for class template 'IndexToOffset'
    139 |         ? IndexToOffset<scalar1, IndexType>::get(linearIndex, a)
        |           ^
  /home/gta/torch-xpu-ops/src/comm/TensorInfo.h:155:8: note: template is declared here
    154 | template <typename T, typename IndexType, int Dims>
        | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    155 | struct IndexToOffset {
        |        ^
  In file included from /home/gta/torch-xpu-ops/src/ATen/native/quantized/sycl/FusedObsFakeQuantKernels.cpp:10:
  In file included from /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/DistributionTemplates.h:11:
  /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/TensorApplyUtils.h:144:11: error: too few template arguments for class template 'IndexToOffset'
    144 |         ? IndexToOffset<scalar2, IndexType>::get(linearIndex, b)
        |           ^
  /home/gta/torch-xpu-ops/src/comm/TensorInfo.h:155:8: note: template is declared here
    154 | template <typename T, typename IndexType, int Dims>
        | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    155 | struct IndexToOffset {
        |        ^
  2 errors generated.

@Silv3S Most likely content of pytorch/torch/include/ is outdated. Manually removing it helped for me, but this file should be updated automatically

Seems that this part of: torch-xpu-ops/src/ATen/CMakeLists.txt does not track updates in files:

# ATen XPU headers

macro(install_xpu_headers glob_pattern dest_subdir)
  file(GLOB headers ${glob_pattern})
  if(headers)
    install(FILES ${headers} DESTINATION "${AT_INSTALL_INCLUDE_DIR}/${dest_subdir}")
  endif()
endmacro()

install_xpu_headers("xpu/*.h" "ATen/xpu")
install_xpu_headers("native/xpu/*.h" "ATen/native/xpu")
install_xpu_headers("native/xpu/sycl/*.h" "ATen/native/xpu/sycl")
install_xpu_headers("native/xpu/mkl/*.h" "ATen/native/xpu/mkl")
install_xpu_headers("native/nested/xpu/*.h" "ATen/native/nested/xpu")
install_xpu_headers("native/nested/xpu/sycl/*.h" "ATen/native/nested/xpu/sycl")
install_xpu_headers("native/quantized/*.h" "ATen/native/quantized/xpu")
install_xpu_headers("native/quantized/sycl/*.h" "ATen/native/quantized/xpu/sycl")
install_xpu_headers("native/sparse/xpu/*.h" "ATen/native/sparse/xpu")
install_xpu_headers("native/sparse/xpu/sycl/*.h" "ATen/native/sparse/xpu/sycl")
install_xpu_headers("native/transformers/*.h" "ATen/native/transformers/xpu")
install_xpu_headers("native/transformers/sycl/*.h" "ATen/native/transformers/xpu/sycl")

Silv3S · 2025-10-20T11:36:59Z

Thanks Paweł, it works. Problem is only visible in incremental builds and this PR didn't break anything. Sorry for the false alarm

yucai-intel added 15 commits October 7, 2025 18:06

Update TensorInfo.h

c58232f

Update Dropout.cpp

e504f42

Update Indexing.cpp

bd2afb2

Update Indexing.h

1dd16c0

Update RNNKernels.cpp

12c63f1

Update Sorting.cpp

3396375

Update SummaryOpsKernels.cpp

c8227fb

Update TensorApplyUtils.h

061e5cb

Update TensorModeKernel.cpp

b757a68

Update WeightNormKernels.cpp

adbbf14

Update ScanUtils.h

2499f21

Update ScanUtils.h

31578e3

Merge branch 'main' into yucai/i2o

75185f8

Merge branch 'main' into yucai/i2o

984e54d

Merge branch 'main' into yucai/i2o

9a29f58

jianyizh approved these changes Oct 15, 2025

View reviewed changes

Merge branch 'main' into yucai/i2o

409fd26

CuiYifeng requested a review from Copilot October 16, 2025 08:45

Copilot AI reviewed Oct 16, 2025

View reviewed changes

src/comm/TensorInfo.h Show resolved Hide resolved

src/comm/TensorInfo.h Show resolved Hide resolved

src/ATen/native/xpu/sycl/TensorModeKernel.cpp Outdated Show resolved Hide resolved

src/ATen/native/xpu/sycl/Indexing.cpp Show resolved Hide resolved

yucai-intel added 3 commits October 17, 2025 11:24

Update TensorInfo.h

995a054

Update TensorModeKernel.cpp

8a98735

Merge branch 'main' into yucai/i2o

7be1dc6

CuiYifeng requested a review from guangyey October 17, 2025 05:30

guangyey approved these changes Oct 17, 2025

View reviewed changes

CuiYifeng added this pull request to the merge queue Oct 20, 2025

Merged via the queue into main with commit 369a0c9 Oct 20, 2025
73 of 75 checks passed

CuiYifeng deleted the yucai/i2o branch October 20, 2025 02:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split IndextoOffset() into offline and online versions #2136

Split IndextoOffset() into offline and online versions #2136

yucai-intel commented Oct 7, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Silv3S commented Oct 20, 2025 •

edited

Loading

Uh oh!

PawelSwider2000 commented Oct 20, 2025 •

edited

Loading

Uh oh!

Silv3S commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Split IndextoOffset() into offline and online versions #2136

Split IndextoOffset() into offline and online versions #2136

Conversation

yucai-intel commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Silv3S commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PawelSwider2000 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Silv3S commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

yucai-intel commented Oct 7, 2025 •

edited

Loading

Silv3S commented Oct 20, 2025 •

edited

Loading

PawelSwider2000 commented Oct 20, 2025 •

edited

Loading