Skip to content

Conversation

yucai-intel
Copy link
Contributor

@yucai-intel yucai-intel commented Oct 7, 2025

Divide indextoOffset() into two versions, offline and online, to reduce runtime overhead and as much as possible.

@CuiYifeng CuiYifeng requested a review from Copilot October 16, 2025 08:45
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Split IndexToOffset into compile-time (static Dims) and runtime (-1) variants to reduce nvcc compilation time and refactor all kernel call sites to use the new template form. Key changes remove the previous contiguous fast path flag and introduce dimension template parameters across many XPU SYCL kernels.

  • Introduced IndexToOffset<T, IndexType, Dims> and dynamic specialization with Dims = -1; removed strict/non-strict contiguous branching.
  • Updated all kernel usages to pass an explicit Dims (positive, -1, or new sentinel -2) and added indexing_kind template parameters in RNN kernels.
  • Replaced standard SYCL subgroup size attribute with Intel-specific attribute in one kernel.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/comm/TensorInfo.h Replaced old IndexToOffset implementation with compile-time and runtime (-1) variants, removed contiguous fast path.
src/ATen/native/xpu/sycl/WeightNormKernels.cpp Updated all IndexToOffset calls to new API (runtime -1).
src/ATen/native/xpu/sycl/TensorModeKernel.cpp Switched subgroup size attribute to Intel-specific and updated IndexToOffset usage.
src/ATen/native/xpu/sycl/TensorApplyUtils.h Updated ApplyOp2 to use runtime (-1) IndexToOffset.
src/ATen/native/xpu/sycl/SummaryOpsKernels.cpp Added ADims/BDims template params and updated IndexToOffset calls.
src/ATen/native/xpu/sycl/Sorting.cpp Pass compile-time Dim to IndexToOffset.
src/ATen/native/xpu/sycl/ScanUtils.h Migrated to runtime (-1) IndexToOffset calls.
src/ATen/native/xpu/sycl/RNNKernels.cpp Added indexing_kind template parameter and adjusted macros to new IndexToOffset signature.
src/ATen/native/xpu/sycl/Indexing.h Updated offset calculations to new runtime form.
src/ATen/native/xpu/sycl/Indexing.cpp Added DstDim/SrcDim/IdxDim template params, macros now emit various Dims (including -2) for IndexToOffset.
src/ATen/native/xpu/sycl/Dropout.cpp Added ADims / BDims template-based offset computation.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@CuiYifeng CuiYifeng requested a review from guangyey October 17, 2025 05:30
@CuiYifeng CuiYifeng added this pull request to the merge queue Oct 20, 2025
Merged via the queue into main with commit 369a0c9 Oct 20, 2025
73 of 75 checks passed
@CuiYifeng CuiYifeng deleted the yucai/i2o branch October 20, 2025 02:37
@Silv3S
Copy link
Contributor

Silv3S commented Oct 20, 2025

Hi @yucai-intel, I can't build torch-xpu-ops since this commit was merged. Does it require some dependency update?
I'm using Ubuntu 24.04.2 LTS with oneAPI Base Toolkit 2025.2.1

 /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/TensorApplyUtils.h:139:11: error: too few template arguments for class template 'IndexToOffset'
    139 |         ? IndexToOffset<scalar1, IndexType>::get(linearIndex, a)
        |           ^
  /home/gta/torch-xpu-ops/src/comm/TensorInfo.h:155:8: note: template is declared here
    154 | template <typename T, typename IndexType, int Dims>
        | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    155 | struct IndexToOffset {
        |        ^
  In file included from /home/gta/torch-xpu-ops/src/ATen/native/quantized/sycl/FusedObsFakeQuantKernels.cpp:10:
  In file included from /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/DistributionTemplates.h:11:
  /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/TensorApplyUtils.h:144:11: error: too few template arguments for class template 'IndexToOffset'
    144 |         ? IndexToOffset<scalar2, IndexType>::get(linearIndex, b)
        |           ^
  /home/gta/torch-xpu-ops/src/comm/TensorInfo.h:155:8: note: template is declared here
    154 | template <typename T, typename IndexType, int Dims>
        | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    155 | struct IndexToOffset {
        |        ^
  2 errors generated.

@PawelSwider2000
Copy link
Contributor

PawelSwider2000 commented Oct 20, 2025

Hi @yucai-intel, I can't build torch-xpu-ops since this commit was merged. Does it require some dependency update? I'm using Ubuntu 24.04.2 LTS with oneAPI Base Toolkit 2025.2.1

 /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/TensorApplyUtils.h:139:11: error: too few template arguments for class template 'IndexToOffset'
    139 |         ? IndexToOffset<scalar1, IndexType>::get(linearIndex, a)
        |           ^
  /home/gta/torch-xpu-ops/src/comm/TensorInfo.h:155:8: note: template is declared here
    154 | template <typename T, typename IndexType, int Dims>
        | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    155 | struct IndexToOffset {
        |        ^
  In file included from /home/gta/torch-xpu-ops/src/ATen/native/quantized/sycl/FusedObsFakeQuantKernels.cpp:10:
  In file included from /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/DistributionTemplates.h:11:
  /home/gta/pytorch/torch/include/ATen/native/xpu/sycl/TensorApplyUtils.h:144:11: error: too few template arguments for class template 'IndexToOffset'
    144 |         ? IndexToOffset<scalar2, IndexType>::get(linearIndex, b)
        |           ^
  /home/gta/torch-xpu-ops/src/comm/TensorInfo.h:155:8: note: template is declared here
    154 | template <typename T, typename IndexType, int Dims>
        | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    155 | struct IndexToOffset {
        |        ^
  2 errors generated.

@Silv3S Most likely content of pytorch/torch/include/ is outdated. Manually removing it helped for me, but this file should be updated automatically

Seems that this part of: torch-xpu-ops/src/ATen/CMakeLists.txt does not track updates in files:

# ATen XPU headers

macro(install_xpu_headers glob_pattern dest_subdir)
  file(GLOB headers ${glob_pattern})
  if(headers)
    install(FILES ${headers} DESTINATION "${AT_INSTALL_INCLUDE_DIR}/${dest_subdir}")
  endif()
endmacro()

install_xpu_headers("xpu/*.h" "ATen/xpu")
install_xpu_headers("native/xpu/*.h" "ATen/native/xpu")
install_xpu_headers("native/xpu/sycl/*.h" "ATen/native/xpu/sycl")
install_xpu_headers("native/xpu/mkl/*.h" "ATen/native/xpu/mkl")
install_xpu_headers("native/nested/xpu/*.h" "ATen/native/nested/xpu")
install_xpu_headers("native/nested/xpu/sycl/*.h" "ATen/native/nested/xpu/sycl")
install_xpu_headers("native/quantized/*.h" "ATen/native/quantized/xpu")
install_xpu_headers("native/quantized/sycl/*.h" "ATen/native/quantized/xpu/sycl")
install_xpu_headers("native/sparse/xpu/*.h" "ATen/native/sparse/xpu")
install_xpu_headers("native/sparse/xpu/sycl/*.h" "ATen/native/sparse/xpu/sycl")
install_xpu_headers("native/transformers/*.h" "ATen/native/transformers/xpu")
install_xpu_headers("native/transformers/sycl/*.h" "ATen/native/transformers/xpu/sycl")

@Silv3S
Copy link
Contributor

Silv3S commented Oct 20, 2025

Thanks Paweł, it works. Problem is only visible in incremental builds and this PR didn't break anything. Sorry for the false alarm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants