Skip to content

Conversation

jianyizh
Copy link
Contributor

Previous solution for #1423 is reverted because other kernels like reduction may have performance regression. Now we set 128 grf only for loops kernel when dynamic cast.

@Copilot Copilot AI review requested due to automatic review settings October 16, 2025 13:14
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Introduce an option to launch certain SYCL loop kernels with a reduced (128) GRF size using experimental SYCL properties, applied selectively to dynamic-cast loop paths to avoid prior regressions.

  • Adds sycl_kernel_submit_small_grf with GRF size property and a distinct kernel name wrapper.
  • Extends existing loop launch templates with a force_small_grf boolean template parameter and applies it (set to true) in specific kernel invocation paths.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/comm/SYCLHelpers.h Adds small-GRF submission helper using experimental launch_config and grf_size property.
src/ATen/native/xpu/sycl/Loops.h Extends kernel launch templates with force_small_grf flag and applies small-GRF launches in scalar/dynamic cast paths.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jianyizh jianyizh force-pushed the jianyi/dynamic_cast branch from 0457876 to 4f058c2 Compare October 16, 2025 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant