Skip to content

Conversation

daisyden
Copy link
Contributor

Fix #2081

@Copilot Copilot AI review requested due to automatic review settings October 16, 2025 08:43
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Ports a CUDA-specific large-index warp softmax test to XPU, adding large tensor coverage for 64-bit indexing correctness.

  • Adds import for largeTensorTest and introduces a new large-shape log_softmax gradient correctness test.
  • Registers the test dynamically on TestNNDeviceType with multiple large memory size decorators.
Comments suppressed due to low confidence (1)

test/xpu/test_nn_xpu.py:1

  • [nitpick] The decorator iterates over dtype (torch.float, torch.half) but the input tensor is always created as torch.float16, causing the float case to test a half input with a float output cast instead of a true float input path. For clearer coverage of dtype-dependent kernel behavior, construct x with the selected dtype (e.g., x = torch.randn(shape, device=device, dtype=dtype, requires_grad=True)) unless there is a documented reason to force half.
# Owner(s): ["module: intel"]

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@largeTensorTest("64GB", "cpu")
def _test_warp_softmax_64bit_indexing(self, device, dtype):
def run_test(*shape):
x = torch.randn(shape, device="xpu", dtype=torch.float16, requires_grad=True)
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test receives a device argument (from the instantiated device-type tests and the second @largeTensorTest decorator specifying 'cpu'), but the tensor is hard-coded with device='xpu'. This prevents the CPU-annotated variant from actually exercising a CPU path and can cause mismatched expectations. Use the passed-in device variable for allocation: x = torch.randn(shape, device=device, dtype=torch.float16, requires_grad=True).

Suggested change
x = torch.randn(shape, device="xpu", dtype=torch.float16, requires_grad=True)
x = torch.randn(shape, device=device, dtype=torch.float16, requires_grad=True)

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New test cases failures with 20250918 pytorch

1 participant