Skip to content

Conversation

jananisriram
Copy link
Contributor

Summary:
Move scaling logic for FP8 benchmarks to get_input_iter().

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (bfloat16, float16), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (float8_e4m3fn).

This diff also circumvents performing unsupported operations, like torch.max and torch.abs, on low-precision data types.

Reviewed By: NikhilAPatel

Differential Revision: D80571223

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80571223

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80571223

jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 20, 2025
Summary:
Pull Request resolved: meta-pytorch#338

Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Reviewed By: NikhilAPatel

Differential Revision: D80571223
jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 20, 2025
Summary:

Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Reviewed By: NikhilAPatel

Differential Revision: D80571223
jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 20, 2025
Summary:

Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Reviewed By: NikhilAPatel

Differential Revision: D80571223
jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 20, 2025
Summary:

Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Reviewed By: NikhilAPatel

Differential Revision: D80571223
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80571223

jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 20, 2025
Summary:
Pull Request resolved: meta-pytorch#338

Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Reviewed By: NikhilAPatel

Differential Revision: D80571223
jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 20, 2025
Summary:

Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Reviewed By: NikhilAPatel

Differential Revision: D80571223
jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 20, 2025
Summary:

Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Reviewed By: NikhilAPatel

Differential Revision: D80571223
jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 20, 2025
Summary:

Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Reviewed By: NikhilAPatel

Differential Revision: D80571223
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80571223

jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 20, 2025
Summary:
Pull Request resolved: meta-pytorch#338

Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Reviewed By: NikhilAPatel

Differential Revision: D80571223
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80571223

jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 21, 2025
Summary:
Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Rollback Plan:

Differential Revision: D80571223

Pulled By: jananisriram
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80571223

jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 21, 2025
Summary:
Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Rollback Plan:

Differential Revision: D80571223

Pulled By: jananisriram
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80571223

jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 21, 2025
Summary:
Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Rollback Plan:

Differential Revision: D80571223

Pulled By: jananisriram
jananisriram added a commit to jananisriram/tritonbench that referenced this pull request Aug 21, 2025
Summary:
Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Rollback Plan:

Differential Revision: D80571223

Pulled By: jananisriram
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80571223

Summary:
Move scaling logic for FP8 benchmarks to `get_input_iter()`.

This diff aligns our fp8_gemm benchmarking suite with real-world practices: input tensors are of high precision types (`bfloat16`, `float16`), scales are computed on the high-precision input tensors, and input tensors are then casted to a lower precision (`float8_e4m3fn`).

This diff also circumvents performing unsupported operations, like `torch.max` and `torch.abs`, on low-precision data types.

Pull Request resolved: meta-pytorch#338

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Rollback Plan:

Reviewed By: xuzhao9

Differential Revision: D80571223

Pulled By: jananisriram
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80571223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants