[Inductor][ATen][FP8] Add note for supported blockwise scaling strategy pairs (#165450)

Summary: Add note mentioning which scaling type pairs are supported in Inductor ATen, since this was a source of confusion and also informs which scaling strategies we choose to support for other backends, like Triton.

Test Plan: n/a

Reviewed By: lw

Differential Revision: D84522373

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165450
Approved by: https://github.com/NikhilAPatel
This commit is contained in:
Janani Sriram
2025-10-14 20:43:58 +00:00
committed by PyTorch MergeBot
parent 1ec0755a7e
commit 382d04a51e

View File

@ -1273,6 +1273,10 @@ _scaled_mm_out_cuda(const Tensor& mat1, const Tensor& mat2,
// by decreasing priority. We prefer "simpler" schemes as they are supported // by decreasing priority. We prefer "simpler" schemes as they are supported
// more broadly (more GPU archs, more CUDA versions) and because they are more // more broadly (more GPU archs, more CUDA versions) and because they are more
// efficient. This tends to matter only for small matmuls (e.g., 1x1x128). // efficient. This tends to matter only for small matmuls (e.g., 1x1x128).
// List of supported BlockWise pairs for FP8:
// https://docs.nvidia.com/cuda/cublas/#element-1d-and-128x128-2d-block-scaling-for-fp8-data-types
auto [scaling_choice_a, scaling_choice_b] = get_joint_scaling( auto [scaling_choice_a, scaling_choice_b] = get_joint_scaling(
{ {
std::make_pair(ScalingType::TensorWise, ScalingType::TensorWise), std::make_pair(ScalingType::TensorWise, ScalingType::TensorWise),