[ROCm] Enable USE_FBGEMM_GENAI (#160676)

Summary: X-link: https://github.com/pytorch/FBGEMM/pull/4703 X-link: https://github.com/facebookresearch/FBGEMM/pull/1728 In this diff we enable the support for the new FBGEMM backed FP8 _scaled_grouped_mm on ROCm. For now we only enable support for `gfx942` as that is what we have thoroughly tested performance and correctness on. Rollback Plan: Differential Revision: D79564024 Test Plan: Ensure builds with: - `USE_FBGEMM_GENAI=1` and without gfx942 - `USE_FBGEMM_GENAI=1` and with gfx942 - `USE_FBGEMM_GENAI=1` and all current [`PYTORCH_ROCM_ARCH`](9491d289b3/.ci/docker/libtorch/build.sh (L48)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160676 Approved by: https://github.com/drisspg
2025-10-20 21:14:14 +08:00 · 2025-09-04 07:13:17 +00:00
parent 890626632d
commit 69a25f6888
3 changed files with 19 additions and 10 deletions
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -880,10 +880,15 @@ cmake_dependent_option(
  USE_FBGEMM_GENAI
  "Whether to build FBGEMM GenAI quantized GEMM kernels.\
  Will be disabled if not supported by the platform"
-  OFF
-  "USE_CUDA OR USE_ROCM"
+  ON
+  "USE_ROCM"
  OFF)

+IF(USE_FBGEMM_GENAI AND USE_ROCM AND NOT "gfx942" IN_LIST PYTORCH_ROCM_ARCH)
+  message(WARNING "Unsupported ROCM arch for FBGEMM GenAI, will set USE_FBGEMM_GENAI to OFF")
+  set(USE_FBGEMM_GENAI off)
+endif()
+
 # CAVEAT: Again, Flash Attention2 will error while building for sm52 while Mem
 # Eff Attention won't
 cmake_dependent_option(