pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Jagadish Krishnamoorthy	8951df03de	test_scaled_matmul_cuda: fix infer_scale_swizzle (#165788 ) Extend #165747 fix to other cases. Add parentheses to clarify operator precedence. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165788 Approved by: https://github.com/jeffdaily, https://github.com/slayton58	2025-10-19 21:42:01 +00:00
Simon Layton	d14cbb4476	Add NVFP4 two-level scaling to scaled_mm (#165774 ) Summary: * Add second-level scaling dispatch to scaled_mm, tying into optional `alpha` passing * Add two-level tests Test Plan: ``` pytest -svv -k "nvfp4_global_scale" test/test_scaled_matmul_cuda.py ``` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165774 Approved by: https://github.com/drisspg	2025-10-18 13:06:04 +00:00
Simon Layton	39e0a832c9	Fix B200 test fails in scaled_mm (#165747 ) Summary: PR #165528 changes some scale/swizzle inference behavior in scaled_mm tests - mxfp8 tests on Blackwell can get incorrectly classified, resulting in failures. Fix the scale/swizzle inference code to prevent this. Fixes https://github.com/pytorch/pytorch/issues/165743 Test Plan: ``` pytest -svv test/test_scaled_matmul_cuda.py ``` Reviewers: @jagadish-amd @jeffdaily @drisspg Subscribers: @Aidyn-A Tasks: Tags: Signed-off-by: Simon Layton <simonlaytonmeta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165747 Approved by: https://github.com/eqy, https://github.com/drisspg, https://github.com/jeffdaily	2025-10-17 17:52:19 +00:00
Yuanyuan Chen	e925dfcc6b	Enable all SIM rules except disabled ones (#164645 ) `SIM` rules are useful for simplifying boolean expressions and enhances code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645 Approved by: https://github.com/ezyang, https://github.com/mlazos	2025-10-17 07:27:11 +00:00
Jagadish Krishnamoorthy	7669ac9402	[ROCm] Add scaled_mm v2 support. (#165528 ) Add mx fp4 support in Blas.cpp. Updated the scale_kernel_dispatch array and ScaledGemmImplementation enum to include MXFP4 support. Modify the tests under test_scaled_matmul_cuda accordingly. PYTORCH_TEST_WITH_ROCM=1 python test/test_scaled_matmul_cuda.py -v -k test_blockwise 115 test passed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165528 Approved by: https://github.com/jeffdaily	2025-10-16 18:36:41 +00:00
eqy	a214371008	[FP8] Add other Blackwell compute-capabiilities to expected fail `test_honor_sm_carveout` (#165159 ) CUTLASS SM hint also isn't working for other Blackwells, need green context for carveout Pull Request resolved: https://github.com/pytorch/pytorch/pull/165159 Approved by: https://github.com/Skylion007	2025-10-16 18:35:06 +00:00
Simon Layton	066f818eea	Refactor and unify v1/v2 _scaled_mm codes (#165436 ) Summary: * Refactor out some core routines (scaled_gemm, auto-tuned scaled_gemm) * Unify v1/v2 dispatch calls where possible * Simplify call pattern w.r.t. CUDA/ROCM for easier readability. Test Plan: ``` pytest -svv test/test_scaled_matmul_cuda.py ``` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165436 Approved by: https://github.com/drisspg	2025-10-15 19:07:05 +00:00
Simon Layton	7c6c5d04fe	Add scaled_grouped_mm_v2 and python API (#165154 ) Summary: * Add `torch._scaled_grouped_mm_v2` with more functionality and extensibility for future formats * Add `torch.nn.functional.scaled_grouped_mm` as public entrypoint * Test both original and v2 functionality Test Plan: ``` pytest -svv -k grouped test/test_scaled_matmul_cuda.py ``` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165154 Approved by: https://github.com/drisspg, https://github.com/danielvegamyhre	2025-10-15 17:47:23 +00:00
Jeff Daily	8360f34c36	[ROCm] hotfix test scaled matmul cuda (#165104 ) Refactoring of scaled mm APIs and related tests caused previously passing tests on ROCm to start failing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165104 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-10 21:06:58 +00:00
Janani Sriram	8f78999d77	[Inductor][ATen] Fix stride rounding on Blockwise128x128 to accommodate for small shapes (#164953 ) Summary: Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](https://github.com/pytorch/pytorch/pull/164259)). Test Plan: `test_fp8.py` `test_scaled_matmul_cuda.py` Differential Revision: D84103213 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164953 Approved by: https://github.com/slayton58, https://github.com/eqy	2025-10-10 19:12:58 +00:00
PyTorch MergeBot	7ab00c7c17	Revert "Hotfix test scaled matmul cuda (#165104 )" This reverts commit 9aa92f246fa5fe5cfda17970d41d167b19a0612a. Reverted https://github.com/pytorch/pytorch/pull/165104 on behalf of https://github.com/malfet due to Looks like it broke cuda tests, isn't it, see `44b1ff54e9/1` ([comment](https://github.com/pytorch/pytorch/pull/165104#issuecomment-3388247886))	2025-10-10 04:32:18 +00:00
Jeff Daily	9aa92f246f	Hotfix test scaled matmul cuda (#165104 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165104 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-09 22:51:30 +00:00
Simon Layton	6a7f5c0d21	Add scaled_mm python API, test (#164142 ) Summary: * Add `torch.nn.functional.scaled_mm` as an abstraction around the C++ methods * Wraps `torch._scaled_mm_v2` API by default, but user can force use of the older `torch._scaled_mm` interface. * Scaled MM tests now run on the new API Test Plan: `pytest test/test_scaled_matmul_cuda.py` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlaytonmeta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164142 Approved by: https://github.com/drisspg ghstack dependencies: #164141	2025-10-09 12:43:18 +00:00
Jagadish Krishnamoorthy	c7e30ae4dd	MX: Remove redundant PLATFORM_SUPPORTS_MX_GEMM constant (#164320 ) Deleted duplicate definition of PLATFORM_SUPPORTS_MX_GEMM, was introduced in https://github.com/pytorch/pytorch/pull/162209 Also, adjusted BLOCK_SIZE and fp4_scaling_dtype in test_matmul_cuda.py to enable test_blockwise_nvfp4_compile on ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164320 Approved by: https://github.com/jeffdaily	2025-10-02 23:30:56 +00:00
Jeff Daily	7304b9e7d2	[ROCm] fix carveout feature (#164303 ) Fixes #164271. Carveout had been applied with an opposite bitmask. Besides being incorrect, this lead to flaky unit test behavior due to carveout being too high. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164303 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-01 19:25:41 +00:00
Simon Layton	8df3f2fa98	Revert new-test part of #163829 (#164259 ) Summary: New test sizes for `test_scaled_mm_vs_emulated_block_wise` all fail with ``` RuntimeError: Invalid scaling configuration ``` Disable these new tests for now (the remaining test is a parametrized version of the original test case) Test Plan: `pytest test/test_scaled_matmul_cuda.py` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164259 Approved by: https://github.com/jananisriram ghstack dependencies: #164266	2025-10-01 02:23:21 +00:00
Simon Layton	7a9119948e	Split scaled-mm tests into separate file (#164266 ) Summary: * Split scaled-mm-specific tests into `test/test_scaled_matmul.py` Test Plan: ``` pytest test/test_matmul_cuda.py pytest test/test_scaled_matmul_cuda.py ``` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164266 Approved by: https://github.com/Skylion007, https://github.com/albanD	2025-10-01 02:23:21 +00:00

17 Commits