pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Simon Layton	7c6c5d04fe	Add scaled_grouped_mm_v2 and python API (#165154 ) Summary: * Add `torch._scaled_grouped_mm_v2` with more functionality and extensibility for future formats * Add `torch.nn.functional.scaled_grouped_mm` as public entrypoint * Test both original and v2 functionality Test Plan: ``` pytest -svv -k grouped test/test_scaled_matmul_cuda.py ``` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165154 Approved by: https://github.com/drisspg, https://github.com/danielvegamyhre	2025-10-15 17:47:23 +00:00
Jeff Daily	8360f34c36	[ROCm] hotfix test scaled matmul cuda (#165104 ) Refactoring of scaled mm APIs and related tests caused previously passing tests on ROCm to start failing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165104 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-10 21:06:58 +00:00
Janani Sriram	8f78999d77	[Inductor][ATen] Fix stride rounding on Blockwise128x128 to accommodate for small shapes (#164953 ) Summary: Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](https://github.com/pytorch/pytorch/pull/164259)). Test Plan: `test_fp8.py` `test_scaled_matmul_cuda.py` Differential Revision: D84103213 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164953 Approved by: https://github.com/slayton58, https://github.com/eqy	2025-10-10 19:12:58 +00:00
PyTorch MergeBot	7ab00c7c17	Revert "Hotfix test scaled matmul cuda (#165104 )" This reverts commit 9aa92f246fa5fe5cfda17970d41d167b19a0612a. Reverted https://github.com/pytorch/pytorch/pull/165104 on behalf of https://github.com/malfet due to Looks like it broke cuda tests, isn't it, see `44b1ff54e9/1` ([comment](https://github.com/pytorch/pytorch/pull/165104#issuecomment-3388247886))	2025-10-10 04:32:18 +00:00
Jeff Daily	9aa92f246f	Hotfix test scaled matmul cuda (#165104 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165104 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-09 22:51:30 +00:00
Simon Layton	6a7f5c0d21	Add scaled_mm python API, test (#164142 ) Summary: * Add `torch.nn.functional.scaled_mm` as an abstraction around the C++ methods * Wraps `torch._scaled_mm_v2` API by default, but user can force use of the older `torch._scaled_mm` interface. * Scaled MM tests now run on the new API Test Plan: `pytest test/test_scaled_matmul_cuda.py` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlaytonmeta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164142 Approved by: https://github.com/drisspg ghstack dependencies: #164141	2025-10-09 12:43:18 +00:00
Jagadish Krishnamoorthy	c7e30ae4dd	MX: Remove redundant PLATFORM_SUPPORTS_MX_GEMM constant (#164320 ) Deleted duplicate definition of PLATFORM_SUPPORTS_MX_GEMM, was introduced in https://github.com/pytorch/pytorch/pull/162209 Also, adjusted BLOCK_SIZE and fp4_scaling_dtype in test_matmul_cuda.py to enable test_blockwise_nvfp4_compile on ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164320 Approved by: https://github.com/jeffdaily	2025-10-02 23:30:56 +00:00
Jeff Daily	7304b9e7d2	[ROCm] fix carveout feature (#164303 ) Fixes #164271. Carveout had been applied with an opposite bitmask. Besides being incorrect, this lead to flaky unit test behavior due to carveout being too high. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164303 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-01 19:25:41 +00:00
Simon Layton	8df3f2fa98	Revert new-test part of #163829 (#164259 ) Summary: New test sizes for `test_scaled_mm_vs_emulated_block_wise` all fail with ``` RuntimeError: Invalid scaling configuration ``` Disable these new tests for now (the remaining test is a parametrized version of the original test case) Test Plan: `pytest test/test_scaled_matmul_cuda.py` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164259 Approved by: https://github.com/jananisriram ghstack dependencies: #164266	2025-10-01 02:23:21 +00:00
Simon Layton	7a9119948e	Split scaled-mm tests into separate file (#164266 ) Summary: * Split scaled-mm-specific tests into `test/test_scaled_matmul.py` Test Plan: ``` pytest test/test_matmul_cuda.py pytest test/test_scaled_matmul_cuda.py ``` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164266 Approved by: https://github.com/Skylion007, https://github.com/albanD	2025-10-01 02:23:21 +00:00

10 Commits