8951df03de
test_scaled_matmul_cuda: fix infer_scale_swizzle ( #165788 )
...
Extend #165747 fix to other cases.
Add parentheses to clarify operator precedence.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165788
Approved by: https://github.com/jeffdaily , https://github.com/slayton58
2025-10-19 21:42:01 +00:00
d14cbb4476
Add NVFP4 two-level scaling to scaled_mm ( #165774 )
...
Summary:
* Add second-level scaling dispatch to scaled_mm, tying into optional `alpha` passing
* Add two-level tests
Test Plan:
```
pytest -svv -k "nvfp4_global_scale" test/test_scaled_matmul_cuda.py
```
Reviewers:
Subscribers:
Tasks:
Tags:
Signed-off-by: Simon Layton <simonlayton@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165774
Approved by: https://github.com/drisspg
2025-10-18 13:06:04 +00:00
39e0a832c9
Fix B200 test fails in scaled_mm ( #165747 )
...
Summary:
PR #165528 changes some scale/swizzle inference behavior in scaled_mm
tests - mxfp8 tests on Blackwell can get incorrectly classified,
resulting in failures.
Fix the scale/swizzle inference code to prevent this.
Fixes https://github.com/pytorch/pytorch/issues/165743
Test Plan:
```
pytest -svv test/test_scaled_matmul_cuda.py
```
Reviewers:
@jagadish-amd @jeffdaily @drisspg
Subscribers:
@Aidyn-A
Tasks:
Tags:
Signed-off-by: Simon Layton <simonlaytonmeta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165747
Approved by: https://github.com/eqy , https://github.com/drisspg , https://github.com/jeffdaily
2025-10-17 17:52:19 +00:00
e925dfcc6b
Enable all SIM rules except disabled ones ( #164645 )
...
`SIM` rules are useful for simplifying boolean expressions and enhances code readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645
Approved by: https://github.com/ezyang , https://github.com/mlazos
2025-10-17 07:27:11 +00:00
7669ac9402
[ROCm] Add scaled_mm v2 support. ( #165528 )
...
Add mx fp4 support in Blas.cpp.
Updated the scale_kernel_dispatch array and ScaledGemmImplementation enum to include MXFP4 support.
Modify the tests under test_scaled_matmul_cuda accordingly.
PYTORCH_TEST_WITH_ROCM=1 python test/test_scaled_matmul_cuda.py -v -k test_blockwise
115 test passed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165528
Approved by: https://github.com/jeffdaily
2025-10-16 18:36:41 +00:00
a214371008
[FP8] Add other Blackwell compute-capabiilities to expected fail test_honor_sm_carveout
( #165159 )
...
CUTLASS SM hint also isn't working for other Blackwells, need green context for carveout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165159
Approved by: https://github.com/Skylion007
2025-10-16 18:35:06 +00:00
066f818eea
Refactor and unify v1/v2 _scaled_mm codes ( #165436 )
...
Summary:
* Refactor out some core routines (scaled_gemm, auto-tuned scaled_gemm)
* Unify v1/v2 dispatch calls where possible
* Simplify call pattern w.r.t. CUDA/ROCM for easier readability.
Test Plan:
```
pytest -svv test/test_scaled_matmul_cuda.py
```
Reviewers:
Subscribers:
Tasks:
Tags:
Signed-off-by: Simon Layton <simonlayton@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165436
Approved by: https://github.com/drisspg
2025-10-15 19:07:05 +00:00
7c6c5d04fe
Add scaled_grouped_mm_v2 and python API ( #165154 )
...
Summary:
* Add `torch._scaled_grouped_mm_v2` with more functionality and
extensibility for future formats
* Add `torch.nn.functional.scaled_grouped_mm` as public entrypoint
* Test both original and v2 functionality
Test Plan:
```
pytest -svv -k grouped test/test_scaled_matmul_cuda.py
```
Reviewers:
Subscribers:
Tasks:
Tags:
Signed-off-by: Simon Layton <simonlayton@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165154
Approved by: https://github.com/drisspg , https://github.com/danielvegamyhre
2025-10-15 17:47:23 +00:00
8360f34c36
[ROCm] hotfix test scaled matmul cuda ( #165104 )
...
Refactoring of scaled mm APIs and related tests caused previously passing tests on ROCm to start failing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165104
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-10-10 21:06:58 +00:00
8f78999d77
[Inductor][ATen] Fix stride rounding on Blockwise128x128 to accommodate for small shapes ( #164953 )
...
Summary: Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](https://github.com/pytorch/pytorch/pull/164259 )).
Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`
Differential Revision: D84103213
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164953
Approved by: https://github.com/slayton58 , https://github.com/eqy
2025-10-10 19:12:58 +00:00
7ab00c7c17
Revert "Hotfix test scaled matmul cuda ( #165104 )"
...
This reverts commit 9aa92f246fa5fe5cfda17970d41d167b19a0612a.
Reverted https://github.com/pytorch/pytorch/pull/165104 on behalf of https://github.com/malfet due to Looks like it broke cuda tests, isn't it, see 44b1ff54e9/1
([comment](https://github.com/pytorch/pytorch/pull/165104#issuecomment-3388247886 ))
2025-10-10 04:32:18 +00:00
9aa92f246f
Hotfix test scaled matmul cuda ( #165104 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165104
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-10-09 22:51:30 +00:00
6a7f5c0d21
Add scaled_mm python API, test ( #164142 )
...
Summary:
* Add `torch.nn.functional.scaled_mm` as an abstraction around the C++
methods
* Wraps `torch._scaled_mm_v2` API by default, but user can force use of
the older `torch._scaled_mm` interface.
* Scaled MM tests now run on the new API
Test Plan:
`pytest test/test_scaled_matmul_cuda.py`
Reviewers:
Subscribers:
Tasks:
Tags:
Signed-off-by: Simon Layton <simonlaytonmeta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164142
Approved by: https://github.com/drisspg
ghstack dependencies: #164141
2025-10-09 12:43:18 +00:00
c7e30ae4dd
MX: Remove redundant PLATFORM_SUPPORTS_MX_GEMM constant ( #164320 )
...
Deleted duplicate definition of PLATFORM_SUPPORTS_MX_GEMM, was introduced in https://github.com/pytorch/pytorch/pull/162209
Also, adjusted BLOCK_SIZE and fp4_scaling_dtype in test_matmul_cuda.py to enable test_blockwise_nvfp4_compile on ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164320
Approved by: https://github.com/jeffdaily
2025-10-02 23:30:56 +00:00
7304b9e7d2
[ROCm] fix carveout feature ( #164303 )
...
Fixes #164271 .
Carveout had been applied with an opposite bitmask. Besides being incorrect, this lead to flaky unit test behavior due to carveout being too high.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164303
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-10-01 19:25:41 +00:00
8df3f2fa98
Revert new-test part of #163829 ( #164259 )
...
Summary:
New test sizes for `test_scaled_mm_vs_emulated_block_wise` all fail with
```
RuntimeError: Invalid scaling configuration
```
Disable these new tests for now (the remaining test is a parametrized
version of the original test case)
Test Plan:
`pytest test/test_scaled_matmul_cuda.py`
Reviewers:
Subscribers:
Tasks:
Tags:
Signed-off-by: Simon Layton <simonlayton@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164259
Approved by: https://github.com/jananisriram
ghstack dependencies: #164266
2025-10-01 02:23:21 +00:00
7a9119948e
Split scaled-mm tests into separate file ( #164266 )
...
Summary:
* Split scaled-mm-specific tests into `test/test_scaled_matmul.py`
Test Plan:
```
pytest test/test_matmul_cuda.py
pytest test/test_scaled_matmul_cuda.py
```
Reviewers:
Subscribers:
Tasks:
Tags:
Signed-off-by: Simon Layton <simonlayton@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164266
Approved by: https://github.com/Skylion007 , https://github.com/albanD
2025-10-01 02:23:21 +00:00