pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Deng, Daisy	4fd70d4e7b	[1/N]Enable some tests in test_ops.TestCommon on Intel GPU (#159944 ) For https://github.com/pytorch/pytorch/issues/114850, we will port aten unit tests to Intel GPU. This PR will work on some test case of test/test_ops.py. We could enable Intel GPU with following methods and try the best to keep the original code styles: 1. Extended XPUTestBase.get_all_devices to support multiple devices 2. Added skipXPU decorator 3. Extended onlyOn to support device list 4. Enabled 'xpu' for some test pathes 5. Added allow_xpu=True for supported test class. 6. Replaced onlyCUDA with onlyOn(['cuda', 'xpu']) for supported tests 7. Use skipIfXpu and skipXPU to disable unsupported test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159944 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/albanD	2025-09-29 09:08:04 +00:00
Samuel Park	df9a4824e6	Bugfix for doing negative padding (#161639 ) Fixes #161014 This bug fix introduces a fix that is consistent with the exception handling. Outlined in issue #161014, there is an edge case where the negative padding does not make the tensor size negative but still triggers the exception that the size is negative. The fix is simply adding `new_dim >=0` to include the zero dim and letting the operator return an empty tensor. In the PR I have added the edge case where the test will now check the negative padding where the dimension gets reduced to zero. But the sample is only for the `constant` type of padding. I would like some feedback if it is necessary to put the same sample on the `reduce` type as well. This is my first PR to contribute to PyTorch and any help/feedback will be welcome! Thank you! @malfet @manuelcandales @janeyx99 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/161639 Approved by: https://github.com/manuelcandales	2025-09-19 20:57:05 +00:00
Haifeng Jin	0dcd9304aa	fix high=0 bug in nll_loss test (#162763 ) Minor bug fix for the `nll_loss` test. Before this PR, it runs `torch.randint(high=0)`, which will fail because it would try to generate a number that >= low and < high, i.e. x>=0 and x<0. The test did not fail because that line is not run when testing on CPU because it failed earlier because of a unsupported dtype. However, as we support TPUs at Google, this line is reached first before the dtype check, which triggers the bug. To my understanding, these OpInfo should be general enough to support different hardware. Fixing this obvious bug would make it more general cross different hardware. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162763 Approved by: https://github.com/soulitzer	2025-09-12 21:48:18 +00:00
Jeff Daily	1e9ddf510f	[ROCm] fix hardsigmoid op (#162758 ) Currently std::min -> ::min did not work as expected on ROCm when input values >= 2147483648 It can be fixed by explicit typing std::min<opmath_t> Pull Request resolved: https://github.com/pytorch/pytorch/pull/162758 Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-12 15:07:13 +00:00
Alexander Grund	e532c9d4f1	Relax tolerance for test_quick_baddbmm_cpu_complex64 (#152424 ) On Zen 2 (AMD EPYC) and Intel Sapphire Rapids this fails with small differences when compiled with native targeted optimizations. I.e. it fails with `-march=znver2` but succeeds with `-march=znver1`. I assume some operator fusing is being used by GCC. Small differences like using `vmovdqa` can be seen in the minimized code of the baddbmm kernel: https://godbolt.org/z/jsxMa91Wb The greatest differences are consistent and the same on both CPU architectures: ``` Greatest absolute difference: 3.43852152582258e-05 at index (1, 2, 1) (up to 1e-05 allowed) Greatest relative difference: 3.6034286949870875e-06 at index (1, 2, 1) (up to 1.3e-06 allowed) ``` Hence I assume this is in the expected tolerances especially as `complex128` and all other types pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152424 Approved by: https://github.com/malfet	2025-09-04 13:26:42 +00:00
xinan.lin	81b7b16618	Reland "[Fix XPU CI][Inductor UT] Fix test cases broken by community. (#161142 )" (#161949 ) This PR reland #161142 which is reverted to be able to revert other PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161949 Approved by: https://github.com/jansel	2025-09-02 23:43:27 +00:00
PyTorch MergeBot	54e275e0d8	Revert "[Fix XPU CI][Inductor UT] Fix test cases broken by community. (#161142 )" This reverts commit c83cbd2f2a2de2e3258f07de77d8740743df6d2d. Reverted https://github.com/pytorch/pytorch/pull/161142 on behalf of https://github.com/jeanschmidt due to This PR needs to be reverted to be able to revert another PR, this is due to merge conflicts, I am sorry for this. Please feel free to rebase and merge at your earliest convenience ([comment](https://github.com/pytorch/pytorch/pull/161142#issuecomment-3242937640))	2025-09-01 17:03:50 +00:00
xinan.lin	c83cbd2f2a	[Fix XPU CI][Inductor UT] Fix test cases broken by community. (#161142 ) Fixes #161384, Fixes #161162, Fixes #160946, Fixes #160947, Fixes #160948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161142 Approved by: https://github.com/jansel	2025-08-30 11:09:07 +00:00
David Berard	8c506e6310	[easy][test] Add repeat_interleave opinfo that exercises binary search fusion (#161445 ) This adds a configuration that would have caught the need for https://github.com/pytorch/pytorch/pull/159961 when https://github.com/pytorch/pytorch/pull/158462 was landed. Notably: * the test has output_size kwarg specified * the input is 1D plus a size-1 dimension (otherwise, if there are non-size-1 dimensions, then the fusion won't occur) Differential Revision: [D80981715](https://our.internmc.facebook.com/intern/diff/D80981715) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161445 Approved by: https://github.com/eellison, https://github.com/v0i0	2025-08-26 12:32:24 +00:00
Kurt Mohler	6382302990	[MPS] Add `grid_sampler_3d` for MPS (#160541 ) This PR adds support for `grid_sampler_3d` for MPS with "bilinear" interpolation. NOTE: "nearest" interpolation is not yet supported Fixes #159882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160541 Approved by: https://github.com/malfet	2025-08-15 16:19:25 +00:00
Nikita Shulga	db0b7f1cc9	[BE][CI] Adjust `error_inputs` for cat and complex (#160378 ) MPS backend does not support double, so errors should be different Pull Request resolved: https://github.com/pytorch/pytorch/pull/160378 Approved by: https://github.com/dcci	2025-08-13 18:35:06 +00:00
Nikita Shulga	df55ec7d4b	[OpInfo][BE] Better inputs for addmm (#160234 ) Right now alpha and betha are both less than zero, which makes them useless for all addmm samples for interal types Pull Request resolved: https://github.com/pytorch/pytorch/pull/160234 Approved by: https://github.com/Skylion007 ghstack dependencies: #160228	2025-08-10 01:26:48 +00:00
Xuehai Pan	f5e2de928b	[BE] fix remaining flake8 v7 warnings (#159044 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159044 Approved by: https://github.com/Skylion007 ghstack dependencies: #159043	2025-07-25 02:56:34 +00:00
Mikayla Gawarecki	7f649ed4f8	Add basic torch.hash_tensor op (#154149 ) Added `torch.hash_tensor` reduction function with a `mode` argument that defaults to reduction with xor. - The hash is always uint64. - Integers will be casted to uint64 before performing the xor_sum reduction - Floats will be upcasted to double and then bitcasted to uint64 before performing the xor_sum reduction Pull Request resolved: https://github.com/pytorch/pytorch/pull/154149 Approved by: https://github.com/albanD	2025-07-23 22:28:03 +00:00
Nikita Shulga	2cdafab0bd	[BE] Raise ValueError from `torch.cat` meta func (#158249 ) Followup after https://github.com/pytorch/pytorch/pull/155460 From [Python documentation](https://docs.python.org/3/library/exceptions.html#ValueError): > Raised when an operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError. Raise [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) when input-output types are incompatible with each other > Raised when an operation or function is applied to an object of inappropriate type. The associated value is a string giving details about the type mismatch. > This exception may be raised by user code to indicate that an attempted operation on an object is not supported, and is not meant to be. If an object is meant to support a given operation but has not yet provided an implementation, [NotImplementedError](https://docs.python.org/3/library/exceptions.html#NotImplementedError) is the proper exception to raise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158249 Approved by: https://github.com/jbschlosser, https://github.com/Skylion007, https://github.com/albanD	2025-07-20 23:49:18 +00:00
Manuel Candales	794b95d54b	Enable Half dtype for logcumsumexp_backward (#157512 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157512 Approved by: https://github.com/malfet	2025-07-03 18:13:38 +00:00
PyTorch MergeBot	c553c55be7	Revert "Fix full_like decomposition to preserve strides (#144765 )" This reverts commit 01b0f09931d47bd2716398a0c335b2807dc3074d. Reverted https://github.com/pytorch/pytorch/pull/144765 on behalf of https://github.com/jeanschmidt due to Seems to be breaking internal tests see [D77652778](https://www.internalfb.com/diff/D77652778), @jansel may you help get this PR merged? ([comment](https://github.com/pytorch/pytorch/pull/144765#issuecomment-3027975098))	2025-07-02 13:56:03 +00:00
Isuru Fernando	01b0f09931	Fix full_like decomposition to preserve strides (#144765 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144765 Approved by: https://github.com/amjames, https://github.com/jansel	2025-07-01 19:13:22 +00:00
Ryan Guo	a4b59498c5	Fix fake kernel for the `out=...` variant of `unbind_copy` (#156643 ) `unbind_copy(..., out=...)` returns None rather than the `out` argument (see https://github.com/pytorch/pytorch/issues/130829#issuecomment-2283936222), but the old fake kernel didn't account for that and caused an assertion failure in `pushPyOutToStack`. This patch fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156643 Approved by: https://github.com/zou3519, https://github.com/jansel, https://github.com/bdhirsh ghstack dependencies: #156642	2025-06-27 01:34:07 +00:00
Xuehai Pan	cec2977ed2	[BE][6/16] fix typos in torch/ (#156316 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156316 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315	2025-06-23 02:57:34 +00:00
PyTorch MergeBot	3f44fdc03d	Revert "[BE][6/16] fix typos in torch/ (#156316 )" This reverts commit b210cf1ea56bcd9f937a2805d9e70d8684d25ee4. Reverted https://github.com/pytorch/pytorch/pull/156316 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:57 +00:00
Xuehai Pan	b210cf1ea5	[BE][6/16] fix typos in torch/ (#156316 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156316 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315	2025-06-22 08:43:33 +00:00
Nikita Shulga	c2f4cc59a7	[MPS] Fix bug in 3d coords calculation (#156375 ) Which was not caught by CI beforehand, as all 3D examples right now are symmetric, so add an uneven shape to `sample_inputs_interpolate` Though it's indirectly tested by `test_upsample_nearest3d` inductor test Pull Request resolved: https://github.com/pytorch/pytorch/pull/156375 Approved by: https://github.com/atalman	2025-06-19 19:56:15 +00:00
Nikita Shulga	c28e74e457	[MPS] Add nearest_3d forward and backward (#156090 ) Introduce generalizable `UpsampleParams` structure in `UpSample.h`, which could be shared between CPU and MPS Delete `upsample_nearest3d` MPS fallback and replace it with proper shader Pull Request resolved: https://github.com/pytorch/pytorch/pull/156090 Approved by: https://github.com/kulinseth, https://github.com/dcci ghstack dependencies: #156256	2025-06-18 04:48:15 +00:00
Nikita Shulga	b1713c6655	[MPS][Testing][BE] Fix samples for full_like (#156026 ) Now that device is known, one can avoid creating tensors of `torch.double` type Pull Request resolved: https://github.com/pytorch/pytorch/pull/156026 Approved by: https://github.com/dcci ghstack dependencies: #156121	2025-06-17 04:46:26 +00:00
PyTorch MergeBot	03488d820c	Revert "[MPS][Testing][BE] Fix samples for full_like (#156026 )" This reverts commit 2d832c9587fd99db295b62d0c9b459d509c19d06. Reverted https://github.com/pytorch/pytorch/pull/156026 on behalf of https://github.com/atalman due to Sorry breaks MPS tests: test_ops.py::TestMathBitsCPU::test_neg_view_full_like_cpu_float64 [GH job link](https://github.com/pytorch/pytorch/actions/runs/15683608879/job/44182730620) [HUD commit link](`2d832c9587`) ([comment](https://github.com/pytorch/pytorch/pull/156026#issuecomment-2977903074))	2025-06-16 19:50:26 +00:00
Nikita Shulga	2d832c9587	[MPS][Testing][BE] Fix samples for full_like (#156026 ) Now that device is known, one can avoid creating tensors of `torch.double` type Pull Request resolved: https://github.com/pytorch/pytorch/pull/156026 Approved by: https://github.com/dcci	2025-06-16 14:27:42 +00:00
redwrasse	8a22551300	Fixes OpInfo gradient checks for ctc_loss (#154590 ) Fixes #67462 Re-enables `OpInfo` gradient checks for the restricted scenarios where the current `ctc_loss` implementation is accurate and consistent. The desired `ctc_loss` gradient behavior appears to be an ongoing discussion, see https://github.com/pytorch/pytorch/issues/52241. The `OpInfo` gradient checks can be updated if/as the underlying implementation advances. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154590 Approved by: https://github.com/soulitzer	2025-06-10 19:56:39 +00:00
Nikita Shulga	abbdf9f363	[BE][Testing] Unskip `ones_like`/`zeros_like` testing on MPS (#155476 ) But skip `double` dtype form OpInfo variants for this test Pull Request resolved: https://github.com/pytorch/pytorch/pull/155476 Approved by: https://github.com/Skylion007, https://github.com/dcci	2025-06-09 20:37:44 +00:00
Narek Malkhasyan	7999735d23	[CUDA][MPS] Fix torch.arange bound validation for large float inputs (#154320 ) Fixes #153133 Fixes an inconsistency in torch.arange on CUDA and MPS backends when using float32 and large input values. Previously, invalid ranges (e.g., start > end with a positive step) could silently return empty tensors due to precision loss in validation logic. The fix introduces double precision validation for checking whether the step sign is consistent with the range direction. This ensures torch.arange behaves consistently with CPU for large float32 inputs, and raises an appropriate error when the range is invalid. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154320 Approved by: https://github.com/malfet	2025-06-05 14:51:25 +00:00
Natalia Gimelshein	34e3930401	fix numpy compatibility for 2d small list indices (#154806 ) Will fix #119548 and linked issues once we switch from warning to the new behavior, but for now, given how much this syntax was used in our test suite, we suspect a silent change will be disruptive. We will change the behavior after 2.8 branch is cut. Numpy behavior was changed at least in numpy 1.24 (more than 2 years ago) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154806 Approved by: https://github.com/cyyever, https://github.com/Skylion007, https://github.com/albanD	2025-06-04 01:58:52 +00:00
Nikita Shulga	e9266f807a	[BE] Use vendored packaging for testing (#154946 ) As the rest of the torch uses it, test should rely on it as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/154946 Approved by: https://github.com/cyyever, https://github.com/Skylion007	2025-06-03 14:22:53 +00:00
nikitaved	edc2d539d1	`torch.tensordot`: performance improvements when contracting to a scalar. (#145936 ) As per title. Fixes https://github.com/pytorch/pytorch/issues/145731 Touches only compute. The CPU overhead can potentially be further reduced. Before: ```python In [3]: n = 512 In [4]: A = torch.rand(n, n) In [5]: B = torch.rand(n, n) In [6]: %timeit torch.tensordot(A, B, [[0, 1], [0, 1]]) 2.04 ms ± 70 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [7]: %timeit torch.tensordot(A, B, [[0, 1], [1, 0]]) 2.85 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [8]: %timeit torch.tensordot(A, B, [[1, 0], [0, 1]]) 2.9 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [9]: %timeit torch.tensordot(A, B, [[1, 0], [1, 0]]) 4.07 ms ± 262 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After ```python In [2]: n = 512 In [3]: A = torch.rand(n, n) In [4]: B = torch.rand(n, n) In [5]: %timeit torch.tensordot(A, B, [[0, 1], [0, 1]]) 30.7 µs ± 2.51 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) In [6]: %timeit torch.tensordot(A, B, [[0, 1], [1, 0]]) 141 µs ± 6.52 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) In [7]: %timeit torch.tensordot(A, B, [[1, 0], [0, 1]]) 142 µs ± 4.03 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) In [8]: %timeit torch.tensordot(A, B, [[1, 0], [1, 0]]) 62.8 µs ± 4.31 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145936 Approved by: https://github.com/albanD, https://github.com/ngimel	2025-05-13 10:57:30 +00:00
Jithun Nair	fe8ebacee4	[ROCm] Upgrade ROCm CI to ROCm6.4 (#151368 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151368 Approved by: https://github.com/jeffdaily, https://github.com/malfet Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-05-08 16:12:16 +00:00
Nikita Shulga	9919d6b872	[Testing] Add copysign from scalar regression test (#152997 ) But instead of adding it just for MPS backend, add it to OpInfo Fixes https://github.com/pytorch/pytorch/issues/152582 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152997 Approved by: https://github.com/wdvr	2025-05-07 00:19:42 +00:00
PyTorch MergeBot	cc28b43950	Revert "[ROCm] Upgrade ROCm CI to ROCm6.4 (#151368 )" This reverts commit 844842dfbf937c43b41c528e461d3f3931bca6e9. Reverted https://github.com/pytorch/pytorch/pull/151368 on behalf of https://github.com/malfet due to This broke inductor cpp wrapper ([comment](https://github.com/pytorch/pytorch/pull/151368#issuecomment-2848519706))	2025-05-03 08:31:31 +00:00
eqy	216d81da81	[CUDA][complex] skip `test_reference_numerics_large_jiterator_unary_cuda_complex64` on CUDA (#148024 ) already skipped on ROCM for a similar reason, recent numpy versions changed convention from `nan+infj` to `-inf+infj` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148024 Approved by: https://github.com/nWEIdia, https://github.com/atalman, https://github.com/malfet	2025-05-02 19:11:11 +00:00
Jithun Nair	844842dfbf	[ROCm] Upgrade ROCm CI to ROCm6.4 (#151368 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151368 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-05-02 17:21:18 +00:00
Isuru Fernando	f0c9b3385d	Support more dtypes for input, indices in gather (#151822 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151822 Approved by: https://github.com/ngimel	2025-05-01 16:35:23 +00:00
Eddie Yan	bb90f66e70	[CUDA][conv3d] bump tolerances for `test_variant_consistency_eager` `conv3d` `complex64` (#152203 ) ~1/1000 1.5e-5 mismatch on A100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152203 Approved by: https://github.com/Skylion007, https://github.com/soulitzer	2025-04-28 17:59:37 +00:00
Anthony Shoumikhin	e2f9759bd0	Fix broken URLs (#152237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237 Approved by: https://github.com/huydhn, https://github.com/malfet	2025-04-27 09:56:42 +00:00
Nikita Shulga	3ef6d6924a	[BE] Switch `TestConsistency` to MPS device (#147893 ) Which will eventually allow move decorators away more `common_mps.py` Adjust tolerances accordingly. XFAIL a bunch of tests on MacOS-13, which is going to be deprecated anyway Pull Request resolved: https://github.com/pytorch/pytorch/pull/147893 Approved by: https://github.com/atalman ghstack dependencies: #152204	2025-04-26 01:19:21 +00:00
Will Feng	82200e33b5	Make torch._chunk_cat support non-contiguous inputs (#151263 ) Currently, `torch._chunk_cat` only supports contiguous inputs (due to `.view()` usage in `_pad_chunk()` supporting only contiguous tensor). This doesn't work for internal models where there can be non-contiguous input tensors: - size=[8192, 16416], stride=[16448, 1] # stride[0] is larger than size[1] - size=[1152, 384], stride=[1, 1152] # column-major tensor In this PR, we relax the assumption on contiguous input tensor, by switching from `.view()` to `.reshape()`. Note that since `.reshape()` will try to use `.view()` under the hood whenever possible, this should not cause regression to existing use cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151263 Approved by: https://github.com/BoyuanFeng	2025-04-16 04:18:46 +00:00
Li-Huai (Allan) Lin	ddfc14b3ae	[MPS] Fix where (#151176 ) Fixes #150967 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151176 Approved by: https://github.com/kulinseth, https://github.com/malfet	2025-04-13 20:44:50 +00:00
Jiang, Yanbing	1e92579126	Add torch._scaled_mm for CPU (#150410 ) This PR is the duplicated one for https://github.com/pytorch/pytorch/pull/139975. This PR is to add torch._scaled_mm for CPU backend. _scaled_mm_out_cpu and _scaled_mm_cpu are new added and included in torch._scaled_mm CPU dispatch. We also add _scaled_mm_out_cpu_emulated as a fallback function if the current platform cannot run FP8 matmul using oneDNN. And this PR also updates the various UTs related to FP8 to support CPU tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150410 Approved by: https://github.com/atalman	2025-04-11 02:23:03 +00:00
Isuru Fernando	d751698a36	Support negative values for fill with uint tensors (#144458 ) Fixes https://github.com/pytorch/pytorch/issues/144188 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144458 Approved by: https://github.com/amjames, https://github.com/eellison	2025-04-09 21:08:06 +00:00
Guilherme Leobas	e6bd133866	add batching rule for `torch.Tensor.scatter_add_` (#150543 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150543 Approved by: https://github.com/zou3519	2025-04-08 18:00:10 +00:00
FFFrog	881d99495d	Add more check for torch.ormqr (#150759 ) As the title statd. Please refer to https://github.com/pytorch/pytorch/issues/150674 for more info. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150759 Approved by: https://github.com/lezcano	2025-04-08 08:26:05 +00:00
PyTorch MergeBot	4854926aeb	Revert "Add torch._scaled_mm for CPU (#150410 )" This reverts commit 3b02f795c5ad2339794b15b370c0e4a235d36adf. Reverted https://github.com/pytorch/pytorch/pull/150410 on behalf of https://github.com/malfet due to It breaks ROCM tests ([comment](https://github.com/pytorch/pytorch/pull/150410#issuecomment-2777704212))	2025-04-04 06:52:54 +00:00
Jiang, Yanbing	3b02f795c5	Add torch._scaled_mm for CPU (#150410 ) This PR is the duplicated one for https://github.com/pytorch/pytorch/pull/139975. This PR is to add torch._scaled_mm for CPU backend. _scaled_mm_out_cpu and _scaled_mm_cpu are new added and included in torch._scaled_mm CPU dispatch. We also add _scaled_mm_out_cpu_emulated as a fallback function if the current platform cannot run FP8 matmul using oneDNN. And this PR also updates the various UTs related to FP8 to support CPU tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150410 Approved by: https://github.com/atalman	2025-04-03 19:43:45 +00:00

1 2 3 4 5 ...

2342 Commits