pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	e925dfcc6b	Enable all SIM rules except disabled ones (#164645 ) `SIM` rules are useful for simplifying boolean expressions and enhances code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645 Approved by: https://github.com/ezyang, https://github.com/mlazos	2025-10-17 07:27:11 +00:00
Edward Z. Yang	de8d81275a	Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939 ) This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. This largely reverts https://github.com/pytorch/pytorch/pull/103275/ for view ops. This means that in inference mode we could hit the wrong C++ kernel; if this occurs we should just SymInt'ify the C++ kernel. Another neat side effect of this change is that Inductor's generated kernels for rms_norm now have rms_norm in their name. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164939 Approved by: https://github.com/bdhirsh	2025-10-11 01:03:55 +00:00
PyTorch MergeBot	5c3fe9fb30	Revert "Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939 )" This reverts commit a6fa4f9c283971c0fb6f60a89674a1f35370ac79. Reverted https://github.com/pytorch/pytorch/pull/164939 on behalf of https://github.com/izaitsevfb due to introduces numeric issues internally, see [D84326613](https://www.internalfb.com/diff/D84326613) ([comment](https://github.com/pytorch/pytorch/pull/164939#issuecomment-3392203314))	2025-10-10 20:21:12 +00:00
Edward Z. Yang	a6fa4f9c28	Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939 ) This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. This largely reverts https://github.com/pytorch/pytorch/pull/103275/ for view ops. This means that in inference mode we could hit the wrong C++ kernel; if this occurs we should just SymInt'ify the C++ kernel. Another neat side effect of this change is that Inductor's generated kernels for rms_norm now have rms_norm in their name. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164939 Approved by: https://github.com/bdhirsh	2025-10-10 00:15:00 +00:00
PyTorch MergeBot	06d86e58d0	Revert "Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939 )" This reverts commit d40a9bfb8da0dc1ac1e6e56b33a25979112874de. Reverted https://github.com/pytorch/pytorch/pull/164939 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/164939#issuecomment-3385056722))	2025-10-09 09:50:59 +00:00
Edward Z. Yang	d40a9bfb8d	Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939 ) This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. This largely reverts https://github.com/pytorch/pytorch/pull/103275/ for view ops. This means that in inference mode we could hit the wrong C++ kernel; if this occurs we should just SymInt'ify the C++ kernel. Another neat side effect of this change is that Inductor's generated kernels for rms_norm now have rms_norm in their name. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164939 Approved by: https://github.com/bdhirsh ghstack dependencies: #164573	2025-10-09 04:49:44 +00:00
Yuanyuan Chen	a029675f6f	More ruff SIM fixes (#164695 ) This PR applies ruff `SIM` rules to more files. Most changes are about simplifying `dict.get` because `None` is already the default value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164695 Approved by: https://github.com/ezyang	2025-10-09 03:24:50 +00:00
PyTorch MergeBot	5d7360bb03	Revert "Enable all SIM rules except disabled ones (#164645 )" This reverts commit 321e6026925f6b6e8a36e3a8b7c0295cd7541911. Reverted https://github.com/pytorch/pytorch/pull/164645 on behalf of https://github.com/izaitsevfb due to causes lint failures ([comment](https://github.com/pytorch/pytorch/pull/164645#issuecomment-3369274351))	2025-10-05 19:32:21 +00:00
Yuanyuan Chen	321e602692	Enable all SIM rules except disabled ones (#164645 ) `SIM` rules are useful for simplifying boolean expressions and enhances code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645 Approved by: https://github.com/ezyang	2025-10-05 07:38:25 +00:00
Yuanyuan Chen	1f8ee5da11	[TorchGen] Remove unused variables and function imports (#164538 ) This PR removes unused code in torchgen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164538 Approved by: https://github.com/Skylion007, https://github.com/albanD	2025-10-03 20:49:36 +00:00
Yuanyuan Chen	a43c4c3972	[5/N] Apply ruff UP035 rule (#164423 ) Continued code migration to enable ruff `UP035`. Most changes are about moving `Callable` from `typing` to `from collections.abc`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164423 Approved by: https://github.com/ezyang	2025-10-02 07:31:11 +00:00
Yavuz Yetim	7afcb030d8	Back out "Revert D81959389" (#163905 ) Summary: Original commit changeset: 06888d7ebff0 Original Phabricator Diff: D82932788 Restricted the test to SM90 for scaled_grouped_mm Test Plan: TBD (will share the linux CI results) Differential Revision: D83283991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163905 Approved by: https://github.com/angelayi	2025-09-30 07:05:13 +00:00
Yuanyuan Chen	7441a1b9b1	Update ruff to 0.13.1 (#163744 ) Update ruff to 0.13.1 so that we can remove `UP038` from `pyproject.toml` because it has been removed from supported rules of ruff. There are some fixes, the most notable one is [(PYI059)](https://docs.astral.sh/ruff/rules/generic-not-last-base-class/#generic-not-last-base-class-pyi059) ``` Checks for classes inheriting from typing.Generic[] where Generic[] is not the last base class in the bases tuple. ``` A BC-breaking change is introduced to change the typing of `OrderedSet .storage` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163744 Approved by: https://github.com/Skylion007, https://github.com/jingsh	2025-09-26 10:12:21 +00:00
Brian Hirsh	7d710403b0	Reapply "Make functionalization `ViewMeta` serializable with pickle. (#143712 )" (#163769 ) ### Summary: NOTE: This is a re-export of https://github.com/pytorch/pytorch/pull/161994 ; the changes between these two PRs is exclusively to the buck/build files (Summary from #161994 ) Attempted rebase of https://github.com/pytorch/pytorch/pull/143712. This reverts commit 6c713ccb5e0df227dd5b630057cbccd373cbe7d6. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D81524507 Pulled By: Lucaskabela Pull Request resolved: https://github.com/pytorch/pytorch/pull/163769 Approved by: https://github.com/dolpm Co-authored-by: Brian Hirsh <hirsheybar@fb.com>	2025-09-25 10:27:37 +00:00
FFFrog	a635505a99	[Code Clean] Remove deadcodes about Python3.9 [6/N] (#163645 ) As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163645 Approved by: https://github.com/albanD ghstack dependencies: #163626, #163627, #163629, #163643, #163644	2025-09-24 07:30:50 +00:00
Nikita Shulga	f9fa138a39	[BE] Delete all pre py-3.10 checks (#163653 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163653 Approved by: https://github.com/jansel ghstack dependencies: #163648, #163649	2025-09-23 23:22:53 +00:00
PyTorch MergeBot	deb7ebe0a3	Revert "[Reland] Use std::string_view in torchgen (#158625 )" This reverts commit 972e409829343cc2062aeee0994a9c1c735d216a. Reverted https://github.com/pytorch/pytorch/pull/158625 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break a couple of ExecuTorch tests for Vulkan backend ([comment](https://github.com/pytorch/pytorch/pull/158625#issuecomment-3287754275))	2025-09-13 07:52:50 +00:00
Yuanyuan Chen	972e409829	[Reland] Use std::string_view in torchgen (#158625 ) Reland of #157050, which is incidentally closed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158625 Approved by: https://github.com/albanD	2025-09-12 08:31:54 +00:00
Laith Sakka	189a054cfb	Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 ) [relanding again after fixing internal build] Summary: This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous() but want to find those call sites to handle this properly by calling is_contiguous_or_false() and not is_contiguous() explitly when appropriate. I had to fix one issue after removing the implicit size oblivious reasoning. here is context we defined in this https://github.com/pytorch/pytorch/pull/157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE. when people call is_contiguous we do sym_is_contiguous().guard_bool() when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false() one issue not handled well was this path ``` c10::SymBool TensorImpl::sym_is_contiguous_custom( at::MemoryFormat memory_format) const { if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) { return pyobj_slot_.load_pyobj_interpreter()->is_contiguous( this, memory_format); } return sym_is_contiguous_default(memory_format); } ``` namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format); This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning. once we removed that implicit size oblivious reasoning, the right thing we want is to call return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format); otherwise we would get DDE even if the caller is doing sym_is_contiguous. so I had to define it for pyinterpreter, and then I had to override it for nested tensors. Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see `e444cd24d4` Rollback Plan: Differential Revision: D80435179 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160869 Approved by: https://github.com/ezyang	2025-09-08 22:59:13 +00:00
Jane Xu	63632fc7ee	Add new_zeros dtype variant to the shim and as a stable op (#161597 ) In case we want this before 2.9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161597 Approved by: https://github.com/mikaylagawarecki	2025-08-28 13:57:24 +00:00
Mikayla Gawarecki	78a8e6a671	Add new_empty (with dtype argument only) to torch::stable (#159508 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159508 Approved by: https://github.com/janeyx99 ghstack dependencies: #160557	2025-08-20 00:50:42 +00:00
Sam Anklesaria	0a5ab612dd	Port amax to stable ABI (#160214 ) To enable porting torchaudio to the stable ABI, we need the `amax` operation to be accessible. This PR ports the op and provides tests that it behaves correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160214 Approved by: https://github.com/mikaylagawarecki	2025-08-19 17:24:53 +00:00
PyTorch MergeBot	b82aa3df20	Revert "Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. (#159197 )" This reverts commit e444cd24d48b3a46f067974f2cc157f5ed27709f. Reverted https://github.com/pytorch/pytorch/pull/159197 on behalf of https://github.com/laithsakka due to internal build failures ([comment](https://github.com/pytorch/pytorch/pull/159197#issuecomment-3195436668))	2025-08-18 07:22:13 +00:00
Laith Sakka	e444cd24d4	Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. (#159197 ) This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous() but want to find those call sites to handle this properly by calling is_contiguous_or_false() and not is_contiguous() explitly when appropriate. I had to fix one issue after removing the implicit size oblivious reasoning. here is context we defined in this https://github.com/pytorch/pytorch/pull/157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE. when people call is_contiguous we do sym_is_contiguous().guard_bool() when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false() one issue not handled well was this path ``` c10::SymBool TensorImpl::sym_is_contiguous_custom( at::MemoryFormat memory_format) const { if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) { return pyobj_slot_.load_pyobj_interpreter()->is_contiguous( this, memory_format); } return sym_is_contiguous_default(memory_format); } ``` namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format); This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning. once we removed that implicit size oblivious reasoning, the right thing we want is to call return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format); otherwise we would get DDE even if the caller is doing sym_is_contiguous. so I had to define it for pyinterpreter, and then I had to override it for nested tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159197 Approved by: https://github.com/ezyang	2025-08-16 09:15:58 +00:00
PyTorch MergeBot	846963fa9b	Revert "[Inductor] addmm + activation function fusion (#158137 )" This reverts commit b9d7de3a094598c3dc0dd52e57bce30eb684c9d8. Reverted https://github.com/pytorch/pytorch/pull/158137 on behalf of https://github.com/malfet due to Broke inductor torchbench, see `663da17b62/1` ([comment](https://github.com/pytorch/pytorch/pull/158137#issuecomment-3191841298))	2025-08-15 15:34:09 +00:00
AaronWang04	b9d7de3a09	[Inductor] addmm + activation function fusion (#158137 ) PR implements a pass in post_grad to fuse activation(add + mm) This was previously done similarly here #106912 but was reverted for performance reasons. it was replaced with a pass that unfuses the activation and add from addmm/addmm_activation and let inductor handle the fusion. however since then cuBLAS team has made a lot of perf improvements on this, will update this post with more benchmarks but preliminary benchmark show good results perf dash board <img width="3371" height="1240" alt="Screenshot from 2025-08-07 13-41-35" src="https://github.com/user-attachments/assets/d44d6205-b33a-4a20-9f0f-d9db176b3738" /> Relu works with both training and inference but gelu only works with inference mode due to some fundamental limitations since gelu's derivative depends on input and relu's doesnt. don't think this is fixable with the current addmm_activation API Graph module before and after this pass Relu(addmm) ``` graph(): %primals_1 : [num_users=1] = placeholder[target=primals_1] %primals_2 : [num_users=2] = placeholder[target=primals_2] %primals_3 : [num_users=2] = placeholder[target=primals_3] %addmm : [num_users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%primals_1, %primals_3, %primals_2), kwargs = {}) %relu : [num_users=2] = call_function[target=torch.ops.aten.relu.default](args = (%addmm,), kwargs = {}) %le : [num_users=1] = call_function[target=torch.ops.aten.le.Scalar](args = (%relu, 0), kwargs = {}) %permute_1 : [num_users=1] = call_function[target=torch.ops.aten.permute.default](args = (%primals_3, [1, 0]), kwargs = {}) return (relu, primals_2, le, permute_1) graph(): %primals_1 : [num_users=1] = placeholder[target=primals_1] %primals_2 : [num_users=2] = placeholder[target=primals_2] %primals_3 : [num_users=2] = placeholder[target=primals_3] %_addmm_activation_default : [num_users=2] = call_function[target=torch.ops.aten._addmm_activation.default](args = (%primals_1, %primals_3, %primals_2), kwargs = {}) %le : [num_users=1] = call_function[target=torch.ops.aten.le.Scalar](args = (%_addmm_activation_default, 0), kwargs = {}) %permute_1 : [num_users=1] = call_function[target=torch.ops.aten.permute.default](args = (%primals_3, [1, 0]), kwargs = {}) return (_addmm_activation_default, primals_2, le, permute_1) ``` Gelu (addmm) ``` graph(): %arg0_1 : [num_users=1] = placeholder[target=arg0_1] %arg1_1 : [num_users=1] = placeholder[target=arg1_1] %arg2_1 : [num_users=1] = placeholder[target=arg2_1] %addmm : [num_users=4] = call_function[target=torch.ops.aten.addmm.default](args = (%arg0_1, %arg2_1, %arg1_1), kwargs = {}) %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%addmm, %addmm), kwargs = {}) %mul_1 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%mul, %addmm), kwargs = {}) %mul_2 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%mul_1, 0.044715), kwargs = {}) %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%addmm, %mul_2), kwargs = {}) %mul_3 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%add, 0.7978845608028654), kwargs = {}) %mul_4 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%addmm, 0.5), kwargs = {}) %tanh : [num_users=1] = call_function[target=torch.ops.aten.tanh.default](args = (%mul_3,), kwargs = {}) %add_1 : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%tanh, 1), kwargs = {}) %mul_5 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%mul_4, %add_1), kwargs = {}) return (mul_5,) graph(): %arg0_1 : [num_users=1] = placeholder[target=arg0_1] %arg1_1 : [num_users=1] = placeholder[target=arg1_1] %arg2_1 : [num_users=1] = placeholder[target=arg2_1] %_addmm_activation_default : [num_users=1] = call_function[target=torch.ops.aten._addmm_activation.default](args = (%arg0_1, %arg2_1, %arg1_1), kwargs = {use_gelu: True}) return (_addmm_activation_default,) ``` Benchmark setup: NGC pytorch 25.06 container cublas version: 12.9.1.4 torch.compile ran with dynamic = False and max_autotune H100 ``` Testing with M=1024, N=1024, K=1024, dtype=bfloat16 ============================================================ Average Time per Iteration (cublas): 0.0107 ms Average Time per Iteration (torch compile): 0.0296 ms ============================================================ Testing with M=2048, N=2048, K=2048, dtype=bfloat16 ============================================================ Average Time per Iteration (cublas): 0.0262 ms Average Time per Iteration (torch compile): 0.0327 ms ============================================================ Testing with M=4096, N=4096, K=4096, dtype=bfloat16 ============================================================ Average Time per Iteration (cublas): 0.1763 ms Average Time per Iteration (torch compile): 0.2457 ms ============================================================ Testing with M=8192, N=8192, K=8192, dtype=bfloat16 ============================================================ Average Time per Iteration (cublas): 1.5280 ms Average Time per Iteration (torch compile): 1.9437 ms ``` A100 ``` ############################################################ Testing with dtype: float16 ############################################################ ============================================================ Testing with M=1024, N=1024, K=1024, dtype=float16 ============================================================ Average Time per Iteration (cublas): 0.0313 ms Average Time per Iteration (torch compile): 0.0643 ms ============================================================ Testing with M=2048, N=2048, K=2048, dtype=float16 ============================================================ Average Time per Iteration (cublas): 0.1149 ms Average Time per Iteration (torch compile): 0.1255 ms ============================================================ Testing with M=4096, N=4096, K=4096, dtype=float16 ============================================================ Average Time per Iteration (cublas): 0.6297 ms Average Time per Iteration (torch compile): 0.7547 ms ============================================================ Testing with M=8192, N=8192, K=8192, dtype=float16 ============================================================ Average Time per Iteration (cublas): 4.3821 ms Average Time per Iteration (torch compile): 5.0740 ms ``` Script ```py import torch torch.manual_seed(0) warmup, numrun= 10, 100 sizes = [1024, 2048, 4096, 8192] dtypes = [torch.float16, torch.bfloat16, torch.float32] device = torch.device("cuda") for dtype in dtypes: dtype_name = str(dtype).split('.')[-1] print(f"\n{'#'60}") print(f"Testing with dtype: {dtype_name}") print(f"{'#'60}") for size in sizes: M, N, K = size, size, size print(f"\n{'='60}") print(f"Testing with M={M}, N={N}, K={K}, dtype={dtype_name}") print(f"{'='60}") A = torch.randn(M, K, device=device, dtype=dtype) B = torch.randn(K, N, device=device, dtype=dtype) C = torch.randn(M, device=device, dtype=dtype) def func1(): return torch._addmm_activation(C, A, B, use_gelu=True) def func2(): return torch.nn.functional.gelu(torch.add(C, torch.mm(A, B)), approximate="tanh") func2_compiled = torch.compile( func2, dynamic=False, options={ "force_disable_caches": True, "max_autotune": True, "max_autotune_gemm": True, "max_autotune_gemm_backends": "TRITON", "autotune_fallback_to_aten": False, } ) for _ in range(warmup): func1() torch.cuda.synchronize(device=device) start_event = torch.cuda.Event(enable_timing=True) end_event = torch.cuda.Event(enable_timing=True) total_time_ms = 0.0 start_event.record() for _ in range(numrun): func1() end_event.record() torch.cuda.synchronize(device=device) total_time_ms += start_event.elapsed_time(end_event) avg_time_ms = total_time_ms / numrun print(f"Average Time per Iteration (cublas):\t {avg_time_ms:.4f} ms") for _ in range(warmup): func2_compiled() torch.cuda.synchronize(device=device) start_event = torch.cuda.Event(enable_timing=True) end_event = torch.cuda.Event(enable_timing=True) total_time_ms = 0.0 start_event.record() for _ in range(numrun): func2_compiled() end_event.record() torch.cuda.synchronize(device=device) total_time_ms += start_event.elapsed_time(end_event) avg_time_ms = total_time_ms / numrun print(f"Average Time per Iteration (torch compile):\t {avg_time_ms:.4f} ms") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/158137 Approved by: https://github.com/eellison	2025-08-14 20:41:38 +00:00
Mikayla Gawarecki	4d419a7461	Add pad and narrow to torch/csrc/stable/ops.h (#159328 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159328 Approved by: https://github.com/janeyx99 ghstack dependencies: #159507	2025-08-12 21:29:49 +00:00
Isalia20	7f4cb4a3e0	[MPS] coalesce for sparse tensors (#159729 ) MPS coalesce function for sparse tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/159729 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-08-08 13:49:55 +00:00
Aaron Gokaslan	beb4d7816d	[BE]: ruff PLC0207 - use maxsplit kwarg (#160107 ) Automatically replaces split with rsplit when relevant and only performs the split up to the first ( or last value). This allows early return of the split function and improve efficiency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160107 Approved by: https://github.com/albanD	2025-08-08 03:14:59 +00:00
Mikayla Gawarecki	d87161c3c8	[Easy] Fix wrong propagation of fallback_ops_dict in gen_aoti_c_shim (#159904 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159904 Approved by: https://github.com/janeyx99	2025-08-06 15:09:18 +00:00
PyTorch MergeBot	59e261bbd8	Revert "[CI] update flake8 and mypy lint dependencies (#158720 )" This reverts commit f5130bf339f12ccf5c6296130c47685bdc4858e4. Reverted https://github.com/pytorch/pytorch/pull/158720 on behalf of https://github.com/yangw-dev due to this pr failed internally when build torchgen due to rror: fail: Unknown PyPI project: pyyaml, it seems like this is caused by change PyYAML into pyyaml, please fix it ([comment](https://github.com/pytorch/pytorch/pull/158720#issuecomment-3129995414))	2025-07-28 22:02:10 +00:00
Xuehai Pan	f5130bf339	[CI] update flake8 and mypy lint dependencies (#158720 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158720 Approved by: https://github.com/Skylion007	2025-07-26 17:12:29 +00:00
Mikayla Gawarecki	e65ab9a868	Enable generating generic c_shim that doesn't bypass dispatcher (#158974 ) Adds `c_shim_aten.{h/cpp}` and use this for `fill_` This is the generated `c_shim_aten.cpp` for reference ```cpp // WARNING: THIS FILE IS AUTOGENERATED BY torchgen. DO NOT MODIFY BY HAND. // See `7e86a7c015/torchgen/gen.py (L2424-L2436)` for details // This file corresponds to the aten_shimified_ops list in torchgen/aoti/fallback_ops.py #include <torch/csrc/inductor/aoti_torch/generated/c_shim_aten.h> #include <torch/csrc/inductor/aoti_torch/utils.h> #ifndef AT_PER_OPERATOR_HEADERS #include <ATen/Functions.h> #include <ATen/CompositeExplicitAutogradFunctions.h> #include <ATen/CompositeExplicitAutogradNonFunctionalFunctions.h> #include <ATen/CompositeImplicitAutogradFunctions.h> #else #include <ATen/ops/fill.h> #endif // AT_PER_OPERATOR_HEADERS using namespace torch::aot_inductor; AOTITorchError aoti_torch_aten_fill__Scalar(AtenTensorHandle self, double value) { AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE({ at::fill_( *tensor_handle_to_tensor_pointer(self), value ); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/158974 Approved by: https://github.com/albanD, https://github.com/janeyx99	2025-07-25 21:59:14 +00:00
PyTorch MergeBot	393377d215	Revert "[CI] update flake8 and mypy lint dependencies (#158720 )" This reverts commit a527e816935957a164d74dd7c5069310b2857695. Reverted https://github.com/pytorch/pytorch/pull/158720 on behalf of https://github.com/malfet due to This broke lint, see `8e57cdb746/1` ([comment](https://github.com/pytorch/pytorch/pull/158720#issuecomment-3096893256))	2025-07-21 13:58:50 +00:00
Xuehai Pan	a527e81693	[CI] update flake8 and mypy lint dependencies (#158720 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158720 Approved by: https://github.com/Skylion007	2025-07-21 09:24:29 +00:00
Benjamin Glass	f44a9eee47	[AOTI] Add missing ops to set of C-shim ops which can have nullptr returns (#158073 ) Most added ops are backwards ops, which have not been well-tested previously (thus why they were missed). Necessary ops were identified by manual examination of torch/_meta_registrations.py return values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158073 Approved by: https://github.com/desertfire	2025-07-11 23:35:26 +00:00
PyTorch MergeBot	d5e6f42094	Revert "Use std::string_view in torchgen (#157050 )" This reverts commit 064288cbab94c9931ca2296a2b9723e864f9050a. Reverted https://github.com/pytorch/pytorch/pull/157050 on behalf of https://github.com/jeanschmidt due to Seems to have broken internal builds, more details on D77449943. @ezyang may I count on your help to get those changes merged? ([comment](https://github.com/pytorch/pytorch/pull/157050#issuecomment-3020222668))	2025-06-30 18:08:54 +00:00
Isalia20	a1282b1823	[MPS] Add boilerplate sparse code support (#157238 ) This PR makes minimal changes to support sparse tensors on MPS. In the followup PRs I'll start adding different operations slowly so we can fix the issue of https://github.com/pytorch/pytorch/issues/129842 which is highly requested(I assume because of whisper using sparse tensors) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157238 Approved by: https://github.com/malfet	2025-06-30 01:53:45 +00:00
cyy	064288cbab	Use std::string_view in torchgen (#157050 ) Let the generated code use std::sv Pull Request resolved: https://github.com/pytorch/pytorch/pull/157050 Approved by: https://github.com/ezyang	2025-06-27 06:36:10 +00:00
angelayi	aff9c1eec5	[aoti][mps] Add fused_rms and sdpa_mps fallback ops (#156844 ) Needed for llama3.1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156844 Approved by: https://github.com/desertfire ghstack dependencies: #156843	2025-06-26 21:03:05 +00:00
Jane Xu	eb331b59fe	Add shim fallback for narrow (#156496 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156496 Approved by: https://github.com/albanD	2025-06-20 19:47:00 +00:00
Jane Xu	4b6cbf528b	Add C shim fallback for fill_ (#156245 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156245 Approved by: https://github.com/desertfire	2025-06-20 18:45:48 +00:00
angelayi	c37ddcaefb	Fix torchgen update-aoti-shim (#156323 ) will remove the fill changes before landing and let Jane merge her changes! Pull Request resolved: https://github.com/pytorch/pytorch/pull/156323 Approved by: https://github.com/janeyx99	2025-06-20 05:23:06 +00:00
Xuehai Pan	b020971e78	[BE] fix typos in torchgen/ (#156083 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156083 Approved by: https://github.com/jingsh ghstack dependencies: #156079, #156082	2025-06-17 19:25:50 +00:00
Oguz Ulgen	a2a75be0f8	Rename inductor cache (#156128 ) Requested by Simon on a different PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/156128 Approved by: https://github.com/xmfan	2025-06-17 03:57:18 +00:00
Xuehai Pan	736a15a81a	[torchgen] Fix `ruff format` for `# fmt: skip` comment for function signature (#155909 ) See also: - astral-sh/ruff#18658 This fix follows the suggestion from: - https://github.com/astral-sh/ruff/issues/18658#issuecomment-2970130276 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155909 Approved by: https://github.com/ezyang	2025-06-14 12:28:55 +00:00
Sean McGovern	297805fd8f	Typo fixes for "overridden" in comments and function names (#155944 ) This word appears often in class descriptions and is not consistently spelled. Update comments and some function names to use the correct spelling consistently. Facilitates searching the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155944 Approved by: https://github.com/Skylion007	2025-06-14 03:37:38 +00:00
xinan.lin	670dab6c63	[AOTI] Enable OP `test__weight_int4pack_mm_with_scales_and_zeros` in AOTI. (#155780 ) The op test__weight_int4pack_mm_with_scales_and_zeros is for Intel GPU. It is functionally equivalent to the CUDA/CPU op test__weight_int4pack_mm (with the constraint that oneDNN only supports integer zero points, which is why we need this API). Since test__weight_int4pack_mm is already included in AOTI's fallback list, this PR adds support for XPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155780 Approved by: https://github.com/jansel	2025-06-13 11:12:13 +00:00
angelayi	938515fa75	[aoti] Update cshim for all backends (#155604 ) Fixes https://github.com/pytorch/pytorch/issues/155349 `python torchgen/gen.py --update-aoti-c-shim` will now update all cpu/cuda/mps/xpu shims -- I verified this using `aten._print.default`, but didn't commit the changes since I'm not sure if we actually want to add this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155604 Approved by: https://github.com/desertfire, https://github.com/janeyx99	2025-06-12 22:10:58 +00:00
Mengwei Liu	386aa72003	[BE] Cleanup old ExecuTorch codegen and runtime code (#154165 ) Summary: These files are added to pytorch/pytorch before ExecuTorch is opensourced. Now is a good time to remove it from pytorch/pytorch, since the code is moved to pytorch/executorch already. Test Plan: Rely on CI jobs. Differential Revision: [D75985423](https://our.internmc.facebook.com/intern/diff/D75985423) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154165 Approved by: https://github.com/kimishpatel, https://github.com/Skylion007, https://github.com/cyyever	2025-06-07 06:54:12 +00:00

1 2 3 4 5 ...

666 Commits