pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	aaac8cb0f5	[1/N] Add strict parameter to Python zip calls (#165531 ) Add `strict=True/False` to zip calls in test utils. `strict=True` is passed when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165531 Approved by: https://github.com/Skylion007 trunk/aaac8cb0f5852bd52be558b59eca35c6e722313c	2025-10-18 05:26:33 +00:00
Yuanyuan Chen	0f0b4bf029	[1/N] Remove unused header inclusion (#165763 ) This PR removes unused header inclusion in C++ files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165763 Approved by: https://github.com/Skylion007 trunk/0f0b4bf0295f988b62283efd72f08a5180d905c4	2025-10-18 05:23:11 +00:00
Yuanyuan Chen	b8194268a6	Remove unnecessary noqa suppressions (#164106 ) This PR removes unused `noqa` suppressions in Python code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164106 Approved by: https://github.com/albanD trunk/b8194268a6fbc369cce413990826492d36d88bdc	2025-10-18 04:52:41 +00:00
Maggie Moss	f02e3947f6	Expand type checking to mypy strict files (#165697 ) Expands Pyrefly type checking to check the files outlined in the mypy-strict.ini configuration file: Pull Request resolved: https://github.com/pytorch/pytorch/pull/165697 Approved by: https://github.com/ezyang trunk/f02e3947f65cd3d6509224af8e5efdaaa348ef32	2025-10-18 04:34:45 +00:00
Huy Do	9095a9dfae	[CD] Apply the fix from #162455 to aarch64+cu129 build (#165794 ) When trying to bring cu129 back in https://github.com/pytorch/pytorch/pull/163029, I mainly looked at https://github.com/pytorch/pytorch/pull/163029 and missed another tweak coming from https://github.com/pytorch/pytorch/pull/162455 I discover this issue when testing aarch64+cu129 builds in https://github.com/pytorch/test-infra/actions/runs/18603342105/job/53046883322?pr=7373. Surprisingly, there is no test running for aarch64 CUDA build from what I see in `79a37055e7`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165794 Approved by: https://github.com/malfet trunk/9095a9dfae39ad3064a999558f2fd393ff78bd3e	2025-10-18 04:16:24 +00:00
Animesh Jain	d9f94e0d7d	[dynamo] Support fx.traceback.annotate as decorator (#165805 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165805 Approved by: https://github.com/Lucaskabela, https://github.com/SherlockNoMad, https://github.com/yushangdi trunk/d9f94e0d7d96e52a636899a1b104cf610dd1a905	2025-10-18 03:58:11 +00:00
Simon Layton	23417ae50f	[Submodule] Bump FBGEMM to latest (#165544 ) Summary: * FBGEMM submodule updated to main * CMake updated to reflect necessary changes * Notably pulls in NVFP4 grouped gemm kernels Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165544 Approved by: https://github.com/cyyever, https://github.com/jeffdaily trunk/23417ae50f5d9bc02e988d916c103ff3a03c5903	2025-10-18 03:58:08 +00:00
Yiming Zhou	e4d6c56ffb	Improve dynamo graph capture stack trace for custom ops (#165693 ) For a custom op ``` @torch.library.custom_op("my_lib::foo", mutates_args={}) def foo(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor: return x + y ``` ppl could call `torch.ops.my_lib.foo()` or directly call `foo()` in the `forward` of an `nn.Module` These two calling conventions will lead to the same node in the output graph, but different stack traces. When directly calling `foo()`, the displayed stack_trace in the graph will be ``` # File: .../pytorch/torch/_library/custom_ops.py:687 in __call__, code: return self._opoverload(args, *kwargs) ``` This is not useful so we filter it out. ``` python test/functorch/test_aot_joint_with_descriptors.py -k test_custom_op_stack_trace ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165693 Approved by: https://github.com/SherlockNoMad, https://github.com/williamwen42 trunk/e4d6c56ffb3d680d3874f0dd01907aee7ed2d3c5	2025-10-18 03:48:18 +00:00
Laith Sakka	017d2985f3	set unbacked bindings in reinplace pass for newly created nodes during generalize_scatter decomp (#164948 ) Two fixes: 1. in rein_place pass, set unbacked bindings for newly created nodes. 2. In inductor, ComputeBuffer used to miss detecting some used symbols, fixed that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164948 Approved by: https://github.com/bobrenjc93 ghstack dependencies: #164341 trunk/017d2985f3a66955ae4a3fba217f2edca369fca4	2025-10-18 03:20:30 +00:00
Laith Sakka	c6a8db0b9a	Fix issues with generalized_scatter and setitem allocated unbacked symbols. (#164341 ) Three fixes: 1. When doing t[u0] +=1 if u0 is unbacked we could allocate a new unbacked symbol during the the indexing of t[u0] (when we fake trace setitem), namely because meta_select does allocate a new unbacked symbol for the storage offset when we do not know if u0>=0 or u0<0. but the output size/stride of setitem(), does not depend on that new symbol. it's self consumed in setitem so we shall ignore it. 2. Also when we trace through generalized_scatter the applications of the views could allocate unbacked symints but those do not effect final output, we also shall ignore them. 3.Before accessing strides in lowering we shall materialize. Address https://github.com/pytorch/pytorch/issues/114293 and https://github.com/pytorch/pytorch/issues/131911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164341 Approved by: https://github.com/bobrenjc93	2025-10-18 03:20:30 +00:00
Aaron Gokaslan	de09bab4b6	[BE]: Update cudnn frontend submodule to 1.15.0 (#165776 ) Update cudnn frontend submodule to 1.15.0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165776 Approved by: https://github.com/eqy trunk/de09bab4b66002a8a9a2195f50f96a78868a3d39	2025-10-18 02:23:27 +00:00
jmaczan	c137e222d4	.venv/ in .gitignore (#165418 ) `uv venv` creates venv in `.venv/` directory. So, it's useful to have `.venv/` in `.gitignore`, since perhaps more people are using `uv` in their work. As per comment `3592f5f4e5 (diff-bc37d034bad564583790a46f19d807abfe519c5671395fd494d8cce506c42947)` uv docs that confirms it: https://docs.astral.sh/uv/pip/environments/#using-arbitrary-python-environments Pull Request resolved: https://github.com/pytorch/pytorch/pull/165418 Approved by: https://github.com/ezyang trunk/c137e222d42ee5f36670b3b2138243c1b12eae83	2025-10-18 02:00:52 +00:00
Shangdi Yu	cf3a787bbc	[annotate] Annotate bw nodes before eliminate dead code (#165782 ) Fixes https://github.com/pytorch/torchtitan/pull/1907 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165782 Approved by: https://github.com/SherlockNoMad trunk/cf3a787bbcf6dc4ca6d746aea1e9dd4ee0c0fbda	2025-10-18 01:54:31 +00:00
drisspg	de3da77cf7	Thread deterministic config vars to subproc compilation (#165729 ) # Summary TIL (AFTER WAYYYY TOO MUCH INSANITY), that we do not serialize the full set of configs for the subproc compilation. I found this while working on Flex-attention determinism: https://github.com/meta-pytorch/attention-gym/pull/168 might be good to audit if we need to thread through any more Pull Request resolved: https://github.com/pytorch/pytorch/pull/165729 Approved by: https://github.com/shunting314, https://github.com/eellison trunk/de3da77cf7f51392be7c8ac9b9a0dab149be938d	2025-10-18 01:25:50 +00:00
Ti-Tai Wang	543ddbf44c	[ONNX] Support renaming in dynamic axes to shapes conversion (#165769 ) Discovered in ##165748 This PR also deprecates the conversion. ONNX exporter team does not intend to maintain the conversion in long term. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165769 Approved by: https://github.com/justinchuby trunk/543ddbf44c06640b424abf72a6469dddc829809f	2025-10-18 01:11:20 +00:00
orangeH25	e9f4999985	[Code Clean] Replace std::runtime_error with TORCH_CHECK (#165305 ) Fixes part of #148114 Including: - torch/csrc/distributed Pull Request resolved: https://github.com/pytorch/pytorch/pull/165305 Approved by: https://github.com/FFFrog, https://github.com/albanD trunk/e9f4999985c0aa1f3c2c5489cde5ae3614503154	2025-10-18 01:08:44 +00:00
Chris Leonard	29b029648e	Fixed issue with GradTrackingTensor not properly propagating sparse layout (#165765 ) Fixes #164286 Fixed issue with GradTrackingTensor not properly propagating sparse layout. @ezyang @jcaip Pull Request resolved: https://github.com/pytorch/pytorch/pull/165765 Approved by: https://github.com/ezyang trunk/29b029648ed3871b83c28d4625bb5f969fe4cb41	2025-10-18 01:00:53 +00:00
Shivam Raikundalia	a25a649e70	[Mem Snapshot] Add Metadata Field (#165490 ) Summary: The implementation adds the ability to: Set custom metadata strings that will be attached to all subsequent allocations Clear or change the metadata at any point View the metadata in memory snapshots via _dump_snapshot() Test Plan: Added test in test_cuda.py and check manually in snapshot to see that metadata was added. Differential Revision: D84654933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165490 Approved by: https://github.com/yushangdi trunk/a25a649e705447b55f5c8b91157472c00c0c42cd	2025-10-17 23:46:02 +00:00
PyTorch MergeBot	69c33898fa	Revert "[Inductor][CuTeDSL] Move load_template up two directories (#165347 ) (#165576 )" This reverts commit febb60323018948b2b9d2cff35b3cc4e0d0c55c8. Reverted https://github.com/pytorch/pytorch/pull/165576 on behalf of https://github.com/seemethere due to This was actually reverted internally, current PR is linked to a stale diff so diff train tools think that this is landed via co-dev when it was actually reverted ([comment](https://github.com/pytorch/pytorch/pull/165576#issuecomment-3417510146)) trunk/69c33898fa99f7c4552401a630a77675119c7ce7	2025-10-17 23:33:17 +00:00
Dzmitry Huba	1b397420f2	Enable more DTensor tests in local tensor mode and fix more integration issues (#165716 ) - During op dispatch local tensor is supposed to collect rng state from CPU and CUDA devices so that it can be reset before execution of the op for each such that ops with randomness produces the same result for all ranks (note that we are planning a separate change to add support of per rank rng state). Previously we relied on op input arguments to deduce which devices to get rng state from. Which doesn't work for factory functions such torch.randn. Hence this changes switches to uncondionally collecting rng state from all devices. - Fixing per rank specific computations in _MaskedPartial and Shard placements discovered during test enablement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165716 Approved by: https://github.com/ezyang trunk/1b397420f22b22f90a1093233ecd9167656e50cb	2025-10-17 23:28:22 +00:00
drisspg	fe80f03726	Add B200 files to labeler and update codeowners (#165767 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165767 Approved by: https://github.com/slayton58 trunk/fe80f03726a7a50439be063327b67c7fba6279b2 viable/strict/1760761532	2025-10-17 23:24:17 +00:00
PyTorch MergeBot	e50dc40d28	Revert "Update gm.print_readable to include Annotation (#165397 )" This reverts commit 7a657700131f31577544e93587eb339618677e97. Reverted https://github.com/pytorch/pytorch/pull/165397 on behalf of https://github.com/malfet due to I don't know how/why, but it breaks windows tests, see `2e22b1a61e/1` ([comment](https://github.com/pytorch/pytorch/pull/165397#issuecomment-3417428128)) trunk/e50dc40d28ba409930023c77a031ec0dd20fd73b viable/strict/1760758005	2025-10-17 22:35:50 +00:00
Wes Bland	2e22b1a61e	[pytorch] Composite backend potential fix for is_backend_available (#165061 ) Summary: `is_backend_available` takes in a string and expects it to only be backend, if its given a composite (device:backend) string, it fails. Reviewed By: prashrock Differential Revision: D81886736 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165061 Approved by: https://github.com/H-Huang trunk/2e22b1a61ea20a54448edf34a5d22fbe8391d626	2025-10-17 22:06:36 +00:00
Animesh Jain	616c6bdf8f	[dynamo][ac] Config flag to allow eager and compile AC divergence for side-effects (#165775 ) Eager AC/SAC reapplies the mutations (like global dict mutations) in the backward during the recomputation of forward. torch.compile has no easy way to reapply python mutations in the backward. But many users might be ok to skip reapplication of side effects in the backward. They can set this config flag to accept this eager and compile divergence. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165775 Approved by: https://github.com/zou3519 ghstack dependencies: #165734 trunk/616c6bdf8ff5052a03f3bfa4e6258c3a527f93db	2025-10-17 22:04:19 +00:00
Animesh Jain	c18ddfc572	[dynamo][easy] Support torch.accelerator.current_accelerator (#165734 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165734 Approved by: https://github.com/Skylion007	2025-10-17 22:04:19 +00:00
Zhengxu Chen	86ebce1766	[precompile] Pass tensor_to_context to backend. (#165702 ) Summary: Fixing a VLLM issue https://github.com/vllm-project/vllm/issues/27040 where aot precompile fails on some models using symbolic shapes in inductor. Test Plan: pp HF_HUB_DISABLE_XET=1 VLLM_ENABLE_V1_MULTIPROCESSING=0 VLLM_USE_AOT_COMPILE=1 vllm bench latency --model microsoft/DialoGPT-small --input-len 128 --output-len 256 --num-iters 50 --dtype float16 Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/165702 Approved by: https://github.com/tugsbayasgalan trunk/86ebce1766b6e20b269f35955fbc3e97332aa765	2025-10-17 21:52:04 +00:00
Nan Zhang	8cb2fb44f2	[Inductor] Support fallback for all gemm like ops (#165755 ) Summary: Fill op_override field for bmm aten ops so they can be converted properly in the wrapper_fxir backend Reviewed By: StellarrZ Differential Revision: D84840948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165755 Approved by: https://github.com/blaine-rister trunk/8cb2fb44f29f6b19400a04ea970807f651657b0c	2025-10-17 21:08:29 +00:00
zpcore	ab65498d71	Fix `_StridedShard` incorrect split (#165533 ) https://github.com/pytorch/pytorch/pull/164820 introduced a bug that `_StridedShard` will call parent class `Shard`'s `split_tensor` method, thus results in incorrect data locality. (I think @ezyang spotted this issue, but we have no test to capture this) Meanwhile, I notice another bug that when we normalize a `_StridedShard`'s placement, it will also trigger parent class `Shard`'s `split_tensor` method because it will create a Shard class [here](`0c14f55de6/torch/distributed/tensor/_api.py (L783)`). I think we never test `distribute_tensor` for `_StridedShard` before. So I added a test here to compare against ordered shard. Using classmethod because the _split_tensor logic is different between `Shard` and `_StridedShard`. Basically I want to shard on local tensors without initializing the Shard object: ``` local_tensor = _StridedShard._make_shard_tensor(dim, tensor, mesh, mesh_dim, split_factor=split_factor) local_tensor = Shard._make_shard_tensor(dim, tensor, mesh, mesh_dim) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165533 Approved by: https://github.com/XilunWu trunk/ab65498d71bf8626b6480fa3924b52ad93b4a046	2025-10-17 20:54:46 +00:00
PyTorch MergeBot	06d324365c	Revert "Escaped html tags name and target to appear as strings (#165543 )" This reverts commit 080365b7d82a3c99c995cab6dc912b7dfe22aa41. Reverted https://github.com/pytorch/pytorch/pull/165543 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165543#issuecomment-3417102048)) trunk/06d324365c24395b6d326b2c5e904460bb426dcd	2025-10-17 20:45:48 +00:00
Yuanyuan Chen	6c9c6e0936	Enable C407 of flake8 (#165046 ) This PR enables C407 on flake8. The description is `C407` is `Unnecessary list comprehension - ‘<builtin>’ can take a generator`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165046 Approved by: https://github.com/albanD trunk/6c9c6e0936751116f6f988d7194eefe16a24e5a1	2025-10-17 20:15:39 +00:00
Rohit Singh Rathaur	2bcd892c86	[distributed] Replace assert statements in distributed checkpoint with explicit checks (#165256 ) Fixes partially #164878 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165256 Approved by: https://github.com/albanD trunk/2bcd892c86349ad6e91d66760fb3d2257526625d	2025-10-17 20:14:35 +00:00
Shangdi Yu	75e2a9fae3	[annotate] add annotate_fn function decorator (#165703 ) Example usage: ``` @fx_traceback.annotate_fn({"pp_stage": 1}) def example_function(x): return x * x class SimpleLinear(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(3, 2) def forward(self, x): with fx_traceback.annotate({"pp_stage": 0}): y = self.linear(x) y = example_function(y) return y - 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165703 Approved by: https://github.com/SherlockNoMad trunk/75e2a9fae37f9d07229a6d4e8e4b2e1d910e3dad	2025-10-17 20:10:53 +00:00
Eddie Yan	a16fd6b488	[NVSHMEM][Triton] Fix NVSHMEM triton test for wacky world sizes (#165704 ) Currently assumes divisible by 4? world size Not as slick as the old setup code but more general Pull Request resolved: https://github.com/pytorch/pytorch/pull/165704 Approved by: https://github.com/Skylion007, https://github.com/kwen2501 trunk/a16fd6b4885206fc2a29ac94124107f05e23a9c6	2025-10-17 19:33:26 +00:00
vishalgoyal316	382b0150de	[docs] Add usage examples to ConvTranspose1d docstring (#165618 ) Fixes #165615 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165618 Approved by: https://github.com/mikaylagawarecki trunk/382b0150de1247bf392b424edea71b541cae7d52	2025-10-17 19:11:57 +00:00
Kasparas Karlauskas	a664b299ac	Update docs for torch.mode (#165614 ) Currently the docs for `torch.mode` include a note: `This function is not defined for torch.cuda.Tensor yet.` However with `torch==2.7.1+cu126` when I try to get the mode of a Tensor that is in cuda memory, I do not face any issues: ``` >>> a = torch.tensor([0, 2, 1, 1, 1, 3, 3]) >>> a.mode() torch.return_types.mode( values=tensor(1), indices=tensor(4)) >>> a.cuda().mode() torch.return_types.mode( values=tensor(1, device='cuda:0'), indices=tensor(4, device='cuda:0')) ``` Am I misunderstanding the note? If not, I suggest removing it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165614 Approved by: https://github.com/mikaylagawarecki trunk/a664b299ac2840b3399835097813e0d3986bb984	2025-10-17 19:06:33 +00:00
vishalgoyal316	9c12651417	Improve error message for non-positive groups in convolution (#165669 ) Prevents from segmentation fault for invalid groups value in convolution. Fixes #142835 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165669 Approved by: https://github.com/mikaylagawarecki trunk/9c12651417bd8a10870702fb368b4d92d70ca667	2025-10-17 19:06:05 +00:00
Tugsbayasgalan Manlaibaatar	08c97b4a1f	Don't run compile inside kernel invocation (#165687 ) When we call torch.compile during fake tensor prop, we shouldn't actually compile because we can't guarantee that the compiled artifact can be fake tensor prop-d. (for example, inductor backend). Instead we should just skip compiling. However, the inner compile will be triggered when being executed in runtime. Fixes: https://github.com/pytorch/pytorch/issues/151328 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165687 Approved by: https://github.com/zou3519 trunk/08c97b4a1f22cbd652c35c08b0896c930e9fa2f3	2025-10-17 19:03:57 +00:00
PyTorch MergeBot	fae74cd52f	Revert "shrink_group implementation to expose ncclCommShrink API (#164518 )" This reverts commit a032510db38e8331afa08f7635d146f9cefdd0ab. Reverted https://github.com/pytorch/pytorch/pull/164518 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/164518#issuecomment-3416718767)) trunk/fae74cd52f3449ec92fdb519c577c8cd142ab7b1	2025-10-17 18:55:53 +00:00
Sherlock Huang	7a65770013	Update gm.print_readable to include Annotation (#165397 ) Sample output ``` [rank0]: # Annotation: {'compile_with_inductor': 'flex_attention'} File: /data/users/bahuang/pytorch/torch/nn/attention/flex_attention.py:1490 in flex_attention, code: out, lse, max_scores = flex_attention_hop( [rank0]: score_mod_2 = self.score_mod_2 [rank0]: mask_fn_2 = self.mask_fn_2 [rank0]: flex_attention_1 = torch.ops.higher_order.flex_attention(xq_5, xk_5, xv_3, score_mod_2, (2048, 2048, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___kv_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___kv_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_kv_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_kv_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___q_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___q_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_q_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_q_indices, 128, 128, mask_fn_2), 0.25, {'PRESCALE_QK': False, 'ROWS_GUARANTEED_SAFE': False, 'BLOCKS_ARE_CONTIGUOUS': False, 'WRITE_DQ': True, 'OUTPUT_LOGSUMEXP': True, 'OUTPUT_MAX': False}, (), (g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___mask_mod___closure___0_cell_contents,)); xq_5 = xk_5 = xv_3 = score_mod_2 = mask_fn_2 = None [rank0]: out_2: "bf16[8, 4, 2048, 16]" = flex_attention_1[0]; flex_attention_1 = None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165397 Approved by: https://github.com/yushangdi, https://github.com/anijain2305 trunk/7a657700131f31577544e93587eb339618677e97	2025-10-17 18:35:18 +00:00
Jane Xu	e4454947e2	Widen ops support to take in IntHOArrayRef vs only std::vec (#165152 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165152 Approved by: https://github.com/mikaylagawarecki ghstack dependencies: #164991 trunk/e4454947e2c692db1a249591121f8583fefe7df1	2025-10-17 18:32:39 +00:00
Jane Xu	3806e9767b	Refactor out headeronly ArrayRef (#164991 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164991 Approved by: https://github.com/swolchok	2025-10-17 18:32:39 +00:00
PyTorch MergeBot	b08d8c2e50	Revert "[DebugMode][2/N] add nn.Module tracking (#165498 )" This reverts commit 45afaf08a14ab760d86ea80dea6d50cec8626513. Reverted https://github.com/pytorch/pytorch/pull/165498 on behalf of https://github.com/seemethere due to First part of the stack was reverted so will need to revert this too ([comment](https://github.com/pytorch/pytorch/pull/165498#issuecomment-3416618198)) trunk/b08d8c2e506532ed00c4be5c4a7bfa58c131156d	2025-10-17 18:22:48 +00:00
Colin L Reliability Rice	ca5b7f8ded	torch.compile: populate compiler_config (#165581 ) Summary: This starts writing the compiler_config metadata into logger Test Plan: Modified existing test case to make sure this is not null. (Also eyeballed what we're logging tomake sure it's reasonable Reviewed By: masnesral Differential Revision: D84014636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165581 Approved by: https://github.com/masnesral trunk/ca5b7f8ded834970c092864647b5914b0e64cd94	2025-10-17 18:21:18 +00:00
PyTorch MergeBot	9a71d96256	Revert "[DebugMode][1/N] refactor logs into _DebugCalls (#165376 )" This reverts commit 556fc09a9f67f24ca5591ec049c5d0c347c5f62a. Reverted https://github.com/pytorch/pytorch/pull/165376 on behalf of https://github.com/seemethere due to This is failing for internal tests, see D84877379 for more context ([comment](https://github.com/pytorch/pytorch/pull/165376#issuecomment-3416570407)) trunk/9a71d96256d247109bfb23cdbfce90d8a076115c	2025-10-17 18:08:59 +00:00
Luca Wehrstedt	0d4c2b71e8	[DeviceMesh] Simplify unflatten method (#165556 ) By adding a few small helpers (e.g., a `splice` method to `_MeshLayout`, and making `_init_process_groups` static and thus stateless) we can substantially shorten the definition of the unflatten method, and help readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165556 Approved by: https://github.com/fduwjj ghstack dependencies: #165554, #165555 trunk/0d4c2b71e85d1a755bf4293d315726e9326cf30f	2025-10-17 17:57:51 +00:00
Luca Wehrstedt	d659bbde62	[DeviceMesh] Introduce private constructor instead of _create_mesh_from_ranks (#165555 ) The refactoring of DeviceMesh is heavily constrained by the signature of its constructor, which is a public API which contains some "legacy" concepts which we'd love to get rid of, such as an explicit/materialized `mesh` Tensor. In other languages the solution to this would be to add a private overload of the constructor. Python doesn't natively allow this, but in this PR I managed to build something that approximates it. This new private constructor basically only takes `_layout`, `_global_rank_permutation`, and `mesh_dim_names`. With such a constructor we can effectively simplify a lot of callsites and get rid of the `_create_mesh_from_ranks` helper method. That's a good thing because it was instantiating many DeviceMeshes in a for loop, which always felt unnecessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165555 Approved by: https://github.com/fduwjj, https://github.com/fegin ghstack dependencies: #165554	2025-10-17 17:57:51 +00:00
Luca Wehrstedt	58879bfafa	[DeviceMesh] Prefer using _layout over _mesh for all sorts of things (#165554 ) The goal of this PR is to avoid storing the explicit `mesh` Tensor inside each DeviceMesh, and instead compute it on-the-fly when the end user needs it, and try to replace all of its internal usages with `_layout` and the newly-introduced `_global_rank_permutation` Tensor. The name of this attribute is up for debate. The advantage of the `_global_rank_permutation` Tensor is that it is _the same_ Tensor for the root mesh and all its children, so it doesn't need to be copied/reallocated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165554 Approved by: https://github.com/fduwjj	2025-10-17 17:57:51 +00:00
Bruce Chang	a032510db3	shrink_group implementation to expose ncclCommShrink API (#164518 ) Closes #164529 To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink) API to PyTorch. This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization. For more info: [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518 Approved by: https://github.com/Skylion007, https://github.com/syed-ahmed, https://github.com/kwen2501 trunk/a032510db38e8331afa08f7635d146f9cefdd0ab	2025-10-17 17:55:03 +00:00
Simon Layton	39e0a832c9	Fix B200 test fails in scaled_mm (#165747 ) Summary: PR #165528 changes some scale/swizzle inference behavior in scaled_mm tests - mxfp8 tests on Blackwell can get incorrectly classified, resulting in failures. Fix the scale/swizzle inference code to prevent this. Fixes https://github.com/pytorch/pytorch/issues/165743 Test Plan: ``` pytest -svv test/test_scaled_matmul_cuda.py ``` Reviewers: @jagadish-amd @jeffdaily @drisspg Subscribers: @Aidyn-A Tasks: Tags: Signed-off-by: Simon Layton <simonlaytonmeta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165747 Approved by: https://github.com/eqy, https://github.com/drisspg, https://github.com/jeffdaily trunk/39e0a832c9898b013314ceee189643410ff8ed11 viable/strict/1760737772	2025-10-17 17:52:19 +00:00
James Wu	dd3b48e85d	Fix bug with serialization after AOTAutogradCache hit (#165474 ) Fixes #165447 On AOTAutogradCache load, the serialization function we pick is just lambda: self, because the object itself is an AOTAutogradCacheEntry. However, this isn't safe, because `wrap_post_compile` will make `self` unserializable, since it needs to load triton kernels and stuff! So instead, on AOTAutogradCache load, we preserve the bytes that were used to load the object to begin with, and return that object on a call to serialize(). This effectively makes it so that we save a copy of the pre-hydrated artifact, without needing to do an eager copy until someone actually calls `serialize`. Test Plan: Run ```py import torch class M(torch.nn.Module): def __init__(self): super().__init__() self.linear1 = torch.nn.Linear(2, 4) self.relu = torch.nn.ReLU() self.linear2 = torch.nn.Linear(4, 8) def forward(self, x): return self.linear2(self.relu(self.linear1(x))) device = "cuda" m = M().to(device) sample_inputs = (torch.randn(2, 2, device=device),) eager_out = m(sample_inputs) with torch._dynamo.config.patch("enable_aot_compile", True): compiled_fn_path = "./m.pt" compiled_fn = torch.compile( m, fullgraph=True ).forward.aot_compile((sample_inputs, {})) compiled_fn.save_compiled_function(compiled_fn_path) torch._dynamo.reset() with torch.compiler.set_stance("fail_on_recompile"): with open(compiled_fn_path, "rb") as f: loaded_fn = torch.compiler.load_compiled_function(f) assert loaded_fn is not None compiled_out = loaded_fn(m, sample_inputs) assert torch.allclose(eager_out, compiled_out) ``` twice, see that it succeeds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165474 Approved by: https://github.com/yiming0416, https://github.com/zhxchen17 trunk/dd3b48e85dd51ccbec8128159947a719902344c6	2025-10-17 17:47:24 +00:00

1 2 3 4 5 ...

94673 Commits