pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-11 22:34:53 +08:00

Author	SHA1	Message	Date
Animesh Jain	d9f94e0d7d	[dynamo] Support fx.traceback.annotate as decorator (#165805 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165805 Approved by: https://github.com/Lucaskabela, https://github.com/SherlockNoMad, https://github.com/yushangdi	2025-10-18 03:58:11 +00:00
Yiming Zhou	e4d6c56ffb	Improve dynamo graph capture stack trace for custom ops (#165693 ) For a custom op ``` @torch.library.custom_op("my_lib::foo", mutates_args={}) def foo(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor: return x + y ``` ppl could call `torch.ops.my_lib.foo()` or directly call `foo()` in the `forward` of an `nn.Module` These two calling conventions will lead to the same node in the output graph, but different stack traces. When directly calling `foo()`, the displayed stack_trace in the graph will be ``` # File: .../pytorch/torch/_library/custom_ops.py:687 in __call__, code: return self._opoverload(args, *kwargs) ``` This is not useful so we filter it out. ``` python test/functorch/test_aot_joint_with_descriptors.py -k test_custom_op_stack_trace ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165693 Approved by: https://github.com/SherlockNoMad, https://github.com/williamwen42	2025-10-18 03:48:18 +00:00
Laith Sakka	017d2985f3	set unbacked bindings in reinplace pass for newly created nodes during generalize_scatter decomp (#164948 ) Two fixes: 1. in rein_place pass, set unbacked bindings for newly created nodes. 2. In inductor, ComputeBuffer used to miss detecting some used symbols, fixed that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164948 Approved by: https://github.com/bobrenjc93 ghstack dependencies: #164341	2025-10-18 03:20:30 +00:00
Laith Sakka	c6a8db0b9a	Fix issues with generalized_scatter and setitem allocated unbacked symbols. (#164341 ) Three fixes: 1. When doing t[u0] +=1 if u0 is unbacked we could allocate a new unbacked symbol during the the indexing of t[u0] (when we fake trace setitem), namely because meta_select does allocate a new unbacked symbol for the storage offset when we do not know if u0>=0 or u0<0. but the output size/stride of setitem(), does not depend on that new symbol. it's self consumed in setitem so we shall ignore it. 2. Also when we trace through generalized_scatter the applications of the views could allocate unbacked symints but those do not effect final output, we also shall ignore them. 3.Before accessing strides in lowering we shall materialize. Address https://github.com/pytorch/pytorch/issues/114293 and https://github.com/pytorch/pytorch/issues/131911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164341 Approved by: https://github.com/bobrenjc93	2025-10-18 03:20:30 +00:00
Ti-Tai Wang	543ddbf44c	[ONNX] Support renaming in dynamic axes to shapes conversion (#165769 ) Discovered in ##165748 This PR also deprecates the conversion. ONNX exporter team does not intend to maintain the conversion in long term. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165769 Approved by: https://github.com/justinchuby	2025-10-18 01:11:20 +00:00
Chris Leonard	29b029648e	Fixed issue with GradTrackingTensor not properly propagating sparse layout (#165765 ) Fixes #164286 Fixed issue with GradTrackingTensor not properly propagating sparse layout. @ezyang @jcaip Pull Request resolved: https://github.com/pytorch/pytorch/pull/165765 Approved by: https://github.com/ezyang	2025-10-18 01:00:53 +00:00
Shivam Raikundalia	a25a649e70	[Mem Snapshot] Add Metadata Field (#165490 ) Summary: The implementation adds the ability to: Set custom metadata strings that will be attached to all subsequent allocations Clear or change the metadata at any point View the metadata in memory snapshots via _dump_snapshot() Test Plan: Added test in test_cuda.py and check manually in snapshot to see that metadata was added. Differential Revision: D84654933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165490 Approved by: https://github.com/yushangdi	2025-10-17 23:46:02 +00:00
Dzmitry Huba	1b397420f2	Enable more DTensor tests in local tensor mode and fix more integration issues (#165716 ) - During op dispatch local tensor is supposed to collect rng state from CPU and CUDA devices so that it can be reset before execution of the op for each such that ops with randomness produces the same result for all ranks (note that we are planning a separate change to add support of per rank rng state). Previously we relied on op input arguments to deduce which devices to get rng state from. Which doesn't work for factory functions such torch.randn. Hence this changes switches to uncondionally collecting rng state from all devices. - Fixing per rank specific computations in _MaskedPartial and Shard placements discovered during test enablement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165716 Approved by: https://github.com/ezyang	2025-10-17 23:28:22 +00:00
PyTorch MergeBot	e50dc40d28	Revert "Update gm.print_readable to include Annotation (#165397 )" This reverts commit 7a657700131f31577544e93587eb339618677e97. Reverted https://github.com/pytorch/pytorch/pull/165397 on behalf of https://github.com/malfet due to I don't know how/why, but it breaks windows tests, see `2e22b1a61e/1` ([comment](https://github.com/pytorch/pytorch/pull/165397#issuecomment-3417428128))	2025-10-17 22:35:50 +00:00
Animesh Jain	616c6bdf8f	[dynamo][ac] Config flag to allow eager and compile AC divergence for side-effects (#165775 ) Eager AC/SAC reapplies the mutations (like global dict mutations) in the backward during the recomputation of forward. torch.compile has no easy way to reapply python mutations in the backward. But many users might be ok to skip reapplication of side effects in the backward. They can set this config flag to accept this eager and compile divergence. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165775 Approved by: https://github.com/zou3519 ghstack dependencies: #165734	2025-10-17 22:04:19 +00:00
Animesh Jain	c18ddfc572	[dynamo][easy] Support torch.accelerator.current_accelerator (#165734 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165734 Approved by: https://github.com/Skylion007	2025-10-17 22:04:19 +00:00
zpcore	ab65498d71	Fix `_StridedShard` incorrect split (#165533 ) https://github.com/pytorch/pytorch/pull/164820 introduced a bug that `_StridedShard` will call parent class `Shard`'s `split_tensor` method, thus results in incorrect data locality. (I think @ezyang spotted this issue, but we have no test to capture this) Meanwhile, I notice another bug that when we normalize a `_StridedShard`'s placement, it will also trigger parent class `Shard`'s `split_tensor` method because it will create a Shard class [here](`0c14f55de6/torch/distributed/tensor/_api.py (L783)`). I think we never test `distribute_tensor` for `_StridedShard` before. So I added a test here to compare against ordered shard. Using classmethod because the _split_tensor logic is different between `Shard` and `_StridedShard`. Basically I want to shard on local tensors without initializing the Shard object: ``` local_tensor = _StridedShard._make_shard_tensor(dim, tensor, mesh, mesh_dim, split_factor=split_factor) local_tensor = Shard._make_shard_tensor(dim, tensor, mesh, mesh_dim) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165533 Approved by: https://github.com/XilunWu	2025-10-17 20:54:46 +00:00
Shangdi Yu	75e2a9fae3	[annotate] add annotate_fn function decorator (#165703 ) Example usage: ``` @fx_traceback.annotate_fn({"pp_stage": 1}) def example_function(x): return x * x class SimpleLinear(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(3, 2) def forward(self, x): with fx_traceback.annotate({"pp_stage": 0}): y = self.linear(x) y = example_function(y) return y - 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165703 Approved by: https://github.com/SherlockNoMad	2025-10-17 20:10:53 +00:00
Eddie Yan	a16fd6b488	[NVSHMEM][Triton] Fix NVSHMEM triton test for wacky world sizes (#165704 ) Currently assumes divisible by 4? world size Not as slick as the old setup code but more general Pull Request resolved: https://github.com/pytorch/pytorch/pull/165704 Approved by: https://github.com/Skylion007, https://github.com/kwen2501	2025-10-17 19:33:26 +00:00
vishalgoyal316	9c12651417	Improve error message for non-positive groups in convolution (#165669 ) Prevents from segmentation fault for invalid groups value in convolution. Fixes #142835 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165669 Approved by: https://github.com/mikaylagawarecki	2025-10-17 19:06:05 +00:00
Tugsbayasgalan Manlaibaatar	08c97b4a1f	Don't run compile inside kernel invocation (#165687 ) When we call torch.compile during fake tensor prop, we shouldn't actually compile because we can't guarantee that the compiled artifact can be fake tensor prop-d. (for example, inductor backend). Instead we should just skip compiling. However, the inner compile will be triggered when being executed in runtime. Fixes: https://github.com/pytorch/pytorch/issues/151328 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165687 Approved by: https://github.com/zou3519	2025-10-17 19:03:57 +00:00
PyTorch MergeBot	fae74cd52f	Revert "shrink_group implementation to expose ncclCommShrink API (#164518 )" This reverts commit a032510db38e8331afa08f7635d146f9cefdd0ab. Reverted https://github.com/pytorch/pytorch/pull/164518 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/164518#issuecomment-3416718767))	2025-10-17 18:55:53 +00:00
Sherlock Huang	7a65770013	Update gm.print_readable to include Annotation (#165397 ) Sample output ``` [rank0]: # Annotation: {'compile_with_inductor': 'flex_attention'} File: /data/users/bahuang/pytorch/torch/nn/attention/flex_attention.py:1490 in flex_attention, code: out, lse, max_scores = flex_attention_hop( [rank0]: score_mod_2 = self.score_mod_2 [rank0]: mask_fn_2 = self.mask_fn_2 [rank0]: flex_attention_1 = torch.ops.higher_order.flex_attention(xq_5, xk_5, xv_3, score_mod_2, (2048, 2048, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___kv_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___kv_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_kv_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_kv_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___q_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___q_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_q_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_q_indices, 128, 128, mask_fn_2), 0.25, {'PRESCALE_QK': False, 'ROWS_GUARANTEED_SAFE': False, 'BLOCKS_ARE_CONTIGUOUS': False, 'WRITE_DQ': True, 'OUTPUT_LOGSUMEXP': True, 'OUTPUT_MAX': False}, (), (g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___mask_mod___closure___0_cell_contents,)); xq_5 = xk_5 = xv_3 = score_mod_2 = mask_fn_2 = None [rank0]: out_2: "bf16[8, 4, 2048, 16]" = flex_attention_1[0]; flex_attention_1 = None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165397 Approved by: https://github.com/yushangdi, https://github.com/anijain2305	2025-10-17 18:35:18 +00:00
Jane Xu	e4454947e2	Widen ops support to take in IntHOArrayRef vs only std::vec (#165152 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165152 Approved by: https://github.com/mikaylagawarecki ghstack dependencies: #164991	2025-10-17 18:32:39 +00:00
Jane Xu	3806e9767b	Refactor out headeronly ArrayRef (#164991 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164991 Approved by: https://github.com/swolchok	2025-10-17 18:32:39 +00:00
PyTorch MergeBot	b08d8c2e50	Revert "[DebugMode][2/N] add nn.Module tracking (#165498 )" This reverts commit 45afaf08a14ab760d86ea80dea6d50cec8626513. Reverted https://github.com/pytorch/pytorch/pull/165498 on behalf of https://github.com/seemethere due to First part of the stack was reverted so will need to revert this too ([comment](https://github.com/pytorch/pytorch/pull/165498#issuecomment-3416618198))	2025-10-17 18:22:48 +00:00
Colin L Reliability Rice	ca5b7f8ded	torch.compile: populate compiler_config (#165581 ) Summary: This starts writing the compiler_config metadata into logger Test Plan: Modified existing test case to make sure this is not null. (Also eyeballed what we're logging tomake sure it's reasonable Reviewed By: masnesral Differential Revision: D84014636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165581 Approved by: https://github.com/masnesral	2025-10-17 18:21:18 +00:00
Luca Wehrstedt	58879bfafa	[DeviceMesh] Prefer using _layout over _mesh for all sorts of things (#165554 ) The goal of this PR is to avoid storing the explicit `mesh` Tensor inside each DeviceMesh, and instead compute it on-the-fly when the end user needs it, and try to replace all of its internal usages with `_layout` and the newly-introduced `_global_rank_permutation` Tensor. The name of this attribute is up for debate. The advantage of the `_global_rank_permutation` Tensor is that it is _the same_ Tensor for the root mesh and all its children, so it doesn't need to be copied/reallocated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165554 Approved by: https://github.com/fduwjj	2025-10-17 17:57:51 +00:00
Bruce Chang	a032510db3	shrink_group implementation to expose ncclCommShrink API (#164518 ) Closes #164529 To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink) API to PyTorch. This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization. For more info: [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518 Approved by: https://github.com/Skylion007, https://github.com/syed-ahmed, https://github.com/kwen2501	2025-10-17 17:55:03 +00:00
Simon Layton	39e0a832c9	Fix B200 test fails in scaled_mm (#165747 ) Summary: PR #165528 changes some scale/swizzle inference behavior in scaled_mm tests - mxfp8 tests on Blackwell can get incorrectly classified, resulting in failures. Fix the scale/swizzle inference code to prevent this. Fixes https://github.com/pytorch/pytorch/issues/165743 Test Plan: ``` pytest -svv test/test_scaled_matmul_cuda.py ``` Reviewers: @jagadish-amd @jeffdaily @drisspg Subscribers: @Aidyn-A Tasks: Tags: Signed-off-by: Simon Layton <simonlaytonmeta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165747 Approved by: https://github.com/eqy, https://github.com/drisspg, https://github.com/jeffdaily	2025-10-17 17:52:19 +00:00
jmaczan	cff1b20771	Patch the flex_attention._get_mod_type to not use inspect.signature when computing num_positional_args (an alternative fix for flex attention graph break on create_block_mask) (#164923 ) The initial fix for inspect.signature uses not a right approach (https://github.com/pytorch/pytorch/pull/164349#pullrequestreview-3306614010). As @williamwen42 suggests (https://github.com/pytorch/pytorch/pull/164349#issuecomment-3379222885) we can just for now get rid of `inspect.signature` call in flex_attention to resolve this high priority issue (https://github.com/pytorch/pytorch/issues/164247#issuecomment-3378673179). In this PR I did exactly this - limited the scope of fix to just computing `num_positional_args` in `flex_attention._get_mod_type` based on properties returned by `NestedUserFunctionVariable.const_getattr` (some were missing so I added them) Fixes #164247 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164923 Approved by: https://github.com/williamwen42	2025-10-17 17:44:45 +00:00
Pian Pawakapan	45afaf08a1	[DebugMode][2/N] add nn.Module tracking (#165498 ) Uses ModTracker to record nn.Module entries, much like CommDebugMode. Can be switched on with `DebugMode(record_nn_module=True)`: ``` [nn.Mod] Bar [nn.Mod] Bar.abc [nn.Mod] Bar.abc.l1 aten::t(t: f32[4, 4]) aten::addmm(t: f32[4], t: f32[4, 4], t: f32[4, 4]) [nn.Mod] Bar.abc.l2 aten::t(t: f32[4, 4]) aten::addmm(t: f32[4], t: f32[4, 4], t: f32[4, 4]) [nn.Mod] Bar.xyz aten::t(t: f32[4, 4]) aten::addmm(t: f32[4], t: f32[4, 4], t: f32[4, 4])""" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165498 Approved by: https://github.com/SherlockNoMad ghstack dependencies: #165376	2025-10-17 17:39:48 +00:00
PyTorch MergeBot	85c5433d38	Revert "Fix `_StridedShard` incorrect split (#165533 )" This reverts commit dfc8a1c5ddc8401197e9ab546e03b0f745edc27b. Reverted https://github.com/pytorch/pytorch/pull/165533 on behalf of https://github.com/seemethere due to Causing a merge conflict internally, see D84829161 ([comment](https://github.com/pytorch/pytorch/pull/165533#issuecomment-3416143176))	2025-10-17 15:57:01 +00:00
inventshah	935ccdbe75	[MPS] Fix internal assertion in torch.linalg.solve for singular matrices (#165254 ) Fixes #163962 by special casing MPS in the negative status code branch in `_linalg_check_errors`. Checks if info is [`MPSMatrixDecompositionStatus.singular`](https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixdecompositionstatus/singular) (which has a raw value of -2). I didn't find an official Apple source with this raw value (besides printing the enum value), so I'm not sure if we can (or should) depend on it? Is there a way to directly get the Objective-C enum value in C++? Pull Request resolved: https://github.com/pytorch/pytorch/pull/165254 Approved by: https://github.com/malfet	2025-10-17 15:35:49 +00:00
PyTorch MergeBot	5d4da26ed0	Revert "[export] preserve_node_meta by default (#165524 )" This reverts commit fdd560afd1d413a9f814cbf7cc2a72e0d39b0117. Reverted https://github.com/pytorch/pytorch/pull/165524 on behalf of https://github.com/lw due to test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_symint_closure [GH job link](https://github.com/pytorch/pytorch/actions/runs/18586312291/job/52991654051) [HUD commit link](`fdd560afd1`) ([comment](https://github.com/pytorch/pytorch/pull/165524#issuecomment-3415352522))	2025-10-17 12:27:17 +00:00
PyTorch MergeBot	80d2ca7566	Revert "[annotate] add annotate_fn function decorator (#165703 )" This reverts commit f1d882212afc3a73ce1e319d80b6406f9dc4a0c8. Reverted https://github.com/pytorch/pytorch/pull/165703 on behalf of https://github.com/lw due to [GH job link](https://github.com/pytorch/pytorch/actions/runs/18585518705/job/52989521797) [HUD commit link](`f1d882212a`) ([comment](https://github.com/pytorch/pytorch/pull/165703#issuecomment-3415073467))	2025-10-17 11:23:13 +00:00
Pian Pawakapan	fdd560afd1	[export] preserve_node_meta by default (#165524 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/165524 Approved by: https://github.com/malaybag	2025-10-17 07:55:28 +00:00
Yuanyuan Chen	e925dfcc6b	Enable all SIM rules except disabled ones (#164645 ) `SIM` rules are useful for simplifying boolean expressions and enhances code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645 Approved by: https://github.com/ezyang, https://github.com/mlazos	2025-10-17 07:27:11 +00:00
Shangdi Yu	f1d882212a	[annotate] add annotate_fn function decorator (#165703 ) Example usage: ``` @fx_traceback.annotate_fn({"pp_stage": 1}) def example_function(x): return x * x class SimpleLinear(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(3, 2) def forward(self, x): with fx_traceback.annotate({"pp_stage": 0}): y = self.linear(x) y = example_function(y) return y - 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165703 Approved by: https://github.com/SherlockNoMad	2025-10-17 07:18:47 +00:00
Animesh Jain	24879f0de9	[dynamo] Use Variable Builder to build the property fget object (#165683 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165683 Approved by: https://github.com/ezyang, https://github.com/williamwen42	2025-10-17 06:29:24 +00:00
Justin Chu	fcbde24c1c	[ONNX] Remove common imports from torchlib (#165156 ) The Rank and IsScalar functions are no longer used in the torchlib. Requires onnxscript v0.5.4 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165156 Approved by: https://github.com/Skylion007, https://github.com/cyyever	2025-10-17 03:25:34 +00:00
Eddie Yan	3154482072	[CUDA][cuBLAS] Only `xFail` `addmm` with reduced precision reductions on non-RTX skus (#165379 ) RTX Blackwells don't behave quite like their datacenter counterparts here Pull Request resolved: https://github.com/pytorch/pytorch/pull/165379 Approved by: https://github.com/Skylion007	2025-10-17 02:45:07 +00:00
Mu-Chu Lee	9fccbdd4f0	Fix incorrect function signature in template (#165567 ) Summary: In https://github.com/pytorch/pytorch/pull/148305 we refactored the grid argument out, but it's not reflected in our template. Test Plan: Included in commit. python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_cond_symint_input_disable_one_pass_cuda Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/165567 Approved by: https://github.com/desertfire	2025-10-17 02:40:56 +00:00
PyTorch MergeBot	11e2084308	Revert "[Mem Snapshot] Add Metadata Field (#165490 )" This reverts commit 5b3ea758951558e7d9f681ae784acb57eaa07910. Reverted https://github.com/pytorch/pytorch/pull/165490 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165490#issuecomment-3413491091))	2025-10-17 02:01:53 +00:00
Shangdi Yu	d82527b32a	[Windows] Add AOTI cross-compilation CI (#165573 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165573 Approved by: https://github.com/malfet ghstack dependencies: #165560	2025-10-17 01:05:35 +00:00
PyTorch MergeBot	d2c82bafb7	Revert "158232 Fix autocast cache incorrectly retaining no_grad state (#165068 )" This reverts commit 5daef30b26b794d237fbbc399c1d47ec0380200a. Reverted https://github.com/pytorch/pytorch/pull/165068 on behalf of https://github.com/jeffdaily due to This broke ROCm CI. test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_256_cuda [GH job link](https://github.com/pytorch/pytorch/actions/runs/18572589089/job/52952074008) [HUD commit link](`5daef30b26`) ([comment](https://github.com/pytorch/pytorch/pull/165068#issuecomment-3413184445))	2025-10-16 23:08:27 +00:00
Colin L Reliability Rice	98a488c9aa	Start recording inductor provenance (#162669 ) Summary: This stores information on where fx graphs come from, which makes it significantly easier to debug. One outstanding question 1) I only stored the kernel stack traces, do we also want the node mappings? Test Plan: I wrote a explicit logging test which makes a module, fx traces it, compiles it, and makes sure the logging infomration shows up. ``` clr@devvm17763 ~/fbsource/fbcode/caffe2/test/dynamo % buck2 test @//mode/opt fbcode//caffe2/test/dynamo:test_dynamo -- test_utils File changed: fbsource//xplat/caffe2/test/dynamo/test_utils.py File changed: fbcode//caffe2/test/dynamo/test_utils.py Buck UI: https://www.internalfb.com/buck2/528dea32-2416-4a62-a1ec-39f3c0efdd2e Test UI: https://www.internalfb.com/intern/testinfra/testrun/13229324015574003 Network: Up: 0B Down: 0B Executing actions. Remaining 0/2 Command: test. Time elapsed: 17.3s Tests finished: Pass 16. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Rollback Plan: Differential Revision: D82037582 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162669 Approved by: https://github.com/yushangdi	2025-10-16 23:05:31 +00:00
Shivam Raikundalia	5b3ea75895	[Mem Snapshot] Add Metadata Field (#165490 ) Summary: The implementation adds the ability to: Set custom metadata strings that will be attached to all subsequent allocations Clear or change the metadata at any point View the metadata in memory snapshots via _dump_snapshot() Test Plan: Added test in test_cuda.py and check manually in snapshot to see that metadata was added. Differential Revision: D84654933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165490 Approved by: https://github.com/yushangdi	2025-10-16 22:54:27 +00:00
Wei Wang	d7e275d4b4	[CI][CUDA] Add periodic b200 distributed job (#159323 ) 1. Run distributed job with B200 runner, periodically. 2. discovered generic distributed test issue that certain unit test hard-coded ranks, calling for require_exact_world_size(world_size) API instead of require_world_size(world_size). Pull Request resolved: https://github.com/pytorch/pytorch/pull/159323 Approved by: https://github.com/eqy Co-authored-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>	2025-10-16 21:54:04 +00:00
Yiming Zhou	1a54d3333d	[easy] Fix graph_capture in aot_joint_with_descriptors test (#165660 ) when `with_export=True`, `aot_export_joint_with_descriptors` should take the graph produced by `_dynamo_graph_capture_for_export` ``` python test/functorch/test_aot_joint_with_descriptors.py -k test_preserve_annotate_simple python test/functorch/test_aot_joint_with_descriptors.py -k test_preserve_annotate_flex_attention ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165660 Approved by: https://github.com/yushangdi	2025-10-16 21:10:11 +00:00
Aaron Orenstein	4c1c341fa0	FakeTensorMode shouldn't cache syms when tracing (#164718 ) Improve FakeTensor cache to handle SymNode and tracing properly. For now, when we're proxy tracing just don't bother caching operations that contain SymNodes in the output. The problem is that the proxy tracer relies on SymNode identity and our cache doesn't preserve that. It can be fixed (and I left some notes in _validate_symbolic_output_for_caching() how) but it's not worth it for now. If we aren't proxy tracing then caching is fine. Thus these changes: 1. Our cache key needs to include whether we were actively tracing or not - this way if we create a cache entry when we weren't tracing and then we try to use it when we ARE tracing it gets rerun. 2. If there's a SymNode in the output then bypass tracing. 3. Some general cleanup of the output validation - we were unnecessarily doing it as a two-step process when it could just be a single step (it's still two parts internally but only a single outer try/except). Pull Request resolved: https://github.com/pytorch/pytorch/pull/164718 Approved by: https://github.com/bobrenjc93 ghstack dependencies: #165266, #164717	2025-10-16 20:57:07 +00:00
Aaron Orenstein	5f21cc786a	Teach ProxyTorchDispatchMode how to decompose sympy.Expr into known inputs (#164717 ) In a training library we hit a weird conflict between dtensor, dynamic shapes, and proxy tensor. The problem is occuring because in sharding_prop we use FakeTensors to compute an operation size (so we don't have to use the full "real" data). We turn off proxy tracing while we're doing that because we don't want the FakeTensor ops to end up in the graph. We then use that size when doing later operations. Normally this is no problem - but when those sizes are dynamic shapes then we have a problem - the proxy tracer wants to track the provenance of all shape operations (`s1*s2`) but since tracing is disabled it doesn't see the operation and when we then use the result shape later on the proxy tracer gets all confused (because the SymNode appeared out of nowhere). At first we were thinking to never disable shape tracing - but that caused a slew of other downstream problems (lots of code that actually needs the shape tracing to be disabled) so instead we enable having a "sym tracing override" and surgically when we disable proxy tracing we leave shape tracing enabled. After this change the dtensor embedding is "fixed" but then runs afoul of a FakeTensor cache bug - which is fixed in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164717 Approved by: https://github.com/bobrenjc93, https://github.com/ezyang ghstack dependencies: #165266	2025-10-16 20:57:06 +00:00
Dzmitry Huba	2cd5fd1588	Enable local tensor mode on DTensor view ops test (#165596 ) While enabling this test discovered lack of support for sub meshes. Added limited support for sub meshes by properly computing rank coordinates for a given sub mesh. The implementation follows similar approach to collectives. We infer all sub meshes for the given dimensions and compute each rank's coordinates with respect to is sub mesh. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165596 Approved by: https://github.com/ezyang	2025-10-16 20:52:06 +00:00
PyTorch MergeBot	fb06e49ce8	Revert "[inductor] print 0.0 as 0 for triton (#164291 )" This reverts commit 99b32a6750bfd0cfe2bc84a47823e1da34802b7b. Reverted https://github.com/pytorch/pytorch/pull/164291 on behalf of https://github.com/malfet due to Broke slow job, see `aba8c43594/1` ([comment](https://github.com/pytorch/pytorch/pull/164291#issuecomment-3412768915))	2025-10-16 20:44:29 +00:00
PyTorch MergeBot	27a98e6ae9	Revert "[DeviceMesh] Prefer using _layout over _mesh for all sorts of things (#165554 )" This reverts commit d61a9b88cf3be04a29c5a7d6e9622ae5e8d51de3. Reverted https://github.com/pytorch/pytorch/pull/165554 on behalf of https://github.com/malfet due to Looks like it broke serialization test, see `aba8c43594/1` ([comment](https://github.com/pytorch/pytorch/pull/165554#issuecomment-3412765681))	2025-10-16 20:41:37 +00:00

1 2 3 4 5 ...

37001 Commits