pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-11 22:34:53 +08:00

Author	SHA1	Message	Date
FFFrog	bc1690c7e8	[Code Clean] Remove support of python3.9 (#163846 ) As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163846 Approved by: https://github.com/ezyang	2025-10-09 11:54:10 +00:00
Cui, Yifeng	53f5af8c92	Update torch-xpu-ops commit pin (#164237 ) Update the torch-xpu-ops commit to [intel/torch-xpu-ops@f30173](`f301733b03`), includes: - Install xpu internal headers to PyTorch - Fix error handling for BatchLinearAlgebra Ops - Fix unnecessary double data type conversion - Fix overflow when calculating workgroups count - Fix segmentation fault and calculation error in AveragePool2dKernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/164237 Approved by: https://github.com/EikanWang	2025-10-09 10:38:59 +00:00
PyTorch MergeBot	4412026949	Revert "AOTI MPS Shim Implementation (#163865 )" This reverts commit 874efa2d72d83b00894097130f18062ce331a265. Reverted https://github.com/pytorch/pytorch/pull/163865 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/163865#issuecomment-3385196387))	2025-10-09 10:26:01 +00:00
PyTorch MergeBot	06d86e58d0	Revert "Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939 )" This reverts commit d40a9bfb8da0dc1ac1e6e56b33a25979112874de. Reverted https://github.com/pytorch/pytorch/pull/164939 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/164939#issuecomment-3385056722))	2025-10-09 09:50:59 +00:00
Manuel Candales	874efa2d72	AOTI MPS Shim Implementation (#163865 ) ## MPS Shim API * Updated MPS shimification API with handles and function declarations: * `AOTIMetalShaderLibraryHandle` and `AOTIMetalKernelFunctionHandle` types * Library management: `aoti_torch_mps_create_shader_library`, `aoti_torch_mps_delete_shader_library`, `aoti_torch_mps_get_kernel_function` * Kernel execution: `aoti_torch_mps_run_command_block`, `aoti_torch_mps_start_encoding`, `aoti_torch_mps_dispatch` variants, etc ## MPS Shader Codegen * Modified to generate source constants instead of direct `DynamicMetalShaderLibrary` instantiation: * Before: `at::native::mps::DynamicMetalShaderLibrary mps_lib_0(R"MTL(...)MTL");` * After: `const char* mps_lib_0_source = R"MTL(...)MTL";` * Updated kernel call generation to use shimified functions: * Generates calls to shimified API instead of direct libtorch calls ## Before vs After Comparison ### Section 1: Shader Library Before (Direct Library Object) ```cpp at::native::mps::DynamicMetalShaderLibrary mps_lib_0(R"MTL( ... )MTL"); ``` After (Source String) ```cpp const char* mps_lib_0_source = (R"MTL( ... )MTL"); ``` ### Section 2: Getter Functions & RAII Management Before (Direct Library Access) ```cpp const std::shared_ptr<at::native::mps::MetalKernelFunction> get_mps_lib_0() { static const auto func = mps_lib_0.getKernelFunction("generated_kernel"); return func; } AOTIMetalKernelFunctionHandle get_mps_lib_0_handle() { static const auto handle = AOTIMetalKernelFunctionHandle(get_mps_lib_0().get()); return handle; } ``` After (Shim API + RAII Wrapper) ```cpp AOTIMetalKernelFunctionHandle get_mps_lib_0_handle() { static auto kernel_handle = []() { AOTIMetalShaderLibraryHandle lib_handle = nullptr; AOTIMetalKernelFunctionHandle kern_handle = nullptr; aoti_torch_mps_create_shader_library(mps_lib_0_source, &lib_handle); aoti_torch_mps_get_kernel_function(lib_handle, "generated_kernel", &kern_handle); // RAII wrapper with custom deleter auto lib_deleter = [](AOTIMetalShaderLibraryHandle h) {{ if (h) aoti_torch_mps_delete_shader_library(h); }}; using LibDeleter = decltype(lib_deleter); using LibPtr = std::unique_ptr<AOTIMetalShaderLibraryOpaque, LibDeleter>; // Return pair of kernel handle and library smart pointer for cleanup return std::make_pair(kern_handle, LibPtr(lib_handle, lib_deleter)); }(); return kernel_handle.first; } ``` ### Section 3: Runtime Execution Before (Direct Library Methods) ```cpp void AOTInductorModel::run_impl(...) { ... get_mps_lib_0()->runCommandBlock([&] { get_mps_lib_0()->startEncoding(); aoti_torch_mps_set_arg_tensor(get_mps_lib_0_handle(), 0, buf0); aoti_torch_mps_set_arg_tensor(get_mps_lib_0_handle(), 1, arg0_1); aoti_torch_mps_set_arg_tensor(get_mps_lib_0_handle(), 2, arg1_1); get_mps_lib_0()->dispatch({static_cast<uint64_t>(10LL)}); }); ... } // AOTInductorModel::run_impl ``` After (Shim API with Lambda Pattern) ```cpp void AOTInductorModel::run_impl(...) { ... auto mps_lib_0_lambda_0 = [&](AOTIMetalKernelFunctionHandle handle) { aoti_torch_mps_start_encoding(handle); aoti_torch_mps_set_arg_tensor(handle, 0, buf0); aoti_torch_mps_set_arg_tensor(handle, 1, arg0_1); aoti_torch_mps_set_arg_tensor(handle, 2, arg1_1); aoti_torch_mps_dispatch_single(handle, static_cast<uint64_t>(10LL)); }; std::function<void(AOTIMetalKernelFunctionHandle)> mps_lib_0_func_wrapper_0 = mps_lib_0_lambda_0; aoti_torch_mps_run_command_block(get_mps_lib_0_handle(), aoti_torch_mps_shared_callback, &mps_lib_0_func_wrapper_0); ... } // AOTInductorModel::run_impl ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163865 Approved by: https://github.com/angelayi, https://github.com/desertfire	2025-10-09 09:28:10 +00:00
PyTorch MergeBot	e09fb44ef1	Revert "Fix truediv numerics between eager and compile (#164144 )" This reverts commit d386325ca9a142419f45b987391f4bb175dd7d0b. Reverted https://github.com/pytorch/pytorch/pull/164144 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/164144#issuecomment-3384769092))	2025-10-09 08:40:52 +00:00
PyTorch MergeBot	5b8174bc28	Revert "[vllm hash update] update the pinned vllm hash (#164628 )" This reverts commit 7b691546d2949790ffc8f6bd3c674faa6a46ff7c. Reverted https://github.com/pytorch/pytorch/pull/164628 on behalf of https://github.com/huydhn due to There are some broken vLLM tests ([comment](https://github.com/pytorch/pytorch/pull/164628#issuecomment-3384560957))	2025-10-09 07:43:02 +00:00
PyTorch MergeBot	5209c8ce07	Revert "Fix Avoid DDE in item numel check (#164934 )" This reverts commit a9a9a3438a374f96a308b707a1718036aaec790d. Reverted https://github.com/pytorch/pytorch/pull/164934 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/164934#issuecomment-3384390621))	2025-10-09 06:57:03 +00:00
Yuanyuan Chen	f231be25c6	Mark unused parameters in C++ code (#164912 ) This PR adds unused parameter name comments in C++ declarations to improve code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164912 Approved by: https://github.com/Skylion007	2025-10-09 06:23:25 +00:00
PyTorch MergeBot	a753ffa9af	Revert "Use runner with more memory for ASAN builds (#165000 )" This reverts commit f5fd18f7e24378bd9eb91404f697f1c81a8187d5. Reverted https://github.com/pytorch/pytorch/pull/165000 on behalf of https://github.com/izaitsevfb due to not sure how, but this broke lint ([comment](https://github.com/pytorch/pytorch/pull/165000#issuecomment-3384286412))	2025-10-09 06:22:28 +00:00
Laith Sakka	a9a9a3438a	Fix Avoid DDE in item numel check (#164934 ) address https://github.com/pytorch/pytorch/issues/164725 and https://github.com/pytorch/pytorch/issues/164704 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164934 Approved by: https://github.com/ezyang, https://github.com/aorenste, https://github.com/Skylion007	2025-10-09 06:06:25 +00:00
Seonmyeong Bak	263db92563	Add knobs in FR dump by watchdog (stacktrace and only active collectives) and trigger FR even on any exceptions (#164591 ) This PR includes a couple of changes to extend FlightRecorder dump by PyTorch watchdog - New knobs to control FR dump as suggested in the public documentation even for watchdog (TORCH_INCLUDE_STACK_TRACE, TORCH_INCLUDE_ONLY_ACTIVE) - Trigger the flight recorder dump on exceptions which could be triggered by any CUDA / host side error (TORCH_NCCL_EXTRA_DUMP_ON_EXEC) -> Can be used as a snapshot of the workload progress for post-mortem analysis Pull Request resolved: https://github.com/pytorch/pytorch/pull/164591 Approved by: https://github.com/fduwjj	2025-10-09 05:33:35 +00:00
Nicolas Macchioni	ed6156e3ea	non-fb impls + unit tests (#164722 ) Test Plan: ``` buck test fbcode//mode/opt caffe2/test/inductor:caching ``` Differential Revision: D83714692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164722 Approved by: https://github.com/NikhilAPatel, https://github.com/adamomainz	2025-10-09 05:10:57 +00:00
Edward Z. Yang	d40a9bfb8d	Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939 ) This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. This largely reverts https://github.com/pytorch/pytorch/pull/103275/ for view ops. This means that in inference mode we could hit the wrong C++ kernel; if this occurs we should just SymInt'ify the C++ kernel. Another neat side effect of this change is that Inductor's generated kernels for rms_norm now have rms_norm in their name. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164939 Approved by: https://github.com/bdhirsh ghstack dependencies: #164573	2025-10-09 04:49:44 +00:00
Sherlock Huang	e532f62e0d	Introduce joint_custom_pass callback (#164981 ) ``` def joint_custom_pass(joint_gm: torch.fx.GraphModule, joint_inputs): # apply your pass for joint graph here return joint_gm class M(torch.nn.Module): def forward(self, x): return x.sin() x = torch.randn(10, requires_grad=False) compiled_fn = torch.compile(M(), backend="aot_eager") with torch._functorch.config.patch("joint_custom_pass", joint_custom_pass): out = compiled_fn(x) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164981 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2025-10-09 04:40:54 +00:00
Pian Pawakapan	1f73b96668	[PGO] log missing sources in allowlist (#164881 ) Summary: - logs missing dynamic sources - emits MLHub insight only on size mismatch recompiles Test Plan: test_pgo Differential Revision: D84098898 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164881 Approved by: https://github.com/bobrenjc93	2025-10-09 04:39:09 +00:00
PyTorch UpdateBot	7b691546d2	[vllm hash update] update the pinned vllm hash (#164628 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164628 Approved by: https://github.com/pytorchbot	2025-10-09 04:35:36 +00:00
PaulZhang12	f05e23e1bc	Add less warps config to inner reductions (#162447 ) Add less warps to ensure proper vectorization + memory coalescing for inner reductions, prefer more work per thread <img width="1717" height="731" alt="Screenshot 2025-09-17 at 10 03 25 AM" src="https://github.com/user-attachments/assets/7b1f4a30-62f2-4bee-bb9c-122501bde63e" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/162447 Approved by: https://github.com/v0i0, https://github.com/eellison, https://github.com/shunting314	2025-10-09 04:22:16 +00:00
PaulZhang12	d386325ca9	Fix truediv numerics between eager and compile (#164144 ) Addresses numeric differences between eager and compile in https://github.com/pytorch/pytorch/issues/141753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164144 Approved by: https://github.com/eellison, https://github.com/jansel, https://github.com/ngimel ghstack dependencies: #164997	2025-10-09 04:22:03 +00:00
Maggie Moss	7457d139c5	Add pyrefly suppressions to torch/distributed (7/n) (#165002 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 One more PR after this one. Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the project-excludes field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: INFO 0 errors (6,884 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165002 Approved by: https://github.com/oulgen	2025-10-09 04:08:25 +00:00
Nikita Vedeneev	ab94a0d544	[CUDA][cuBLAS] addmm -- some refactoring for easier navigation between the Lt and non-Lt paths (#163955 ) As per title. Additionally, some Lt selection conditions are revisited, and some redundancy removed (especially in the ROCm vs non-ROCm paths). Pull Request resolved: https://github.com/pytorch/pytorch/pull/163955 Approved by: https://github.com/ngimel, https://github.com/eqy	2025-10-09 04:07:45 +00:00
Animesh Jain	0e9b3a772a	[export] Turn on install_free_tensors flag (#164691 ) The final step in removing the discrepancy between torch.compile(fullgraph=True) and torch.export(strict=True). Pull Request resolved: https://github.com/pytorch/pytorch/pull/164691 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #164721	2025-10-09 03:25:15 +00:00
Animesh Jain	af7ca55ced	[export][dynamo] Fallback to slowpath for MultiHeadAttention for strict export (#164721 ) In https://github.com/pytorch/pytorch/pull/106824, export decided to slow-path for MultiHeadAttention module (look into the PR description as to why). But that PR eventually caused a divergence between Dynamo and export. Today, strict-export does not inline into builtin modules (like MultiHeadAttention), and therefore make_fx sees the original nn.Module and takes the slow path. But compile inlines into the nn module, and at this time the condition `_is_make_fx_tracing` is False. As a result, Dynamo takes a fast path, resulting in a different op being called. This divergence is undesirable. There are 2 ways to fix it 1) Make export take the fast path - As explained in the https://github.com/pytorch/pytorch/pull/106824 , this might be difficult. So, we go to (2) 2) Make compile as well take the slow path - This is easy to implement. The con here is that Pytorch eager and compile will use different operators, which can cause numerics issues etc. Since (2) is easy to do, we will follow this path. We are tracking the issue in https://github.com/pytorch/pytorch/issues/164062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164721 Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan	2025-10-09 03:25:15 +00:00
Yuanyuan Chen	a029675f6f	More ruff SIM fixes (#164695 ) This PR applies ruff `SIM` rules to more files. Most changes are about simplifying `dict.get` because `None` is already the default value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164695 Approved by: https://github.com/ezyang	2025-10-09 03:24:50 +00:00
PaulZhang12	54ae61c573	Change test_emulate_precision_casts_mean_ratio_chain from gelu to relu (#164997 ) gelu can be instable on local builds due to libdevice differences, as we lower to libdevice.erf. That combined with the semantics in the test can lead to catastrophic cancellation. We switch this test from gelu to relu to fix this instability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164997 Approved by: https://github.com/eellison, https://github.com/jansel	2025-10-09 03:14:05 +00:00
Jeddie Ji	2fe37b5fde	[RecSys][Combo Kernel] skip combo kernel generation if parition group is empty (#164918 ) Summary: Noticed sometimes the combo kernel partition will contain empty group. Skip kernel generation in this case to unblock head model launching. The change in this diff is safe, but it's better to root cause why empty group is being created. Test Plan: Lowering passed after applying the diff Differential Revision: D84134471 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164918 Approved by: https://github.com/mlazos	2025-10-09 02:55:23 +00:00
ruisizhang123	96d91da792	[dynamo] allow placement subclass to be traceble (#164985 ) This pr is to unblock SimpleFSDP+`gradient_divide_factor` [here](https://github.com/pytorch/torchtitan/pull/1793). We will need to create a subclass for DTensor `Partial` placement. When tracing `SimpleFSDPPartial`, I hit the assertion error that `SimpleFSDPPartial` is not in `ok_types`. I'm updating the code to check placement dtype via `isinstance` instead of `type(val)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164985 Approved by: https://github.com/ezyang, https://github.com/eellison	2025-10-09 01:44:21 +00:00
Ivan Zaitsev	f5fd18f7e2	Use runner with more memory for ASAN builds (#165000 ) An attempt to [address OOM here](`aed5ed1076/1`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/165000 Approved by: https://github.com/seemethere, https://github.com/malfet, https://github.com/huydhn	2025-10-09 01:09:28 +00:00
fduwjj	8ca986ee60	[fr] Enable reset the FR recording for fault tolerance (#164988 ) We also want to have a python side API for users to reset FR recording for FR entries. We don't need to reset the PGNCCL's member counter since we are creating new PGNCCL anyway. FR is a global ring buffer, so we need to reset it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164988 Approved by: https://github.com/tushar00jain ghstack dependencies: #164752	2025-10-09 01:03:01 +00:00
atalman	81dbeb06f4	CUDA aarch64 12.6 and 12.8 builds fix triton constraints (#165013 ) Since we have introduced CUDA aarch64 builds for all cuda versions we need to remove this constraint. This was missed by https://github.com/pytorch/pytorch/pull/162364 Proper constraint on triton should be: ``` Requires-Dist: triton==3.5.0; platform_system == "Linux" ``` not: ``` Requires-Dist: triton==3.5.0; platform_system == "Linux" and platform_machine == "x86_64" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165013 Approved by: https://github.com/Camyll, https://github.com/nWEIdia, https://github.com/tinglvv viable/strict/1759996180	2025-10-09 00:49:28 +00:00
fduwjj	7a1ead755f	[DeviceMesh] Add a warning for slicing flattened dim from root mesh and types for _get_slice_mesh_layout (#164993 ) As title, we want to add a deprecate warning for slicing flattened dim from root mesh. Also cosmetic changes for adding types for `_get_slice_mesh_layout`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164993 Approved by: https://github.com/fegin ghstack dependencies: #164750, #164954 viable/strict/1759987490	2025-10-09 00:47:08 +00:00
Boyuan Feng	90b4e130d6	[Benchmark] cleanup torchbench models (#164816 ) Prune models from TorchInductor dashboard to reduce ci cost. This PR prunes torchbench models according to the [doc](https://docs.google.com/document/d/1nLPNNAU-_M9Clx9FMrJ1ycdPxe-xRA54olPnsFzdpoU/edit?tab=t.0), which removes timm and huggingface models from torchbench. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164816 Approved by: https://github.com/anijain2305, https://github.com/seemethere, https://github.com/huydhn, https://github.com/malfet viable/strict/1759985417	2025-10-09 00:31:25 +00:00
Animesh Jain	4308b8a28f	[dynamo] Support torch.fx.traceback.annotate (#164678 ) Builds on top of https://github.com/pytorch/pytorch/pull/163673 and https://github.com/pytorch/pytorch/pull/164174. This will be used in the followup PRs to apply regional inductor compilation. The existing implementation let Dynamo trace into the `torch.fx.traceback.annotate`, but thats not what we want. We want Dynamo to essentially run the torch.fx.traceback.annotate function in eager, so that every Fx node created in Dynamo Fx graph has the custom meta node. What does not work? * We still have to set the context manager `torch.fx.traceback.preserve_node_meta()` in the user code because CI was unhappy. This can be fixed but with some perseverance. * This does not work with graph breaks yet. But we can solve that problem, if needed, in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164678 Approved by: https://github.com/SherlockNoMad, https://github.com/jansel, https://github.com/xmfan viable/strict/1759979241	2025-10-08 22:41:00 +00:00
Nikita Shulga	94b1ec8c7c	[BE] Use torch check the way its intended (#164987 ) Replace `if (!foo) TORCH_CHECK(false, "bar");` with `TORCH_CHECK(foo, "bar");` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164987 Approved by: https://github.com/albanD, https://github.com/Skylion007	2025-10-08 22:28:08 +00:00
eellison	054268c9eb	Consider collective inputs to be deallocated only when wait is completed (#164945 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164945 Approved by: https://github.com/IvanKobzarev ghstack dependencies: #164738, #164783, #164944 viable/strict/1759977763	2025-10-08 22:19:25 +00:00
eellison	af40828bbb	Limit coll bucketing within node idxs (#164944 ) Respect max_coll_distance from overlap scheduler in bucketing, also, add an optimization in path searching. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164944 Approved by: https://github.com/IvanKobzarev ghstack dependencies: #164738, #164783	2025-10-08 22:18:53 +00:00
bobrenjc93	5a1fbf45ad	[ez] remove unnecessary wrapper (#164720 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164720 Approved by: https://github.com/ydwu4	2025-10-08 22:12:29 +00:00
eellison	aed5ed1076	Refactor memory estimator to use node storages, add test (#164783 ) - Update the Memory Estimator to use node storages for analysis, which simplifies book keeping, as opposed to manually looking at operator schema. This will also allow me to reuse this component elsewhere. - Factor out into separate class, so that this same logic can be used in scheduling (node allocations / aliasing / uses) - Adds Tests for correctness - right now only on fwd/bwd by itself, not with both. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164783 Approved by: https://github.com/ruisizhang123 ghstack dependencies: #164738	2025-10-08 22:07:43 +00:00
William Wen	af4c29fea8	[dynamo, nested graph breaks] fix nested step graph break related issues (#162737 ) Turns out codegen'ing a nested step graph break is significantly more complicated than first thought. The optimized function should actually do: - call graph/load values/do side effects etc. - call into the leaf's resume function, but skipped (this essentially step graph break function for just the leaf function) - call into all the other resume functions, traced. This PR also adds `torch._dynamo.step_unsupported()`, which can be used for internal testing purposes to better test step graph break handling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162737 Approved by: https://github.com/Lucaskabela ghstack dependencies: #160601	2025-10-08 22:02:52 +00:00
William Wen	486b4d2414	[dynamo, nested graph breaks] move cell codegen before side effects codegen (#160601 ) This is needed because if we codegen cells for nested frames AFTER side effects, then reconstruction could get messed up. From below: >The added test case demonstrates the reconstruction failure if we kept cell codegen at the original place (only happens with nested graph breaks since we reconstruct nested frame cells from VariableTracker rather than directly using LOAD_CLOSURE). >At a high level, what happened before this change was that side_effects was pruning the cells (I don't recall exactly why this happens), and because cells were codegen'd after the side effects were applied, we were unable to properly reconstruct the cell. The error I was seeing was a list/tuple IndexError. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160601 Approved by: https://github.com/mlazos	2025-10-08 22:02:52 +00:00
Hari Krishna Sai Kodali	8f83b3e71c	add device generalization support for distributed checkpoint tests (#159242 ) ## MOTIVATION To generalize Distributed checkpoint test cases for non-CUDA devices ## CHANGES 18 test files with minimal device abstraction changes updated in test/distributed/checkpoint/ - Use device_type from DTensorTestBase wherever appropriate - Replaced hard coded device names with torch.accelerator.current_accelerator() - extend multi gpu decrator for other devices test/distributed/checkpoint/test_state_dict_stager.py has large diff, that's because i changed the name cuda_obj to gpu_obj. Functional change is minimum. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159242 Approved by: https://github.com/guangyey, https://github.com/d4l3k	2025-10-08 21:56:31 +00:00
Howard Huang	f0c9f3bddb	[PP] [BE] Remove runtime tests (#164962 ) BE cleaning up dead code since we migrated the Multi-stage schedules to use schedule execution runtime Pull Request resolved: https://github.com/pytorch/pytorch/pull/164962 Approved by: https://github.com/Skylion007 ghstack dependencies: #162016	2025-10-08 21:42:33 +00:00
Isalia20	1d182dd81c	[MPS] sparse norm (#164961 ) Norms for sparse mps tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/164961 Approved by: https://github.com/malfet	2025-10-08 21:41:42 +00:00
fduwjj	0b15f7ae05	[fr] Enable dynamic path write for FR dump when it comes to torchft (#164752 ) When it comes to FR dump, in the case of fault tolerance, users want to set the dump path to a different one when there is restart, so we just enable this case for users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164752 Approved by: https://github.com/tushar00jain	2025-10-08 21:36:32 +00:00
Nikita Shulga	f1229b6db9	[BE] Remove manual IP address resolution (#164969 ) As https://github.com/pytorch/pytorch/issues/100400 has been closed a while back Pull Request resolved: https://github.com/pytorch/pytorch/pull/164969 Approved by: https://github.com/seemethere ghstack dependencies: #164968	2025-10-08 21:22:34 +00:00
Anshul Sinha	b1ac252f55	[Replicate][Test] tests that pp model grads are the same as single-device model grads (#164890 ) Summary: Created a test so that we can verify that a model that has been pipelined + replicated has the same gradients as a reference model. To do this, I mapped the layers and their parameters in each partial model to the original full model and then compared the gradients. Test Case 1. pytest test/distributed/_composable/test_composability/test_pp_composability.py -k test_replicate_pp_grads Pull Request resolved: https://github.com/pytorch/pytorch/pull/164890 Approved by: https://github.com/H-Huang	2025-10-08 21:07:05 +00:00
fduwjj	5ba11df4f8	[DeviceMesh] Make all members of DeviceMesh private and add public access API (#164954 ) This is mostly mechanical change which make device mesh members all private and use a public property API instead. This is not a BC breaking change since the new API still guarantee BC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164954 Approved by: https://github.com/fegin ghstack dependencies: #164750	2025-10-08 21:04:07 +00:00
Nikita Shulga	15800888b6	[CI] Print GPU info during setup linux (#164968 ) I.e. run `nvidia-smi` if present Helps detecting what driver version this runner is on, which would have helped debugging some of the issues recently Pull Request resolved: https://github.com/pytorch/pytorch/pull/164968 Approved by: https://github.com/ngimel	2025-10-08 20:58:33 +00:00
Catherine Lee	e7ed1a00eb	Run inductor-perf-test-nightly-h100 once per day (#164967 ) To reduce inductor costs, though I'm not sure how much this one matters specifically since h100s are reserved Pull Request resolved: https://github.com/pytorch/pytorch/pull/164967 Approved by: https://github.com/BoyuanFeng	2025-10-08 20:58:19 +00:00
Shunting Zhang	2982406721	[inductor] ban benchmarking by default in deterministic mode (#164532 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164532 Approved by: https://github.com/eellison ghstack dependencies: #164801	2025-10-08 20:55:15 +00:00

1 2 3 4 5 ...

94186 Commits