pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Mikayla Gawarecki	f37a6523ef	Move version.h to torch/headeronly (#164381 ) Differential Revision: [D83685392](https://our.internmc.facebook.com/intern/diff/D83685392) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164381 Approved by: https://github.com/janeyx99	2025-10-07 17:47:30 +00:00
Maggie Moss	b13cd141b3	Add pyrefly suppressions (#164748 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the `project-excludes` field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: 0 errors (4,263 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164748 Approved by: https://github.com/oulgen	2025-10-07 17:31:18 +00:00
Lakshay Garg	5e47b4dd60	Remove device_id param from DeviceCachingAllocator::malloc (#164798 ) The `malloc` call in DeviceCachingAllocator accepts a DeviceIndex param which can be confusion because the allocator can only allocate memory for the device that it corresponds to. This associated device is fixed at construction time and the runtime param can be misleading. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164798 Approved by: https://github.com/ngimel, https://github.com/cyyever, https://github.com/eqy	2025-10-07 16:42:04 +00:00
Yuanyuan Chen	ee5389d520	Enable batch samples in sparse tests (#164677 ) The test cases are enabled because the issue was fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164677 Approved by: https://github.com/albanD	2025-10-07 15:58:37 +00:00
eellison	ab01a0d7d3	Add memory estimator (#164738 ) Original work by @ShatianWang, with lints applied. I am going to a few changes and add tests in subsequent prs but I want to preserve original commit first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164738 Approved by: https://github.com/IvanKobzarev ghstack dependencies: #164568, #164569, #164581	2025-10-07 15:32:27 +00:00
Animesh Jain	801e282f39	[dynamo] Support torch.fx.traceback.annotate (#164678 ) Builds on top of https://github.com/pytorch/pytorch/pull/163673 and https://github.com/pytorch/pytorch/pull/164174. This will be used in the followup PRs to apply regional inductor compilation. The existing implementation let Dynamo trace into the `torch.fx.traceback.annotate`, but thats not what we want. We want Dynamo to essentially run the torch.fx.traceback.annotate function in eager, so that every Fx node created in Dynamo Fx graph has the custom meta node. This does not work with graph breaks yet. But we can solve that problem, if needed, in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164678 Approved by: https://github.com/SherlockNoMad, https://github.com/jansel, https://github.com/xmfan viable/strict/1759863143	2025-10-07 14:54:26 +00:00
Aleksei Nikiforov	87c9fbda22	Follow up to PR 163980 for s390x (#164464 ) Now with same updates propagated to s390x it works on s390x runners too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164464 Approved by: https://github.com/atalman viable/strict/1759857870	2025-10-07 12:02:29 +00:00
YyWangCS	3cc8af2d67	torch.topk: refactor global histogram/cumsum into a dedicated kernel to eliminate redundant memory access (#164459 ) # TLDR This PR removes the regression in torch.topk introduced from torch 2.7.0 and delivers much better performance for large inputs. The table below reports execution times on H20 for various input sizes with float32 data, extracting the top-100 values. Results indicate that this PR restores and improves performance, especially on large inputs. \| Input Shape \| torch2.6.0 (ms) \| torch2.8.0 (ms) \| 2.8.0+this PR (ms) \| \| -------------- \| --------------- \| --------------- \| ------------------ \| \| (1, 1B) \| 36.6 \| 1564.1 \| 25.6 \| \| (1, 100M) \| 3.56 \| 17.4 \| 2.54 \| \| (1, 1000,000) \| 0.135 \| 0.145 \| 0.098 \| \| (512, 128000) \| 1.33 \| 1.33 \| 1.32 \| \| (8192, 128000) \| 19.6 \| 19.6 \| 19.4 \| # Background After upgrading PyTorch from 2.6.0 to 2.7.0, we observed a significant GPU performance regression in `torch.topk` on NVIDIA GPUs. For instance, extracting the top-1000 largest values from one billion floats on an NVIDIA H20 increased from 36 ms to 1.6 s. Profiling with Nsight Compute indicates that the slowdown is caused by redundant memory accesses introduced in [PR #145536](https://github.com/pytorch/pytorch/pull/145536). # Analysis `torch.topk` relies on RadixSelect to find the target values. Each radix pass requires computing a histogram of the input values. For large inputs, histogram computation is split into two stages: 1. Local histogram: Each CUDA block processes a subset of the input and writes its local histogram to global memory. 2. Global reduction: A single CUDA block reads all local histograms from global memory and reduces them into the final global histogram. Before [PR #145536](https://github.com/pytorch/pytorch/pull/145536), both stages ran inside a single kernel (`radixFindKthValues`), using a semaphore to ensure that all local histograms were completed before reduction. In PR #145536, the global histogram computation was merged with subsequent top-k calculations into a single kernel (`computeBlockwiseKthCounts`) to avoid the semaphore. While this simplifies synchronization, it introduces redundant memory reads: - `computeBlockwiseKthCounts` launches `numInputSlices * blocks_per_slice` blocks. - For each row (slice), `blocks_per_slice` CUDA blocks redundantly reload the same local histograms from global memory. # This PR To address this inefficiency, we introduce the following optimizations: 1. Dedicated kernel: Refactor global histogram and cumsum computation into a separate GPU kernel, `computeDigitCumSum`. 2. Loop unrolling: Apply loop unrolling in `computeDigitCumSum` to speed up local histogram reads. # Performance We benchmarked torch.topk on NVIDIA H20 with float32 inputs, extracting the top-100 values across different input sizes. The results in the table below demonstrate that this PR effectively eliminates the performance regression introduced in 2.7.0 and delivers substantial improvements on large inputs. \| Input Shape \| torch2.6.0 (ms) \| torch2.8.0 (ms) \| 2.8.0+this PR (ms) \| \| -------------- \| --------------- \| --------------- \| ------------------ \| \| (1, 1B) \| 36.6 \| 1564.1 \| 25.6 \| \| (1, 100M) \| 3.56 \| 17.4 \| 2.54 \| \| (1, 1000,000) \| 0.135 \| 0.145 \| 0.098 \| \| (512, 128000) \| 1.33 \| 1.33 \| 1.32 \| \| (8192, 128000) \| 19.6 \| 19.6 \| 19.4 \| Besides, I have verified the correctness of this PR with different inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164459 Approved by: https://github.com/ngimel, https://github.com/Skylion007 viable/strict/1759850721	2025-10-07 11:04:03 +00:00
Nicolas Macchioni	1fb072ac2a	exceptions + unit tests (#164550 ) Test Plan: ``` buck test fbcode//mode/opt caffe2/test/inductor:caching ``` Reviewed By: aorenste Differential Revision: D83714688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164550 Approved by: https://github.com/aorenste viable/strict/1759847121	2025-10-07 10:04:58 +00:00
Animesh Jain	cac5e13e13	[dynamo] Inline nn module calls using __call__ methods (#164817 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164817 Approved by: https://github.com/SherlockNoMad, https://github.com/mlazos viable/strict/1759842063	2025-10-07 08:57:20 +00:00
Ivan Zaitsev	68350660ee	Increase timeout for nightly macOS performance tests to 300 minutes (#164793 ) the Test step time recently went slightly up. hopefully this fixes https://github.com/pytorch/alerting-infra/issues/263 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164793 Approved by: https://github.com/seemethere	2025-10-07 08:44:07 +00:00
Laith Sakka	ef7e2ca77e	remove check_is_size from test_misc.py (#164667 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164667 Approved by: https://github.com/angelayi ghstack dependencies: #164664, #164665	2025-10-07 07:33:50 +00:00
Laith Sakka	cdaaf3e4a3	remove size-like based size-oblivious special max simplifications (#164665 ) As we removed guard_size_oblivious this simplification is no longer relevant, this is part of the process of deprecation for guard_size_oblivious and its dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164665 Approved by: https://github.com/aorenste ghstack dependencies: #164664	2025-10-07 07:33:50 +00:00
Laith Sakka	0ea59c3c55	do not suggest torch._check_is_size() (#164664 ) size like concept for data dependency is not relevant anymore as we removed all guard_size_oblivious calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164664 Approved by: https://github.com/angelayi, https://github.com/mlazos	2025-10-07 07:33:50 +00:00
Nicolas Macchioni	8f705d019a	context + unit tests (#164549 ) Summary: the context module provides configurable context selection + isolation key hashing; context selection is broken into runtime and compile context. runtime context is decided at call time (inductor configs, precision configs, etc.) and compile context is decided at compile time (hardware type, software hashes). callees will be given access to SelectedRuntimeContext and SelectedCompileContext, which they can use to determine and select what context is necessary with regards to the function which is being cached. these selected contexts are wrapped in an IsolationSchema, which denotes what context should be taken into consideration when producing an isolation key. The isolation key is essentially a salt of the function signature key, which says that some function signature key result is valid under a given context (isolation schema) Test Plan: ``` buck test fbcode//mode/opt caffe2/test/inductor:caching ``` Reviewed By: aorenste D83714689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164549 Approved by: https://github.com/aorenste viable/strict/1759832720	2025-10-07 06:02:10 +00:00
bobrenjc93	4bcc05777e	[torchfuzz] synthesize inputs for data dependent ops (#164716 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164716 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688, #164693, #164694, #164715 viable/strict/1759830577	2025-10-07 05:40:32 +00:00
bobrenjc93	2a6cdba6e5	[torchfuzz] various edge case fixes (#164715 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164715 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688, #164693, #164694	2025-10-07 05:30:46 +00:00
bobrenjc93	53f6cc7529	[torchfuzz] make ops_fuzzer deterministic (#164694 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164694 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688, #164693	2025-10-07 05:30:46 +00:00
bobrenjc93	ac901bf79a	[torchfuzz] consolidate on a base implementation of args_codegen (#164693 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164693 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688	2025-10-07 05:20:28 +00:00
bobrenjc93	c965d6dbb2	[torchfuzz] move into experimental dir (#164688 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164688 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687	2025-10-07 05:09:08 +00:00
bobrenjc93	ac08556f67	[torchfuzz] support more unbacked functions (#164687 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164687 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649	2025-10-07 05:00:03 +00:00
bobrenjc93	5fe7f29b9e	[torchfuzz] add support for operator weights (#164649 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164649 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647	2025-10-07 05:00:03 +00:00
bobrenjc93	ded099ecbf	[torchfuzz] don't use the first gpu in multi process fuzzer (#164647 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164647 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646	2025-10-07 04:59:56 +00:00
bobrenjc93	63fcc3e6c4	[torchfuzz] update README.md (#164646 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164646 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514	2025-10-07 04:59:50 +00:00
Xiao Fu	fd3e15c14f	Fix typo in class definition of bytecodedispatchtable (#164762 ) ghstack-source-id: 84f0d7bb7e3780ca75473782abfae530010be56e Pull Request resolved: https://github.com/pytorch/pytorch/pull/164761 Fixes the type in naming of bytecodedispatchtable Pull Request resolved: https://github.com/pytorch/pytorch/pull/164762 Approved by: https://github.com/StrongerXi, https://github.com/williamwen42 viable/strict/1759827107	2025-10-07 04:36:09 +00:00
Yuanyuan Chen	ff5faa744a	Remove unused THPXXX macros (#164660 ) These macros are not used in OSS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164660 Approved by: https://github.com/albanD	2025-10-07 04:04:21 +00:00
Tugsbayasgalan Manlaibaatar	4725871a81	Return fake mode from export graph capture API (#164730 ) This PR is to temporarily unblock various experiments to re-use dynamo create fake mode. Note that this is still not what we want as the end state. The end state should look sth like: ``` out = fulllgraph_capture(mod, inputs) fake_mode = out.backend_inputs.fake_mode gm = out.module() ``` This doesn't work today because export requires we need to wrap the original module to setup a flat module to trace for easier handling of pytree. As a result, we would need to carry export specific flag in fullgraph_capture which seems not ideal. Regardless, the end state is that we need to give downstream user a graph module and a fake mode in some form, so I think _dynamo_graph_capture_for_export returning a fake mode within graph module itself via gm.meta Pull Request resolved: https://github.com/pytorch/pytorch/pull/164730 Approved by: https://github.com/avikchaudhuri viable/strict/1759823376	2025-10-07 03:42:46 +00:00
Animesh Jain	bcd96cc6ff	[annotate] Copy fwd to bwd metadata for subgraphs as well (#164795 ) The test is in the next PR. My older PR on dynamo annotate - https://github.com/pytorch/pytorch/pull/164678 is getting reverted due to unknown reasons, so difficult to add a test right now in this PR. When I reland, I can add a test for this as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164795 Approved by: https://github.com/yushangdi ghstack dependencies: #164772 viable/strict/1759821889	2025-10-07 02:42:47 +00:00
Yuanyuan Chen	50e077beaa	Fix outdated info in requirements-ci.txt (#164441 ) Fixes installation instructions and descriptions for `numba` and `scikit-image` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164441 Approved by: https://github.com/albanD	2025-10-07 02:10:41 +00:00
albanD	56d66ac0d7	Make custom op alias check consistent (#164576 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164576 Approved by: https://github.com/soulitzer	2025-10-07 02:05:09 +00:00
rraminen	49f7d8d19d	[ROCm] Fix test_cuda_synchronize failure on ROCm (#164735 ) This PR skips the hipify step of torch/csrc/jit/ir/ir.h to avoid a build-time error for the JIT cuda namespace. This fixes two skipped tests in test/jit/test_cuda.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164735 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> viable/strict/1759814683	2025-10-07 01:14:24 +00:00
PyTorch MergeBot	afee8062d5	Revert "Fix mesh.get_local_rank when it is > 1d (#164473 )" This reverts commit 83d71dfb2fd993a6242372b8123549acaa85ffdb. Reverted https://github.com/pytorch/pytorch/pull/164473 on behalf of https://github.com/izaitsevfb due to appears to be causing vision_maskrcnn regression ([comment](https://github.com/pytorch/pytorch/pull/164473#issuecomment-3374738997)) viable/strict/1759812581	2025-10-07 00:37:41 +00:00
Chris Leonard	e89d12bf5d	Numpy zerotensor handling (#164487 ) Fixes #89034 Updated tensor_to_numpy() function in tensor_numpy.cpp to handle ZeroTensors by throwing an error if force=False and returning an array full of zeros if force=True. @ngimel, I just saw that you mentioned PyTorch is not too concerned with this issue but I had already worked on it so I figured I would push it anyways and see what you thought. Feel free to close the PR if you think it is not worth merging. @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/164487 Approved by: https://github.com/izaitsevfb	2025-10-07 00:34:14 +00:00
Yedidya Feldblum	d4752bc7f6	[caffe2] tweak Unpickler::readInstruction handling TUPLE (#164764 ) Summary: Creating the vector was a bit awkward. Use the natural iterator-pair constructor with move-iterators. Test Plan: CI. Reviewed By: dolpm Differential Revision: D83995108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164764 Approved by: https://github.com/drisspg viable/strict/1759811145	2025-10-07 00:18:10 +00:00
Jeff Daily	44a5d41993	[ROCm] add gfx1150 gfx1151 to supported gemm lists (#164744 ) This is one of a few PRs needed to address https://github.com/pytorch/pytorch/pull/164744 fully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164744 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-07 00:02:23 +00:00
Animesh Jain	361c5d362c	[fx][traceback] Actually disable preservation of node metadata when enable=False (#164772 ) This will come in handy when we run graph passes that add new nodes, and create_proxy can add seq_nr meta. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164772 Approved by: https://github.com/SherlockNoMad	2025-10-06 23:39:12 +00:00
PyTorch MergeBot	1fc71d1b57	Revert "Numpy zerotensor handling (#164487 )" This reverts commit f7ad6dbad67161333a1473d1e0b478b7475a0ec1. Reverted https://github.com/pytorch/pytorch/pull/164487 on behalf of https://github.com/malfet due to Did it break torchbench?, see `8c728e129d/1` ([comment](https://github.com/pytorch/pytorch/pull/164487#issuecomment-3374635051)) viable/strict/1759809089	2025-10-06 23:32:12 +00:00
Jeff Daily	8f54e27e5d	[ROCm][CI] rebuild magma binary for gfx1150 gfx1151 (#164782 ) After #164763 added gfx1150 gfx1151 to list of targets, this PR will trigger rebuild of magma binary for ROCm 7 with the new targets. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164782 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-06 23:29:21 +00:00
Scott Wolchok	8c0bc879b9	Reapply "C++-accessible Placements via pybind11 (#163030 )" (#164519 ) This makes Placement data representation available in C++ via pybind11. Reapply with fix for internal errors. D83788896 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164519 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2025-10-06 23:19:14 +00:00
Eddie Yan	746fe78ecd	[CUDA] Add experimental green context support for SM carveout (#159104 ) Low-level PyTorch APIs should be usable/stable enough at this point but we might move the underlying driver API usage a bit from here... Built on top of @drisspg 's branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/159104 Approved by: https://github.com/ngimel Co-authored-by: drisspg <drisspguessous@gmail.com> viable/strict/1759807643	2025-10-06 23:11:23 +00:00
Yuanyuan Chen	b63bbe1661	Remove old ROCm version check in tests (#164245 ) This PR removes ROCm<6 version checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164245 Approved by: https://github.com/jeffdaily	2025-10-06 22:42:01 +00:00
PyTorch MergeBot	3912ba3e94	Revert "Fix refine_ranges corner case (#164075 )" This reverts commit 27234792add2ee9bedd84ca02dbf34f8f244bc5c. Reverted https://github.com/pytorch/pytorch/pull/164075 on behalf of https://github.com/izaitsevfb due to fails executorch builds, see [D83938444](https://www.internalfb.com/diff/D83938444) ([comment](https://github.com/pytorch/pytorch/pull/164075#issuecomment-3374430964))	2025-10-06 22:09:39 +00:00
PyTorch MergeBot	cfc5cc17dc	Revert "[dynamo] Support torch.fx.traceback.annotate (#164678 )" This reverts commit 2883b5ab773daf5861d43ff0b65be49a441ab3f9. Reverted https://github.com/pytorch/pytorch/pull/164678 on behalf of https://github.com/izaitsevfb due to fails inductor:max_autotune tests internally, see D83948169 ([comment](https://github.com/pytorch/pytorch/pull/164678#issuecomment-3374407009)) viable/strict/1759804909	2025-10-06 22:03:42 +00:00
zeshengzong	fdc8ccc5bc	Make `Adam`, `AdamW` work with nonzero-dim Tensor betas (#149939 ) Fixes #147921 ## Changes - Convert tensor `betas` using `_to_scalar` - Change annotation of `betas` param - Change param type in docs ## Test Result ```bash pytest -s test/test_optim.py -k test_tensor_lr -vv ``` ![image](https://github.com/user-attachments/assets/312ee045-1e8b-4789-aa6e-ba63e6df7e81) ![image](https://github.com/user-attachments/assets/7e6ec274-645b-46b9-b1a6-2b340a685203) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149939 Approved by: https://github.com/janeyx99 Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2025-10-06 22:03:25 +00:00
Yuanyuan Chen	48b54b45d6	Replace pynvml with nvidia-ml-py in win-test.sh (#164681 ) pynvml was deprecated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164681 Approved by: https://github.com/Aidyn-A, https://github.com/eqy	2025-10-06 21:57:26 +00:00
Eddie Yan	6861fa43e5	[CUDA] Cleanup persistent cuBLASLt workspaces before compile-regions test (#163299 ) Fixes some tests that seemed to start flaking out as reported in #163202, due to cuBLASLt workspaces becoming persistent following that change. It's relatively obvious why the workspaces/allocations corresponding to them should be cleaned up for `test_memory_snapshot_script` but less obvious for `test_memory_plots_free_segment_stack`? Why does not cleaning up workspace prevent `empty_cache` from showing up? Pull Request resolved: https://github.com/pytorch/pytorch/pull/163299 Approved by: https://github.com/albanD viable/strict/1759799518	2025-10-06 21:13:03 +00:00
atalman	c1f40d33c8	Fix docker build issue after 164575 (#164774 ) Looks like https://github.com/pytorch/pytorch/pull/164575 introduced an issue. The command is wrong: ``` conda install -c "whl/nightly" -y python=3.11 conda=25.7.0 ``` Should be just using default conda channel: ``` conda install -y python=3.11 conda=25.7.0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164774 Approved by: https://github.com/Camyll	2025-10-06 20:28:20 +00:00
Jeff Daily	7e7ac2039d	[ROCm][CI] add gfx1150 gfx1151 to almalinux image (#164763 ) First PR necessary to address missing gfx1151 reported in https://github.com/pytorch/pytorch/issues/164346. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164763 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> viable/strict/1759797408	2025-10-06 20:19:43 +00:00
Zhengxu Chen	23ab6a45e5	[precompile][ez] Add instrumentation for guard loading/building. (#164602 ) Summary: as title. Test Plan: CI Differential Revision: D83868533 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164602 Approved by: https://github.com/dolpm	2025-10-06 20:16:09 +00:00
Rohit Singh Rathaur	b558c986e8	Add regression test for get_root_mesh with multiple independent meshes (#164731 ) Fixes #163330 I tried to reproduce the bug with my 4-GPU setup (the original issue used 8 GPUs). I created several different test scenarios, trying to trigger the bug by: - creating two different device meshes - slicing them in various ways - checking if get_root_mesh() would get confused but the bug didn't show up! Everything worked correctly in `2.10`. I found that there was a massive refactoring of the `DeviceMesh` code (PR #163213) that landed on October 2nd. That PR completely rewrote how `DeviceMesh` tracks relationships between parent meshes and submeshes using. It seems like this refactoring fixed the bug! But I added a regression test to make sure it doesn't come back. The test (`test_get_root_mesh_multiple_independent_meshes`) does exactly what the bug report described: - creates two independent meshes - slices them both - verifies that each submesh correctly points back to its real parent - makes sure submeshes from mesh1 don't incorrectly claim mesh2 as their parent Pull Request resolved: https://github.com/pytorch/pytorch/pull/164731 Approved by: https://github.com/fduwjj viable/strict/1759794583	2025-10-06 18:52:25 +00:00

1 2 3 4 5 ...

94061 Commits