pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-11 22:34:53 +08:00

Author	SHA1	Message	Date
PyTorch MergeBot	df640df68a	Revert "Reapply "C++-accessible Placements via pybind11 (#163030 )" (#164519 )" This reverts commit 8c0bc879b97bc580aaa0777b2d266bdd068cb528. Reverted https://github.com/pytorch/pytorch/pull/164519 on behalf of https://github.com/malfet due to Still breaks internal workflows ([comment](https://github.com/pytorch/pytorch/pull/164519#issuecomment-3378469432))	2025-10-07 19:46:17 +00:00
zhxchen17	4c3c0ef2f1	[precompile] Load source cache for AOT compile as well. (#164773 ) Adding source_get_cache also to AOT compile case. Since the guard manager loader code can be shared between AOT and caching, we added a new function load_guard_manager to avoid code duplication between two workflows, for loading guards. Test Plan: test_guard_serialization.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/164773 Approved by: https://github.com/yiming0416, https://github.com/dolpm	2025-10-07 18:47:09 +00:00
Parshant Sharma	bc33b10202	fix copy_ for scalar in inductor (#164167 ) Fixes #158437 ### Summary - TorchInductor was not properly handling scalar copy operations `(tensor.copy_(scalar_value))` - Ensured scalar sources are converted to appropriate tensor representations with correct dtype and device ### Impact - Enables compilation of models using ` tensor.copy_(scalar) `patterns - module: inductor Pull Request resolved: https://github.com/pytorch/pytorch/pull/164167 Approved by: https://github.com/shunting314 viable/strict/1759877385	2025-10-07 18:31:37 +00:00
Colin Peppler	2855a045b3	Use sym_eq and sym_and on symbolic shapes in common_meta_baddbmm_bmm (#164781 ) Differential Revision: D84005053 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164781 Approved by: https://github.com/Skylion007	2025-10-07 18:25:00 +00:00
Lakshay Garg	9ecd092bd9	Add python bindings for NCCL CTA policies (#164309 ) NCCLConfig can now be constructed with non-default [cta policies][1] ```python import torch from torch.distributed import ProcessGroupNCCL as nccl config = nccl.NCCLConfig() config.cta_policy = nccl.NCCL_CTA_POLICY_ZERO # NCCL version >= 2.28 ``` [1]: https://docs.nvidia.com/deeplearning/nccl/archives/nccl_2283/user-guide/docs/api/flags.html#nccl-communicator-cta-policy-flags Pull Request resolved: https://github.com/pytorch/pytorch/pull/164309 Approved by: https://github.com/eqy	2025-10-07 18:16:20 +00:00
Avik Chaudhuri	078d475d3b	move partition and compiler fns from stage 1 to stage 2 (#164765 ) Differential Revision: D83995689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164765 Approved by: https://github.com/zhxchen17 viable/strict/1759875874	2025-10-07 18:02:03 +00:00
Mikayla Gawarecki	f37a6523ef	Move version.h to torch/headeronly (#164381 ) Differential Revision: [D83685392](https://our.internmc.facebook.com/intern/diff/D83685392) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164381 Approved by: https://github.com/janeyx99	2025-10-07 17:47:30 +00:00
Maggie Moss	b13cd141b3	Add pyrefly suppressions (#164748 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the `project-excludes` field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: 0 errors (4,263 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164748 Approved by: https://github.com/oulgen	2025-10-07 17:31:18 +00:00
Lakshay Garg	5e47b4dd60	Remove device_id param from DeviceCachingAllocator::malloc (#164798 ) The `malloc` call in DeviceCachingAllocator accepts a DeviceIndex param which can be confusion because the allocator can only allocate memory for the device that it corresponds to. This associated device is fixed at construction time and the runtime param can be misleading. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164798 Approved by: https://github.com/ngimel, https://github.com/cyyever, https://github.com/eqy	2025-10-07 16:42:04 +00:00
Yuanyuan Chen	ee5389d520	Enable batch samples in sparse tests (#164677 ) The test cases are enabled because the issue was fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164677 Approved by: https://github.com/albanD	2025-10-07 15:58:37 +00:00
eellison	ab01a0d7d3	Add memory estimator (#164738 ) Original work by @ShatianWang, with lints applied. I am going to a few changes and add tests in subsequent prs but I want to preserve original commit first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164738 Approved by: https://github.com/IvanKobzarev ghstack dependencies: #164568, #164569, #164581	2025-10-07 15:32:27 +00:00
Animesh Jain	801e282f39	[dynamo] Support torch.fx.traceback.annotate (#164678 ) Builds on top of https://github.com/pytorch/pytorch/pull/163673 and https://github.com/pytorch/pytorch/pull/164174. This will be used in the followup PRs to apply regional inductor compilation. The existing implementation let Dynamo trace into the `torch.fx.traceback.annotate`, but thats not what we want. We want Dynamo to essentially run the torch.fx.traceback.annotate function in eager, so that every Fx node created in Dynamo Fx graph has the custom meta node. This does not work with graph breaks yet. But we can solve that problem, if needed, in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164678 Approved by: https://github.com/SherlockNoMad, https://github.com/jansel, https://github.com/xmfan viable/strict/1759863143	2025-10-07 14:54:26 +00:00
Aleksei Nikiforov	87c9fbda22	Follow up to PR 163980 for s390x (#164464 ) Now with same updates propagated to s390x it works on s390x runners too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164464 Approved by: https://github.com/atalman viable/strict/1759857870	2025-10-07 12:02:29 +00:00
YyWangCS	3cc8af2d67	torch.topk: refactor global histogram/cumsum into a dedicated kernel to eliminate redundant memory access (#164459 ) # TLDR This PR removes the regression in torch.topk introduced from torch 2.7.0 and delivers much better performance for large inputs. The table below reports execution times on H20 for various input sizes with float32 data, extracting the top-100 values. Results indicate that this PR restores and improves performance, especially on large inputs. \| Input Shape \| torch2.6.0 (ms) \| torch2.8.0 (ms) \| 2.8.0+this PR (ms) \| \| -------------- \| --------------- \| --------------- \| ------------------ \| \| (1, 1B) \| 36.6 \| 1564.1 \| 25.6 \| \| (1, 100M) \| 3.56 \| 17.4 \| 2.54 \| \| (1, 1000,000) \| 0.135 \| 0.145 \| 0.098 \| \| (512, 128000) \| 1.33 \| 1.33 \| 1.32 \| \| (8192, 128000) \| 19.6 \| 19.6 \| 19.4 \| # Background After upgrading PyTorch from 2.6.0 to 2.7.0, we observed a significant GPU performance regression in `torch.topk` on NVIDIA GPUs. For instance, extracting the top-1000 largest values from one billion floats on an NVIDIA H20 increased from 36 ms to 1.6 s. Profiling with Nsight Compute indicates that the slowdown is caused by redundant memory accesses introduced in [PR #145536](https://github.com/pytorch/pytorch/pull/145536). # Analysis `torch.topk` relies on RadixSelect to find the target values. Each radix pass requires computing a histogram of the input values. For large inputs, histogram computation is split into two stages: 1. Local histogram: Each CUDA block processes a subset of the input and writes its local histogram to global memory. 2. Global reduction: A single CUDA block reads all local histograms from global memory and reduces them into the final global histogram. Before [PR #145536](https://github.com/pytorch/pytorch/pull/145536), both stages ran inside a single kernel (`radixFindKthValues`), using a semaphore to ensure that all local histograms were completed before reduction. In PR #145536, the global histogram computation was merged with subsequent top-k calculations into a single kernel (`computeBlockwiseKthCounts`) to avoid the semaphore. While this simplifies synchronization, it introduces redundant memory reads: - `computeBlockwiseKthCounts` launches `numInputSlices * blocks_per_slice` blocks. - For each row (slice), `blocks_per_slice` CUDA blocks redundantly reload the same local histograms from global memory. # This PR To address this inefficiency, we introduce the following optimizations: 1. Dedicated kernel: Refactor global histogram and cumsum computation into a separate GPU kernel, `computeDigitCumSum`. 2. Loop unrolling: Apply loop unrolling in `computeDigitCumSum` to speed up local histogram reads. # Performance We benchmarked torch.topk on NVIDIA H20 with float32 inputs, extracting the top-100 values across different input sizes. The results in the table below demonstrate that this PR effectively eliminates the performance regression introduced in 2.7.0 and delivers substantial improvements on large inputs. \| Input Shape \| torch2.6.0 (ms) \| torch2.8.0 (ms) \| 2.8.0+this PR (ms) \| \| -------------- \| --------------- \| --------------- \| ------------------ \| \| (1, 1B) \| 36.6 \| 1564.1 \| 25.6 \| \| (1, 100M) \| 3.56 \| 17.4 \| 2.54 \| \| (1, 1000,000) \| 0.135 \| 0.145 \| 0.098 \| \| (512, 128000) \| 1.33 \| 1.33 \| 1.32 \| \| (8192, 128000) \| 19.6 \| 19.6 \| 19.4 \| Besides, I have verified the correctness of this PR with different inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164459 Approved by: https://github.com/ngimel, https://github.com/Skylion007 viable/strict/1759850721	2025-10-07 11:04:03 +00:00
Nicolas Macchioni	1fb072ac2a	exceptions + unit tests (#164550 ) Test Plan: ``` buck test fbcode//mode/opt caffe2/test/inductor:caching ``` Reviewed By: aorenste Differential Revision: D83714688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164550 Approved by: https://github.com/aorenste viable/strict/1759847121	2025-10-07 10:04:58 +00:00
Animesh Jain	cac5e13e13	[dynamo] Inline nn module calls using __call__ methods (#164817 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164817 Approved by: https://github.com/SherlockNoMad, https://github.com/mlazos viable/strict/1759842063	2025-10-07 08:57:20 +00:00
Ivan Zaitsev	68350660ee	Increase timeout for nightly macOS performance tests to 300 minutes (#164793 ) the Test step time recently went slightly up. hopefully this fixes https://github.com/pytorch/alerting-infra/issues/263 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164793 Approved by: https://github.com/seemethere	2025-10-07 08:44:07 +00:00
Laith Sakka	ef7e2ca77e	remove check_is_size from test_misc.py (#164667 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164667 Approved by: https://github.com/angelayi ghstack dependencies: #164664, #164665	2025-10-07 07:33:50 +00:00
Laith Sakka	cdaaf3e4a3	remove size-like based size-oblivious special max simplifications (#164665 ) As we removed guard_size_oblivious this simplification is no longer relevant, this is part of the process of deprecation for guard_size_oblivious and its dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164665 Approved by: https://github.com/aorenste ghstack dependencies: #164664	2025-10-07 07:33:50 +00:00
Laith Sakka	0ea59c3c55	do not suggest torch._check_is_size() (#164664 ) size like concept for data dependency is not relevant anymore as we removed all guard_size_oblivious calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164664 Approved by: https://github.com/angelayi, https://github.com/mlazos	2025-10-07 07:33:50 +00:00
Nicolas Macchioni	8f705d019a	context + unit tests (#164549 ) Summary: the context module provides configurable context selection + isolation key hashing; context selection is broken into runtime and compile context. runtime context is decided at call time (inductor configs, precision configs, etc.) and compile context is decided at compile time (hardware type, software hashes). callees will be given access to SelectedRuntimeContext and SelectedCompileContext, which they can use to determine and select what context is necessary with regards to the function which is being cached. these selected contexts are wrapped in an IsolationSchema, which denotes what context should be taken into consideration when producing an isolation key. The isolation key is essentially a salt of the function signature key, which says that some function signature key result is valid under a given context (isolation schema) Test Plan: ``` buck test fbcode//mode/opt caffe2/test/inductor:caching ``` Reviewed By: aorenste D83714689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164549 Approved by: https://github.com/aorenste viable/strict/1759832720	2025-10-07 06:02:10 +00:00
bobrenjc93	4bcc05777e	[torchfuzz] synthesize inputs for data dependent ops (#164716 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164716 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688, #164693, #164694, #164715 viable/strict/1759830577	2025-10-07 05:40:32 +00:00
bobrenjc93	2a6cdba6e5	[torchfuzz] various edge case fixes (#164715 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164715 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688, #164693, #164694	2025-10-07 05:30:46 +00:00
bobrenjc93	53f6cc7529	[torchfuzz] make ops_fuzzer deterministic (#164694 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164694 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688, #164693	2025-10-07 05:30:46 +00:00
bobrenjc93	ac901bf79a	[torchfuzz] consolidate on a base implementation of args_codegen (#164693 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164693 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688	2025-10-07 05:20:28 +00:00
bobrenjc93	c965d6dbb2	[torchfuzz] move into experimental dir (#164688 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164688 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687	2025-10-07 05:09:08 +00:00
bobrenjc93	ac08556f67	[torchfuzz] support more unbacked functions (#164687 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164687 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649	2025-10-07 05:00:03 +00:00
bobrenjc93	5fe7f29b9e	[torchfuzz] add support for operator weights (#164649 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164649 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646, #164647	2025-10-07 05:00:03 +00:00
bobrenjc93	ded099ecbf	[torchfuzz] don't use the first gpu in multi process fuzzer (#164647 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164647 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514, #164646	2025-10-07 04:59:56 +00:00
bobrenjc93	63fcc3e6c4	[torchfuzz] update README.md (#164646 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164646 Approved by: https://github.com/pianpwk ghstack dependencies: #164432, #164434, #164514	2025-10-07 04:59:50 +00:00
Xiao Fu	fd3e15c14f	Fix typo in class definition of bytecodedispatchtable (#164762 ) ghstack-source-id: 84f0d7bb7e3780ca75473782abfae530010be56e Pull Request resolved: https://github.com/pytorch/pytorch/pull/164761 Fixes the type in naming of bytecodedispatchtable Pull Request resolved: https://github.com/pytorch/pytorch/pull/164762 Approved by: https://github.com/StrongerXi, https://github.com/williamwen42 viable/strict/1759827107	2025-10-07 04:36:09 +00:00
Yuanyuan Chen	ff5faa744a	Remove unused THPXXX macros (#164660 ) These macros are not used in OSS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164660 Approved by: https://github.com/albanD	2025-10-07 04:04:21 +00:00
Tugsbayasgalan Manlaibaatar	4725871a81	Return fake mode from export graph capture API (#164730 ) This PR is to temporarily unblock various experiments to re-use dynamo create fake mode. Note that this is still not what we want as the end state. The end state should look sth like: ``` out = fulllgraph_capture(mod, inputs) fake_mode = out.backend_inputs.fake_mode gm = out.module() ``` This doesn't work today because export requires we need to wrap the original module to setup a flat module to trace for easier handling of pytree. As a result, we would need to carry export specific flag in fullgraph_capture which seems not ideal. Regardless, the end state is that we need to give downstream user a graph module and a fake mode in some form, so I think _dynamo_graph_capture_for_export returning a fake mode within graph module itself via gm.meta Pull Request resolved: https://github.com/pytorch/pytorch/pull/164730 Approved by: https://github.com/avikchaudhuri viable/strict/1759823376	2025-10-07 03:42:46 +00:00
Animesh Jain	bcd96cc6ff	[annotate] Copy fwd to bwd metadata for subgraphs as well (#164795 ) The test is in the next PR. My older PR on dynamo annotate - https://github.com/pytorch/pytorch/pull/164678 is getting reverted due to unknown reasons, so difficult to add a test right now in this PR. When I reland, I can add a test for this as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164795 Approved by: https://github.com/yushangdi ghstack dependencies: #164772 viable/strict/1759821889	2025-10-07 02:42:47 +00:00
Yuanyuan Chen	50e077beaa	Fix outdated info in requirements-ci.txt (#164441 ) Fixes installation instructions and descriptions for `numba` and `scikit-image` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164441 Approved by: https://github.com/albanD	2025-10-07 02:10:41 +00:00
albanD	56d66ac0d7	Make custom op alias check consistent (#164576 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164576 Approved by: https://github.com/soulitzer	2025-10-07 02:05:09 +00:00
rraminen	49f7d8d19d	[ROCm] Fix test_cuda_synchronize failure on ROCm (#164735 ) This PR skips the hipify step of torch/csrc/jit/ir/ir.h to avoid a build-time error for the JIT cuda namespace. This fixes two skipped tests in test/jit/test_cuda.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164735 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> viable/strict/1759814683	2025-10-07 01:14:24 +00:00
PyTorch MergeBot	afee8062d5	Revert "Fix mesh.get_local_rank when it is > 1d (#164473 )" This reverts commit 83d71dfb2fd993a6242372b8123549acaa85ffdb. Reverted https://github.com/pytorch/pytorch/pull/164473 on behalf of https://github.com/izaitsevfb due to appears to be causing vision_maskrcnn regression ([comment](https://github.com/pytorch/pytorch/pull/164473#issuecomment-3374738997)) viable/strict/1759812581	2025-10-07 00:37:41 +00:00
Chris Leonard	e89d12bf5d	Numpy zerotensor handling (#164487 ) Fixes #89034 Updated tensor_to_numpy() function in tensor_numpy.cpp to handle ZeroTensors by throwing an error if force=False and returning an array full of zeros if force=True. @ngimel, I just saw that you mentioned PyTorch is not too concerned with this issue but I had already worked on it so I figured I would push it anyways and see what you thought. Feel free to close the PR if you think it is not worth merging. @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/164487 Approved by: https://github.com/izaitsevfb	2025-10-07 00:34:14 +00:00
Yedidya Feldblum	d4752bc7f6	[caffe2] tweak Unpickler::readInstruction handling TUPLE (#164764 ) Summary: Creating the vector was a bit awkward. Use the natural iterator-pair constructor with move-iterators. Test Plan: CI. Reviewed By: dolpm Differential Revision: D83995108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164764 Approved by: https://github.com/drisspg viable/strict/1759811145	2025-10-07 00:18:10 +00:00
Jeff Daily	44a5d41993	[ROCm] add gfx1150 gfx1151 to supported gemm lists (#164744 ) This is one of a few PRs needed to address https://github.com/pytorch/pytorch/pull/164744 fully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164744 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-07 00:02:23 +00:00
Animesh Jain	361c5d362c	[fx][traceback] Actually disable preservation of node metadata when enable=False (#164772 ) This will come in handy when we run graph passes that add new nodes, and create_proxy can add seq_nr meta. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164772 Approved by: https://github.com/SherlockNoMad	2025-10-06 23:39:12 +00:00
PyTorch MergeBot	1fc71d1b57	Revert "Numpy zerotensor handling (#164487 )" This reverts commit f7ad6dbad67161333a1473d1e0b478b7475a0ec1. Reverted https://github.com/pytorch/pytorch/pull/164487 on behalf of https://github.com/malfet due to Did it break torchbench?, see `8c728e129d/1` ([comment](https://github.com/pytorch/pytorch/pull/164487#issuecomment-3374635051)) viable/strict/1759809089	2025-10-06 23:32:12 +00:00
Jeff Daily	8f54e27e5d	[ROCm][CI] rebuild magma binary for gfx1150 gfx1151 (#164782 ) After #164763 added gfx1150 gfx1151 to list of targets, this PR will trigger rebuild of magma binary for ROCm 7 with the new targets. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164782 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-06 23:29:21 +00:00
Scott Wolchok	8c0bc879b9	Reapply "C++-accessible Placements via pybind11 (#163030 )" (#164519 ) This makes Placement data representation available in C++ via pybind11. Reapply with fix for internal errors. D83788896 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164519 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2025-10-06 23:19:14 +00:00
Eddie Yan	746fe78ecd	[CUDA] Add experimental green context support for SM carveout (#159104 ) Low-level PyTorch APIs should be usable/stable enough at this point but we might move the underlying driver API usage a bit from here... Built on top of @drisspg 's branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/159104 Approved by: https://github.com/ngimel Co-authored-by: drisspg <drisspguessous@gmail.com> viable/strict/1759807643	2025-10-06 23:11:23 +00:00
Yuanyuan Chen	b63bbe1661	Remove old ROCm version check in tests (#164245 ) This PR removes ROCm<6 version checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164245 Approved by: https://github.com/jeffdaily	2025-10-06 22:42:01 +00:00
PyTorch MergeBot	3912ba3e94	Revert "Fix refine_ranges corner case (#164075 )" This reverts commit 27234792add2ee9bedd84ca02dbf34f8f244bc5c. Reverted https://github.com/pytorch/pytorch/pull/164075 on behalf of https://github.com/izaitsevfb due to fails executorch builds, see [D83938444](https://www.internalfb.com/diff/D83938444) ([comment](https://github.com/pytorch/pytorch/pull/164075#issuecomment-3374430964))	2025-10-06 22:09:39 +00:00
PyTorch MergeBot	cfc5cc17dc	Revert "[dynamo] Support torch.fx.traceback.annotate (#164678 )" This reverts commit 2883b5ab773daf5861d43ff0b65be49a441ab3f9. Reverted https://github.com/pytorch/pytorch/pull/164678 on behalf of https://github.com/izaitsevfb due to fails inductor:max_autotune tests internally, see D83948169 ([comment](https://github.com/pytorch/pytorch/pull/164678#issuecomment-3374407009)) viable/strict/1759804909	2025-10-06 22:03:42 +00:00
zeshengzong	fdc8ccc5bc	Make `Adam`, `AdamW` work with nonzero-dim Tensor betas (#149939 ) Fixes #147921 ## Changes - Convert tensor `betas` using `_to_scalar` - Change annotation of `betas` param - Change param type in docs ## Test Result ```bash pytest -s test/test_optim.py -k test_tensor_lr -vv ``` ![image](https://github.com/user-attachments/assets/312ee045-1e8b-4789-aa6e-ba63e6df7e81) ![image](https://github.com/user-attachments/assets/7e6ec274-645b-46b9-b1a6-2b340a685203) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149939 Approved by: https://github.com/janeyx99 Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2025-10-06 22:03:25 +00:00

1 2 3 4 5 ...

94067 Commits