pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Xuehai Pan	047ae24e34	Eliminate setup.py install/develop in the codebose (#162329 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162329 Approved by: https://github.com/ezyang	2025-09-29 03:54:28 +00:00
Yuanyuan Chen	3cda34ebde	[2/N] Apply ruff UP035 check in torch files (#164054 ) This is the result of applying the ruff `UP035` check. `Callable` is imported from `collections.abc` instead of `typing`. `TypeAlias` is also imported from `typing`. This PR is the follow-up of #163947. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164054 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2025-09-29 03:35:32 +00:00
Yuanyuan Chen	352197c508	Remove old ROCm skip conditions in tests (#164058 ) This PR removes skip conditions for ROCM <= 3.5. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164058 Approved by: https://github.com/kwen2501	2025-09-29 03:00:58 +00:00
Animesh Jain	811c693c49	[dynamo] Special path for cloning of torch dispatch tensors (#164081 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164081 Approved by: https://github.com/tugsbayasgalan ghstack dependencies: #164084	2025-09-29 01:44:44 +00:00
Animesh Jain	c2768d0f5a	[export] Skip the check instead of disable (#164084 ) Its unclear why we had disable in the first place. With install_free_tensors, we are tracing into this hook. A better way would be to place the tracer without any hook. For now, disable the checking while dynamo is tracing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164084 Approved by: https://github.com/tugsbayasgalan	2025-09-29 01:44:44 +00:00
Yuanyuan Chen	a8c528c105	[1/N] Apply UP035 rule in tests (#163947 ) Apply UP035 `ruff` rule in tests, but some tests for `fx` and `dynamo` are excluded in case the old typing is the test target. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163947 Approved by: https://github.com/ezyang	2025-09-29 01:42:01 +00:00
Animesh Jain	dc54ce7554	[hops] Support unspecialized nn module for export hops (#164082 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164082 Approved by: https://github.com/tugsbayasgalan ghstack dependencies: #164079	2025-09-29 01:34:10 +00:00
Animesh Jain	1981ed4f60	[dynamo][logging] Add to param_count only if metrics_count is active (#164079 ) This is rare but happens with executorch tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164079 Approved by: https://github.com/tugsbayasgalan	2025-09-29 00:59:18 +00:00
jainapurva	54b38f3b46	Add operator benchmarking run to CI nightly (#162530 ) This PR introduces a new "operator microbenchmark" CI workflow and GitHub Actions for operator microbenchmarks, updating test scripts and job matrices to support new parameters, and broadening the operator benchmark tests to include more data types, larger shapes, and gradient tests. The benchmark configurations now focus more on different cuda hardware and multiple dtypes (bf16, fp16, fp32), for both compile and eager mode. Benchmark Configuration and Coverage: * Expanded operator benchmark configurations in `addmm_test.py`, `bmm_test.py`, `matmul_test.py`, and `mm_test.py` to benchmark multiple dtypes on CUDA devices, in eager and compile mode, for forward and backward run. The configs with tag "long" for the above mentioned files are being run in CI. * The CI benchmarking is running on various hardwares: H100, A100. * The CI job also uploads the microbenchmarking outputs to a [HUD](https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch&benchmarkName=PyTorch+operator+microbenchmark) dashboard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162530 Approved by: https://github.com/huydhn Co-authored-by: Huy Do <huydhn@gmail.com>	2025-09-29 00:46:38 +00:00
RiyaP-QA	bc5a072ebf	fixes import error 'functionalize' from functorch (#163746 ) Fixes #163637 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163746 Approved by: https://github.com/malfet	2025-09-28 23:16:45 +00:00
RajeshvShiyal	d1b3481131	registraion replaced with registration in jit_type.h file comment (#164072 ) Fixes #164071 typo correction done Pull Request resolved: https://github.com/pytorch/pytorch/pull/164072 Approved by: https://github.com/Skylion007	2025-09-28 22:55:24 +00:00
Yuanyuan Chen	3766513d25	Remove C++ workarounds for Python < 3.10 (#164055 ) Remove two unnecessary `PY_VERSION_HEX` branches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164055 Approved by: https://github.com/ezyang	2025-09-28 20:00:02 +00:00
FFFrog	ea6846b231	[CI] Remove the unnecessary workflow related functorch (#162581 ) The [docs](https://docs.pytorch.org/functorch/stable/) about `functorch` has been migrated into [PyTorch Doc](https://docs.pytorch.org/docs/stable/func.html) since PyTorch 2.0, so I think we can remove it right now to reduce the compute resources usages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162581 Approved by: https://github.com/ezyang	2025-09-28 19:56:20 +00:00
Tugsbayasgalan Manlaibaatar	f6537d9616	Move control flow export tests to new tracer (#163259 ) Differential Revision: [D82732614](https://our.internmc.facebook.com/intern/diff/D82732614) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163259 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #163136, #163137, #163258	2025-09-28 19:56:09 +00:00
Tugsbayasgalan Manlaibaatar	cc0332563e	Use new_tracer_experimental for torchao strict export (#163258 ) Export team is fixing up the old strict export implementation, as a result it fails a check where we proxy the whole module under given directories. _WrapperModule is a way for torchao to workaround the issue where export requiring nn.module to trace so it should never get proxied in the graph. Differential Revision: [D82732613](https://our.internmc.facebook.com/intern/diff/D82732613) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163258 Approved by: https://github.com/anijain2305 ghstack dependencies: #163136, #163137	2025-09-28 19:55:54 +00:00
Tugsbayasgalan Manlaibaatar	8239ba4087	Fix various bugs in subclass input in export (#163770 ) This adds basic support for subclass inputs in export (specifically for non-strict). I had to make fakify little more complicated which risks further divergence from dynamo fakification. But dynamo one is so complex, so i feel it is better to do this way. Also improved fake mode detection logic to recursively look into subclass inner tensors. Differential Revision: [D83156489](https://our.internmc.facebook.com/intern/diff/D83156489) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163770 Approved by: https://github.com/avikchaudhuri	2025-09-28 18:03:32 +00:00
Tugsbayasgalan Manlaibaatar	1fdd99de71	Building guards should be under metrics_context (#163967 ) Differential Revision: [D83354042](https://our.internmc.facebook.com/intern/diff/D83354042) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163967 Approved by: https://github.com/avikchaudhuri	2025-09-28 16:28:34 +00:00
lichuyang	38ed608956	Better error handling in torch/nativert/* (#163308 ) Replace the runtime_error of the vallina C++ exceptions with TORCH_CEHCK in torch/nativert/* The vallina C++ exception should not exist in the core part of pytorch for its corss-languanges trait. Comparing with the vallina C++ exceptions, TORCH_CHECK have the richer error context and It has the unified error handling mechanism. This commit replace the runtime_error with TORCH_CHECK of the files in torch/nativert/* . Fixes part of #148114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163308 Approved by: https://github.com/dolpm	2025-09-28 14:23:44 +00:00
Alessandro Fanfarillo	238dc65368	[ROCm] use hipSolver instead of MAGMA for Cholesky (#163977 ) Currently, the Cholesky factorization and least squares operation defaults to magma when Pytorch is compiled for ROCm. This shows suboptimal performance. This change allows PyTorch to rely on hipSolver instead of Magma. @jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/163977 Approved by: https://github.com/Skylion007	2025-09-28 06:53:06 +00:00
Laith Sakka	7bbde0c094	Remove unused argument from DEFINE_BINARY macro. (#163868 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163868 Approved by: https://github.com/Skylion007 ghstack dependencies: #163822	2025-09-28 06:32:41 +00:00
Laith Sakka	dfcab0e7e1	Handle DDE in infer_size_impl (#163822 ) hit this while running VLLM with unbacked for model Qwen/Qwen2-1.5B-Instruct Pull Request resolved: https://github.com/pytorch/pytorch/pull/163822 Approved by: https://github.com/bobrenjc93, https://github.com/Skylion007	2025-09-28 06:32:41 +00:00
PyTorch UpdateBot	1cc9263f52	[vllm hash update] update the pinned vllm hash (#164053 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164053 Approved by: https://github.com/pytorchbot	2025-09-28 04:35:17 +00:00
Yuanyuan Chen	c2862c8e66	[distributed] Remove python code older than 3.10 (#163613 ) Because now that the minimum Python version is 3.10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163613 Approved by: https://github.com/XuehaiPan, https://github.com/kwen2501	2025-09-28 04:15:24 +00:00
Laith Sakka	b377c9e365	graph break on tolist if capture_scalar_outputs is false (#163807 ) address https://github.com/pytorch/pytorch/issues/163798 its problematic to not graph break because: 1. break current contract. 2. well dynamo trace then we have .item call then if we ever re-trace later in autograd for example we hit a failure (We do not know where to graph break at that point)! see the added unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163807 Approved by: https://github.com/bobrenjc93	2025-09-28 04:02:52 +00:00
Avik Chaudhuri	3059b08012	[inductor] add subsystem to pattern matcher (#163922 ) Summary: Running a toy example through `torch.compile(fullgraph=True, backend="inductor")` with default inductor config, I tried to see what passes are run in each of pre-grad, joint-graph, and post-grad phases by printing out the subsystem in `GraphTransformObserver`. However the subsystem showed up as None in a bunch of transforms that were run in each of those phases, so this PR adds some additional annotations. Note that these annotations are probably not a complete set, since other transforms may run based on changes to the config that are not covered here. Hopefully this doesn't change behavior. However, I did notice that bisecting relies on disabling various phases, which means that while before some passes would not be disabled (because their subsystem was `None`), now they would. Test Plan: existing tests + manual test described in summary Differential Revision: D83306676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163922 Approved by: https://github.com/jansel	2025-09-28 03:15:23 +00:00
Aaron Gokaslan	5504a06e01	[BE]: Update NCCL to 2.28.3 (#162351 ) @eqy New NCCL has some a bunch of bugfixes for features including reducing the number SMs needed by NVLINK collectives as well as some very useful new APIs for SymmetricMemory. Also allows FP8 support for non-reductive operations on pre-sm90 devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162351 Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/atalman	2025-09-28 01:38:59 +00:00
lichuyang	1ad491dd88	Better error handling in torch/csrc/jit/ir/* (#163757 ) Refactor error handling to use TORCH_CHECK for improved clarity in constants and scope management Fixes some parts of ISSUE #148114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163757 Approved by: https://github.com/albanD	2025-09-28 01:18:24 +00:00
Bob Ren	fd20889d0b	Add type annotations to MPS profiler utilities (#163486 ) ## Summary - drop the local mypy allow-untyped-defs escape hatch in the MPS profiler helpers - annotate the context managers and bool helpers so they type-check cleanly ## Testing - python -m mypy torch/mps/profiler.py --config-file mypy-strict.ini ------ https://chatgpt.com/codex/tasks/task_e_68d0ce4df2e483268d06673b65ef7745 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163486 Approved by: https://github.com/Skylion007	2025-09-27 23:00:53 +00:00
fduwjj	2ce2e48a05	[WIP][symm_mem] Add a wait for signal and put signal for one side API (#159837 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159837 Approved by: https://github.com/kwen2501	2025-09-27 21:20:13 +00:00
Aart J.C. Bik	1d98be6abf	[NFC] fixed typo in sparse semi structured filename (#163904 ) Make sure all semi structured files use "SparseSemiStructured" Pull Request resolved: https://github.com/pytorch/pytorch/pull/163904 Approved by: https://github.com/Skylion007	2025-09-27 21:19:48 +00:00
Chien-Chin Huang	dfda239cce	[DTensor] Raise an RuntimeError when checkpointing APIs are used with Partial placement (#163941 ) A DTensor that contains partial placement shouldn't be checkpointed (DCP.save) -- the result is not correct and DCP doesn't know how to handle it. There are several APIs that are only used by checkpointing, e.g.,`__create_write_items__`. These APIs should raise an exception if the DTensor, `self`, has Partial placement. Ideally, we want to add the following test: ``` with self.assertRaisesRegex( RuntimeError, "Any checkpointing related operations are not supported for" ): dcp.save({"dtensor": dtensor}, checkpoint_id=tempfile.gettempdir()) ``` While we do see the RuntimeError is raised, the error was raised in another thread due to DTensor checkpoint APIs are called by DCP in a separate thread, which assertRaisesRegex cannot capture. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163941 Approved by: https://github.com/tianyu-l	2025-09-27 19:50:16 +00:00
Animesh Jain	991e3d0d16	[dynamo][guards] Revert introduction of different types of lambda_guards (#163385 ) With https://fb.workplace.com/groups/260102303573409/permalink/787294574187510/ issue, it might be a better idea to just speedup _realize_dict and keep the changes very local. So reverting this PR as well, to return to clean slate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163385 Approved by: https://github.com/jansel	2025-09-27 18:20:48 +00:00
Yidi Wu	8f6dbc0ba8	[scan] create fw and bw graphs via partitioning (#162754 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162754 Approved by: https://github.com/zou3519 ghstack dependencies: #161557, #161664, #161808, #162025, #161732	2025-09-27 18:13:15 +00:00
Yidi Wu	3413490f53	[scan] materialize combine_fn in forward add more autograd tests (#161732 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161732 Approved by: https://github.com/zou3519 ghstack dependencies: #161557, #161664, #161808, #162025	2025-09-27 18:13:15 +00:00
Yidi Wu	b85bee3bbb	[hop] refactor check input alias and mutation to be a graph pass (#162025 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162025 Approved by: https://github.com/zou3519 ghstack dependencies: #161557, #161664, #161808	2025-09-27 18:13:15 +00:00
Yidi Wu	66dbf2c9f5	[scan][autograd] clone outputs that's aliasing with inputs or outputs in bw (#161808 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161808 Approved by: https://github.com/zou3519 ghstack dependencies: #161557, #161664	2025-09-27 18:13:15 +00:00
Yidi Wu	f5d85874dd	[scan][be] remove unnecessary tensor checks (#161664 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161664 Approved by: https://github.com/Skylion007, https://github.com/zou3519 ghstack dependencies: #161557	2025-09-27 18:13:14 +00:00
Yidi Wu	8f15d6a0c9	[test][scan] refactor inductor test and prepare for adding bw tests (#161557 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161557 Approved by: https://github.com/zou3519	2025-09-27 18:13:14 +00:00
redwrasse	e78792a70d	Update ctc loss docs float32 input required for CuDNN (#162042 ) Discovered while working on https://github.com/pytorch/pytorch/pull/159106 the non-obvious requirement that inputs must be float32 to use CuDNN (https://github.com/pytorch/pytorch/pull/159106#issuecomment-3189981705), otherwise the native CUDA implementation is called. Updates the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162042 Approved by: https://github.com/mikaylagawarecki Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>	2025-09-27 18:10:17 +00:00
Ke Wen	d9db838f58	[CI] Re-enable test_all_to_all_vdev_2d_offset (#163985 ) Fixes https://github.com/pytorch/pytorch/issues/163847 Moving allocations upfront and collectives later. The hang goes away. My investigation indicates that the hang is inside the last call `torch.testing.assert_close(out_expected, out[:out_numel])`. Rank 3 calls into it, but never gets out. Don't know why yet. I will investigate more. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163985 Approved by: https://github.com/fegin	2025-09-27 16:56:25 +00:00
FFFrog	6ba83e06a5	[AMP] Add deprecated decorator for torch.xxx.amp.autocast class (#163654 ) As the title stated. Changes: - torch.cuda.amp.autocast - torch.cpu.amp.autocast - add explicit `__new__` and `__init_subclass__` for those class above for inspect.signature to retrieve correct signature Pull Request resolved: https://github.com/pytorch/pytorch/pull/163654 Approved by: https://github.com/Skylion007	2025-09-27 14:37:12 +00:00
FFFrog	960290d629	[Docs] Add standard-imghdr for PyTorch Doc (#163944 ) As the title stated. Python [Pep-0594](https://peps.python.org/pep-0594) have removed imghdr from python standard libaries, the older version of sphinx don`t add it as installation dependencies, so we need to add it to requirement as an temporary dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163944 Approved by: https://github.com/albanD, https://github.com/svekars	2025-09-27 08:14:51 +00:00
Min Si	b1a4efc302	[amd] Add cudaHostFn_t to cuda_to_hip_mappings (#164007 ) Summary: See title Test Plan: ``` buck build --flagfile fbcode//mode/opt-amd-gpu fbcode//comms/ctran/algos/common/tests:ctran_algo_gpe_kernel_sync_test ``` After fix: https://www.internalfb.com/buck2/362ff91e-53f2-4b82-9536-cb84c91384a2 Before fix: failed in D83294731 (version 1): https://www.internalfb.com/sandcastle/workflow/1792432651703947243 Differential Revision: D83375414 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164007 Approved by: https://github.com/llxxee	2025-09-27 06:09:50 +00:00
Wei Wang	96182faf96	[CI][Distributed][CUDA][Symm-Mem] Enable B200 Symm Mem Test (#162988 ) Inspired by https://github.com/pytorch/pytorch/pull/162981 and motivated by https://github.com/pytorch/pytorch/pull/159323 taking a total of 20 hours to finish (and unlikely to make it in short time due to https://github.com/pytorch/pytorch/issues/162178 ) Creating this subtest to get something distributed on B200. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162988 Approved by: https://github.com/malfet	2025-09-27 05:12:05 +00:00
bobrenjc93	dcb8af7501	[torchfuzz] fix bool propagation (#164003 ) bools can't propogate through the current pointwise ops such as add/mul. once we add more that can, we'll probably want to add an additional subclass that supports pointwise bools, but for now just don't allow it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164003 Approved by: https://github.com/pianpwk ghstack dependencies: #163743, #163812, #163890, #164002	2025-09-27 04:51:29 +00:00
PyTorch UpdateBot	280e712c13	[vllm hash update] update the pinned vllm hash (#164029 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164029 Approved by: https://github.com/pytorchbot	2025-09-27 04:34:57 +00:00
Arsh Zahed	254d2864d6	Add runtime_overhead PR Time Benchmark (#163866 ) This adds a PR time benchmark that checks for runtime overhead on a very small graph. This will help track regressions in runtime overhead. Example Results: ``` runtime_overhead_inductor,instruction_count,222645 runtime_overhead_inductor_inference_mode,instruction_count,234998 runtime_overhead_inductor_requires_grad,instruction_count,293556 runtime_overhead_inductor_requires_grad_backward,instruction_count,78181 runtime_overhead_inductor_dynamic,instruction_count,234870 runtime_overhead_inductor_inference_mode_dynamic,instruction_count,248711 runtime_overhead_inductor_requires_grad_dynamic,instruction_count,309979 runtime_overhead_inductor_requires_grad_backward_dynamic,instruction_count,77599 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163866 Approved by: https://github.com/jansel, https://github.com/mlazos, https://github.com/anijain2305	2025-09-27 03:26:59 +00:00
Eli Uriegas	9dac6437da	lint: Filter out /usr/include from results (#164012 ) Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164012 Approved by: https://github.com/ZainRizvi ghstack dependencies: #164008	2025-09-27 00:54:07 +00:00
Eli Uriegas	8a0e8cad5f	lint: Only include files in pytorch (#164008 ) We were seeing instances of stdlib files in clang-tidy output so this just essentially removes them from the things that lintrunner will report up. Longer term fix here would be to just modify the clang-tidy configuration in order to do the correct thing here but that requires a bit more investigation as to why this is only happening in CI and is not reproduceable locally. Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164008 Approved by: https://github.com/ZainRizvi	2025-09-27 00:54:07 +00:00
bobrenjc93	3a115da3e6	[torchfuzz] ones over zero (#164002 ) reduces likelihood of divide by zero errors. long term we'll probably want to just fuzz these values entirely Pull Request resolved: https://github.com/pytorch/pytorch/pull/164002 Approved by: https://github.com/pianpwk ghstack dependencies: #163743, #163812, #163890	2025-09-27 00:53:02 +00:00

1 2 3 4 5 ...

93641 Commits