pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Aaron Gokaslan	b85c460749	[BE][Ez]: Update NVTX submodule to 3.2.1 (#154797 ) Update NVTX3 submodule to 3.2.1. * Mostly improved compiler support, Python support, and better CMake and C++ support. * Also has a few new APIs to support fancy new features. * This is header only library so should be an easy non-invasive change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154797 Approved by: https://github.com/jansel	2025-05-31 23:01:13 +00:00
Pearu Peterson	6a781619bf	Temporarily disable sparse tensor validation when loading from external storage. (#154758 ) As in the title per https://github.com/pytorch/pytorch/issues/153143#issuecomment-2917793067 . The plan is to workout a solution that will allow (1) disabling pinned memory check to fix the original issue and (2) switching off the sparse tensor validation for maximal performance in loading sparse tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154758 Approved by: https://github.com/amjames, https://github.com/ngimel	2025-05-31 19:45:44 +00:00
FindHao	c99e91b1d7	[BE]Enhance _get_clean_triton.py to auto-generate launch_params if missing (#154666 ) Previously, @Chillee wrote a script https://github.com/pytorch/pytorch/pull/125811 to remove inductor dependency for inductor compiled triton kernels. We'd like to automate the process of obtaining the launch parameters. Added functionality to the torch/utils/_get_clean_triton.py to automatically generate the launch_params file if it does not exist and the auto_generate_params flag is set to True. This includes running the input file in a subprocess with the appropriate environment variable. Updated the get_clean_triton function and the main script to support this new feature, allowing users to disable auto-generation via a command-line argument. # Test Plan test embedding op in TritonBench ``` # generate inductor compiled triton kernels TORCH_COMPILE_DEBUG=1 TORCHINDUCTOR_FX_GRAPH_CACHE=0 python run.py --op embedding --mode fwd --precision fp32 --metrics nsys_rep --only inductor_embedding --num-inputs 1 --input-id 11 # run the script to get rid of inductor dependency. By default, triton_only_repro.py is the output file name. python ~/pytorch/torch/utils/_get_clean_triton.py ~/tritonbench/torch_compile_debug/run_2025_05_29_14_47_50_497790-pid_849274/torchinductor/model__0_forward_1.0/output_code.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/154666 Approved by: https://github.com/davidberard98	2025-05-31 19:27:56 +00:00
xu-shawn	c014e4bcaa	Fix typo in vec256 interleave2 (#154784 ) Fix a typo where the elements in a vector are mislabeled Pull Request resolved: https://github.com/pytorch/pytorch/pull/154784 Approved by: https://github.com/malfet, https://github.com/Skylion007	2025-05-31 14:17:10 +00:00
fan.mo	daff263062	[Functorch] Support Functorch for PrivateUse1 backend (#154700 ) This PR enable that functorch to be used in 3rd party backends. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154700 Approved by: https://github.com/zou3519	2025-05-31 07:28:45 +00:00
Nicolas Macchioni	15e9119a69	[BE] install_triton_wheel.sh update for internal dev (#154637 ) internal devgpu gets mad at `pip install ...` but `python3 -m pip install ...` is fine Pull Request resolved: https://github.com/pytorch/pytorch/pull/154637 Approved by: https://github.com/Skylion007, https://github.com/cyyever	2025-05-31 06:57:56 +00:00
Animesh Jain	7368eeba5e	[dynamo][guards] Prevent LENGTH guard on nn modules (#154763 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154763 Approved by: https://github.com/williamwen42	2025-05-31 05:32:31 +00:00
henrylhtsang	7a79de1c0f	[inductor] Add kernel_hash_key to ChoiceCaller (#154470 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154470 Approved by: https://github.com/mlazos	2025-05-31 03:09:37 +00:00
PyTorch MergeBot	bd10ea4e6c	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit ad26ec6abe51d528124bc5fbbacaa87aef077ab8. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923997777))	2025-05-31 02:14:24 +00:00
Peter Y. Yeh	43390d8b13	ROCm Sparsity through HipSparseLT (#150578 ) TLDR: - This pull request introduces support for hipSPARSELt in ROCm, current usage would be semi-structure sparsity. - Require ROCm 6.4 && gfx942/gfx950. - The average performance uplift (compare to dense operation) is ~ 20% in ROCm 6.4 but expect further performance lift along the way. ### Dense vs. Sparse Performance Comparison #### NT (Row-major) Average Uplift: `1.20` \| M \| N \| K \| hipsparselt-bench (us) \| hipblaslt-bench get all (us) \| Uplift \| \|-------\|--------\|--------\|-------------------------\|-------------------------------\|--------\| \| 14336 \| 8 \| 4096 \| 20.05 \| 25.3 \| 1.26 \| \| 4096 \| 8 \| 14336 \| 21.07 \| 25.28 \| 1.20 \| \| 3072 \| 3072 \| 10240 \| 299.05 \| 351.82 \| 1.18 \| \| 3072 \| 1536 \| 768 \| 18.56 \| 20.05 \| 1.08 \| \| 3072 \| 17664 \| 768 \| 163.13 \| 173.91 \| 1.07 \| \| 3072 \| 196608 \| 768 \| 1717.30 \| 1949.63 \| 1.14 \| \| 3072 \| 24576 \| 768 \| 206.84 \| 242.98 \| 1.17 \| \| 3072 \| 6144 \| 768 \| 53.90 \| 56.88 \| 1.06 \| \| 3072 \| 98304 \| 768 \| 833.77 \| 962.28 \| 1.15 \| \| 768 \| 1536 \| 768 \| 8.53 \| 19.65 \| 2.30 \| \| 768 \| 17664 \| 768 \| 46.02 \| 46.84 \| 1.02 \| \| 768 \| 196608 \| 768 \| 463.15 \| 540.46 \| 1.17 \| \| 768 \| 24576 \| 768 \| 54.32 \| 59.55 \| 1.10 \| \| 768 \| 6144 \| 768 \| 19.47 \| 20.15 \| 1.03 \| \| 768 \| 98304 \| 768 \| 231.88 \| 258.73 \| 1.12 \| --- #### NN (Row-major) Average Uplift: `1.13` \| M \| N \| K \| hipsparselt-bench (us) \| hipblaslt-bench get all (us) \| Uplift \| \|-----\|--------\|-------\|-------------------------\|-------------------------------\|--------\| \| 768 \| 1536 \| 3072 \| 27.50 \| 28.78 \| 1.05 \| \| 768 \| 17664 \| 3072 \| 125.06 \| 158.94 \| 1.27 \| \| 768 \| 196608 \| 3072 \| 1568.38 \| 1767.12 \| 1.13 \| \| 768 \| 24576 \| 3072 \| 171.05 \| 203.49 \| 1.19 \| \| 768 \| 6144 \| 3072 \| 58.72 \| 60.39 \| 1.03 \| \| 768 \| 98304 \| 3072 \| 787.15 \| 887.60 \| 1.13 \| ------------------------- This pull request introduces support for hipSPARSELt in ROCm, alongside various updates and improvements to the codebase and test suite. The changes primarily involve adding configuration flags, updating conditional checks, and ensuring compatibility with hipSPARSELt. ### ROCm and hipSPARSELt Support: * [`BUILD.bazel`](diffhunk://#diff-7fc57714ef13c3325ce2a1130202edced92fcccc0c6db34a72f7b57f60d552a3R292): Added `@AT_HIPSPARSELT_ENABLED@` substitution to enable hipSPARSELt support. * [`aten/CMakeLists.txt`](diffhunk://#diff-0604597797bb21d7c39150f9429d6b2ace10b79ab308514ad03f76153ae8249bR104-R110): Introduced a conditional flag to enable hipSPARSELt support based on ROCm version. * [`aten/src/ATen/CMakeLists.txt`](diffhunk://#diff-ce80f3115ab2f6be5142f0678a1fc92c6b2d7727766ce44f48726c99e720f777R37): Added `AT_HIPSPARSELT_ENABLED` configuration. * [`aten/src/ATen/cuda/CUDAConfig.h.in`](diffhunk://#diff-8bb82da825ca87c28233abacffa1b0566c73a54990b7a77f3f5108d3718fea15R11): Defined `AT_HIPSPARSELT_ENABLED` macro. * `caffe2/CMakeLists.txt`, `cmake/Dependencies.cmake`, `cmake/public/LoadHIP.cmake`: Included hipSPARSELt in the ROCm dependencies. [[1]](diffhunk://#diff-c5ee05f1e918772792ff6f2a3f579fc2f182e57b1709fd786ef6dc711fd68b27R1380) [[2]](diffhunk://#diff-12e8125164bbfc7556b1781a8ed516e333cc0bf058acb7197f7415be44606c72L1084-R1084) [[3]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5R153) ### Codebase Updates: * [`aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp`](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R1-R6): Added hipSPARSELt support checks and initialization functions. Updated various methods to conditionally handle hipSPARSELt. [[1]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R1-R6) [[2]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R22-R67) [[3]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R78-R85) [[4]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R97-R109) [[5]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R183-R188) [[6]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3L134-R200) [[7]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R213-R222) [[8]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3L217-R285) ### Test Suite Updates: * [`test/test_sparse_semi_structured.py`](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR50-R65): Added checks for hipSPARSELt availability and updated test conditions to skip tests not supported on ROCm. [[1]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR50-R65) [[2]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR228) [[3]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR239) [[4]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR250) [[5]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR579) [[6]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR624) [[7]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR661) [[8]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR695) [[9]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR730) [[10]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR755) [[11]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR771) [[12]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR809) [[13]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR844) [[14]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cL840-R854) [[15]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR1005) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150578 Approved by: https://github.com/jeffdaily	2025-05-31 02:03:40 +00:00
cyy	ad26ec6abe	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-31 01:54:35 +00:00
PyTorch MergeBot	3e71016459	Revert "Aten vector default constructors set to 0, add fnmadd and fnmsub (#154298 )" This reverts commit 489afa829a248ca64c4b2dffe2e6d601b8816cf9. Reverted https://github.com/pytorch/pytorch/pull/154298 on behalf of https://github.com/izaitsevfb due to breaks linux-jammy-aarch64-py3.10 / build ([comment](https://github.com/pytorch/pytorch/pull/154298#issuecomment-2923966688))	2025-05-31 01:51:59 +00:00
Robert Burke	489afa829a	Aten vector default constructors set to 0, add fnmadd and fnmsub (#154298 ) Test Plan: The only functional change is zero-initialization instead of undefined-initialization. If tests pass, I think it should be fine. Differential Revision: D75345074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154298 Approved by: https://github.com/swolchok	2025-05-31 01:32:45 +00:00
dolpm	472773c7f9	[nativert] move OpKernelKind enum to torch (#154756 ) Summary: att Test Plan: ci Differential Revision: D75703996 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154756 Approved by: https://github.com/zhxchen17, https://github.com/cyyever	2025-05-31 01:31:29 +00:00
Natalia Gimelshein	f01e628e3b	Resubmit Remove MemPoolContext (#154042 ) (#154746 ) Summary: Per title Test Plan: Added tests + existing tests Differential Revision: D75695030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154746 Approved by: https://github.com/malfet	2025-05-31 01:21:54 +00:00
Siddharth Kotapati	932733e0e6	Fix memory leaks in mps_linear_nograph (#154765 ) Fixes some memory leaks which were identified as part of the investigation of https://github.com/pytorch/pytorch/issues/154329. This doesn't appear to be the whole solution but wanted to merge this anyway since it's a quick fix In my tests I see roughly 3MB of unexpected memory growth before this change, and after this change I see 2.2MB of memory growth Pull Request resolved: https://github.com/pytorch/pytorch/pull/154765 Approved by: https://github.com/malfet	2025-05-31 00:46:12 +00:00
PyTorch MergeBot	108422ac26	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit 78624679a876a21acb14bf075ba6beccff21b9a0. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923785799))	2025-05-31 00:28:03 +00:00
mori360	da4aacabac	Add h100_distributed label (#154562 ) Add h100_distributed label, testing distributed 3D composability tests on 8*H100 GPU node. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154562 Approved by: https://github.com/seemethere	2025-05-31 00:17:43 +00:00
David Berard	9b5308cd58	[upstream triton] support build with setup.py in ./python/ or in ./ (#154635 ) Upstream triton has moved setup.py from python/ to ./. This PR allows versions to be buildable by checking the location of setup.py and choosing the cwd of the build commands based on the location. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154635 Approved by: https://github.com/atalman	2025-05-31 00:15:43 +00:00
Catherine Lee	b019a33f8f	[ez][CI] Reuse old whl: remove old zip/whl (#154770 ) Forgot that unzip doesn't get rid of the zip so the old one is still there Unrelated: figure out how to update the git version Pull Request resolved: https://github.com/pytorch/pytorch/pull/154770 Approved by: https://github.com/ZainRizvi, https://github.com/malfet	2025-05-31 00:13:24 +00:00
PyTorch MergeBot	0fab32290a	Revert "[draft export] avoid storing intermediate real tensors in proxies (#154630 )" This reverts commit 5acb8d50801e6d110790993464611314dd1bd54b. Reverted https://github.com/pytorch/pytorch/pull/154630 on behalf of https://github.com/malfet due to This still ooms, at least occasionally see `78624679a8/1` ([comment](https://github.com/pytorch/pytorch/pull/154630#issuecomment-2923759745))	2025-05-31 00:07:56 +00:00
Yidi Wu	faf973da5e	[refactor] move materialize_as_graph to _higher_order_ops/utils.py (#154070 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154070 Approved by: https://github.com/zou3519	2025-05-31 00:06:44 +00:00
cyy	78624679a8	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-31 00:01:52 +00:00
Pian Pawakapan	5f1c3c67b2	[pgo] log dynamic whitelist in PT2 Compile Events (#154747 ) Summary: logs the whitelist to PT2 Compile Events Test Plan: loggercli codegen GeneratedPt2CompileEventsLoggerConfig Reviewed By: bobrenjc93 Differential Revision: D75617963 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154747 Approved by: https://github.com/angelayi	2025-05-30 23:54:24 +00:00
Aaron Gokaslan	bbda22e648	[BE][Ez]: Optimize unnecessary lambda with operator (#154722 ) Automated edits performed by FURB118. Operator is implemented in C and way faster when passed to another C method like sorted, max etc as a `key=` Pull Request resolved: https://github.com/pytorch/pytorch/pull/154722 Approved by: https://github.com/jansel	2025-05-30 23:47:10 +00:00
Catherine Lee	0f3db20132	[ez][CI] Do not reuse old whl if deleting files (#154731 ) Thankfully very few commits actually delete files so I don't think has affected anything Pull Request resolved: https://github.com/pytorch/pytorch/pull/154731 Approved by: https://github.com/Skylion007	2025-05-30 22:35:13 +00:00
David Berard	eb93c0adb1	[inductor][AMD] support special kwargs in AMD triton configs (#154605 ) Context: AMD triton kernels can be launched with special kwargs, like `waves_per_eu`. Triton configs with these kwargs look like this: ``` triton.Config({ "BLOCK_SIZE": 64, "waves_per_eu": 2, }) ``` in comparison, nvidia's special kwargs are explicit parameters on the config, e.g. num_warps: ``` triton.Config( {"BLOCK_SIZE": 64}, num_warps=4, ) ``` Problem: this causes custom triton kernels w/ PT2 to error out, because there's a kwarg in the triton.Config that doesn't appear in the kernel signature. Solution: When splicing in the constexpr values into the arg list, ignore any values in the config kwargs list if they don't appear in the function signature. Differential Revision: [D75599629](https://our.internmc.facebook.com/intern/diff/D75599629/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D75599629/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/154605 Approved by: https://github.com/njriasan	2025-05-30 22:24:32 +00:00
PyTorch MergeBot	1193bf0855	Revert "convert inductor codecache to use getArtifactLogger (#153766 )" This reverts commit 5b6fd277f954b789649501e21e9689a42d565e13. Reverted https://github.com/pytorch/pytorch/pull/153766 on behalf of https://github.com/malfet due to I want to revert this change as I'm 90+% certain it somehow broke testing ([comment](https://github.com/pytorch/pytorch/pull/153766#issuecomment-2923620806))	2025-05-30 22:20:07 +00:00
Justin Chu	26aa8dcf27	[ONNX] Simplify onnx test dependencies (#154732 ) Simplify onnx test dependencies and bump onnxscript to 0.3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154732 Approved by: https://github.com/Skylion007	2025-05-30 21:58:04 +00:00
Pian Pawakapan	5acb8d5080	[draft export] avoid storing intermediate real tensors in proxies (#154630 ) Handles GC for non-strict draft export; GPU memory usage shouldn't be much more than eager mode + input tensors now. While trying to do draft export CPU offloading, I found out GC is feasible, because in non-strict, there's 2 places holding references to a `.real_tensor` attribute: 1) the FakeTensors in fake tensor prop, but these are held by the actual variables in the model's forward call, and so the real tensor gets gc-ed along with the fake one when the variable goes out of scope. 2) A clone of the fake tensor in 1) stored in `proxy.node.meta["val"]`, which was added in https://github.com/pytorch/pytorch/pull/150948. But we didn't actually need to store them on intermediate values; the placeholders are enough for retracing/lowering. Avoiding storing the intermediate values in 2), the values in 1) should be naturally GC-ed, and the real-tensor memory usage for non-strict should be pretty similar to eager computation? Strict still OOMs; dynamo still holds these in variable tracking, and not sure how to GC those. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154630 Approved by: https://github.com/angelayi, https://github.com/yushangdi	2025-05-30 21:06:55 +00:00
Feny Patel	abc2264e8f	remove another instance of mtia_workloadd from pytorch (#154739 ) Summary: ^ Test Plan: CIs Differential Revision: D75692171 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154739 Approved by: https://github.com/sraikund16	2025-05-30 20:50:46 +00:00
Paul Zhang	22a4cabd19	[Inductor] Add NaN assert to returned values from generated code (#154455 ) Summary: It is possible to have `reinterpret_tensor` in the output of inductor codegen, e.g. `reinterpret_tensor(buf366, (1024, ), (1, ), 0)` in the return tuple. This adds assertions to all return values from inductor codegen to prevent nans from slipping through and being hard to trace. Test Plan: NaN asserts properly generated in example gemm script: vars = (buf1, primals_2, buf2, primals_1, ) for var in vars: if isinstance(var, torch.Tensor): assert not var.isnan().any().item() assert not var.isinf().any().item() Pull Request resolved: https://github.com/pytorch/pytorch/pull/154455 Approved by: https://github.com/eellison	2025-05-30 20:32:56 +00:00
Aaron Gokaslan	ed1ff7d0fb	[BE][Ez]: Update mimalloc submodule to 2.2.3 (#154720 ) Updating minor version of mimalloc. The old version is more than 2 years old, and the newer release has performance fixes and compiler fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154720 Approved by: https://github.com/jansel	2025-05-30 20:17:13 +00:00
Aaron Gokaslan	2f03673ebf	[BE][Ez]: Enable ClangFormat aten/src/core/Formatting.cpp (#154719 ) Follow up to #152830 . Noticed the file was excluded from fromatting, opt in to clang-format since it's really close anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154719 Approved by: https://github.com/jansel	2025-05-30 19:52:43 +00:00
Alessandro Sangiorgi	f57754e815	[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#154618 ) This is a follow-up PR of the reverted one https://github.com/pytorch/pytorch/pull/148981 re-opening for visibility : Modified TorchInductor’s autotuning flow so that each best_config JSON file also includes the Triton “base32” (or base64) cache key. Motivation Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config. The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging. Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set store_cubin = True since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154618 Approved by: https://github.com/jansel	2025-05-30 19:30:25 +00:00
Isalia20	d6edefefbf	[CUDA] Fixes for backwards in memefficient attn for large tensors (#154663 ) followup to #154029. @ngimel Backwards had the same problem as well so this PR fixes it and adds support for logsumexp computation in the forward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154663 Approved by: https://github.com/ngimel	2025-05-30 19:30:07 +00:00
Alexander Grund	d89d213118	Fix test_tensorboard when started w/o tensorboard package (#154709 ) If `TEST_TENSORBOARD == False` then `DataType` is not defined or imported. However it is used unconditionally when defining the test with `parametrize` which leads to an NameError crashing the test execution on start. Provide a Dummy to make it syntactially correct. Tests will be skipped on start. ``` File "/dev/shm/build/pytorch-v2.2.1/test/test_tensorboard.py", line 885, in <module> class TestTensorProtoSummary(BaseTestCase): File "/dev/shm/build/pytorch-v2.2.1/test/test_tensorboard.py", line 889, in TestTensorProtoSummary (torch.float16, DataType.DT_HALF), ^^^^^^^^ NameError: name 'DataType' is not defined Got exit code 1, retrying... test_tensorboard 1/1 failed! [Errno 2] No such file or directory: '/dev/shm/build/pytorch-v2.2.1/.pytest_cache/v/cache/stepcurrent/test_tensorboard_0_0dba8bc00bbe233f' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/154709 Approved by: https://github.com/Skylion007	2025-05-30 19:18:43 +00:00
atalman	22641f42b6	[Binary-builds]Use System NCCL by default in CI/CD. (#152835 ) Use System NCCl by default. The correct nccl version is already built into the Manylinux docker image. Will followup with PR on detecting if user has NCCL installed and enabling USE_SYSTEM_NCCL by default in this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152835 Approved by: https://github.com/malfet	2025-05-30 18:51:48 +00:00
Ryan Guo	967937872f	[dynamo] Remove dead code path for `torch.Tensor.view(*shape)` (#154646 ) This was introduced in early days of Dynamo, and looks like it's been fixed since -- the regression test `test_transpose_for_scores` passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154646 Approved by: https://github.com/Skylion007, https://github.com/zou3519 ghstack dependencies: #154645	2025-05-30 18:50:58 +00:00
Ryan Guo	f9dc20c7a3	[dynamo] Fix syntax error in aot graph from kwarg-less `torch.Tensor.[random_\|uniform_]` calls (#154645 ) As title, fixes #151432, see more context in the issue discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154645 Approved by: https://github.com/zou3519	2025-05-30 18:50:58 +00:00
PyTorch MergeBot	fb67fa9968	Revert "[Inductor] Add NaN assert to returned values from generated code (#154455 )" This reverts commit aec3ef100844631cb7c4ce2725157984eb9cebfe. Reverted https://github.com/pytorch/pytorch/pull/154455 on behalf of https://github.com/malfet due to Looks like it broke inductor/test_compile_subprocess.py::CpuTests::test_AllenaiLongformerBase, see `35fc5c49b4/1`(default%2C%20&mergeEphemeralLF=true ([comment](https://github.com/pytorch/pytorch/pull/154455#issuecomment-2923154249))	2025-05-30 18:45:01 +00:00
PyTorch MergeBot	35fc5c49b4	Revert "[internal] Expose additional metadata to compilation callbacks (#153596 )" This reverts commit f889dea97dad3cc506d43e379a469334417040c8. Reverted https://github.com/pytorch/pytorch/pull/153596 on behalf of https://github.com/izaitsevfb due to introduces bunch of callback-related failures on rocm ([comment](https://github.com/pytorch/pytorch/pull/153596#issuecomment-2923139061))	2025-05-30 18:39:27 +00:00
Aaron Gokaslan	b6b9311f4f	[BE][Ez]: Fix typo in dynamo utils #154639 (#154748 ) Fixes a typo in #154639 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154748 Approved by: https://github.com/ngimel	2025-05-30 18:39:01 +00:00
Guilherme Leobas	bbdf469f0e	Add CPython dict tests (#150791 ) Tests: * test_dict.py * test_ordered_dict.py * test_userdict.py Minor changes were made to each test to run them inside Dynamo One can reproduce the changes by downloading the tests from CPython and applying the diff: ```bash for f in "test_dict" "test_ordered_dict" "test_userdict"; do wget -O "test/dynamo/cpython/3_13/${f}.py" "https://raw.githubusercontent.com/python/cpython/refs/heads/3.13/Lib/test/${f}.py" git apply "test/dynamo/cpython/3_13/${f}.diff" done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150791 Approved by: https://github.com/zou3519	2025-05-30 18:17:09 +00:00
Aaron Gokaslan	2120eeb8de	[BE][Ez]: Improve dynamo utils typing with TypeIs and TypeGuard (#154639 ) Adds some additional TypeIs and TypeGuard to some _dynamo utils for additional type narrowing Pull Request resolved: https://github.com/pytorch/pytorch/pull/154639 Approved by: https://github.com/jansel	2025-05-30 18:09:50 +00:00
zeshengzong	1b569e5490	Fix load_state_dict description (#154599 ) Fixes #141364 Fix missing description in `assign` param ## Test Result ### Before ![image](https://github.com/user-attachments/assets/5928c691-4e31-463b-aa0a-86eb8bb452e5) ### After ![image](https://github.com/user-attachments/assets/036631a2-0f20-4a71-95c3-2c0fd732293e) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154599 Approved by: https://github.com/colesbury, https://github.com/mikaylagawarecki	2025-05-30 18:08:59 +00:00
Shivam Raikundalia	30ac7f4d4e	[EZ/Memory Snapshot] Remove Handle even if compile_context not set (#154664 ) Summary: When setting the memory snapshot callback we register and unregister callbacks for performance reasons. For ease of use, it makes sense to just remove all callbacks regardless of which flags are enabled. The enable stays behind a feature flag, this just changes the disable to ignore the flag itself. Test Plan: Ran without any flags and saw all callbacks removed. Differential Revision: D75636035 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154664 Approved by: https://github.com/sanrise, https://github.com/aaronenyeshi	2025-05-30 18:08:37 +00:00
dolpm	65d8dba735	[nativert] move layout planner settings to torch (#154668 ) Summary: att Test Plan: ci Differential Revision: D75633031 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154668 Approved by: https://github.com/zhxchen17	2025-05-30 17:33:27 +00:00
Sidharth	3bdceab124	[dynamo] fix: added star operator for graph_break_hints (#154713 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154713 Approved by: https://github.com/zou3519, https://github.com/williamwen42	2025-05-30 17:31:03 +00:00
Henry Hu	802ffd06c8	[Export] Add math module for deserialization (#154643 ) Summary: As title Test Plan: ci Differential Revision: D75580646 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154643 Approved by: https://github.com/yushangdi	2025-05-30 17:29:25 +00:00

1 2 3 4 5 ...

88442 Commits