pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Xuehai Pan	047ae24e34	Eliminate setup.py install/develop in the codebose (#162329 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162329 Approved by: https://github.com/ezyang	2025-09-29 03:54:28 +00:00
Isalia20	62b0ebd8f9	[MPS] [Sparse] unique_dim and sparse broadcast (#163694 ) Implements unique_dim, sparse broadcast ops and adds dtypes for mps for tests where we expect to fail, otherwise they would always fail due to being run in double precision Pull Request resolved: https://github.com/pytorch/pytorch/pull/163694 Approved by: https://github.com/malfet	2025-09-26 23:03:13 +00:00
Nikita Shulga	ff2f319e6e	[MPS] Fix conv layout handling (#162776 ) What started as simple fix for `mps_convolution_backward_input` resulted in a pretty significant refactor/fixes: - Updated `mps_conv_use_channels_last` to return channels last output if either input or weights are channels last - Use the same primitive throughout `Convolution.mm` to determine wether output should be allocated in channels last format or not But doing only those two, resulted in crash in `test_memory_format_nn_Conv2d_mps_float32`, when weights were backward, and bias is present: ``` % python -c "import torch;print(torch.nn.functional.conv2d(torch.rand(2, 4, 3, 4,device='mps'), torch.rand(5, 4, 3, 3,device='mps').to(memory_format=torch.channels_last), torch.rand(5,device='mps')))" /AppleInternal/Library/BuildRoots/4~B5E4ugDCh2RsPWAjMEoPu8LC5w1yXEwd7XweDhg/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:3619: failed assertion `Error: MLIR pass manager failed' zsh: abort python -c ``` Which requires a more thorough redesign/cleanup, namely: - Do not alter the layout based on MacOS version, but rather do additional copies on MacOS-14 if inputs/output or weight are in channels last format ( done by defining `std::optional<Tensor> output_c;` that contains a contiguous copy of the output tensor - Introduced `input_suggested_layout` which is set to ChannelsLast if and only if input is channels last and is running on MacOS-15+ - Delete unused `memory_layout` and `group` arguments from `fill_depthwise_conv_desc` - Fix bias broadcasting logic for channels last As result, in addition to adding one more regression test this change removes `expectedFailures` from: - `TestModule.test_memory_format` for `Conv2d`, `ConvTranspose2d`, `LazyConv1d`, `LazyConvTranspose1d` - `test_require_stride_expanded_dynamic_shapes` - `test_mutable_custom_op_fixed_layout2` for MacOS-14 Fixes https://github.com/pytorch/pytorch/issues/161905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162776 Approved by: https://github.com/Skylion007	2025-09-25 23:41:34 +00:00
FFFrog	c4312b443f	[Tools] Adapting the Hypothesis library (version 5.x) for use with the PyTorch framework (#163748 ) Starting from version 5.x, the Hypothesis library removed the timeout setting and only retained the deadline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163748 Approved by: https://github.com/albanD, https://github.com/Skylion007	2025-09-25 16:41:50 +00:00
PyTorch MergeBot	00059db034	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )" This reverts commit 09cb34c1dce8fe1b880bbf3115d8ddad3401d871. Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/malfet due to reverted internally and now can be safely reverted in OSS ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3334176367))	2025-09-25 13:47:46 +00:00
dilililiwhy	6e5dddba64	Use accelerator API in common_dtensor (#163498 ) Fixes #ISSUE_NUMBER Try to unify the device checking in common_dtensor (testing module) by accelerator API Pull Request resolved: https://github.com/pytorch/pytorch/pull/163498 Approved by: https://github.com/albanD, https://github.com/H-Huang	2025-09-23 16:30:20 +00:00
Yuanyuan Chen	d3a1345ed8	Use functools.cache on has_efa (#163439 ) Cache the result of `has_efa` by `functools.cache`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163439 Approved by: https://github.com/janeyx99	2025-09-23 05:03:03 +00:00
Edward Yang	09cb34c1dc	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-22 21:12:18 +00:00
Nikita Shulga	4027e97791	[BE] Delete `skipIfMPSOnMacOS13` (#163515 ) As PyTorch needs MacOS-14 or newer to use MPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/163515 Approved by: https://github.com/Skylion007	2025-09-22 21:10:22 +00:00
PyTorch MergeBot	f0078941cf	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )" This reverts commit 6c334885d48725197b5d35e2c1543efc0f4198d0. Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/wdvr due to reverted internally - @ezyang see D82281294 ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3317017530))	2025-09-22 05:39:07 +00:00
Samuel Park	df9a4824e6	Bugfix for doing negative padding (#161639 ) Fixes #161014 This bug fix introduces a fix that is consistent with the exception handling. Outlined in issue #161014, there is an edge case where the negative padding does not make the tensor size negative but still triggers the exception that the size is negative. The fix is simply adding `new_dim >=0` to include the zero dim and letting the operator return an empty tensor. In the PR I have added the edge case where the test will now check the negative padding where the dimension gets reduced to zero. But the sample is only for the `constant` type of padding. I would like some feedback if it is necessary to put the same sample on the `reduce` type as well. This is my first PR to contribute to PyTorch and any help/feedback will be welcome! Thank you! @malfet @manuelcandales @janeyx99 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/161639 Approved by: https://github.com/manuelcandales	2025-09-19 20:57:05 +00:00
Prachi Gupta	bee362c381	[ROCm][SymmMem] Fix skip condition for PLATFORM_SUPPORTS_SYMM_MEM (#163205 ) It seems `TEST_CUDA` is set to true even for ROCm (MI200) jobs. Changing if TEST_CUDA to an else condition to avoid running symmetric memory UTs on MI200. For other non-rocm arch, it should return true and can be skipped using other skip decorators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163205 Approved by: https://github.com/ezyang Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-19 12:12:47 +00:00
Edward Z. Yang	e36a6fcf0f	Massive hack to make autograd shut up about threaded PG mutations (#163238 ) See the Note for explanation. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/163238 Approved by: https://github.com/albanD	2025-09-18 18:12:57 +00:00
Xinya Zhang	e769026bcb	[ROCm] Remove HIPBLASLT_ALLOW_TF32 from codebase (#162998 ) A few UT failures are caused by `HIPBLASLT_ALLOW_TF32` Fixes #157094 Fixes #157093 Fixes #157092 Fixes #157091 Fixes #157064 Fixes #157063 Fixes #157062 Fixes #157061 Fixes #157042 Fixes #157041 Fixes #157039 Fixes #157004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162998 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-18 13:53:48 +00:00
Kurt Mohler	5236007806	[MPS] Add `embedding_bag` forward pass (#163012 ) Part of #162270 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163012 Approved by: https://github.com/kulinseth, https://github.com/malfet	2025-09-17 19:00:47 +00:00
Jane Xu	bcbb45b746	remove tolerance override for dynamo test_mixed_device_dtype in SGD (#163088 ) In reaction to https://github.com/pytorch/pytorch/issues/116202#issuecomment-3145929113 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163088 Approved by: https://github.com/albanD	2025-09-17 18:17:23 +00:00
Simon Fan	821458d97a	[dynamo][hop] Introduce Local Map HOP (#161458 ) Can't actually deploy it because of: https://github.com/pytorch/pytorch/issues/161456 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161458 Approved by: https://github.com/ydwu4	2025-09-17 09:32:38 +00:00
Deng, Daisy	c9485f8ff3	[Reland][2/N]Port several test files under test/distributed to Intel GPU (#159473 ) For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. This PR will work on some test files under test/distributed. We could enable Intel GPU with following methods and try the best to keep the original code styles: - instantiate_device_type_tests() - use "torch.accelerator.current_accelerator()" to determine the accelerator backend - use requires_accelerator_dist_backend to allow both nccl and xccl test - enabled XPU for some test path - Change the hardcoded world_size according to device_count. - Unify some common code under torch/testing/_internal for multiple backend, for example: Added xpu for Backend.backend_capability and dist.Backend.register_backend() Pull Request resolved: https://github.com/pytorch/pytorch/pull/159473 Approved by: https://github.com/guangyey, https://github.com/d4l3k	2025-09-17 06:42:27 +00:00
PyTorch MergeBot	66308fb470	Revert "[ROCm] Remove HIPBLASLT_ALLOW_TF32 from codebase (#162998 )" This reverts commit cef815dc2ce37f98e01a6469a15b69f15995c1f9. Reverted https://github.com/pytorch/pytorch/pull/162998 on behalf of https://github.com/huydhn due to Sorry for reverting this, but it seems to break a test in trunk ([comment](https://github.com/pytorch/pytorch/pull/162998#issuecomment-3300280242))	2025-09-16 20:39:41 +00:00
joshuamarkovic	559e8d1c20	[doc]: Small typos (#162982 ) Small typo fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/162982 Approved by: https://github.com/ezyang, https://github.com/zou3519	2025-09-16 17:42:19 +00:00
Prachi Gupta	f638854e1d	[ROCm][SymmMem] re-enable UTs (#162811 ) After the UT suite moved to `MultiProcContinuousTest`, `skipIfRocm` decorator started failing rather than skipping UTs because now we spawn multiple threads before the skip decorator is taken into account and the skip decorator was raising an exception to exit the process. But, the parent process treated the child process exiting as a crash rather than a skip. Additionally, in `MultiProcContinuousTest`, if one UT fails all subsequent ones are also skipped which makes sense since there's one setup for the entire suite. However, this showed up as many failing/skipped UTs in the parity. I added multiprocess version of skip decorators for ROCm, including, `skip_if_rocm_arch_multiprocess` and `skip_if_rocm_ver_lessthan_multiprocess`. These are needed as symmetric memory feature is only supported on MI300 onwards and we need to skip them for other archs and some UTs only work after ROCm7.0. Fixes #161249 Fixes #161187 Fixes #161078 Fixes #160989 Fixes #160881 Fixes #160768 Fixes #160716 Fixes #160665 Fixes #160621 Fixes #160549 Fixes #160506 Fixes #160445 Fixes #160347 Fixes #160203 Fixes #160177 Fixes #160049 Fixes #159921 Fixes #159764 Fixes #159643 Fixes #159499 Fixes #159397 Fixes #159396 Fixes #159347 Fixes #159067 Fixes #159066 Fixes #158916 Fixes #158760 Fixes #158759 Fixes #158422 Fixes #158138 Fixes #158136 Fixes #158135 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/162811 Approved by: https://github.com/jeffdaily	2025-09-16 15:35:39 +00:00
PyTorch MergeBot	e7c3f802ff	Revert "[dynamo][hop] Introduce Local Map HOP (#161458 )" This reverts commit 505458db803e1ffabac08a2fc150b566d3ea3a57. Reverted https://github.com/pytorch/pytorch/pull/161458 on behalf of https://github.com/jeffdaily due to broke rocm tests ([comment](https://github.com/pytorch/pytorch/pull/161458#issuecomment-3299230458))	2025-09-16 15:14:36 +00:00
Xinya Zhang	cef815dc2c	[ROCm] Remove HIPBLASLT_ALLOW_TF32 from codebase (#162998 ) A few UT failures are caused by `HIPBLASLT_ALLOW_TF32` Fixes #157094, #157093, #157092, #157091, #157064, #157063, #157062, #157061, #157042, #157041, #157039, #157004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162998 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-16 12:48:45 +00:00
Simon Fan	505458db80	[dynamo][hop] Introduce Local Map HOP (#161458 ) Can't actually deploy it because of: https://github.com/pytorch/pytorch/issues/161456 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161458 Approved by: https://github.com/ydwu4	2025-09-16 00:37:40 +00:00
Edward Yang	1247dde1f2	[BE] Improve pytest summary display for OpInfo tests (#162961 ) pytest summarizes test failures by printing a truncated first line of the test of the OUTERMOST wrapped exception. Prior to this PR, it looked like this: ``` FAILED [0.0454s] test/distributed/tensor/test_dtensor_ops.py::TestLocalDTensorOpsCPU::test_dtensor_op_db_H_cpu_float32 - Exception: Caused by sample input at index 0: SampleInput(input=Tensor[size=(12, 12), device="cpu", dtype=torch.float32], args=(), kwargs={}, ... ``` I argue this is not so useful. If I have a lot of test failures, I look to the test summary to understand what /kind/ of errors I have, so I can assess which ones I should look at first. In other words, this is better: ``` FAILED [0.1387s] test/distributed/tensor/test_dtensor_ops.py::TestLocalDTensorOpsCPU::test_dtensor_op_db__softmax_backward_data_cpu_float32 - Exception: Tensor-likes are not close! ``` Now I know specifically this is a numerics problem! This PR does it by prepending the old exception text to the wrapped exception. This is slightly redundant, as we are exception chaining, but it does the job. Open to bikeshedding. Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/162961 Approved by: https://github.com/malfet	2025-09-15 19:58:19 +00:00
Zeng, Xiangdong	814ba34fa6	[2/N] Port 5 _composable distributed test to Intel GPU (#159241 ) For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. This is the second PR for _composable cases, the first is https://github.com/pytorch/pytorch/pull/159118. We could enable Intel GPU with following methods and try the best to keep the original code styles: - Use "torch.accelerator.current_accelerator()" to determine the accelerator backend - Enabled XPU for some test path - Skip some test cases which Intel GPU does not support - Added "cpu:gloo,xpu:xccl" for distributed backend Pull Request resolved: https://github.com/pytorch/pytorch/pull/159241 Approved by: https://github.com/guangyey, https://github.com/d4l3k	2025-09-15 06:24:58 +00:00
PyTorch MergeBot	31040b6357	Revert "port some distributed tensor test files for Intel GPU (#161703 )" This reverts commit 179f10621b418427fc6e92f58ea2b0bbe4cc9c52. Reverted https://github.com/pytorch/pytorch/pull/161703 on behalf of https://github.com/huydhn due to Sorry for reverting your change but these tests are failing internally ([comment](https://github.com/pytorch/pytorch/pull/161703#issuecomment-3287720713))	2025-09-13 07:22:14 +00:00
Haifeng Jin	0dcd9304aa	fix high=0 bug in nll_loss test (#162763 ) Minor bug fix for the `nll_loss` test. Before this PR, it runs `torch.randint(high=0)`, which will fail because it would try to generate a number that >= low and < high, i.e. x>=0 and x<0. The test did not fail because that line is not run when testing on CPU because it failed earlier because of a unsupported dtype. However, as we support TPUs at Google, this line is reached first before the dtype check, which triggers the bug. To my understanding, these OpInfo should be general enough to support different hardware. Fixing this obvious bug would make it more general cross different hardware. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162763 Approved by: https://github.com/soulitzer	2025-09-12 21:48:18 +00:00
Jeff Daily	1e9ddf510f	[ROCm] fix hardsigmoid op (#162758 ) Currently std::min -> ::min did not work as expected on ROCm when input values >= 2147483648 It can be fixed by explicit typing std::min<opmath_t> Pull Request resolved: https://github.com/pytorch/pytorch/pull/162758 Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-12 15:07:13 +00:00
Jeff Daily	7357eb66c5	[ROCm][CI] unskip some test_memory_format tests (#162766 ) Fixes #70125. Much of the work was done by #161687. This PR is additional test cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162766 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-12 15:02:40 +00:00
Edward Yang	6c334885d4	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-12 10:54:42 +00:00
Liao, Wei	179f10621b	port some distributed tensor test files for Intel GPU (#161703 ) it's another pr to port distributed tensor test for Intel GPU, while the other pr is https://github.com/pytorch/pytorch/pull/161604 We could enable Intel GPU with following methods and try the best to keep the original code styles: Use torch.accelerator for general gpu Skip the case if running on xpu which has known issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/161703 Approved by: https://github.com/guangyey, https://github.com/d4l3k	2025-09-12 07:57:32 +00:00
PyTorch MergeBot	6b59a19242	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )" This reverts commit 6e8f17c58029e5fa6bc222b2445ebbc0cbdc17c7. Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3283985880))	2025-09-12 06:52:03 +00:00
Edward Yang	6e8f17c580	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-12 03:56:18 +00:00
PyTorch MergeBot	92f9ed7ac3	Revert "[2/N]Port several test files under test/distributed to Intel GPU (#159473 )" This reverts commit fa1d409e83af93425a2672d62e134e8f20c5ccc0. Reverted https://github.com/pytorch/pytorch/pull/159473 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break an distributed tests ([comment](https://github.com/pytorch/pytorch/pull/159473#issuecomment-3282999084))	2025-09-11 23:51:21 +00:00
Deng, Daisy	fa1d409e83	[2/N]Port several test files under test/distributed to Intel GPU (#159473 ) For https://github.com/pytorch/pytorch/issues/114850, we will port distributed tests to Intel GPU. This PR will work on some test files under test/distributed. We could enable Intel GPU with following methods and try the best to keep the original code styles: - instantiate_device_type_tests() - use "torch.accelerator.current_accelerator()" to determine the accelerator backend - use requires_accelerator_dist_backend to allow both nccl and xccl test - enabled XPU for some test path - Change the hardcoded world_size according to device_count. - Unify some common code under torch/testing/_internal for multiple backend, for example: Added xpu for Backend.backend_capability and dist.Backend.register_backend() Pull Request resolved: https://github.com/pytorch/pytorch/pull/159473 Approved by: https://github.com/guangyey, https://github.com/d4l3k	2025-09-11 06:44:26 +00:00
PyTorch MergeBot	40ea6e418a	Revert "Fix decorators skipping NCCL tests (#158846 )" This reverts commit c2388201fc85b0748173212de5a17514c7a71f21. Reverted https://github.com/pytorch/pytorch/pull/158846 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing some inductor tests ([comment](https://github.com/pytorch/pytorch/pull/158846#issuecomment-3276471387))	2025-09-10 20:51:31 +00:00
Edward Yang	1051c7dbc2	Don't unconditionally import torch._dynamo, it's slow (#162595 ) A trivial test on OS X. Before: ``` real 0m6.550s user 0m2.532s sys 0m3.359s ``` After: ``` real 0m2.607s user 0m1.898s sys 0m3.344s ``` Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/162595 Approved by: https://github.com/albanD	2025-09-10 17:21:03 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	de05dbc39c	Replace export_for_training with export (#162396 ) Summary: replace export_for_training with epxort Test Plan: CI Rollback Plan: Differential Revision: D81935792 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162396 Approved by: https://github.com/angelayi, https://github.com/jerryzh168	2025-09-10 14:19:34 +00:00
Alexander Grund	c2388201fc	Fix decorators skipping NCCL tests (#158846 ) Avoid failures caused by tests exiting via sys.exit instead of `unittest.skip` In particular it will not try to start the test (causing forks into subprocess) just to stop them (killing the subprocess) which is done in the test setup Using `unittest.skip` decorators avoids the starting of the test in the first place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158846 Approved by: https://github.com/Skylion007	2025-09-10 12:25:42 +00:00
Edward Yang	dda071587f	Revert "Make distributed modules importable even when backend not built (#159889 )" (#162568 ) This reverts commit a0d026688cd69583d5a4e0c6f3e5fda141a7f4a9. Revert "Always build USE_DISTRIBUTED. (#160449)" This reverts commit d80297a6846f1f2c36fd4f19e22919f2abe8fcea. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568 Approved by: https://github.com/huydhn	2025-09-10 04:29:42 +00:00
Kurt Mohler	583bbf7761	[MPS] Add `native_dropout` and `native_dropout_backward` (#162108 ) Fixes #162002 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162108 Approved by: https://github.com/malfet	2025-09-09 01:44:06 +00:00
Avik Chaudhuri	711c8c821e	shape guards (#161178 ) Summary: This PR introduces shape guards to export. Previously only value ranges, equalities, and specializations would be tracked for symbolic expressions, and we had a forward hook to check them. Instead now we create a function to check shape guards and call it in the exported program. Test Plan: updated several tests Rollback Plan: Differential Revision: D80713603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161178 Approved by: https://github.com/tugsbayasgalan	2025-09-08 22:44:09 +00:00
Edward Z. Yang	a0d026688c	Make distributed modules importable even when backend not built (#159889 ) This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled. Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889 Approved by: https://github.com/wconstab ghstack dependencies: #160449	2025-09-08 19:10:36 +00:00
PyTorch MergeBot	29e09a6545	Revert "Make distributed modules importable even when backend not built (#159889 )" This reverts commit 01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b. Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002))	2025-09-08 07:04:36 +00:00
PyTorch MergeBot	ff2de5d522	Revert "[2/N]Port several test files under test/distributed to Intel GPU (#159473 )" This reverts commit 040d00af048967dde7938d358d7f5988cbd18388. Reverted https://github.com/pytorch/pytorch/pull/159473 on behalf of https://github.com/jeanschmidt due to Seems to be breaking internal signals, @d4l3k please help the author to have this change landed. [D81718444](https://www.internalfb.com/diff/D81718444) ([comment](https://github.com/pytorch/pytorch/pull/159473#issuecomment-3264046983))	2025-09-07 21:06:38 +00:00
Yidi Wu	48e3be3ab6	[while_loop][autograd] add hop while_loop_stack_output (#160467 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160467 Approved by: https://github.com/zou3519 ghstack dependencies: #160548	2025-09-06 21:26:33 +00:00
Daniel Vega-Myhre	b6d0a9ea90	MXFP8 grouped GEMM support for torch._scaled_grouped_mm + submodule bump (#162209 ) ## Summary - We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: https://github.com/pytorch/FBGEMM/pull/4816 - This is needed for backward pass of mxfp8 MoE training with grouped gemms - Changes: - Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm` - Add meta registration input validation for mxfp8 grouped gemm, for composability with compile - Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs - Bump FBGEMM third party submodule to include: - https://github.com/pytorch/FBGEMM/pull/4816 - https://github.com/pytorch/FBGEMM/pull/4820 - https://github.com/pytorch/FBGEMM/pull/4821 - https://github.com/pytorch/FBGEMM/pull/4823 #### How fbgemm dependency was bumped Documenting this since I haven't found it documented elsewhere: - `cd ~/pytorch/third_party/fbgemm` - `git fetch` - `git checkout <hash>` - `cd ~/pytorch` - `git add third_party/fbgemm` ## Test plan #### Test build ``` USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e . ... Successfully installed torch-2.9.0a0+gitf5070f3 ``` [full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581) #### Unit tests ``` pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_ ... test/test_matmul_cuda.py ......... [100%] ============================================================== 9 passed, 1668 deselected in 5.34s =============================================================== ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162209 Approved by: https://github.com/ngimel	2025-09-06 15:25:30 +00:00
Blaine Burton Rister	9aedb3cd87	[AOTI-FX] Support registering custom FX backends (#162317 ) # Feature Currently, `torch._inductor.compile_aot` always uses the `WrapperFxCodegen` class. In contrast, Python and C++ codegen allow users to register custom backends. This PR brings that feature to FX codegen. # Test plan Added a CI test registering a custom FX backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162317 Approved by: https://github.com/jansel	2025-09-06 07:32:03 +00:00
Edward Z. Yang	01edcd4df8	Make distributed modules importable even when backend not built (#159889 ) This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled. Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889 Approved by: https://github.com/wconstab ghstack dependencies: #160449	2025-09-05 20:15:11 +00:00

1 2 3 4 5 ...

5731 Commits