pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	fdab48a7c1	Enable all PIE rules on ruff (#165814 ) This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are ``` PIE796 Enum contains duplicate value: {value} PIE808 Unnecessary start argument in range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814 Approved by: https://github.com/ezyang	2025-10-18 07:36:18 +00:00
PyTorch MergeBot	24520b8386	Revert "Enable all PIE rules on ruff (#165814 )" This reverts commit c79dfdc6550e872783aa5cb5fc9e86589bf18872. Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))	2025-10-18 07:21:08 +00:00
Yuanyuan Chen	c79dfdc655	Enable all PIE rules on ruff (#165814 ) This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are ``` PIE796 Enum contains duplicate value: {value} PIE808 Unnecessary start argument in range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814 Approved by: https://github.com/ezyang	2025-10-18 06:40:12 +00:00
Raman Kumar	df26c51478	error message for instantiating CUDA Stream if CUDA not available (#159868 ) Fixes #159744 Summary: ``` import torch # Generate input data input_tensor = torch.randn(3, 3) stream = torch.cuda.Stream() # Call the API input_tensor.record_stream(stream) ``` ⚠️ will now show an error message `torch.cuda.Stream requires CUDA support` Pull Request resolved: https://github.com/pytorch/pytorch/pull/159868 Approved by: https://github.com/malfet, https://github.com/isuruf	2025-10-11 23:21:35 +00:00
Eddie Yan	f7082e92b3	[cuBLAS] update cuBLAS determinism docs, remove workspace requirement checks (#161749 ) Since CUDA 11.x (need to update the docs for this, current PR is saying 12.2 which is incorrect) we've been allocating cuBLAS workspaces explicitly per handle/stream combination https://github.com/pytorch/pytorch/pull/85447 According to the cuBLAS documentation, this appears to be sufficient for determinism without any explicit workspace requirements to e.g., `:4096:8` or `:16:8` as was previously expressed in PyTorch docs https://docs.nvidia.com/cuda/cublas/#results-reproducibility Planning to add an explicit determinism test as well... Pull Request resolved: https://github.com/pytorch/pytorch/pull/161749 Approved by: https://github.com/ngimel	2025-10-03 00:09:47 +00:00
Nikita Shulga	3e3e83418d	[BE] Move indexing tests to test_indexing (#160994 ) Which enables them on MPS device - xfail all `test_index_reduce` on MPS, as op is not implemented - xfail all `test_index_copy` on MPS due to the silent correctness problems, see https://github.com/pytorch/pytorch/issues/160993 - Fixed hard crash in `index_fill` and replaced `skipIfMPS` with `expectedFailueMPS` - Created issue for the lack of deterministic algorithms for MPS backend Pull Request resolved: https://github.com/pytorch/pytorch/pull/160994 Approved by: https://github.com/manuelcandales ghstack dependencies: #160850, #160889, #160926	2025-08-21 00:42:55 +00:00
Xu Han	d6786741a7	[inductor] slow test some Windows UTs. (#160267 ) When we enabled Windows inductor UTs since the PR: https://github.com/pytorch/pytorch/pull/160161/ The main branch CI occurred timeout issue, Let's move some UT to slow test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160267 Approved by: https://github.com/ezyang	2025-08-10 18:35:42 +00:00
Raphael Reme	ee2649219c	Fix max_width computation in _tensor_str._Formatter (#126859 ) Previous version of `torch._tensor_str._Formatter` was not using `PRINT_OPTS.sci_mode` for the `max_width` computation but was using it for the formatting of values leading to a weird discrepancy. Now, the code first checks if it should be in sci_mode, then compute `max_width` Here is an example to test the behavior: ```python A = torch.tensor([10, 1e-1, 1e-2]) B = torch.tensor([10, 1e-1, 1e-1]) print("================= Default =================") print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}") print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}") print("================= sci_mode=False =================") with torch._tensor_str.printoptions(sci_mode=False): print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}") print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}") print("================= sci_mode=True =================") with torch._tensor_str.printoptions(sci_mode=True): print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}") print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}") ``` In the current version this prints: ``` ================= Default ================= tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10 tensor([10.0000, 0.1000, 0.1000]) Formatter max_width: 7 ================= sci_mode=False ================= tensor([ 10.0000, 0.1000, 0.0100]) Formatter max_width: 10 tensor([10.0000, 0.1000, 0.1000]) Formatter max_width: 7 ================= sci_mode=True ================= tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10 tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 7 ``` On can see that in `sci_mode=False`, the values of A are prefixed with unneeded 0 and does not have the same `max_width` as B (It keeps the `max_width` from `sci_mode = None`) Also in `sci_mode = True`, for B, the `max_width` is 7 but each value takes 10 chars... (But it is fine as the code that uses `max_width` do not rely much on it, but still, this is missleading) After this commit, this will print ``` ================= Default ================= tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10 tensor([10.0000, 0.1000, 0.1000]) Formatter max_width: 7 ================= sci_mode=False ================= tensor([10.0000, 0.1000, 0.0100]) Formatter max_width: 7 tensor([10.0000, 0.1000, 0.1000]) Formatter max_width: 7 ================= sci_mode=True ================= tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10 tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 10 ``` This also allows to align A with B for `sci_mode=False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126859 Approved by: https://github.com/malfet	2025-08-01 15:05:41 +00:00
PyTorch MergeBot	6c6e11c206	Revert "Fix max_width computation in _tensor_str._Formatter (#126859 )" This reverts commit 1465757959dd7e63715b7621650896eca977aefa. Reverted https://github.com/pytorch/pytorch/pull/126859 on behalf of https://github.com/yangw-dev due to broke trunk with test distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_single - RuntimeError: Expected to find buf7 = empty but did not find it ([comment](https://github.com/pytorch/pytorch/pull/126859#issuecomment-3137137030))	2025-07-30 16:56:32 +00:00
Raphael Reme	1465757959	Fix max_width computation in _tensor_str._Formatter (#126859 ) Previous version of `torch._tensor_str._Formatter` was not using `PRINT_OPTS.sci_mode` for the `max_width` computation but was using it for the formatting of values leading to a weird discrepancy. Now, the code first checks if it should be in sci_mode, then compute `max_width` Here is an example to test the behavior: ```python A = torch.tensor([10, 1e-1, 1e-2]) B = torch.tensor([10, 1e-1, 1e-1]) print("================= Default =================") print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}") print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}") print("================= sci_mode=False =================") with torch._tensor_str.printoptions(sci_mode=False): print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}") print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}") print("================= sci_mode=True =================") with torch._tensor_str.printoptions(sci_mode=True): print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}") print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}") ``` In the current version this prints: ``` ================= Default ================= tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10 tensor([10.0000, 0.1000, 0.1000]) Formatter max_width: 7 ================= sci_mode=False ================= tensor([ 10.0000, 0.1000, 0.0100]) Formatter max_width: 10 tensor([10.0000, 0.1000, 0.1000]) Formatter max_width: 7 ================= sci_mode=True ================= tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10 tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 7 ``` On can see that in `sci_mode=False`, the values of A are prefixed with unneeded 0 and does not have the same `max_width` as B (It keeps the `max_width` from `sci_mode = None`) Also in `sci_mode = True`, for B, the `max_width` is 7 but each value takes 10 chars... (But it is fine as the code that uses `max_width` do not rely much on it, but still, this is missleading) After this commit, this will print ``` ================= Default ================= tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10 tensor([10.0000, 0.1000, 0.1000]) Formatter max_width: 7 ================= sci_mode=False ================= tensor([10.0000, 0.1000, 0.0100]) Formatter max_width: 7 tensor([10.0000, 0.1000, 0.1000]) Formatter max_width: 7 ================= sci_mode=True ================= tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10 tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 10 ``` This also allows to align A with B for `sci_mode=False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126859 Approved by: https://github.com/malfet	2025-07-30 14:01:00 +00:00
gaoyufeng	f5cf05c983	Throw invalid_argument instead of RuntimeError when parameters exceed… (#158267 ) Throw invalid_argument instead of RuntimeError when parameters exceed limits (for torch.int32 dtype) Fixes #157707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158267 Approved by: https://github.com/albanD	2025-07-25 23:49:46 +00:00
Jiang, Yanbing	f4d8bc46c7	Enable TF32 as fp32 internal precision for matmul/linear/conv (#157520 ) ### Description This PR is to enable TF32 as fp32 internal precision for matmul/linear/conv in `mkldnn backend`. Since we have refined fp32 precision API in https://github.com/pytorch/pytorch/pull/125888, we can easily extend the API to support TF32 for `mkldnn backend`. ``` torch.backends.mkldnn.matmul.fp32_precision = 'tf32' torch.backends.mkldnn.conv.fp32_precision = "tf32" ``` Related kernel update and UTs update are done. And the wrapper `bf32_on_and _off` is updated to `reduced_f32_on_and_off`, and it can run tests 3 times, one is reduced_f32 OFF, the other two are reduced_f32 ON (including `bf32 ON` and `tf32 ON`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/157520 Approved by: https://github.com/mingfeima, https://github.com/jansel	2025-07-17 08:57:34 +00:00
FFFrog	565fd07909	[Easy] Make the error message shown by THPUtils_unpackLong to be clearer (#157886 ) As the title stated. The error message of `THPUtils_unpackLong` is the same as `THPUtils_unpackInt` Pull Request resolved: https://github.com/pytorch/pytorch/pull/157886 Approved by: https://github.com/Skylion007	2025-07-10 06:26:13 +00:00
Xuehai Pan	fc0376e8b1	[BE][2/6] fix typos in test/ (test/test_*.py) (#157636 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157636 Approved by: https://github.com/yewentao256, https://github.com/mlazos ghstack dependencies: #156311, #156609	2025-07-09 11:02:23 +00:00
Aleksei Nikiforov	41e8b826d0	S390x update test marks (#157541 ) Update s390x test marks test_logs_out from test/dynamo/test_logging.py is updated and no longer fails on s390x. test_qengine from test/test_torch.py doesn't work on s390x: no QEngine is available. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157541 Approved by: https://github.com/huydhn	2025-07-08 09:08:33 +00:00
Yu, Guangye	1b58e7adab	fix storage use_count (#157694 ) # Motivation https://github.com/pytorch/pytorch/pull/155451 decoupled `torch._C._storage_Use_Count` from CUDA and introduced a corresponding unit test: `815545f2dd/test/test_torch.py (L257-L262)` However, this test fails when PyTorch is built with debug assertions enabled. @clee2000 disabled this UT in https://github.com/pytorch/pytorch/pull/156731. The root cause is that `_cdata` is obtained from an `intrusive_ptr`, not a `weak_intrusive_ptr`. As a result, calling `c10::weak_intrusive_ptr::use_count` on it triggers the internal assertion: `815545f2dd/c10/util/intrusive_ptr.h (L912-L917)` For example: ```python a = torch.randn(10, device=device) # refcount=1, weakcount=1 prev_cf = torch._C._storage_Use_Count(a.untyped_storage()._cdata) # violate the assertation ``` This violates the expected invariant inside `weak_intrusive_ptr::use_count`, which assumes the pointer was originally constructed from a valid `weak_intrusive_ptr`. Actually, `storage_impl` is obtained from an `intrusive_ptr`. `815545f2dd/torch/csrc/Module.cpp (L2105-L2109)` # Solution Use `c10::intrusive_ptr::use_count` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157694 Approved by: https://github.com/albanD	2025-07-08 05:53:12 +00:00
Jason Ansel	b147b6c0e3	Increase tolerance for test_corrcoef_cuda_int32 (#157206 ) Fixes #156988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/157206 Approved by: https://github.com/Skylion007	2025-06-29 16:30:54 +00:00
Lukas Geiger	a92b24cd83	Prevent cudaStreamSync when indexing GPU tensors with boolean CPU mask (#156384 ) `index_put` with a boolean mask (`target[mask] = src`) causes a `cudaStreamSynchronize`. When both `mask` and `target` tensors are on GPU this is expected. However, the sync can be prevented if the `mask` is a CPU tensor. Internally a new index tensor is created with `mask.nonzero()` so we can use a non-blocking copy to transfer it to the GPU since it cannot be accidentally mutated by the user between its creation and the device copy. @ngimel Let me know if I'm missing something. I think this is useful since users can't prevent a sync simply by making sure all tensors are on the same device as with other ops. Instead one would need to do something like this which is much less readable ```python indices = mask.nonzero().squeeze(1).to("cuda", non_blocking=True) target[indices] = src ``` Fixes #12461 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156384 Approved by: https://github.com/ngimel	2025-06-28 05:41:16 +00:00
Isuru Fernando	c808af514d	Support deterministic upsample trilinear backward (#154239 ) Fixes https://github.com/pytorch/pytorch/issues/154183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154239 Approved by: https://github.com/eellison, https://github.com/albanD	2025-06-26 15:02:27 +00:00
Catherine Lee	2ff3280c77	[ez] Disable some failing periodic tests (#156731 ) test_torch.py::TestTorchDeviceTypeCUDA::test_storage_use_count_cuda: Added in https://github.com/pytorch/pytorch/pull/150059 Fails in debug mode [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44706020831) [HUD commit link](`4491326fb0`) inductor/test_inductor_freezing.py::FreezingGpuTests::test_cpp_wrapper_cuda: [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44707119967) [HUD commit link](`4491326fb0`) started failing after moving to new cuda version https://github.com/pytorch/pytorch/pull/155234 I'll ping people if this gets merged Pull Request resolved: https://github.com/pytorch/pytorch/pull/156731 Approved by: https://github.com/huydhn	2025-06-24 23:02:21 +00:00
Randolf Scholz	2e9bd03f60	Implemented `Size.__radd__` (#152554 ) Fixes #144334 Builds on top of #146834 by @khushi-411 The needed trick was to add `PyNumberMethods` because these Number Protocol appears to be responsible for `__radd__` (see https://stackoverflow.com/q/18794169) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152554 Approved by: https://github.com/albanD Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com> Co-authored-by: albanD <desmaison.alban@gmail.com>	2025-06-23 15:38:37 +00:00
Yu, Guangye	d84efde3f0	Move _storage_Use_Count to be gerneric (#155451 ) # Motivation `torch._C._storage_Use_Count` should be a generic API that is not aware of device type. It is also used in `337cd7c53d/torchtune/training/_activation_offloading.py (L323)` to do some memory optimization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155451 Approved by: https://github.com/albanD	2025-06-12 01:39:04 +00:00
Laith Sakka	8b507a9809	convert guard_size_oblivious to runtime check in infer_size_impl (#148872 ) its ok to check the requirement numel == newsize at runtime in case of unbacked instead of at compile time and assume that its true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148872 Approved by: https://github.com/bobrenjc93	2025-05-13 00:32:28 +00:00
Boyuan Feng	ffda46e3be	[Graph Partition] remove weak dep from `partition_input_names` (#152863 ) Graph partition analyzes read_writes to get partition input names. However, weak dep is fake dependency and is not actually read or written. So we should not include weak dep in graph partition input names. The following test failure is fixed by removing weak dependency from partition_input_names: `PYTORCH_TEST_WITH_INDUCTOR=1 python test/test_torch.py TestTorchDeviceTypeCUDA.test_params_invalidated_with_grads_invalidated_between_unscale_and_step_Adam_cuda_float32` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152863 Approved by: https://github.com/eellison	2025-05-09 17:20:04 +00:00
albanD	22d1359bc6	Move warning from item to specific number conversions (#152709 ) Follow up to https://github.com/pytorch/pytorch/pull/143261 to not warn when a plain .item() is done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152709 Approved by: https://github.com/malfet, https://github.com/ngimel	2025-05-05 20:46:05 +00:00
Jagadish Krishnamoorthy	0d99b4e9e2	ROCm: Enable tf32 testing on test_nn (#148945 ) Add tf32 support for ROCm tests. test command: python test/test_nn.py -v Pull Request resolved: https://github.com/pytorch/pytorch/pull/148945 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-04-28 23:01:04 +00:00
Andrew M. James	0413358a77	Non-deterministic alert in histc_cuda for floating types only (#151701 ) The note about atomic add only applies for floating point. The implementation is deterministic for integer data types. fixes: #151610 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151701 Approved by: https://github.com/ngimel, https://github.com/Skylion007	2025-04-24 21:16:46 +00:00
eqy	d78d2af4e3	[CUDA][TF32] Account for TF32 in `test_corrcoef` (#151830 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151830 Approved by: https://github.com/Skylion007	2025-04-24 21:06:07 +00:00
PyTorch MergeBot	ed0d2ebaa0	Revert "Non-deterministic alert in histc_cuda for floating types only (#151701 )" This reverts commit b7a7741411585817daa81780b078fd15816f2d2d. Reverted https://github.com/pytorch/pytorch/pull/151701 on behalf of https://github.com/ZainRizvi due to Sorry but this is causing inductor tests to fail. See here for more info: test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_histc_cuda_float32 [GH job link](https://github.com/pytorch/pytorch/actions/runs/14586002763/job/40913547718) [HUD commit link](`b7a7741411`) ([comment](https://github.com/pytorch/pytorch/pull/151701#issuecomment-2821800837))	2025-04-22 16:07:25 +00:00
Andrew M. James	b7a7741411	Non-deterministic alert in histc_cuda for floating types only (#151701 ) The note about atomic add only applies for floating point. The implementation is deterministic for integer data types. fixes: #151610 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151701 Approved by: https://github.com/ngimel, https://github.com/Skylion007	2025-04-22 03:24:36 +00:00
Joshua Hamilton	4ce0b959ff	Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 ) Fixes #143071 Operations performed on tensors with `requires_grad=True` such as ```python import torch x = torch.tensor(2.0, requires_grad=True) y = x ** 3 ``` and ```python x = torch.tensor(2.0, requires_grad=True) y = torch.pow(x,3) ``` are valid operations. While an operation using `numpy` like ```python import numpy as np x = torch.tensor(2.0, requires_grad=True) y = np.pow(x,3) # > RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. ``` leads to an error. However, an operation that uses `math` like ```python import math x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) ``` does not cause an error, and `y` is no longer a tensor with a gradient! This represents a [footgun](https://en.wiktionary.org/wiki/footgun#Noun) for some users, like myself when training small, custom, non-neural network models. To prevent future undesired behavior, I added a warning when converting tensors with `requires_grad=True` to scalars. Now, when using `math.pow` on a `tensor`, we get a single warning with: ```python x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) # > UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior. # Consider using tensor.detach() first. ``` Please let me know if you have any questions 👍 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143261 Approved by: https://github.com/malfet Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-04-01 00:42:46 +00:00
PyTorch MergeBot	1526ff955e	Revert "Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 )" This reverts commit 515b45e5693dbf9dd58d8472806cbe5f49e43074. Reverted https://github.com/pytorch/pytorch/pull/143261 on behalf of https://github.com/clee2000 due to failing internal tests D72135661 ([comment](https://github.com/pytorch/pytorch/pull/143261#issuecomment-2767531682))	2025-03-31 22:19:08 +00:00
Joshua Hamilton	515b45e569	Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 ) Fixes #143071 Operations performed on tensors with `requires_grad=True` such as ```python import torch x = torch.tensor(2.0, requires_grad=True) y = x ** 3 ``` and ```python x = torch.tensor(2.0, requires_grad=True) y = torch.pow(x,3) ``` are valid operations. While an operation using `numpy` like ```python import numpy as np x = torch.tensor(2.0, requires_grad=True) y = np.pow(x,3) # > RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. ``` leads to an error. However, an operation that uses `math` like ```python import math x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) ``` does not cause an error, and `y` is no longer a tensor with a gradient! This represents a [footgun](https://en.wiktionary.org/wiki/footgun#Noun) for some users, like myself when training small, custom, non-neural network models. To prevent future undesired behavior, I added a warning when converting tensors with `requires_grad=True` to scalars. Now, when using `math.pow` on a `tensor`, we get a single warning with: ```python x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) # > UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior. # Consider using tensor.detach() first. ``` Please let me know if you have any questions 👍 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143261 Approved by: https://github.com/albanD Co-authored-by: albanD <desmaison.alban@gmail.com>	2025-03-30 11:19:07 +00:00
PyTorch MergeBot	16560d4e8f	Revert "Refactor `test/test_torch.py` by moving testcase to `test_indexing.py` (#148875 )" This reverts commit 0fa0a740958ffc474843ceb1d19ee43c4bff4c09. Reverted https://github.com/pytorch/pytorch/pull/148875 on behalf of https://github.com/ZainRizvi due to That torch.version failure you got in CI was a legitimate failure and is now breaking trunk. [GH job link](https://github.com/pytorch/pytorch/actions/runs/13778023702/job/38534207536) [HUD commit link](`0fa0a74095`) ([comment](https://github.com/pytorch/pytorch/pull/148875#issuecomment-2714757288))	2025-03-11 15:27:25 +00:00
zeshengzong	0fa0a74095	Refactor `test/test_torch.py` by moving testcase to `test_indexing.py` (#148875 ) Fix `FIXME` in `test_torch.py` by moving test-cases to `test_indexing.py` ```python # FIXME: move to test indexing # FIXME: move to indexing test suite ``` - Move tests in `test/test_torch.py` to `test_indexing.py` - Remove `FIXME` comments ## TestResult ```bash pytest test/test_torch.py -k TestTorchDeviceType -vv pytest test/test_indexing.py -k TestIndexing -vv ``` ![image](https://github.com/user-attachments/assets/49a80985-e74a-4da6-a063-476e87e6aa8a) ![image](https://github.com/user-attachments/assets/77afa936-5dba-480c-b293-eb1f7bc74420) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148875 Approved by: https://github.com/soulitzer	2025-03-11 01:01:59 +00:00
Nikita Shulga	3cde4c3069	[BE] Remove `onlyCPU` decorator from test_local_scalar_dense (#148559 ) Followup from https://github.com/pytorch/pytorch/pull/145717, not sure why author thinks those tests should be limited to one architecture. And fixed similar crashes for CUDA and MPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/148559 Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/seemethere	2025-03-06 17:43:02 +00:00
Sun, Jiayi	19a6cf35f6	add input shape check for _local_scalar_dense (#145717 ) Fix https://github.com/pytorch/pytorch/issues/145066. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145717 Approved by: https://github.com/malfet	2025-03-05 15:24:08 +00:00
cyy	ec2805ada8	Remove outdated CUDA version check (#148142 ) Since Torch requires CUDA>=11, some checks can be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148142 Approved by: https://github.com/janeyx99, https://github.com/eqy	2025-03-04 03:33:44 +00:00
cyy	b0dfd242fa	Remove NO_MULTIPROCESSING_SPAWN checks (#146705 ) py 3.9 has spawn. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146705 Approved by: https://github.com/colesbury	2025-02-28 05:53:19 +00:00
PyTorch MergeBot	926b7b5027	Revert "Remove NO_MULTIPROCESSING_SPAWN checks (#146705 )" This reverts commit 40ad5e01dff05c7d64e070fb01683820e678f788. Reverted https://github.com/pytorch/pytorch/pull/146705 on behalf of https://github.com/cyyever due to Broke lint?, I guess land race with rufff update ([comment](https://github.com/pytorch/pytorch/pull/146705#issuecomment-2689603077))	2025-02-28 03:04:38 +00:00
cyyever	40ad5e01df	Remove NO_MULTIPROCESSING_SPAWN checks (#146705 ) py 3.9 has spawn. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146705 Approved by: https://github.com/colesbury	2025-02-28 00:15:32 +00:00
Mikayla Gawarecki	536bce5a04	Make Tensor.set_ validate storage_offset when sizes/strides are unchanged (#147354 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147354 Approved by: https://github.com/albanD ghstack dependencies: #147352	2025-02-27 15:48:58 +00:00
Benjamin Glass	f98cd84b04	cpp_wrapper: use largeTensorTest for test memory checks (#146991 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146991 Approved by: https://github.com/desertfire	2025-02-27 00:30:21 +00:00
cyy	c6ea4425e5	Enable some tests on Windows (#146243 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146243 Approved by: https://github.com/albanD	2025-02-05 03:54:28 +00:00
Danial Javady	bb4964013f	Add determinmistic kernel for reflection2d (#136241 ) Adds feature for #98925 Tests pass for both existing reflectionpad2d and the new one I inserted. Summary of the work: Simple conditional check for deterministic mode that will dispatch to a different kernel. This kernel does not use any atomic operations, and will lead to deterministic results as instead of going from the output to input(1:1) relationship, I am doing the opposite. I am going from input -> all outputs, which is 1 to many. These operations are done in the same order every execution as I simply traverse the data set with a grid stride loop and use simple linearized indexing into the input tensor. So each thread will compute the 4 conditionals, which are then used to see if the input has an output in the 8 regions. These 8 regions are top left, top, top right, left, right, bottom left, bottom, bottom right`. I did not focus on performance for this PR as that would expand the scope heavily. If there are any performance questions though i can answer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136241 Approved by: https://github.com/eqy, https://github.com/albanD	2025-01-29 20:34:03 +00:00
Jack Taylor	082fab0fc7	[64-bit] Int64 casting for UpSampleNearest3D (#144865 ) Fixes #144855 Follows approach in https://github.com/pytorch/pytorch/pull/141923 to use int64 types to increase INT_MAX limits Pull Request resolved: https://github.com/pytorch/pytorch/pull/144865 Approved by: https://github.com/eqy	2025-01-29 19:30:09 +00:00
gasoonjia	501c5972f0	[pytorch] raise exception when calling dim order on sparse tensor (#145888 ) This diff introduces a change to the PyTorch library that raises an exception when calling the `dim_order` method on a sparse tensor. Differential Revision: [D68797044](https://our.internmc.facebook.com/intern/diff/D68797044/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145888 Approved by: https://github.com/Jack-Khuu	2025-01-29 06:15:44 +00:00
Benjamin Glass	5aa5a5763e	[inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684 ) Triton 2.2 and greater have a bug where allowing TF32 generation for a GPU that does not support TF32 will cause code generation errors. Patch around this problem by: 1. Adding a function to `torch.cuda` that determines whether CUDA hardware is capable of using the TF32 format. 2. Using that function to explicitly disable TF32 generation when calling Triton, where needed. To demonstrate that this fix works, try running `test/inductor/test_max_autotune.py` on a GPU with CUDA compute capability < 8 (e.g. any NVIDIA consumer GPU) without this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145684 Approved by: https://github.com/eqy	2025-01-28 22:01:08 +00:00
Dmitry Nikolaev	d4871750d9	[ROCm] Enable post-merge trunk workflow on MI300 runners; skip and fix MI300 related failed tests (#143673 ) This PR * makes changes to the workflow files and scripts so we can run CI workflows on the MI300 runners * skips and fixes several tests, failed on MI300, observed in https://github.com/pytorch/pytorch/pull/140989 Skipped due to unsupported Float8_e4m3fn data type on MI300 (need to update test code to use datatypes supported by MI300): - distributed.tensor.parallel.test_micro_pipeline_tp.py::MicroPipelineTPTest::test_fuse_all_gather_scaled_matmul_A_dims_\_gather_dim_\ (24 tests across inductor/distributed configs) - distributed.tensor.parallel.test_micro_pipeline_tp.py::test_fuse_scaled_matmul_reduce_scatter_A_dims_\_scatter_dim_\ (12 tests across inductor/distributed configs)) - inductor.test_loop_ordering::LoopOrderingTest::test_fp8_cast_and_t - inductor.test_loop_ordering::LoopOrderingTest::test_fp8_pattern_2 Skipped due to AssertionError on MI300: - inductor.test_mkldnn_pattern_matcher.py::test_qconv2d_int8_mixed_bf16 - distributed._tools.test_sac_ilp::TestSACILP::test_sac_ilp_case1 Skipped: - test_cuda.py::TestCudaMallocAsync::test_clock_speed - test_cuda.py::TestCudaMallocAsync::test_power_draw - test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_cumsum_cuda Skipped flaky tests on MI300: - distributed.test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress_cuda - inductor.test_cpu_repro::CPUReproTests::test_lstm_packed_unbatched_False* (256 tests) Fixed: - test_matmul_cuda.py::TestFP8MatmulCudaCUDA::test_float8_basics_cuda Features: - inductor/test_fp8.py - declare a new function to convert FP8 datatypes to ROCm supported FP8 datatypes. It keeps test names for CUDA and ROCm and allows to enable Inductor FP8 tests on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/143673 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/pruthvistony Co-authored-by: saienduri <saimanas.enduri@amd.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-01-09 05:18:57 +00:00
cyy	df458be4e5	[4/N] Apply py39 ruff and pyupgrade fixes (#143257 ) ```torch/fx/passes/annotate_getitem_nodes.py``` was changed to support the new type hinting annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143257 Approved by: https://github.com/justinchuby, https://github.com/albanD	2025-01-04 10:47:51 +00:00

1 2 3 4 5 ...

2180 Commits