pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Lukas Geiger	a92b24cd83	Prevent cudaStreamSync when indexing GPU tensors with boolean CPU mask (#156384 ) `index_put` with a boolean mask (`target[mask] = src`) causes a `cudaStreamSynchronize`. When both `mask` and `target` tensors are on GPU this is expected. However, the sync can be prevented if the `mask` is a CPU tensor. Internally a new index tensor is created with `mask.nonzero()` so we can use a non-blocking copy to transfer it to the GPU since it cannot be accidentally mutated by the user between its creation and the device copy. @ngimel Let me know if I'm missing something. I think this is useful since users can't prevent a sync simply by making sure all tensors are on the same device as with other ops. Instead one would need to do something like this which is much less readable ```python indices = mask.nonzero().squeeze(1).to("cuda", non_blocking=True) target[indices] = src ``` Fixes #12461 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156384 Approved by: https://github.com/ngimel	2025-06-28 05:41:16 +00:00
Isuru Fernando	c808af514d	Support deterministic upsample trilinear backward (#154239 ) Fixes https://github.com/pytorch/pytorch/issues/154183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154239 Approved by: https://github.com/eellison, https://github.com/albanD	2025-06-26 15:02:27 +00:00
Catherine Lee	2ff3280c77	[ez] Disable some failing periodic tests (#156731 ) test_torch.py::TestTorchDeviceTypeCUDA::test_storage_use_count_cuda: Added in https://github.com/pytorch/pytorch/pull/150059 Fails in debug mode [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44706020831) [HUD commit link](`4491326fb0`) inductor/test_inductor_freezing.py::FreezingGpuTests::test_cpp_wrapper_cuda: [GH job link](https://github.com/pytorch/pytorch/actions/runs/15856606665/job/44707119967) [HUD commit link](`4491326fb0`) started failing after moving to new cuda version https://github.com/pytorch/pytorch/pull/155234 I'll ping people if this gets merged Pull Request resolved: https://github.com/pytorch/pytorch/pull/156731 Approved by: https://github.com/huydhn	2025-06-24 23:02:21 +00:00
Randolf Scholz	2e9bd03f60	Implemented `Size.__radd__` (#152554 ) Fixes #144334 Builds on top of #146834 by @khushi-411 The needed trick was to add `PyNumberMethods` because these Number Protocol appears to be responsible for `__radd__` (see https://stackoverflow.com/q/18794169) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152554 Approved by: https://github.com/albanD Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com> Co-authored-by: albanD <desmaison.alban@gmail.com>	2025-06-23 15:38:37 +00:00
Yu, Guangye	d84efde3f0	Move _storage_Use_Count to be gerneric (#155451 ) # Motivation `torch._C._storage_Use_Count` should be a generic API that is not aware of device type. It is also used in `337cd7c53d/torchtune/training/_activation_offloading.py (L323)` to do some memory optimization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155451 Approved by: https://github.com/albanD	2025-06-12 01:39:04 +00:00
Laith Sakka	8b507a9809	convert guard_size_oblivious to runtime check in infer_size_impl (#148872 ) its ok to check the requirement numel == newsize at runtime in case of unbacked instead of at compile time and assume that its true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148872 Approved by: https://github.com/bobrenjc93	2025-05-13 00:32:28 +00:00
Boyuan Feng	ffda46e3be	[Graph Partition] remove weak dep from `partition_input_names` (#152863 ) Graph partition analyzes read_writes to get partition input names. However, weak dep is fake dependency and is not actually read or written. So we should not include weak dep in graph partition input names. The following test failure is fixed by removing weak dependency from partition_input_names: `PYTORCH_TEST_WITH_INDUCTOR=1 python test/test_torch.py TestTorchDeviceTypeCUDA.test_params_invalidated_with_grads_invalidated_between_unscale_and_step_Adam_cuda_float32` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152863 Approved by: https://github.com/eellison	2025-05-09 17:20:04 +00:00
albanD	22d1359bc6	Move warning from item to specific number conversions (#152709 ) Follow up to https://github.com/pytorch/pytorch/pull/143261 to not warn when a plain .item() is done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152709 Approved by: https://github.com/malfet, https://github.com/ngimel	2025-05-05 20:46:05 +00:00
Jagadish Krishnamoorthy	0d99b4e9e2	ROCm: Enable tf32 testing on test_nn (#148945 ) Add tf32 support for ROCm tests. test command: python test/test_nn.py -v Pull Request resolved: https://github.com/pytorch/pytorch/pull/148945 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-04-28 23:01:04 +00:00
Andrew M. James	0413358a77	Non-deterministic alert in histc_cuda for floating types only (#151701 ) The note about atomic add only applies for floating point. The implementation is deterministic for integer data types. fixes: #151610 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151701 Approved by: https://github.com/ngimel, https://github.com/Skylion007	2025-04-24 21:16:46 +00:00
eqy	d78d2af4e3	[CUDA][TF32] Account for TF32 in `test_corrcoef` (#151830 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151830 Approved by: https://github.com/Skylion007	2025-04-24 21:06:07 +00:00
PyTorch MergeBot	ed0d2ebaa0	Revert "Non-deterministic alert in histc_cuda for floating types only (#151701 )" This reverts commit b7a7741411585817daa81780b078fd15816f2d2d. Reverted https://github.com/pytorch/pytorch/pull/151701 on behalf of https://github.com/ZainRizvi due to Sorry but this is causing inductor tests to fail. See here for more info: test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_histc_cuda_float32 [GH job link](https://github.com/pytorch/pytorch/actions/runs/14586002763/job/40913547718) [HUD commit link](`b7a7741411`) ([comment](https://github.com/pytorch/pytorch/pull/151701#issuecomment-2821800837))	2025-04-22 16:07:25 +00:00
Andrew M. James	b7a7741411	Non-deterministic alert in histc_cuda for floating types only (#151701 ) The note about atomic add only applies for floating point. The implementation is deterministic for integer data types. fixes: #151610 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151701 Approved by: https://github.com/ngimel, https://github.com/Skylion007	2025-04-22 03:24:36 +00:00
Joshua Hamilton	4ce0b959ff	Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 ) Fixes #143071 Operations performed on tensors with `requires_grad=True` such as ```python import torch x = torch.tensor(2.0, requires_grad=True) y = x ** 3 ``` and ```python x = torch.tensor(2.0, requires_grad=True) y = torch.pow(x,3) ``` are valid operations. While an operation using `numpy` like ```python import numpy as np x = torch.tensor(2.0, requires_grad=True) y = np.pow(x,3) # > RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. ``` leads to an error. However, an operation that uses `math` like ```python import math x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) ``` does not cause an error, and `y` is no longer a tensor with a gradient! This represents a [footgun](https://en.wiktionary.org/wiki/footgun#Noun) for some users, like myself when training small, custom, non-neural network models. To prevent future undesired behavior, I added a warning when converting tensors with `requires_grad=True` to scalars. Now, when using `math.pow` on a `tensor`, we get a single warning with: ```python x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) # > UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior. # Consider using tensor.detach() first. ``` Please let me know if you have any questions 👍 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143261 Approved by: https://github.com/malfet Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-04-01 00:42:46 +00:00
PyTorch MergeBot	1526ff955e	Revert "Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 )" This reverts commit 515b45e5693dbf9dd58d8472806cbe5f49e43074. Reverted https://github.com/pytorch/pytorch/pull/143261 on behalf of https://github.com/clee2000 due to failing internal tests D72135661 ([comment](https://github.com/pytorch/pytorch/pull/143261#issuecomment-2767531682))	2025-03-31 22:19:08 +00:00
Joshua Hamilton	515b45e569	Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 ) Fixes #143071 Operations performed on tensors with `requires_grad=True` such as ```python import torch x = torch.tensor(2.0, requires_grad=True) y = x ** 3 ``` and ```python x = torch.tensor(2.0, requires_grad=True) y = torch.pow(x,3) ``` are valid operations. While an operation using `numpy` like ```python import numpy as np x = torch.tensor(2.0, requires_grad=True) y = np.pow(x,3) # > RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. ``` leads to an error. However, an operation that uses `math` like ```python import math x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) ``` does not cause an error, and `y` is no longer a tensor with a gradient! This represents a [footgun](https://en.wiktionary.org/wiki/footgun#Noun) for some users, like myself when training small, custom, non-neural network models. To prevent future undesired behavior, I added a warning when converting tensors with `requires_grad=True` to scalars. Now, when using `math.pow` on a `tensor`, we get a single warning with: ```python x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) # > UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior. # Consider using tensor.detach() first. ``` Please let me know if you have any questions 👍 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143261 Approved by: https://github.com/albanD Co-authored-by: albanD <desmaison.alban@gmail.com>	2025-03-30 11:19:07 +00:00
PyTorch MergeBot	16560d4e8f	Revert "Refactor `test/test_torch.py` by moving testcase to `test_indexing.py` (#148875 )" This reverts commit 0fa0a740958ffc474843ceb1d19ee43c4bff4c09. Reverted https://github.com/pytorch/pytorch/pull/148875 on behalf of https://github.com/ZainRizvi due to That torch.version failure you got in CI was a legitimate failure and is now breaking trunk. [GH job link](https://github.com/pytorch/pytorch/actions/runs/13778023702/job/38534207536) [HUD commit link](`0fa0a74095`) ([comment](https://github.com/pytorch/pytorch/pull/148875#issuecomment-2714757288))	2025-03-11 15:27:25 +00:00
zeshengzong	0fa0a74095	Refactor `test/test_torch.py` by moving testcase to `test_indexing.py` (#148875 ) Fix `FIXME` in `test_torch.py` by moving test-cases to `test_indexing.py` ```python # FIXME: move to test indexing # FIXME: move to indexing test suite ``` - Move tests in `test/test_torch.py` to `test_indexing.py` - Remove `FIXME` comments ## TestResult ```bash pytest test/test_torch.py -k TestTorchDeviceType -vv pytest test/test_indexing.py -k TestIndexing -vv ``` ![image](https://github.com/user-attachments/assets/49a80985-e74a-4da6-a063-476e87e6aa8a) ![image](https://github.com/user-attachments/assets/77afa936-5dba-480c-b293-eb1f7bc74420) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148875 Approved by: https://github.com/soulitzer	2025-03-11 01:01:59 +00:00
Nikita Shulga	3cde4c3069	[BE] Remove `onlyCPU` decorator from test_local_scalar_dense (#148559 ) Followup from https://github.com/pytorch/pytorch/pull/145717, not sure why author thinks those tests should be limited to one architecture. And fixed similar crashes for CUDA and MPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/148559 Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/seemethere	2025-03-06 17:43:02 +00:00
Sun, Jiayi	19a6cf35f6	add input shape check for _local_scalar_dense (#145717 ) Fix https://github.com/pytorch/pytorch/issues/145066. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145717 Approved by: https://github.com/malfet	2025-03-05 15:24:08 +00:00
cyy	ec2805ada8	Remove outdated CUDA version check (#148142 ) Since Torch requires CUDA>=11, some checks can be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148142 Approved by: https://github.com/janeyx99, https://github.com/eqy	2025-03-04 03:33:44 +00:00
cyy	b0dfd242fa	Remove NO_MULTIPROCESSING_SPAWN checks (#146705 ) py 3.9 has spawn. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146705 Approved by: https://github.com/colesbury	2025-02-28 05:53:19 +00:00
PyTorch MergeBot	926b7b5027	Revert "Remove NO_MULTIPROCESSING_SPAWN checks (#146705 )" This reverts commit 40ad5e01dff05c7d64e070fb01683820e678f788. Reverted https://github.com/pytorch/pytorch/pull/146705 on behalf of https://github.com/cyyever due to Broke lint?, I guess land race with rufff update ([comment](https://github.com/pytorch/pytorch/pull/146705#issuecomment-2689603077))	2025-02-28 03:04:38 +00:00
cyyever	40ad5e01df	Remove NO_MULTIPROCESSING_SPAWN checks (#146705 ) py 3.9 has spawn. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146705 Approved by: https://github.com/colesbury	2025-02-28 00:15:32 +00:00
Mikayla Gawarecki	536bce5a04	Make Tensor.set_ validate storage_offset when sizes/strides are unchanged (#147354 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147354 Approved by: https://github.com/albanD ghstack dependencies: #147352	2025-02-27 15:48:58 +00:00
Benjamin Glass	f98cd84b04	cpp_wrapper: use largeTensorTest for test memory checks (#146991 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146991 Approved by: https://github.com/desertfire	2025-02-27 00:30:21 +00:00
cyy	c6ea4425e5	Enable some tests on Windows (#146243 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146243 Approved by: https://github.com/albanD	2025-02-05 03:54:28 +00:00
Danial Javady	bb4964013f	Add determinmistic kernel for reflection2d (#136241 ) Adds feature for #98925 Tests pass for both existing reflectionpad2d and the new one I inserted. Summary of the work: Simple conditional check for deterministic mode that will dispatch to a different kernel. This kernel does not use any atomic operations, and will lead to deterministic results as instead of going from the output to input(1:1) relationship, I am doing the opposite. I am going from input -> all outputs, which is 1 to many. These operations are done in the same order every execution as I simply traverse the data set with a grid stride loop and use simple linearized indexing into the input tensor. So each thread will compute the 4 conditionals, which are then used to see if the input has an output in the 8 regions. These 8 regions are top left, top, top right, left, right, bottom left, bottom, bottom right`. I did not focus on performance for this PR as that would expand the scope heavily. If there are any performance questions though i can answer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136241 Approved by: https://github.com/eqy, https://github.com/albanD	2025-01-29 20:34:03 +00:00
Jack Taylor	082fab0fc7	[64-bit] Int64 casting for UpSampleNearest3D (#144865 ) Fixes #144855 Follows approach in https://github.com/pytorch/pytorch/pull/141923 to use int64 types to increase INT_MAX limits Pull Request resolved: https://github.com/pytorch/pytorch/pull/144865 Approved by: https://github.com/eqy	2025-01-29 19:30:09 +00:00
gasoonjia	501c5972f0	[pytorch] raise exception when calling dim order on sparse tensor (#145888 ) This diff introduces a change to the PyTorch library that raises an exception when calling the `dim_order` method on a sparse tensor. Differential Revision: [D68797044](https://our.internmc.facebook.com/intern/diff/D68797044/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145888 Approved by: https://github.com/Jack-Khuu	2025-01-29 06:15:44 +00:00
Benjamin Glass	5aa5a5763e	[inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684 ) Triton 2.2 and greater have a bug where allowing TF32 generation for a GPU that does not support TF32 will cause code generation errors. Patch around this problem by: 1. Adding a function to `torch.cuda` that determines whether CUDA hardware is capable of using the TF32 format. 2. Using that function to explicitly disable TF32 generation when calling Triton, where needed. To demonstrate that this fix works, try running `test/inductor/test_max_autotune.py` on a GPU with CUDA compute capability < 8 (e.g. any NVIDIA consumer GPU) without this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145684 Approved by: https://github.com/eqy	2025-01-28 22:01:08 +00:00
Dmitry Nikolaev	d4871750d9	[ROCm] Enable post-merge trunk workflow on MI300 runners; skip and fix MI300 related failed tests (#143673 ) This PR * makes changes to the workflow files and scripts so we can run CI workflows on the MI300 runners * skips and fixes several tests, failed on MI300, observed in https://github.com/pytorch/pytorch/pull/140989 Skipped due to unsupported Float8_e4m3fn data type on MI300 (need to update test code to use datatypes supported by MI300): - distributed.tensor.parallel.test_micro_pipeline_tp.py::MicroPipelineTPTest::test_fuse_all_gather_scaled_matmul_A_dims_\_gather_dim_\ (24 tests across inductor/distributed configs) - distributed.tensor.parallel.test_micro_pipeline_tp.py::test_fuse_scaled_matmul_reduce_scatter_A_dims_\_scatter_dim_\ (12 tests across inductor/distributed configs)) - inductor.test_loop_ordering::LoopOrderingTest::test_fp8_cast_and_t - inductor.test_loop_ordering::LoopOrderingTest::test_fp8_pattern_2 Skipped due to AssertionError on MI300: - inductor.test_mkldnn_pattern_matcher.py::test_qconv2d_int8_mixed_bf16 - distributed._tools.test_sac_ilp::TestSACILP::test_sac_ilp_case1 Skipped: - test_cuda.py::TestCudaMallocAsync::test_clock_speed - test_cuda.py::TestCudaMallocAsync::test_power_draw - test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_cumsum_cuda Skipped flaky tests on MI300: - distributed.test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress_cuda - inductor.test_cpu_repro::CPUReproTests::test_lstm_packed_unbatched_False* (256 tests) Fixed: - test_matmul_cuda.py::TestFP8MatmulCudaCUDA::test_float8_basics_cuda Features: - inductor/test_fp8.py - declare a new function to convert FP8 datatypes to ROCm supported FP8 datatypes. It keeps test names for CUDA and ROCm and allows to enable Inductor FP8 tests on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/143673 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/pruthvistony Co-authored-by: saienduri <saimanas.enduri@amd.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-01-09 05:18:57 +00:00
cyy	df458be4e5	[4/N] Apply py39 ruff and pyupgrade fixes (#143257 ) ```torch/fx/passes/annotate_getitem_nodes.py``` was changed to support the new type hinting annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143257 Approved by: https://github.com/justinchuby, https://github.com/albanD	2025-01-04 10:47:51 +00:00
Davide Italiano	41b5c600df	[ReduceOps] Add dimension checking for cummin()/cummax(). (#143920 ) Summary: cum{min,max} didn't guard against 0-d vector and allowed an arbitrary dimension to be passed. Test Plan: torch_test.py Reviewers: Subscribers: Tasks: Tags: Fixes #71477 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143920 Approved by: https://github.com/malfet	2025-01-03 04:14:33 +00:00
Wenqin Yang	8d9ff9c8a4	Fix a bug for wrong stride in fake tensor (#141427 ) Fixes #141426 Please see details in the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141427 Approved by: https://github.com/jansel	2024-12-31 23:45:32 +00:00
Eddie Yan	b90a3b7281	[cumsum][CUDA][64-bit indexing] Add 64-bit indexing path for `cumsum` (#143696 ) For #143486 Interestingly enough changing the indexing type seems to degrade performance when a larger width is not needed, even on small sizes, so making this a template param rather than forcing all cases to 64-bit Pull Request resolved: https://github.com/pytorch/pytorch/pull/143696 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-12-24 03:45:28 +00:00
PyTorch MergeBot	49fdc52fd2	Revert "Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 )" This reverts commit bc78b6ea4f88d673426d6de17671b82facf50beb. Reverted https://github.com/pytorch/pytorch/pull/143261 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lint, plz help fix and reland this ([comment](https://github.com/pytorch/pytorch/pull/143261#issuecomment-2560583332))	2024-12-24 03:15:38 +00:00
Joshua Hamilton	bc78b6ea4f	Add a warning when a tensor with requires_grad=True is converted to a scalar (#143261 ) Fixes #143071 Operations performed on tensors with `requires_grad=True` such as ```python import torch x = torch.tensor(2.0, requires_grad=True) y = x ** 3 ``` and ```python x = torch.tensor(2.0, requires_grad=True) y = torch.pow(x,3) ``` are valid operations. While an operation using `numpy` like ```python import numpy as np x = torch.tensor(2.0, requires_grad=True) y = np.pow(x,3) # > RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. ``` leads to an error. However, an operation that uses `math` like ```python import math x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) ``` does not cause an error, and `y` is no longer a tensor with a gradient! This represents a [footgun](https://en.wiktionary.org/wiki/footgun#Noun) for some users, like myself when training small, custom, non-neural network models. To prevent future undesired behavior, I added a warning when converting tensors with `requires_grad=True` to scalars. Now, when using `math.pow` on a `tensor`, we get a single warning with: ```python x = torch.tensor(2.0, requires_grad=True) y = math.pow(x,3) # > UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior. # Consider using tensor.detach() first. ``` Please let me know if you have any questions 👍 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143261 Approved by: https://github.com/albanD	2024-12-24 00:22:18 +00:00
Tom Ritchford	f1cbf4b1b5	Enable ruff's unused variable checking everywhere in pytorch (#136965 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136965 Approved by: https://github.com/cyyever, https://github.com/albanD	2024-12-22 02:33:11 +00:00
Tom Ritchford	8d4926e30a	Fix unused variables in test/torch.py (#143399 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143399 Approved by: https://github.com/albanD	2024-12-18 17:57:24 +00:00
emmettbicker	576789197a	Add support for CPU scalar in addcmul (#143264 ) Step required for performance in #143122 Adds support for CPU scalar for tensor_2 in addcmul. For example: ``` import torch a = torch.rand(2, 2, device="cuda") b = torch.tensor(1e-3) torch.add(a, b) torch.addcmul(a, a, b) # used to fail, now works ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143264 Approved by: https://github.com/janeyx99 Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2024-12-18 04:43:29 +00:00
Natalia Gimelshein	859be14c4e	fix a few int64_t index computations, fix complex128 scan that had to… (#143401 ) …o few threads per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/143401 Approved by: https://github.com/eqy	2024-12-18 04:27:27 +00:00
albanD	70be7900bb	Fix Tensor clear to properly clear slots (#143203 ) Fixes a bug introduced in https://github.com/pytorch/pytorch/pull/137267 While the test ensures the finalizer did run to make sure things are cleared, the objects are not properly collected by the gc due to the faulty tp_clear implementation. So, while the finalizer did run, the object was still alive. Fixing this by giving tp_clear the same treatment as tp_traverse and tp_dealloc on Tensor: make it a unique function that handles the full subclass hierarchy in one place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143203 Approved by: https://github.com/ezyang, https://github.com/colesbury ghstack dependencies: #143202	2024-12-14 00:17:07 +00:00
gasoonjia	29e985b7b0	[dim_order] raised runtime error when tensor has ambiguous dim order (#141632 ) This diff makes tensor.dim_order() raise error when tensor's dim order is ambiguous. Detail discussion can be found https://fb.workplace.com/groups/894363187646754/permalink/2039987243084337/ Differential Revision: [D65133579](https://our.internmc.facebook.com/intern/diff/D65133579/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141632 Approved by: https://github.com/larryliu0820	2024-12-08 23:16:57 +00:00
Natalia Gimelshein	0443398f5b	Implement deterministic scan (#140887 ) Fixes #89492 Uses block-wise cub primitives On large inputs, this implementation is approximately 25% slower than device cub implementation, so it's turned on only in cases where cub would have been (floating point inputs, cumsum that is effectively 1d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140887 Approved by: https://github.com/ezyang, https://github.com/kurtamohler	2024-11-19 23:43:26 +00:00
PyTorch MergeBot	7f10351ba0	Revert "Implement deterministic scan (#140887 )" This reverts commit 4eed438a42a054a63b5e0a7225dd0e84cf488a96. Reverted https://github.com/pytorch/pytorch/pull/140887 on behalf of https://github.com/ngimel due to breaks with 11.4 ([comment](https://github.com/pytorch/pytorch/pull/140887#issuecomment-2486409438))	2024-11-19 18:08:48 +00:00
Prachi Gupta	7156d0824d	[ROCm] Fix largeIndexBlockSize (#139087 ) On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. This impacts index_add_ operation on a large tensor, we end up picking the large values instead of max supported block size (128). This leads to GPU accessing memory out of bounds. While we wait for ::min to be fixed, we can use < operator to compare instead of relying on ::min. Example Code w/ failure: ``` D=6144 hidden_states = torch.zeros([16384, 6144], device="cuda:0", dtype=torch.bfloat16) index = torch.randint(0, 16384, (1, 32, 16384), device="cuda:0", dtype=torch.int64) output = torch.empty([1, 32, 16384, 6144], device="cuda:0", dtype=torch.bfloat16) hidden_states.index_add_(0, index.view(-1), output.view(-1, D)) ``` ``` Traceback (most recent call last): RuntimeError: HIP error: invalid configuration argument ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/139087 Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony	2024-11-19 06:29:48 +00:00
Natalia Gimelshein	4eed438a42	Implement deterministic scan (#140887 ) Fixes #89492 Uses block-wise cub primitives On large inputs, this implementation is approximately 25% slower than device cub implementation, so it's turned on only in cases where cub would have been (floating point inputs, cumsum that is effectively 1d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140887 Approved by: https://github.com/ezyang, https://github.com/kurtamohler	2024-11-18 20:56:14 +00:00
Nikita Shulga	0f739b8f66	[Codemod] `skipIfMps`->`skipIfMPS` (#140562 ) As `MPS` is an acronym that stands for Metal Performance Shaders Also to closer align with `skipCUDAIf` not `skipCudaIf` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140562 Approved by: https://github.com/ZainRizvi, https://github.com/r-barnes	2024-11-13 19:45:08 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00

1 2 3 4 5 ...

2163 Commits