Commit Graph

395 Commits

Author SHA1 Message Date
71aefd5595 [reland] Allow setting grad_dtype on leaf tensors (#164751)
ghstack-source-id: e44b3941530be83a630ec93f1478eec741ffca2e
Pull-Request-resolved: https://github.com/pytorch/pytorch/pull/162815

Fixes #ISSUE_NUMBER

Relanding due to internal weirdness. Separate PR to codev w/o ghstack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164751
Approved by: https://github.com/albanD
2025-10-08 20:23:13 +00:00
331191ce4b Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659)"
This reverts commit 29cbcbac4215e0d9070a1b7a07ddaec9a36bbd08.

Reverted https://github.com/pytorch/pytorch/pull/162659 on behalf of https://github.com/izaitsevfb due to reverted internally, see [D83214133](https://www.internalfb.com/diff/D83214133) ([comment](https://github.com/pytorch/pytorch/pull/162659#issuecomment-3369348172))
2025-10-05 21:39:57 +00:00
3ddf2018d0 Revert "Support setting grad_dtype on leaf tensors (#162815)"
This reverts commit dca73982c53e9f99f96246b5d9ed9bab83c7423f.

Reverted https://github.com/pytorch/pytorch/pull/162815 on behalf of https://github.com/yangw-dev due to break internal test D83850533, see more details below ([comment](https://github.com/pytorch/pytorch/pull/162815#issuecomment-3367498501))
2025-10-03 23:14:28 +00:00
dca73982c5 Support setting grad_dtype on leaf tensors (#162815)
`grad_dtype` is a new attribute on Tensor to control gradient dtype:
- Access/setting is leaf-only.
- grad_dtype is respected when (1) when assigning to .grad, and (2) in the engine after the previous node produces incoming gradients for AccumulateGrad. (See table below for details)
- Not setting grad_dtype preserves the current behavior. Accessing it returns `t.dtype`
- `grad_dtype` cannot be set when there is already a `.grad` present and the dtypes conflict.

| `grad_dtype` setting | Setting `.grad` manually | Incoming gradient from autograd engine |
|-----------------------|--------------------------|-----------------------------------------|
| **Default (tensor’s dtype)** | `.grad` must match tensor’s dtype | Engine casts incoming grad to tensor’s dtype |
| **Set to specific dtype** | `.grad` must match that dtype | Engine casts incoming grad to the specified dtype |
| **Set to `None`** | `.grad` may be any dtype | Engine does not cast; accepts incoming grad dtype as-is |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162815
Approved by: https://github.com/albanD
2025-10-02 23:09:07 +00:00
cc5d74c366 Revert "[BE] Remove HermeticPyObjectTLS and Simplify PythonOpRegistrationTrampoline (#163464)"
This reverts commit 94195a37ae4eae9c486a81b0f67725c8970f74d6.

Reverted https://github.com/pytorch/pytorch/pull/163464 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/163464#issuecomment-3353307034))
2025-09-30 18:20:20 +00:00
46ec0664e3 Remove unused PyIntXXX, THPUtils_newReal_BOOL, THPQXXX macros (#164056)
The removed macros are not used in other places of the `pytorch` GitHub org.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164056
Approved by: https://github.com/albanD
2025-09-30 13:48:25 +00:00
94195a37ae [BE] Remove HermeticPyObjectTLS and Simplify PythonOpRegistrationTrampoline (#163464)
Removes HermeticPyObjectTLS as we no longer need since torch deploy is no longer supported. PythonOpRegistrationTrampoline is also drastically simplified as and being prepped for removal in a future PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163464
Approved by: https://github.com/albanD, https://github.com/Skylion007
2025-09-25 23:30:50 +00:00
29cbcbac42 [BE] Make PyObjectSlot use a global PyInterpreter (#162659)
This pr gets rid of the pyobj_interpreter_ variable from PyObjectSlot and saves a word in the process

Gonna ask for review from @huydhn as there are some changes to CI.

Testing: imported internally and the failed android build seems to work now!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162659
Approved by: https://github.com/albanD, https://github.com/huydhn
2025-09-25 08:53:19 +00:00
edafc902d7 Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659)"
This reverts commit d1993c27ae59842c887d549a3f8936fbcd769498.

Reverted https://github.com/pytorch/pytorch/pull/162659 on behalf of https://github.com/wdvr due to reverted internally, please see D82771705 @PaliC ([comment](https://github.com/pytorch/pytorch/pull/162659#issuecomment-3317110247))
2025-09-22 06:22:37 +00:00
5599f487ef Fully native DTensor.__new__ (#162508)
Move the entirety of `__new__` into C++, saving a layer of disable_dynamo and making progress toward all-C++.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162508
Approved by: https://github.com/ezyang
ghstack dependencies: #161695
2025-09-21 18:36:05 +00:00
76a841fd47 Port OpSchema.__post_init__ and OpSchema._recompute_comparison_key to C++ (#161695)
I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking.

Differential Revision: [D81530102](https://our.internmc.facebook.com/intern/diff/D81530102)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161695
Approved by: https://github.com/ezyang
2025-09-19 04:07:30 +00:00
d1993c27ae [BE] Make PyObjectSlot use a global PyInterpreter (#162659)
This pr gets rid of the pyobj_interpreter_ variable from PyObjectSlot and saves a word in the process

Gonna ask for review from @huydhn as there are some changes to CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162659
Approved by: https://github.com/albanD, https://github.com/huydhn
2025-09-17 16:40:55 +00:00
a63221a335 Fix TODO in make_tensor_for_subclass_helper (#162336)
The constructor does accept a DataPtr (had to fix the DataPtr variant not accepting a SymInt, though).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162336
Approved by: https://github.com/ezyang
ghstack dependencies: #162298
2025-09-17 06:46:34 +00:00
4db203f875 Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659)"
This reverts commit 05ee8114f818a95745c812c3cd7aa8e784e61a9a.

Reverted https://github.com/pytorch/pytorch/pull/162659 on behalf of https://github.com/jeanschmidt due to seems to have introduced errors in linting see https://github.com/pytorch/pytorch/actions/runs/17750689989/job/50444910643 ([comment](https://github.com/pytorch/pytorch/pull/162659#issuecomment-3298626136))
2025-09-16 12:52:57 +00:00
05ee8114f8 [BE] Make PyObjectSlot use a global PyInterpreter (#162659)
This pr gets rid of the pyobj_interpreter_ variable from PyObjectSlot and saves a word in the process

Gonna ask for review from @huydhn as there are some changes to CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162659
Approved by: https://github.com/albanD, https://github.com/huydhn
2025-09-16 00:37:09 +00:00
1274297e06 Remove __torch_dispatch__ check in THPVariable_make_dtensor (#162337)
We control DTensor, so we can just guarantee there isn't a programming error with __torch_dispatch__. (The guard is already less-than-perfect; see the note that the deleted comment refers to.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162337
Approved by: https://github.com/Skylion007
ghstack dependencies: #161591, #161595, #161633, #161634, #161692, #162219, #162220, #162218, #161596
2025-09-11 06:58:35 +00:00
eab2afeff7 fastpath type Tensor in THPVariable_NewWithVar (#161634)
It is cheap to do an exact check against Tensor and much faster when it works (PyType_IsSubtype does not have this fastpath, I checked [source](9ee0214b5d/Objects/typeobject.c (L2889))). Spot-checked in perf on detach-DTensor-in-a-loop benchmark; small win but clear.

Differential Revision: [D81530101](https://our.internmc.facebook.com/intern/diff/D81530101)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161634
Approved by: https://github.com/Skylion007, https://github.com/albanD
ghstack dependencies: #161591, #161595, #161633
2025-09-09 01:10:06 +00:00
8e076d889c Don't call check_has_torch_dispatch in THPVariable_NewWithVar if we already know (#161591)
We already know when we're called from make_wrapper_subclass or make_dtensor. The check isn't particularly cheap.

Differential Revision: [D81530099](https://our.internmc.facebook.com/intern/diff/D81530099)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161591
Approved by: https://github.com/ezyang
ghstack dependencies: #161466, #161586, #161590
2025-09-08 16:28:08 +00:00
88d94d17e8 Add torch.Tensor._make_dtensor to accelerate DTensor.__new__ further (#161590)
This seems to be a (very very roughly) ~8% improvement on DTensor benchmark very similar to the benchmark from #160580 (120ish usec -> 110ish usec)

Differential Revision: [D81530105](https://our.internmc.facebook.com/intern/diff/D81530105)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161590
Approved by: https://github.com/albanD
ghstack dependencies: #161466, #161586
2025-09-05 18:43:41 +00:00
0ee8a4e281 Fix accidental copy in pushPyOutToStack (#161329)
`auto` forces a copy. Confirmed this did something noticable with perf.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161329
Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328
2025-08-30 06:55:43 +00:00
1b99c1859c [BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)
This PR is a bit more involved but effectively works to drastically simplify PyObjectSlot and PyInterpreter.
1) For PyObjectSlot we now use a global pyinterpreter since there only is one. From here we change all of the call sites to rely on this assumption.
2) We also remove the "tags" of the PyInterpreter by deprecating `PyInterpreterStatus`.

For the reviewer, sadly it seems like `functorch/csrc/dim/dim.cpp` needed to get linted, so there is an unreadable amount of changes there. Fortunately, the only actual change in the file is as follows which just removes `getPyInterpreter()` from  the `check_pyobj` call.

```
 mpy::handle handle_from_tensor(Arena& A, TensorRef t) {
-    // fast case: tensor is live in python
-    std::optional<PyObject*> mb_obj =
-        t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(getPyInterpreter(), /*ignore_hermetic_tls=*/false);
-    if (mb_obj.has_value() && !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
-        return *mb_obj;
-    }
-    return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
-}
-}
+  // fast case: tensor is live in python
+  std::optional<PyObject*> mb_obj =
+      t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(
+          /*ignore_hermetic_tls=*/false);
+  if (mb_obj.has_value() &&
+      !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
+    return *mb_obj;
+  }
+  return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
+}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158427
Approved by: https://github.com/albanD
2025-07-30 17:29:43 +00:00
15a50dcf1c Revert "[BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)"
This reverts commit eb7365072315be2bc4259114e25e269801441748.

Reverted https://github.com/pytorch/pytorch/pull/158427 on behalf of https://github.com/ZainRizvi due to Reverting this as part of reverting the stack for https://github.com/pytorch/pytorch/pull/158288 ([comment](https://github.com/pytorch/pytorch/pull/158427#issuecomment-3099815367))
2025-07-21 23:14:57 +00:00
eb73650723 [BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)
This PR is a bit more involved but effectively works to drastically simplify PyObjectSlot and PyInterpreter.
1) For PyObjectSlot we now use a global pyinterpreter since there only is one. From here we change all of the call sites to rely on this assumption.
2) We also remove the "tags" of the PyInterpreter by deprecating `PyInterpreterStatus`.

For the reviewer, sadly it seems like `functorch/csrc/dim/dim.cpp` needed to get linted, so there is an unreadable amount of changes there. Fortunately, the only actual change in the file is as follows which just removes `getPyInterpreter()` from  the `check_pyobj` call.

```
 mpy::handle handle_from_tensor(Arena& A, TensorRef t) {
-    // fast case: tensor is live in python
-    std::optional<PyObject*> mb_obj =
-        t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(getPyInterpreter(), /*ignore_hermetic_tls=*/false);
-    if (mb_obj.has_value() && !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
-        return *mb_obj;
-    }
-    return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
-}
-}
+  // fast case: tensor is live in python
+  std::optional<PyObject*> mb_obj =
+      t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(
+          /*ignore_hermetic_tls=*/false);
+  if (mb_obj.has_value() &&
+      !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
+    return *mb_obj;
+  }
+  return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
+}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158427
Approved by: https://github.com/albanD
2025-07-18 05:23:00 +00:00
07bb097698 Fix clang-tidy bugprone* warnings (#148529)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148529
Approved by: https://github.com/ezyang
2025-06-23 23:09:56 +00:00
297805fd8f Typo fixes for "overridden" in comments and function names (#155944)
This word appears often in class descriptions and is not consistently spelled. Update comments and some function names to use the correct spelling consistently. Facilitates searching the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155944
Approved by: https://github.com/Skylion007
2025-06-14 03:37:38 +00:00
cyy
8fa81a6066 Enable misc-use-internal-linkage check and apply fixes (#148948)
Enables clang-tidy rule [`misc-use-internal-linkage`](https://clang.llvm.org/extra/clang-tidy/checks/misc/use-internal-linkage.html). This new check was introduced in Clang-Tidy 18 and is available due to recent update of Clang-Tidy 19.

The check marks functions and variables used only in the translation unit as static. Therefore undesired symbols are not leaked into other units, more link time optimisations are possible and the resulting binaries may be smaller.

The detected violations were mostly fixed by using static. In other cases, the symbols were indeed consumed by others files, then their declaring headers were included. Still some declarations were wrong and have been fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148948
Approved by: https://github.com/Skylion007
2025-03-12 14:22:56 +00:00
cyy
8df99b6a6e Remove unneeded std::make_optional (#143575)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143575
Approved by: https://github.com/Skylion007
2024-12-31 03:08:47 +00:00
70be7900bb Fix Tensor clear to properly clear slots (#143203)
Fixes a bug introduced in https://github.com/pytorch/pytorch/pull/137267

While the test ensures the finalizer did run to make sure things are cleared, the objects are not properly collected by the gc due to the faulty tp_clear implementation. So, while the finalizer did run, the object was still alive.
Fixing this by giving tp_clear the same treatment as tp_traverse and tp_dealloc on Tensor: make it a unique function that handles the full subclass hierarchy in one place.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143203
Approved by: https://github.com/ezyang, https://github.com/colesbury
ghstack dependencies: #143202
2024-12-14 00:17:07 +00:00
8741d72e3c move function before modifying it (#143202)
This is a no-op. Just to make the diff in the next PR easier to read

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143202
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2024-12-14 00:17:07 +00:00
882b6af219 c10::string_view -> std::string_view in autograd (#142354)
Differential Revision: D66939966

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142354
Approved by: https://github.com/Skylion007
2024-12-10 15:43:41 +00:00
cyy
45ed7c13fa Remove unneeded std::make_optional (#141567)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141567
Approved by: https://github.com/albanD
2024-11-28 00:05:21 +00:00
cyy
4a2da52137 [1/N] Replace c10::sv with std::sv (#139453)
Picks some safe replacements.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139453
Approved by: https://github.com/Skylion007
2024-11-01 05:39:37 +00:00
cyy
1605d4aeb8 Fix object slice (#138880)
To avoid casting Tensor to Tensorbase

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138880
Approved by: https://github.com/Skylion007
2024-10-26 00:13:19 +00:00
69ba89da11 Fix cuda sanitizer and as_subclass calls (#138218)
This fixes 4 main issues:
- The way the cuda sanitizer handle it's state is weird. In particular, because the lifetime of the Mode is linked to the submodule, then this might outlive the python runtime and other modules loaded. On my current version, this even outlives the "sys" module. Given that I'm not sure the impact of changing this lifetime handling, I'm making the exit handler a no-op when python is already dying and thus no point cleaning up.
- Adds a "disable" method to be able to test after the mode is enabled.
- Fix `Tensor.as_sublass()` to properly disable modes when creating the new Tensor object just like we already do in `make_subclass` and `make_wrapper_subclass`. The change here is just to apply the exact same treatment to it.
- ~Fix `Tensor.as_subclass()` not to propagate autograd as there is no valid backward associated here.~ We have test that check that this behavior happen so I guess this is not an obvious bugfix and expected behavior. Reverted that change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138218
Approved by: https://github.com/ngimel
2024-10-17 21:18:32 +00:00
c0deec120f Fix resurrection logic to trigger early enough (#137267)
Fixes https://github.com/pytorch/pytorch/issues/136358

The bug here is that the Tensor object is actually 2 classes: `Tensor` from `_tensor.py` and `TensorBase` from c++.

Before this PR, they have the following gc methods:
Tensor:
 - tp_clear subtype_clear
 - tp_traverse THPVariable_subclass_traverse
 - tp_dealloc THPVariable_subclass_dealloc

TensorBase:
- tp_clear THPVariable_clear
- tp_traverse THPFunction_traverse (fake function that is just an error)
- tp_dealloc object_dealloc

The problem is that when clear is called on the Tensor, subtype_clear is going to clear the things owned by the "Tensor" type, in particular, its `__dict__` attribute, before delegating to the TensorBase clear where we detect that resurrection needs to happen and skip it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137267
Approved by: https://github.com/ezyang, https://github.com/kshitij12345
2024-10-04 21:13:54 +00:00
adc48a5b52 Python CAPI cleanup (#137266)
This is unrelated to anything else, but as I was going through the code, fixing bad patterns and a refcount bug (which is unlikely to cause any real issue tbh)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137266
Approved by: https://github.com/Skylion007
2024-10-03 17:55:48 +00:00
8962610247 [BE][clang-format] make macro PyObject_HEAD_INIT(type) and PyVarObject_HEAD_INIT(type, size) have its own line (#136949)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136949
Approved by: https://github.com/albanD, https://github.com/eqy
ghstack dependencies: #136945
2024-10-02 18:39:22 +00:00
cf31724db7 Fix and improvements to toward 3.13t (#136319)
Small part of https://github.com/pytorch/pytorch/pull/130689
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136319
Approved by: https://github.com/malfet, https://github.com/Skylion007
2024-09-20 04:22:18 +00:00
cyy
929d2f8253 [3/N] Fix clang-tidy warnings in torch/csrc/autograd (#133389)
Follows #133295
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133389
Approved by: https://github.com/Skylion007
2024-08-16 00:57:54 +00:00
cyy
29861779ce [2/N] Change #include <c10/util/Optional.h> to #include <optional> (#130236)
Follows  #128301. The changes were made by grep and sed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130236
Approved by: https://github.com/ezyang
2024-07-09 03:17:24 +00:00
cyy
f4dcf2ae93 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang, https://github.com/r-barnes
2024-07-08 07:03:53 +00:00
27a14405d3 enable device index check for all device types (#126767)
enable device index check for all device types for grad setter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126767
Approved by: https://github.com/albanD
2024-06-27 01:09:53 +00:00
846bb30e13 Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)"
This reverts commit bd72e28314d8d63bb347becb8309f5ac7761c6b5.

Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build bd72e28314. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))
2024-06-15 01:58:20 +00:00
cyy
bd72e28314 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang
2024-06-14 23:21:01 +00:00
cd06ae0cb8 Relax use_count constraints for swap_tensors when AccumulateGrad holds a reference (#127313)
### Before this PR:
`torch.utils.swap_tensors(a, b)` required the `use_count` of `a` and `b` to be 1

```python
a = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, 4)
out = a * 2
out.sum().backward()
# Calling swap_tensors here would fail due to the reference held by AccumulateGrad node, which is not cleaned up after backward
# torch.utils.swap_tensors(a, b)
del out
# Calling swap_tensors here would pass
torch.utils.swap_tensors(a, b)
```
### After this PR:
`torch.utils.swap_tensors(a, b)` requires the `use_count` of `a` and `b` to be 1 or 2 IF the second reference is held by `AccumulateGrad`

A pre-hook will be registered on the `AccumulateGrad` node so that it will fail if it is called (i.e. if user attempts to backward through the graph).

```python
a = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, 4)
out = a * 2
out.sum().backward()
# Calling swap_tensors here is ok
torch.utils.swap_tensors(a, b)
# If we ever backward to the AccumulateGrad node it will error that it was poisoned by swap_tensors
```

### Application to `nn.Module`

This issue is especially pertinent in context of `nn.Module` where parameters will have `AccumulateGrad` nodes initialized after forward. Specifically, this is intended to address https://github.com/pytorch/pytorch/pull/126814#issuecomment-2127777866. Previously, this would fail at the `m.cpu()` but we want users to be able to do something like the following, and instead raise an error if the user ever attempts to backward through the poisoned `AccumulateGrad` node

```python
import torch
import torch.nn as nn
m = nn.Linear(3, 5)
inp = torch.randn(2, 3)
out = m(inp)
out.sum().backward()
m.cpu()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127313
Approved by: https://github.com/soulitzer
2024-05-30 07:06:55 +00:00
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
71467abc44 Changes to compile with 3.13 (#126033)
This is mainly:
- Fix refcount access macro
- Hide all the Dynamo code that needs update as usual
- Add _PyWeakref_ClearRef as an extern provided by CPython. Including the pycore header that defines it would require raw c include shenanigans that I don't think are worth it.
This allows to build both with regular and nogil version of cpython. Both

Note that this requires the 3.13 branch at least past [d3094744d40de2deefbda9b1996d5029c9ebf0b0](d3094744d4) which we need for mimalloc include and weakref function being exposed.

debug-only issues in pybind11 with PyMem_MALLOC vs PyObject_MALLOC being should be synced either by updating pybind or cpython. @colesbury I can send a PR to ifdef the proper use in pybind if you think that this is the best solution here?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126033
Approved by: https://github.com/colesbury
2024-05-14 02:14:57 +00:00
82edc8b5d5 [NT] Make NestedTensor register as having symbolic sizes/strides (#124687)
Fixes #123698

This PR makes TensorImpl::has_symbolic_sizes_strides return false for NestedTensors.

1. It passes in the actual sizes when we call `_make_wrapper_subclass` - this is the change that makes the subclass register as `has_symbolic_sizes_strides() == True`
2. It adds a field to `_make_wrapper_subclass` where an explicit `numel` can be provided. This allows us to skip the numel computation for the storage, which previously fails due to arithmetic on NestedInts.
3. Implements `aten::numel` for NJT - this is separate from the overridden numel in `make_wrapper_subclass` for now. Note also that this means that we leave `dispatch_sizes_strides_policy="sizes"`, so that we call into the custom `numel` implementation (as well as `sizes` and `strides`), because `numel` cannot currently be computed from `sizes` for NJT.

Note also that this depends on #121361, because calling TensorImpl::set_sizes_and_strides() tries to clone the sizes into the tensor, which means that we need `clone` to be implemented on NestedInt.

Differential Revision: [D57225736](https://our.internmc.facebook.com/intern/diff/D57225736)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124687
Approved by: https://github.com/albanD
2024-05-13 16:50:25 +00:00
19a9de114a Forbid subclassing _TensorBase directly (#125558)
As per title.
This ensures that all the places where we assume the method defined in _tensor.py do exist.

BC-Breaking: This is bc-breaking as the user cannot subclass this private class anymore.
You should replace any use of _TensorBase to Tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125558
Approved by: https://github.com/ezyang
2024-05-08 20:29:29 +00:00
b119e1bcc2 Fix refcount handling for dtype, layout and memory format (#125271)
Finish fixing https://github.com/pytorch/pytorch/issues/124868
re-use our wrap() utils as much as possible and NewRef in other places.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125271
Approved by: https://github.com/colesbury
2024-05-02 02:34:34 +00:00