Commit Graph

159 Commits

Author SHA1 Message Date
cc5d74c366 Revert "[BE] Remove HermeticPyObjectTLS and Simplify PythonOpRegistrationTrampoline (#163464)"
This reverts commit 94195a37ae4eae9c486a81b0f67725c8970f74d6.

Reverted https://github.com/pytorch/pytorch/pull/163464 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/163464#issuecomment-3353307034))
2025-09-30 18:20:20 +00:00
50c0550f5a Add magic TORCH_MAKE_PYBIND_ENUM_FASTER macro (#163527)
See comment on the macro definition. In short, pybind11 3.x
added `py::native_enum`, and also had to add overhead for that new way
to bind enums on the critical path for calling functions that take
regular old `py::enum_`s as arguments (for example, `__eq__`).

Differential Revision: [D82873169](https://our.internmc.facebook.com/intern/diff/D82873169/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163527
Approved by: https://github.com/ezyang
2025-09-26 17:59:22 +00:00
94195a37ae [BE] Remove HermeticPyObjectTLS and Simplify PythonOpRegistrationTrampoline (#163464)
Removes HermeticPyObjectTLS as we no longer need since torch deploy is no longer supported. PythonOpRegistrationTrampoline is also drastically simplified as and being prepped for removal in a future PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163464
Approved by: https://github.com/albanD, https://github.com/Skylion007
2025-09-25 23:30:50 +00:00
70d36e047d Making batching rule for F.embedding DTensor-aware (#162117)
`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's
batching rule generates a new tensor via at::arange, at::arange
generates a regular tensor, and DTensor rightfully errors on mixed
DTensor-regular Tensor operations.

This PR fixes the problem by activating DTensor implicit replication on
just the at::arange and the subsequent add operation.

In order to accomplish this I move the DTensor implicit replication flag
to C++ (most batching rules are in C++).

Test Plan:
- new test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162117
Approved by: https://github.com/bdhirsh
2025-09-05 21:40:14 +00:00
d9d6dde0f4 Leak Python filenames so that we can give good dispatcher errors. (#160418)
Signed-off-by: Edward Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160418
Approved by: https://github.com/zou3519
2025-08-31 22:31:39 +00:00
1196bb1c2e Add utility to get computed kernel in torch.library (#158393)
Adds `OperatorEntry::getComputedKernelForDispatchKey` which returns the KernelFunction corresponding to `OperatorEntry.dispatchTable_[dispatch_ix]` for a given dispatch key
- Specifically it returns a `SafeKernelFunction` that holds a `KernelToken`. This `KernelToken` is registered to the `KernelFunction` in `OperatorEntry.kernels_` and will be invalidated when the `KernelFunction` is destructed (i.e. when the `AnnotatedKernel` that holds this `KernelFunction` is removed from `kernels_`, which happens when the corresponding impl is deregistered).
- `SafeKernelFunction` can be called via `callBoxed`, the validity of the token will be checked before this happens
- `SafeKernelFunction` is pybinded and `getComputedKernelForDispatchKey` is exposed to the frontend ia `torch.library.get_kernel`

Related to https://github.com/pytorch/pytorch/issues/155330

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158393
Approved by: https://github.com/albanD
2025-08-13 21:00:59 +00:00
1b99c1859c [BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)
This PR is a bit more involved but effectively works to drastically simplify PyObjectSlot and PyInterpreter.
1) For PyObjectSlot we now use a global pyinterpreter since there only is one. From here we change all of the call sites to rely on this assumption.
2) We also remove the "tags" of the PyInterpreter by deprecating `PyInterpreterStatus`.

For the reviewer, sadly it seems like `functorch/csrc/dim/dim.cpp` needed to get linted, so there is an unreadable amount of changes there. Fortunately, the only actual change in the file is as follows which just removes `getPyInterpreter()` from  the `check_pyobj` call.

```
 mpy::handle handle_from_tensor(Arena& A, TensorRef t) {
-    // fast case: tensor is live in python
-    std::optional<PyObject*> mb_obj =
-        t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(getPyInterpreter(), /*ignore_hermetic_tls=*/false);
-    if (mb_obj.has_value() && !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
-        return *mb_obj;
-    }
-    return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
-}
-}
+  // fast case: tensor is live in python
+  std::optional<PyObject*> mb_obj =
+      t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(
+          /*ignore_hermetic_tls=*/false);
+  if (mb_obj.has_value() &&
+      !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
+    return *mb_obj;
+  }
+  return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
+}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158427
Approved by: https://github.com/albanD
2025-07-30 17:29:43 +00:00
6162e650b0 [BE] remove torch deploy - conditionals (#158288)
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always

Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
2025-07-29 17:40:49 +00:00
f8fafdc7a6 Revert "[BE] remove torch deploy - conditionals (#158288)"
This reverts commit ab26d4fbeb5bc4b4e6ef1c37fbec9fab6e5a9edd.

Reverted https://github.com/pytorch/pytorch/pull/158288 on behalf of https://github.com/ZainRizvi due to Reverting as per offline discussion to fix internal breaks.  @PaliC will reland this as a codev diff. Instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158288#issuecomment-3119037960))
2025-07-25 16:09:39 +00:00
ab26d4fbeb [BE] remove torch deploy - conditionals (#158288)
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always

Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
2025-07-23 20:27:28 +00:00
ee5a434f8c Revert "[BE] remove torch deploy - conditionals (#158288)"
This reverts commit 1a4268b8113d5160d71225bab980f03c2318a0a4.

Reverted https://github.com/pytorch/pytorch/pull/158288 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally, see D78496147 for details. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158288#issuecomment-3099826158))
2025-07-21 23:17:39 +00:00
15a50dcf1c Revert "[BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)"
This reverts commit eb7365072315be2bc4259114e25e269801441748.

Reverted https://github.com/pytorch/pytorch/pull/158427 on behalf of https://github.com/ZainRizvi due to Reverting this as part of reverting the stack for https://github.com/pytorch/pytorch/pull/158288 ([comment](https://github.com/pytorch/pytorch/pull/158427#issuecomment-3099815367))
2025-07-21 23:14:57 +00:00
eb73650723 [BE] Make PyObjectSlot use a global PyInterpreter and remove (#158427)
This PR is a bit more involved but effectively works to drastically simplify PyObjectSlot and PyInterpreter.
1) For PyObjectSlot we now use a global pyinterpreter since there only is one. From here we change all of the call sites to rely on this assumption.
2) We also remove the "tags" of the PyInterpreter by deprecating `PyInterpreterStatus`.

For the reviewer, sadly it seems like `functorch/csrc/dim/dim.cpp` needed to get linted, so there is an unreadable amount of changes there. Fortunately, the only actual change in the file is as follows which just removes `getPyInterpreter()` from  the `check_pyobj` call.

```
 mpy::handle handle_from_tensor(Arena& A, TensorRef t) {
-    // fast case: tensor is live in python
-    std::optional<PyObject*> mb_obj =
-        t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(getPyInterpreter(), /*ignore_hermetic_tls=*/false);
-    if (mb_obj.has_value() && !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
-        return *mb_obj;
-    }
-    return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
-}
-}
+  // fast case: tensor is live in python
+  std::optional<PyObject*> mb_obj =
+      t->unsafeGetTensorImpl()->pyobj_slot()->check_pyobj(
+          /*ignore_hermetic_tls=*/false);
+  if (mb_obj.has_value() &&
+      !t->unsafeGetTensorImpl()->pyobj_slot()->owns_pyobj()) {
+    return *mb_obj;
+  }
+  return A.autorelease(mpy::object::checked_steal(THPVariable_Wrap(*t)));
+}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158427
Approved by: https://github.com/albanD
2025-07-18 05:23:00 +00:00
1a4268b811 [BE] remove torch deploy - conditionals (#158288)
This PR is part of the work to deprecate torch::deploy in OSS. Effectively it does 3 things to get started.
1. Remove test_deploy_interaction as we no longer need to worry about this
2. Remove all torch._running_with_deploy checks and use the False path always (surfaced 1)
3. Remove `USE_DEPLOY` and switch to the default path always

Note: MyPy does fail on a bunch of things here as a bunch of older files are touched. It may be better to fix these things on a separate PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158288
Approved by: https://github.com/albanD
2025-07-17 05:56:07 +00:00
aa2d54148d Add AOTDispatcher config to set backward autocast behavior (#156356)
This PR adds a new config `backward_pass_autocast`, to set the backward autocast
behavior. It does not change the existing behavior.

The reason why we need this is that torch.compile acquires a forward and
backward graph at the time of the forward pass. This means that
implemented naively, if there are any context managers active outside
the call to torch.compile, the backward graph will also get the
behaviors from those context managers. This PR gives users a way to
tweak the autocast behavior of the backward pass.

Please see torch._functorch.config for the options to the
`backward_pass_autocast` config.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156356
Approved by: https://github.com/bdhirsh
ghstack dependencies: #155354
2025-06-27 14:58:58 +00:00
ced90016c1 [BE][7/16] fix typos in torch/ (torch/csrc/) (#156317)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156317
Approved by: https://github.com/albanD
ghstack dependencies: #156313, #156314, #156315, #156316
2025-06-23 02:57:41 +00:00
035a68d25a Revert "[BE][7/16] fix typos in torch/ (torch/csrc/) (#156317)"
This reverts commit ee72815f1180fe2d8bcdb23493999256169ac2fa.

Reverted https://github.com/pytorch/pytorch/pull/156317 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](c95f7fa874) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))
2025-06-22 12:31:56 +00:00
ee72815f11 [BE][7/16] fix typos in torch/ (torch/csrc/) (#156317)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156317
Approved by: https://github.com/albanD
ghstack dependencies: #156313, #156314, #156315, #156316
2025-06-22 08:43:41 +00:00
762844355e Make DispatchKeySet serializable; add __eq__ (#152732)
These seem like reasonable things to add. Also fixes a bug in vLLM for
me.

Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152732
Approved by: https://github.com/bdhirsh
2025-05-03 14:40:06 +00:00
a34c28e0d2 [dynamo] Add guard serialization for tensor matches. (#151318)
This is a proof-of-concept of how we could serialize a guard and deserialize it back from the bytes.

The main behavioral change introduced in this diff is on CheckFunctionManager:

```
check_fn_manager = CheckFunctionManager(code, output_graph, guards_serialization_mode="save")

guards_state: bytes = check_fn_manager.guards_state
```

Once `guards_serialization_mode` is set to `save`, CheckFunctionManager will return an addtional `bytes` object called `guards_state` which should contain all the information needed for deserializing guards later.

When we load back guards state, we will set `guards_serialization_mode` is set to `load`:

```
output_graph_state = pickle.loads(guards_state)
check_fn_manager = CheckFunctionManager(code, output_graph_state, guards_serialization_mode="load")
```

# TENSOR_MATCH

Since we have many types of guards to support, we will break the work into small diffs instead of a single diff to support every guards.

We kick off the work from TENSOR_MATCH from this diff.

# Testing

For each type of guard we will test it like the following:
1. Use guard_filter_fn to select 1 type of guard each time.
2. Call InstructionTranslator directly on an example function to get OutputGraph and CheckFunctionManager (reference guard manager)
3. Serialize->deserialize the output graph state and re-build the guards with a new CheckFunctionManager (loaded guard manager)
4. Throw a set of example inputs to both reference and loaded guard manager to see if their behavior match.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151318
Approved by: https://github.com/jansel, https://github.com/anijain2305
2025-04-25 14:16:23 +00:00
b1d055fd6a Revert "[dynamo] Add guard serialization for tensor matches. (#151318)"
This reverts commit 81c4369d813facf39313dfd481adc71704cbc2c1.

Reverted https://github.com/pytorch/pytorch/pull/151318 on behalf of https://github.com/zhxchen17 due to macos test failing ([comment](https://github.com/pytorch/pytorch/pull/151318#issuecomment-2828638168))
2025-04-24 19:22:45 +00:00
81c4369d81 [dynamo] Add guard serialization for tensor matches. (#151318)
This is a proof-of-concept of how we could serialize a guard and deserialize it back from the bytes.

The main behavioral change introduced in this diff is on CheckFunctionManager:

```
check_fn_manager = CheckFunctionManager(code, output_graph, guards_serialization_mode="save")

guards_state: bytes = check_fn_manager.guards_state
```

Once `guards_serialization_mode` is set to `save`, CheckFunctionManager will return an addtional `bytes` object called `guards_state` which should contain all the information needed for deserializing guards later.

When we load back guards state, we will set `guards_serialization_mode` is set to `load`:

```
output_graph_state = pickle.loads(guards_state)
check_fn_manager = CheckFunctionManager(code, output_graph_state, guards_serialization_mode="load")
```

# TENSOR_MATCH

Since we have many types of guards to support, we will break the work into small diffs instead of a single diff to support every guards.

We kick off the work from TENSOR_MATCH from this diff.

# Testing

For each type of guard we will test it like the following:
1. Use guard_filter_fn to select 1 type of guard each time.
2. Call InstructionTranslator directly on an example function to get OutputGraph and CheckFunctionManager (reference guard manager)
3. Serialize->deserialize the output graph state and re-build the guards with a new CheckFunctionManager (loaded guard manager)
4. Throw a set of example inputs to both reference and loaded guard manager to see if their behavior match.

Differential Revision: [D72987485](https://our.internmc.facebook.com/intern/diff/D72987485/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151318
Approved by: https://github.com/jansel, https://github.com/anijain2305
2025-04-24 18:07:01 +00:00
cyy
8fa81a6066 Enable misc-use-internal-linkage check and apply fixes (#148948)
Enables clang-tidy rule [`misc-use-internal-linkage`](https://clang.llvm.org/extra/clang-tidy/checks/misc/use-internal-linkage.html). This new check was introduced in Clang-Tidy 18 and is available due to recent update of Clang-Tidy 19.

The check marks functions and variables used only in the translation unit as static. Therefore undesired symbols are not leaked into other units, more link time optimisations are possible and the resulting binaries may be smaller.

The detected violations were mostly fixed by using static. In other cases, the symbols were indeed consumed by others files, then their declaring headers were included. Still some declarations were wrong and have been fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148948
Approved by: https://github.com/Skylion007
2025-03-12 14:22:56 +00:00
cyy
53e356a1c0 [2/N] Enable cppcoreguidelines-special-member-functions (#138670)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138670
Approved by: https://github.com/sraikund16
2024-10-24 04:35:18 +00:00
a040c4a260 Use std::move on stringstream to prevent unnecessary copy. (#138065)
- Takes advantage of C++20's improved handling of move semantics for std::basic_stringbuf.
- Reduces unnecessary copying and improves memory efficiency, especially for long formatted strings.

Benchmark(proof of concept): https://quick-bench.com/q/qohAu0ARH3vSDyKVsoKEfXOO6BI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138065
Approved by: https://github.com/Skylion007
2024-10-16 21:35:10 +00:00
796c3c3415 Revert "Disallow FakeTensor.data_ptr access in eager mode (#137221)"
This reverts commit 7e13e7dd7e5fc20c0420605aeecb0f902af5326c.

Reverted https://github.com/pytorch/pytorch/pull/137221 on behalf of https://github.com/jovianjaison due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/137221#issuecomment-2397957081))
2024-10-07 21:46:13 +00:00
7e13e7dd7e Disallow FakeTensor.data_ptr access in eager mode (#137221)
Previously we raised a deprecation warning (beginning PyTorch 2.4). Now
that we are on 2.6, we're completing the deprecation and disallowing
this behavior.

Test Plan:
- tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137221
Approved by: https://github.com/albanD, https://github.com/eellison
2024-10-03 23:47:55 +00:00
144fde4fd2 [MPS] Add support for autocast in MPS (#99272)
Fixes https://github.com/pytorch/pytorch/issues/88415

Need to run inductor/test_cpu_select_algorithm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272
Approved by: https://github.com/malfet

Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Roy Hvaara <roy@lightyear.no>
2024-09-05 23:23:17 +00:00
cyy
8967d55b01 [18/N] Fix clang-tidy warnings in jit (#132963)
Follows #132753

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132963
Approved by: https://github.com/Skylion007
2024-08-09 01:27:32 +00:00
2764bee942 Revert "[MPS] Add support for autocast in MPS (#99272)"
This reverts commit 6919e8baaba391ced7b4acaa553d6ea1f3b30e79.

Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/clee2000 due to Broke test/inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmCPU::test_quantized_linear_amx_batch_size_3_in_features_128_out_features_64_bias_False_cpu on sm86 jobs [GH job link](https://github.com/pytorch/pytorch/actions/runs/10252979157/job/28367091621) [HUD commit link](6919e8baab) Not caught on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2269808857))
2024-08-05 19:59:04 +00:00
6919e8baab [MPS] Add support for autocast in MPS (#99272)
Fixes https://github.com/pytorch/pytorch/issues/88415

Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272
Approved by: https://github.com/malfet
2024-08-05 17:02:30 +00:00
466ea8ce54 Add fallback() to torch.library (#131707)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131707
Approved by: https://github.com/zou3519
2024-07-27 18:02:35 +00:00
cyy
a4be5cb50e Simplify some c++ code (#131612)
The simplifications were discovered by static analysis tools.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131612
Approved by: https://github.com/ezyang
2024-07-25 15:07:37 +00:00
f52b2ee90f Modularize aten parameter parser and checker (#125308)
In this PR, we abstracted the different types of aten operation parameters as `ParameterMetadata`. This structure intends to be used to represent and store the metadata of each aten operation parameter. Currently, it only supports `Tensor`, `TensorList`, and `Scalar`.

```C++
using ParameterMetadataValue = std::variant<TensorMetadata, std::vector<TensorMetadata>, c10::Scalar>;
```

With this PR, we can extend other parameter-type support in a more modularize way, like `string`, `int`, `double`.

Differential Revision: [D59399546](https://our.internmc.facebook.com/intern/diff/D59399546)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125308
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/atalman
2024-07-11 13:17:25 +00:00
cyy
f4dcf2ae93 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang, https://github.com/r-barnes
2024-07-08 07:03:53 +00:00
07450e9713 Revert "[MPS] Add support for autocast in MPS (#99272)"
This reverts commit 6240cfd5c751bea6ca91dc765085e1d871b22345.

Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/jeanschmidt due to introduced breakages in trunk ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2203033719))
2024-07-02 12:29:51 +00:00
6240cfd5c7 [MPS] Add support for autocast in MPS (#99272)
Fixes https://github.com/pytorch/pytorch/issues/88415

Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272
Approved by: https://github.com/malfet
2024-07-02 01:49:52 +00:00
e84cf805d2 Revert "Modularize aten parameter parser and checker (#125308)"
This reverts commit 60bbdc0b40656cf70b2b098c7d715e19f031fb0d.

Reverted https://github.com/pytorch/pytorch/pull/125308 on behalf of https://github.com/fbgheith due to test failures when run by meta ([comment](https://github.com/pytorch/pytorch/pull/125308#issuecomment-2181327211))
2024-06-20 18:52:05 +00:00
60bbdc0b40 Modularize aten parameter parser and checker (#125308)
In this PR, we abstracted the different types of aten operation parameters as `ParameterMetadata`. This structure intends to be used to represent and store the metadata of each aten operation parameter. Currently, it only supports `Tensor`, `TensorList`, and `Scalar`.

```C++
using ParameterMetadataValue = std::variant<TensorMetadata, std::vector<TensorMetadata>, c10::Scalar>;
```

With this PR, we can extend other parameter-type support in a more modularize way, like `string`, `int`, `double`, and other different types to be summarized as the following list. The list is collected from all aten operations and ordered by the number of being used.

- `Tensor`
- `bool`
- `int64_t`
- `TensorList`
- `Scalar`
- `c10::SymIntArrayRef`
- `::std::optional<Tensor>`
- `IntArrayRef`
- `double`
- `c10::SymInt`
- `::std::optional<ScalarType>`
- `::std::optional<double>`
- `::std::optional<bool>`
- `::std::optional<Layout>`
- `::std::optional<Device>`
- `::std::optional<int64_t>`
- `Dimname`
- `::std::optional<Generator>`
- `c10::string_view`
- `::std::optional<c10::string_view>`
- `OptionalIntArrayRef`
- `::std::optional<Scalar>`
- `OptionalSymIntArrayRef`
- `::std::optional<MemoryFormat>`
- `::std::optional<c10::SymInt>`
- `ScalarType`
- `ArrayRef<Scalar>`
- `DimnameList`
- `::std::optional<ArrayRef<double>>`
- `::std::array<bool,3>`
- `::std::optional<DimnameList>`
- `c10::List<::std::optional<Tensor>>`
- `::std::array<bool,2>`
- `Storage`
- `::std::array<bool,4>`
- `Device`
- `DeviceIndex`
- `ITensorListRef`
- `Stream`
- `Layout`
- `MemoryFormat`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125308
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-06-15 09:18:44 +00:00
846bb30e13 Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)"
This reverts commit bd72e28314d8d63bb347becb8309f5ac7761c6b5.

Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build bd72e28314. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))
2024-06-15 01:58:20 +00:00
cyy
bd72e28314 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang
2024-06-14 23:21:01 +00:00
cyy
f8c6d43524 Concat namespaces and other fixes in torch/csrc/utils (#127833)
It contains formatting and other minor fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127833
Approved by: https://github.com/ezyang
2024-06-04 15:12:45 +00:00
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
978b572652 Add registration API for torch.compile-eager (#121387)
This PR is a follow-up of RFC https://github.com/pytorch/pytorch/issues/115545.

In this PR, we intend to provide a registration API dedicated to eager-through-torch.compile. The major workflow of this API will be as follows.

- Load cache
- Check cache according to the input tensors
  - Cache Hit: Run the cached kernel directly
  - Cache Miss: Run the AOTI to produce kernel and run the produced kernel. If AOTI fails to produce the kernel, invoke the python fallback function.

Currently, this PR always fallback to python kernel now and cache mechanism will be implemented in another PR - https://github.com/pytorch/pytorch/pull/116368

Differential Revision: [D57164385](https://our.internmc.facebook.com/intern/diff/D57164385)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121387
Approved by: https://github.com/desertfire, https://github.com/jansel, https://github.com/zou3519, https://github.com/jgong5
2024-05-10 00:30:27 +00:00
ca0f070065 Revert "Add registration API for torch.compile-eager (#121387)"
This reverts commit 61e937f3d6b904d6706594c1b3cfd7d0e56f9663.

Reverted https://github.com/pytorch/pytorch/pull/121387 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/121387#issuecomment-2087541956))
2024-04-30 22:13:04 +00:00
61e937f3d6 Add registration API for torch.compile-eager (#121387)
This PR is a follow-up of RFC https://github.com/pytorch/pytorch/issues/115545.

In this PR, we intend to provide a registration API dedicated to eager-through-torch.compile. The major workflow of this API will be as follows.

- Load cache
- Check cache according to the input tensors
  - Cache Hit: Run the cached kernel directly
  - Cache Miss: Run the AOTI to produce kernel and run the produced kernel. If AOTI fails to produce the kernel, invoke the python fallback function.

Currently, this PR always fallback to python kernel now and cache mechanism will be implemented in another PR - https://github.com/pytorch/pytorch/pull/116368

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121387
Approved by: https://github.com/desertfire, https://github.com/jansel, https://github.com/zou3519, https://github.com/jgong5
2024-04-27 12:49:58 +00:00
e62169a8fa Support torchbind op dispatch in python (#123367)
We override the `__call__` method and register fake, functional, proxy default dispatch mode implementation in its python_key_mode_table.

The idea is:
1. when inputs contains FakeScriptObject,  we dispatch it through _get_dispatch mechanism. We implement dispatch mode keys automatically in the operator's constructor.
2. when inputs are not fakified, we dispatch through the original c++ dispatcher.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123367
Approved by: https://github.com/zou3519
2024-04-19 17:17:27 +00:00
889e3eeed3 Avoid cuda init to FakeTensorMode (#124413)
Also partially fixes #122109

This PR:
- We add a C++ flag (only_lift_cpu_tensors) to toggle the
  torch.tensor(1, device='cuda') ctor strategy.
  When false (default), it does the current PyTorch behavior
  of unconditionally constructing a concrete CUDA tensor then calling
  lift_fresh on it. When true, we instead construct a concrete CPU
  tensor, call lift_fresh, and then call Tensor.to(device) (under any ambient
  modes).
- FakeTensorMode flips this flag depending on if CUDA is available or
  not. We don't unconditionally set the flag to True because that is
  likely BC-breaking.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124413
Approved by: https://github.com/eellison
2024-04-19 02:39:35 +00:00
648c39c47d Add OpOverload.redispatch; use it in new custom ops API (#124089)
A kernel has "dispatcher convention" if there is an additional keyset
arg at the beginning of the argument list. This PR:
- adds a way to register kernels with dispatcher_convention using
  Library.impl (pass dispatcher_convention = True)
- adds OpOverload.redispatch

We use both of the above in the new custom ops API: we register the
autograd kernel in dispatcher convention so that we can actually call
redispatch like how pytorch built-in ops do it.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124089
Approved by: https://github.com/albanD
ghstack dependencies: #123937, #124064, #124065, #124066, #124071
2024-04-18 12:48:04 +00:00
47dbfecd37 Rename impl_abstract to register_fake, part 1/2 (#123937)
This PR:
- adds a new torch.library.register_fake and deprecates
  torch.library.impl_abstract. The motivation is that we have a lot of
  confusion around the naming so we are going to align the naming with
  the actual subsystem (FakeTensor).
- renames `m.impl_abstract_pystub("fbgemm_gpu.sparse_ops")` to
  `m.has_python_registration("fbgemm_gpu.sparse_ops")`. No deprecation
  here yet; I need to test how this works with static initialization.
- Renames a bunch of internals to match (e.g. abstractimplpystub ->
  pystub)

I'm scared to rename the Python-side internal APIs (e.g.
torch._library.abstract_impl) because of torch.package concerns. I'll do
that in its own isolated PR next just in case it causes problems.

DEPRECATION NOTE: torch.library.impl_abstract was renamed to to
torch.library.register_fake. Please use register_fake. We'll delete
impl_abstract in a future version of PyTorch.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123937
Approved by: https://github.com/albanD
2024-04-17 12:46:01 +00:00