75 Commits

Author SHA1 Message Date
36871622f1 [2/N] Mark unused parameters in C++ code (#165121)
This is follow-up of #164912 to mark unused C++ parameters to improve code readability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165121
Approved by: https://github.com/Skylion007
2025-10-15 03:04:39 +00:00
c6329524d8 Revert "Add magic TORCH_MAKE_PYBIND_ENUM_FASTER macro (#163527)"
This reverts commit 50c0550f5a5b1e35885d892081a7d5115d8b4489.

Reverted https://github.com/pytorch/pytorch/pull/163527 on behalf of https://github.com/swolchok due to breaking import torch in debug builds, see #164297 ([comment](https://github.com/pytorch/pytorch/pull/163527#issuecomment-3361919142))
2025-10-02 15:42:42 +00:00
50c0550f5a Add magic TORCH_MAKE_PYBIND_ENUM_FASTER macro (#163527)
See comment on the macro definition. In short, pybind11 3.x
added `py::native_enum`, and also had to add overhead for that new way
to bind enums on the critical path for calling functions that take
regular old `py::enum_`s as arguments (for example, `__eq__`).

Differential Revision: [D82873169](https://our.internmc.facebook.com/intern/diff/D82873169/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163527
Approved by: https://github.com/ezyang
2025-09-26 17:59:22 +00:00
ced90016c1 [BE][7/16] fix typos in torch/ (torch/csrc/) (#156317)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156317
Approved by: https://github.com/albanD
ghstack dependencies: #156313, #156314, #156315, #156316
2025-06-23 02:57:41 +00:00
035a68d25a Revert "[BE][7/16] fix typos in torch/ (torch/csrc/) (#156317)"
This reverts commit ee72815f1180fe2d8bcdb23493999256169ac2fa.

Reverted https://github.com/pytorch/pytorch/pull/156317 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](c95f7fa874) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))
2025-06-22 12:31:56 +00:00
ee72815f11 [BE][7/16] fix typos in torch/ (torch/csrc/) (#156317)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156317
Approved by: https://github.com/albanD
ghstack dependencies: #156313, #156314, #156315, #156316
2025-06-22 08:43:41 +00:00
cyy
40fb738197 Use Wextra-semi (#140236)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140236
Approved by: https://github.com/ezyang
2024-11-13 02:15:16 +00:00
cf31724db7 Fix and improvements to toward 3.13t (#136319)
Small part of https://github.com/pytorch/pytorch/pull/130689
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136319
Approved by: https://github.com/malfet, https://github.com/Skylion007
2024-09-20 04:22:18 +00:00
cyy
f8c6d43524 Concat namespaces and other fixes in torch/csrc/utils (#127833)
It contains formatting and other minor fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127833
Approved by: https://github.com/ezyang
2024-06-04 15:12:45 +00:00
7931eee5c5 Support torch.dtype as parameter in pybind11 cpp extension. (#126865)
Support torch.dtype as parameter in pybind11 cpp extension.
Example:
`
cpp_extension.my_ops(self, other, torch.dtype)
`

@ezyang @bdhirsh
Co-authored-by: Edward Z. Yang <ezyang@mit.edu>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126865
Approved by: https://github.com/ezyang
2024-05-29 23:19:32 +00:00
b119e1bcc2 Fix refcount handling for dtype, layout and memory format (#125271)
Finish fixing https://github.com/pytorch/pytorch/issues/124868
re-use our wrap() utils as much as possible and NewRef in other places.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125271
Approved by: https://github.com/colesbury
2024-05-02 02:34:34 +00:00
73744a2c00 torch.mtia module for MTIA device backend (#123612)
MTIA device has its own Module in PyTorch now.
torch.mtia has following APIs similar to other backends. The lazy_init is also supported.
```
__all__ = [
    "init",
    "is_available",
    "synchronize",
    "device_count",
    "current_device",
    "current_stream",
    "default_stream",
    "set_stream",
    "stream",
    "device",
]

```
------------
For device management. We expand AccleratorHooksInterface to support generic device management and it can be used in both C++ and PyThon.
```
def _accelerator_hooks_device_count() -> _int: ...
def _accelerator_hooks_set_current_device(device_index: _int) -> None: ...
def _accelerator_hooks_get_current_device() -> _int : ...
def _accelerator_hooks_exchange_device(device_index: _int) -> _int : ...
def _accelerator_hooks_maybe_exchange_device(device_index: _int) -> _int : ...
```

---------
Adding get_device_module API to retrieve device modules for different device types.
```
def get_device_module(device: Optional[Union[torch.device, str]] = None)
```
---------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123612
Approved by: https://github.com/albanD
ghstack dependencies: #123611
2024-04-26 16:17:54 +00:00
e04c7b19f4 Revert "torch.mtia module for MTIA device backend (#123612)"
This reverts commit 381653de63df4b1b31cc95531320caf83b1b60b3.

Reverted https://github.com/pytorch/pytorch/pull/123612 on behalf of https://github.com/jeffdaily due to this PR broke ROCm with message RuntimeError: Cannot have MTIA with other devices ([comment](https://github.com/pytorch/pytorch/pull/123612#issuecomment-2077649762))
2024-04-25 16:06:46 +00:00
381653de63 torch.mtia module for MTIA device backend (#123612)
MTIA device has its own Module in PyTorch now.
torch.mtia has following APIs similar to other backends. The lazy_init is also supported.
```
__all__ = [
    "init",
    "is_available",
    "synchronize",
    "device_count",
    "current_device",
    "current_stream",
    "default_stream",
    "set_stream",
    "stream",
    "device",
]

```
------------
For device management. We expand AccleratorHooksInterface to support generic device management and it can be used in both C++ and PyThon.
```
def _accelerator_hooks_device_count() -> _int: ...
def _accelerator_hooks_set_current_device(device_index: _int) -> None: ...
def _accelerator_hooks_get_current_device() -> _int : ...
def _accelerator_hooks_exchange_device(device_index: _int) -> _int : ...
def _accelerator_hooks_maybe_exchange_device(device_index: _int) -> _int : ...
```

---------
Adding get_device_module API to retrieve device modules for different device types.
```
def get_device_module(device: Optional[Union[torch.device, str]] = None)
```
---------

Differential Revision: [D56443356](https://our.internmc.facebook.com/intern/diff/D56443356)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123612
Approved by: https://github.com/albanD
ghstack dependencies: #123611
2024-04-24 20:51:20 +00:00
929242a15c Revert "torch.mtia module for MTIA device backend (#123612)"
This reverts commit d7e1bf9ff908d2a9c20d5354426d34c539fcb7a1.

Reverted https://github.com/pytorch/pytorch/pull/123612 on behalf of https://github.com/jeffdaily due to This broke ROCm. see test_overrides.py ([comment](https://github.com/pytorch/pytorch/pull/123611#issuecomment-2067363780))
2024-04-19 22:44:26 +00:00
d7e1bf9ff9 torch.mtia module for MTIA device backend (#123612)
MTIA device has its own Module in PyTorch now.
torch.mtia has following APIs similar to other backends. The lazy_init is also supported.
```
__all__ = [
    "init",
    "is_available",
    "synchronize",
    "device_count",
    "current_device",
    "current_stream",
    "default_stream",
    "set_stream",
    "stream",
    "device",
]

```
------------
For device management. We expand AccleratorHooksInterface to support generic device management and it can be used in both C++ and PyThon.
```
def _accelerator_hooks_device_count() -> _int: ...
def _accelerator_hooks_set_current_device(device_index: _int) -> None: ...
def _accelerator_hooks_get_current_device() -> _int : ...
def _accelerator_hooks_exchange_device(device_index: _int) -> _int : ...
def _accelerator_hooks_maybe_exchange_device(device_index: _int) -> _int : ...
```

---------
Adding get_device_module API to retrieve device modules for different device types.
```
def get_device_module(device: Optional[Union[torch.device, str]] = None)
```
---------
@exported-using-ghexport

Differential Revision: [D52923602](https://our.internmc.facebook.com/intern/diff/D52923602/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123612
Approved by: https://github.com/albanD
ghstack dependencies: #123611
2024-04-18 17:38:06 +00:00
cyy
6da0e7f84b [Clang-tidy header][17/N] Apply clang-tidy on headers in torch/csrc/cuda (#117829)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117829
Approved by: https://github.com/albanD
2024-01-26 13:33:24 +00:00
cyy
20f769544c [12/N] Apply clang-tidy and fix warnings in headers of torch/csrc (#116486)
This PR follows #116751.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116486
Approved by: https://github.com/albanD
2024-01-10 08:48:14 +00:00
0aa50909f3 Revert "[12/N] Apply clang-tidy and fix warnings in headers of torch/csrc (#116486)"
This reverts commit 5aa258eb09d5ecd62aea4d2bd02bbfa5eda0d554.

Reverted https://github.com/pytorch/pytorch/pull/116486 on behalf of https://github.com/izaitsevfb due to Reverting, as it depends on https://github.com/pytorch/pytorch/pull/116353, which has to be reverted ([comment](https://github.com/pytorch/pytorch/pull/116486#issuecomment-1876042948))
2024-01-03 22:18:54 +00:00
cyy
5aa258eb09 [12/N] Apply clang-tidy and fix warnings in headers of torch/csrc (#116486)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116486
Approved by: https://github.com/albanD
2023-12-30 18:38:53 +00:00
165f4f6ccf [PyTorch] Redirect c10::optional to std::optional (#101995)
We have C++17 now!

I am intentionally dropping the `c10::optional<c10::ArrayRef>` size optimization. It was intended to improve dispatch, but thanks to D34602980 / #70864 we don't use `optional<ArrayRef>` in function arguments anymore anyway.

Differential Revision: [D46079028](https://our.internmc.facebook.com/intern/diff/D46079028/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101995
Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/ezyang
2023-11-30 02:46:41 +00:00
cyy
dc1a3581e4 Remove c10::variant (#112725)
Maybe it's time to remove.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112725
Approved by: https://github.com/albanD
2023-11-03 18:31:58 +00:00
cyy
646fa36875 Add const reference in opportunities detected by clang-tidy (#105931)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105931
Approved by: https://github.com/Skylion007
2023-07-26 21:38:10 +00:00
24bf15fe8d Support record_stream in dispatch mode (#99529)
Summary:
Issuing a `t.record_stream(s)` call while a `TorchDispatchMode` is active was throwing because PyTorch was unable to convert a c10::Stream back to a Python object. It's now fixed.

Fixes https://github.com/pytorch/pytorch/issues/94403

Test Plan: Added a unit test

Differential Revision: D45117566

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99529
Approved by: https://github.com/albanD
2023-04-21 07:17:19 +00:00
39fd7f945f Add Symbool support in python to C++ translation (#98453)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98453
Approved by: https://github.com/ezyang
2023-04-12 03:21:57 +00:00
5c6f5439b7 Implement SymBool (#92149)
We have known for a while that we should in principle support SymBool as a separate concept from SymInt and SymFloat ( in particular, every distinct numeric type should get its own API). However, recent work with unbacked SymInts in, e.g., https://github.com/pytorch/pytorch/pull/90985 have made this a priority to implement. The essential problem is that our logic for computing the contiguity of tensors performs branches on the passed in input sizes, and this causes us to require guards when constructing tensors from unbacked SymInts. Morally, this should not be a big deal because, we only really care about the regular (non-channels-last) contiguity of the tensor, which should be guaranteed since most people aren't calling `empty_strided` on the tensor, however, because we store a bool (not a SymBool, prior to this PR it doesn't exist) on TensorImpl, we are forced to *immediately* compute these values, even if the value ends up not being used at all. In particular, even when a user allocates a contiguous tensor, we still must compute channels-last contiguity (as some contiguous tensors are also channels-last contiguous, but others are not.)

This PR implements SymBool, and makes TensorImpl use SymBool to store the contiguity information in ExtraMeta. There are a number of knock on effects, which I now discuss below.

* I introduce a new C++ type SymBool, analogous to SymInt and SymFloat. This type supports logical and, logical or and logical negation. I support the bitwise operations on this class (but not the conventional logic operators) to make it clear that logical operations on SymBool are NOT short-circuiting. I also, for now, do NOT support implicit conversion of SymBool to bool (creating a guard in this case). This does matter too much in practice, as in this PR I did not modify the equality operations (e.g., `==` on SymInt) to return SymBool, so all preexisting implicit guards did not need to be changed. I also introduced symbolic comparison functions `sym_eq`, etc. on SymInt to make it possible to create SymBool. The current implementation of comparison functions makes it unfortunately easy to accidentally introduce guards when you do not mean to (as both `s0 == s1` and `s0.sym_eq(s1)` are valid spellings of equality operation); in the short term, I intend to prevent excess guarding in this situation by unit testing; in the long term making the equality operators return SymBool is probably the correct fix.
* ~~I modify TensorImpl to store SymBool for the `is_contiguous` fields and friends on `ExtraMeta`. In practice, this essentially meant reverting most of the changes from https://github.com/pytorch/pytorch/pull/85936 . In particular, the fields on ExtraMeta are no longer strongly typed; at the time I was particularly concerned about the giant lambda I was using as the setter getting a desynchronized argument order, but now that I have individual setters for each field the only "big list" of boolean arguments is in the constructor of ExtraMeta, which seems like an acceptable risk. The semantics of TensorImpl are now that we guard only when you actually attempt to access the contiguity of the tensor via, e.g., `is_contiguous`. By in large, the contiguity calculation in the implementations now needs to be duplicated (as the boolean version can short circuit, but the SymBool version cannot); you should carefully review the duplicate new implementations. I typically use the `identity` template to disambiguate which version of the function I need, and rely on overloading to allow for implementation sharing. The changes to the `compute_` functions are particularly interesting; for most of the functions, I preserved their original non-symbolic implementation, and then introduce a new symbolic implementation that is branch-less (making use of our new SymBool operations). However, `compute_non_overlapping_and_dense` is special, see next bullet.~~ This appears to cause performance problems, so I am leaving this to an update PR.
* (Update: the Python side pieces for this are still in this PR, but they are not wired up until later PRs.) While the contiguity calculations are relatively easy to write in a branch-free way, `compute_non_overlapping_and_dense` is not: it involves a sort on the strides. While in principle we can still make it go through by using a data oblivious sorting network, this seems like too much complication for a field that is likely never used (because typically, it will be obvious that a tensor is non overlapping and dense, because the tensor is contiguous.) So we take a different approach: instead of trying to trace through the logic computation of non-overlapping and dense, we instead introduce a new opaque operator IsNonOverlappingAndDenseIndicator which represents all of the compute that would have been done here. This function returns an integer 0 if `is_non_overlapping_and_dense` would have returned `False`, and an integer 1 otherwise, for technical reasons (Sympy does not easily allow defining custom functions that return booleans). The function itself only knows how to evaluate itself if all of its arguments are integers; otherwise it is left unevaluated. This means we can always guard on it (as `size_hint` will always be able to evaluate through it), but otherwise its insides are left a black box. We typically do NOT expect this custom function to show up in actual boolean expressions, because we will typically shortcut it due to the tensor being contiguous. It's possible we should apply this treatment to all of the other `compute_` operations, more investigation necessary. As a technical note, because this operator takes a pair of a list of SymInts, we need to support converting `ArrayRef<SymNode>` to Python, and I also unpack the pair of lists into a single list because I don't know if Sympy operations can actually validly take lists of Sympy expressions as inputs. See for example `_make_node_sizes_strides`
* On the Python side, we also introduce a SymBool class, and update SymNode to track bool as a valid pytype. There is some subtlety here: bool is a subclass of int, so one has to be careful about `isinstance` checks (in fact, in most cases I replaced `isinstance(x, int)` with `type(x) is int` for expressly this reason.) Additionally, unlike, C++, I do NOT define bitwise inverse on SymBool, because it does not do the correct thing when run on booleans, e.g., `~True` is `-2`. (For that matter, they don't do the right thing in C++ either, but at least in principle the compiler can warn you about it with `-Wbool-operation`, and so the rule is simple in C++; only use logical operations if the types are statically known to be SymBool). Alas, logical negation is not overrideable, so we have to introduce `sym_not` which must be used in place of `not` whenever a SymBool can turn up. To avoid confusion with `__not__` which may imply that `operators.__not__` might be acceptable to use (it isn't), our magic method is called `__sym_not__`. The other bitwise operators `&` and `|` do the right thing with booleans and are acceptable to use.
* There is some annoyance working with booleans in Sympy. Unlike int and float, booleans live in their own algebra and they support less operations than regular numbers. In particular, `sympy.expand` does not work on them. To get around this, I introduce `safe_expand` which only calls expand on operations which are known to be expandable.

TODO: this PR appears to greatly regress performance of symbolic reasoning. In particular, `python test/functorch/test_aotdispatch.py -k max_pool2d` performs really poorly with these changes. Need to investigate.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92149
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-01-21 02:21:56 +00:00
f7365eca90 Add unbacked symints support; item works now (#90624)
The big idea is to add `create_unbacked_symfloat` and `create_unbacked_symint` to ShapeEnv, allowing you to allocate symbolic floats/ints corresponding to data you don't know about at compile time. Then, instead of immediately erroring out when you try to call local_scalar_dense on a FakeTensor, we instead create a fresh symint/symfloat and return that.

There a bunch of odds and ends that need to be handled:

* A number of `numel` calls converted to `sym_numel`
* When we finally return from item(), we need to ensure we actually produce a SymInt/SymFloat when appropriate. The previous binding code assumed that you would have to get a normal Python item. I add a pybind11 binding for Scalar (to PyObject only) and refactor the code to use that. There is some trickiness where you are NOT allowed to go through c10::SymInt if there isn't actually any SymInt involved. See comment.
* One of our unit tests tripped an implicit data dependent access which occurs when you pass a Tensor as an argument to a sizes parameter. This is also converted to support symbolic shapes
* We now support tracking bare SymInt/SymFloat returns in proxy tensor mode (this was already in symbolic-shapes branch)
* Whenever we allocate an unbacked symint, we record the stack trace it was allocated at. These get printed when you attempt data dependent access on the symint (e.g., you try to guard on it)
* Subtlety: unbacked symints are not necessarily > 1. I added a test for this.

These unbacked symints are not very useful right now as you will almost always immediately raise an error later when you try to guard on them. The next logical step is adding an assertion refinement system that lets ShapeEnv learn facts about unbacked symints so it can do a better job eliding guards that are unnecessary.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90624
Approved by: https://github.com/Skylion007, https://github.com/voznesenskym
2022-12-12 13:33:07 +00:00
adc1a94ef4 Add tests for custom pybind type_casters (#89897)
This is a followup to #89115 which Fixes #88958

This adds tests to verify at runtime that the types returned by custom pybind type_casters are correctly specified in the second argument to `PYBIND11_TYPE_CASTER`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89897
Approved by: https://github.com/ezyang
2022-12-02 07:02:09 +00:00
2c0bd85c75 complex: register c10::complex with py::cast (#89680)
Fixes #77134

TODO:
* [x] Add test (tested locally with script below) (Are there similar tests in the test-suite?)

```c++
#include <torch/torch.h>
#include <torch/csrc/utils/pybind.h>
#include <iostream>
#include <vector>
#include <pybind11/pybind11.h>
#include <pybind11/embed.h>
#include <cassert>

namespace py = pybind11;

int main() {
    py::scoped_interpreter guard{}; // start the interpreter
    auto casted_cdouble = py::cast(c10::complex<double>(1.0, 2.0));
    assert(
        (c10::complex<double>(1.0, 2.0) ==
         py::cast<c10::complex<double>>(casted_cdouble)));

    auto casted_cfloat = py::cast(c10::complex<float>(1.0, 2.0));
    assert(
        (c10::complex<double>(1.0, 2.0) ==
         py::cast<c10::complex<double>>(casted_cfloat)));

    auto casted_chalf = py::cast(c10::complex<at::Half>(1.0, 2.0));
    assert(
        (c10::complex<double>(1.0, 2.0) ==
         py::cast<c10::complex<double>>(casted_chalf)));
}

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89680
Approved by: https://github.com/ezyang
2022-11-25 14:53:57 +00:00
cf4b4b1b06 Fix python types in pybind function signatures (#89115)
Fixes #88958

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89115
Approved by: https://github.com/ezyang
2022-11-16 16:30:56 +00:00
5e0c01330c SymIntArrayRef type caster (#89074)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89074
Approved by: https://github.com/SherlockNoMad
2022-11-16 14:02:39 +00:00
d3c01c722d Fix pybind11 problems with c10::SymInt unregistered (#88011)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88011
Approved by: https://github.com/weiwangmeta, https://github.com/albanD
2022-10-29 07:55:45 +00:00
3b6588ab74 Consistent compute numel/contiguous strategy with SymInts (#85858)
Previously, our handling for contiguity was inconsistent in the following ways:

- is_strides_like 2d/3d and is_non_overlapping_and_dense always were computed
  based on sizes_and_strides_, even if you had symbolic ints
- Furthermore, even if you set custom policy for strides, these quantities were
  not overridable by subclasses
- Furthermore, we didn't even store these fields on ExtraMeta
- We duplicate implementations of compute_contiguous (plain, channels last,
  channels last 3d)
- We inconsistently called refresh_numel()/refresh_contiguous(), versus
  recomputing it ourselves

This factor makes a consistent strategy for all of the boolean fields, and
for numel computation.  After this refactor:

- All layout boolean fields are interposable via strides policy
  and can be overridden from Python; you will never access a garbage field
- All layout boolean fields are on ExtraMeta
- You can always call refresh_numel/contiguous, no matter if your Tensor is
  contiguous or not
- The numel/layout boolean fields are always populated consistently with
  the sizes strides fields (either on Tensor or ExtraMeta), even if you
  have custom policy
- There is only one implementation of the actual computation logic

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D39907696](https://our.internmc.facebook.com/intern/diff/D39907696)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85858
Approved by: https://github.com/albanD
2022-09-30 21:26:34 +00:00
8ca1839d32 Python Dispatcher integration with C++ dispatcher (#85050)
#84826 but without ghstack
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85050
Approved by: https://github.com/malfet
2022-09-15 00:43:36 +00:00
706b990306 Revert "Python Dispatcher integration with C++ dispatcher (#84826)"
This reverts commit 35f6a69191ef762cf22b6cbfe94b8d9406e16674.

Reverted https://github.com/pytorch/pytorch/pull/84826 on behalf of https://github.com/malfet due to Broke dynamo, see 35f6a69191
2022-09-14 14:07:58 +00:00
35f6a69191 Python Dispatcher integration with C++ dispatcher (#84826)
Signed-off-by: Edward Z. Yang <ezyangfb.com>

From @ezyang's original PR:

There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:

We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation
The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See https://github.com/pybind/pybind11/issues/483#issuecomment-1237418106 for how this works; the technique is generally useful.

I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84826
Approved by: https://github.com/ezyang
2022-09-14 06:57:19 +00:00
df69660832 Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552)"" (#82599)
This reverts commit 532b8a9e00d7eea2636e67621bfcfa34d9c85bcb.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599
Approved by: https://github.com/albanD
2022-08-02 19:37:02 +00:00
532b8a9e00 Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552)"
This reverts commit 9465c0e0b50f3c37bc150ef0016238ba33eca6f4.

Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels
2022-08-01 20:25:35 +00:00
9465c0e0b5 Add a lint rule for torch/csrc/util/pybind.h include (#82552)
We define specializations for pybind11 defined templates
(in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently
it is important that these specializations *always* be #include'd
when making use of pybind11 templates whose behavior depends on
these specializations, otherwise we can cause an ODR violation.

The easiest way to ensure that all the specializations are always
loaded is to designate a header (in this case, torch/csrc/util/pybind.h)
that ensures the specializations are defined, and then add a lint
to ensure this header is included whenever pybind11 headers are
included.

The existing grep linter didn't have enough knobs to do this
conveniently, so I added some features.  I'm open to suggestions
for how to structure the features better.  The main changes:

- Added an --allowlist-pattern flag, which turns off the grep lint
  if some other line exists.  This is used to stop the grep
  lint from complaining about pybind11 includes if the util
  include already exists.

- Added --match-first-only flag, which lets grep only match against
  the first matching line.  This is because, even if there are multiple
  includes that are problematic, I only need to fix one of them.
  We don't /really/ need this, but when I was running lintrunner -a
  to fixup the preexisting codebase it was annoying without this,
  as the lintrunner overall driver fails if there are multiple edits
  on the same file.

I excluded any files that didn't otherwise have a dependency on
torch/ATen, this was mostly caffe2 and the valgrind wrapper compat
bindings.

Note the grep replacement is kind of crappy, but clang-tidy lint
cleaned it up in most cases.

See also https://github.com/pybind/pybind11/issues/4099

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552
Approved by: https://github.com/albanD
2022-08-01 17:16:58 +00:00
332e43ed1a [Profiler] Expose extra fields to Python
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79623

Pybind11 has a really awesome feature where you can tell it how to move a type from C++ to Python just by specializing one template and it has out of the box support for variant types. (You do have to make one change to variant to avoid a bunch of chatty compiler warnings.) This will make it easy to both:
A) Write principled type driven analysis in Python similar to `c10::visit`
B) Expose fields that only make sense for certain events without cluttering up the API of the top level events.

For now I haven't added any fields; this PR is just to handle the foundation.

Differential Revision: [D36988611](https://our.internmc.facebook.com/intern/diff/D36988611/)

Approved by: https://github.com/aaronenyeshi
2022-06-19 15:41:27 +00:00
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
40d1f77384 Codegen: python_torch_functions only include relevant operators (#68693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693

Generation of python bindings for native functions is split over 8
different files. One for each namespace, with the torch namespace
split into 3 shards, and methods in their own file as well. This
change ensures that editing any single (non-method) operator only
causes one of these files to be rebuilt.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32596270

Pulled By: albanD

fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f
(cherry picked from commit ba0fc71a3a6835e49b332a8be52bf798fa2726b3)
2022-01-21 15:37:06 +00:00
80b19c4c8c Enable Python bindings for UntypedStorage (#68945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68945

This PR enables the Python conversion functions for `Storage` (specifically `UntypedStorage`) and also cleans up some remnants of the deprecated typed storages from `DynamicTypes.cpp`.
ghstack-source-id: 147245110

Test Plan: Run the existing unit and integration tests.

Reviewed By: albanD

Differential Revision: D32676505

fbshipit-source-id: 3a3f6db4fb0da5c78dd406c96ab70bdc37015521
(cherry picked from commit d6427b94cf88b078bd228d43cd2afbabf0773b39)
2022-01-20 02:11:34 +00:00
ddea6980fe [PyTorch][JIT] Don't refcount Type singletons (#69579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69579

This should help us avoid reference counting overhead on singleton Type subclasses without a major rewrite of the Type subsystem.
ghstack-source-id: 146643993

Test Plan:
Ran //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark with arguments `--op empty -niter 40 --stressTestRecordFunction --captureRecordFunctionInputs` on devbig with turbo off.

Before:
```
I1206 13:47:15.037441 1201670 bench.cpp:144] Mean 0.737675
I1206 13:47:15.037463 1201670 bench.cpp:145] Median 0.736725
I1206 13:47:15.037468 1201670 bench.cpp:146] Min 0.722897
I1206 13:47:15.037473 1201670 bench.cpp:147] stddev 0.00508187
I1206 13:47:15.037482 1201670 bench.cpp:148] stddev / mean 0.00688903
```

After:
```
I1206 13:48:16.830123 1205612 bench.cpp:144] Mean 0.66988
I1206 13:48:16.830150 1205612 bench.cpp:145] Median 0.663956
I1206 13:48:16.830157 1205612 bench.cpp:146] Min 0.65986
I1206 13:48:16.830164 1205612 bench.cpp:147] stddev 0.0335928
I1206 13:48:16.830171 1205612 bench.cpp:148] stddev / mean 0.0501475
```

Static runtime startup is also improved; for CMF local_ro, time to initialize a predictor went from 10.01s to 9.59s.

(Note: I wish I had a production workload to demonstrate the advantage of this on. I tried ctr_mobile_feed local_ro net but it was neutral. Anything that manipulates types or List/Dict a lot might be promising.)

Reviewed By: suo

Differential Revision: D32923880

fbshipit-source-id: c82ed6689b3598e61047fbcb2149982173127ff0
2022-01-06 17:39:16 -08:00
eb2f535689 c10::Storage python to cpp converter and typecast (#59734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59734

Adds typecast logic to allow for c10::Storages to cross the Python/C++ barrier with pyBind

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D29075279

Pulled By: Lilyjjo

fbshipit-source-id: 3e67b8525d308c5bccc64438ebac82b4d17ba462
2021-06-29 14:16:52 -07:00
b162d95e46 Fix a number of lint perf and safety issues in torch (#59897)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59897

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29037012

fbshipit-source-id: 7c16286d5fc2b67964fb65f8374dfff4d1a7aefb
2021-06-15 13:14:51 -07:00
2ce23136d0 Use irange in torch/csrc utils (#55556)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55556

Test Plan: Sandcastle

Reviewed By: ezyang

Differential Revision: D27625936

fbshipit-source-id: 79065438f582a6f5fe6f1f796b6984767605197e
2021-06-02 15:47:00 -07:00
58bc003487 Add pybind type caster for c10::Device (#57292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57292

In Future (and soon in other places too) we need to receive a list of devices from Python-land. We don't want to just take their indices because we need full devices in order to infer the type from them. torch.device is not defined through pybind, it's defined through a plain `PyModule_AddObject` call with CPython, thus pybind isn't naturally able to understand and convert it. However we can provide a custom type caster which fixes that. We have this already for at::Tensor, at::Generator, ...
ghstack-source-id: 127916268

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28092732

fbshipit-source-id: 1c31d0b85a4d5c9e7bde8161efbb7574d505157c
2021-05-01 16:11:10 -07:00
ec0fa40f0f Release GIL before destructing RPCAgent subclasses. (#57029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57029

Partially addresses https://github.com/pytorch/pytorch/issues/56297

This fixes deadlocks when the threads the RPCAgent are blocking
on try to take the GIL.  This also adds a general utility for
making shared_ptr run destructors without GIL.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D28030294

Pulled By: ezyang

fbshipit-source-id: 628c066eebbb70bda5b914645a109dce35d73c8d
2021-04-28 10:25:03 -07:00
6ec71ed4f9 Replace all direct cdata access with THPVariable_Unpack (#55799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55799

I'm going to change the implementation of cdata soon so I need to
abstract over cdata access with a function.  Additionally, many
users are casting manually casting to THPVariable to access
the member so I can remove these unsafe casts in the client code
(the implementation, of course, is still doing an unsafe cast.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27712130

Pulled By: ezyang

fbshipit-source-id: 95fcc013bf3913d67f2c634068eb5b3aab144cb3
2021-04-15 08:57:04 -07:00