Summary:
Add support for torch._check() in TorchScript jit.script frontend.
* It will be special cased to behave like torch._assert, turned into an if + raise exception.
Test Plan:
Unit tests
Rollback Plan:
Differential Revision: D79744604
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159988
Approved by: https://github.com/davidberard98
TorchScript only supports indexing into ModuleLists with integer literals. The error message already warns about this; but this PR adds clarifications around what a "literal" is. I'm adding this PR because, in my opinion, it's not obvious what a "literal" is and how strict its definition is. The clarification provided in this PR should make it easier for users to understand the issue and how to fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98606
Approved by: https://github.com/eellison, https://github.com/gmagogsfm
**Summary** NamedTuple attributes can be annotated to declare their type:
```python
class MyNamedTuple(NamedTuple):
x: int
y: torch.Tensor
z: MyOtherType
```
Normally in python you can also declare your types as strings, `x: 'int'`. But NamedTuples previously didn't support this, because their annotation evaluation process was slightly different. This PR updates the NamedTuple attribute type annotation evaluation method to support ForwardRef declarations (i.e. declaring as strings).
**Details**
Below I repeat the comment I left in _jit_internal.py:
NamedTuple types are slightly different from normal types.
Normally, annotations are evaluted like this (during jit.script):
1. Load strings of python code into c++ and parse.
2. Get annotations as strings
3. Use the PythonResolver's resolution callback (rcb) to convert the string into a python object
4. We call into annotations.py:ann_to_type to convert python obj from step 3 into a type that torchscript understands.
NamedTuples are more complicated, because they have sub-types. Normally, once we have the NamedTuple type object from #3, we can just look at the annotation literal values and use ann_to_type directly on them.
But sometimes, users will annotate with string literals, e.g.
```
x: 'int'
```
This also happens with PEP563 (from __forward__ import annotations)
These annotations appear in the annotation dict as ForwardRef('int').
Then, we need to convert the string into a python object. This requires having local context for custom objects or imported types. rcb() is what gives us this. So, we plumb rcb through the stack so it can be used in this context for the if block below.
FAQ:
- Why do we need this special handling for NamedTuple but string annotations work fine for normal types? Normally, we parse the string directly and then call rcb() directly from C++.
- Why not use ForwardRef._evaluate? For that, we need globals() and locals() for the local context where the NamedTuple was defined. rcb is what lets us look up into these. So, basically rcb does the hard work for us.
- What is rcb? rcb is a ResolutionCallback - python callable that takes a string and returns a type. It's generated by `createResolutionCallback.*` in _jit_internal.py.
**Why is this only partial support**:
This only plumbs the rcb through some paths. In particular, the `toSugaredValue` path uses a fake rcb.
**Alternatives**:
We could also treat this the way we treat non-nn.Module classes: we evaluate them separately, ahead of time. That solution is probably better, but probably requires a more risky refactor for the way NamedTuples are handled.
Fixes#95858
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96933
Approved by: https://github.com/qihqi
We want to make TorchRec sharded models TorchScriptable.
TorchRec sharded models uses generic types Awaitable[W] and LazyAwaitable[W] (https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/types.py#L212).
In sharded model those types are used instead of contained type W, having the initialization function that produces object of type W.
At the moment when the first attribute of W is requested - `LazyAwaitable[W]` will call its initialization function (on the same stack), cache the result inside and work transparently as an object of W. So we can think about it as a delayed object initialization.
To support this behavior in TorchScript - we propose a new type to TorchScript - `Await`.
In eager mode it works the same as `LazyAwaitable[W]` in TorchRec, being dynamically typed - acting as a type `W` while it is `Await[W]`.
Within torchscript it is `Await[W]` and can be only explicitly converted to W, using special function `torch.jit.awaitable_wait(aw)`.
Creation of this `Await[W]` is done via another special function `torch.jit.awaitable(func, *args)`.
The semantic is close to `torch.jit.Future`, fork, wait and uses the same jit mechanics (inline fork Closures) with the difference that it does not start this function in parallel on fork. It only stores as a lambda inside IValue that will be called on the same thread when `torch.jit.awaitable_wait` is called.
For example (more examples in this PR `test/jit/test_await.py`)
```
def delayed(z: Tensor) -> Tensor:
return Tensor * 3
@torch.jit.script
def fn(x: Tensor):
aw: Await[int] = torch.jit._awaitable(delayed, 99)
a = torch.eye(2)
b = torch.jit._awaitable_wait(aw)
return a + b + x
```
Functions semantics:
`_awaitable(func -> Callable[Tuple[...], W], *args, **kwargs) -> Await[W]`
Creates Await object, owns args and kwargs. Once _awaitable_wait calls, executes function func and owns the result of the function. Following _awaitable_wait calls will return this result from the first function call.
`_awaitable_wait(Await[W]) -> W`
Returns either cached result of W if it is not the first _awaitable_wait call to this Await object or calls specified function if the first.
`_awaitable_nowait(W) -> Await[W]`
Creates trivial Await[W] wrapper on specified object To be type complaint for the corner cases.
Differential Revision: [D42502706](https://our.internmc.facebook.com/intern/diff/D42502706)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90863
Approved by: https://github.com/davidberard98
Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236
Approved by: https://github.com/malfet
Replace cpp string comparisons with more efficient equality operators. These string comparisons are not just more readable, but they also allow for short-circuiting for faster string equality checks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92765
Approved by: https://github.com/ezyang
As we live in C++17 world
This is a functional no-op, just
- `s/namespace at { namespace native {/namespace at::native {/`
- `s/namespace torch { namespace jit {/namespace torch::jit {/`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92100
Approved by: https://github.com/izaitsevfb
This adds `torch.cuda._DeviceGuard` which is a stripped down version of
`torch.cuda.device` with lower overhead. To do this, it only accepts `int` as
the device so we don't need to call `_get_device_index` and is implemented
with a new C++ helper `torch._C._cuda_exchangeDevice` that allows
`_DeviceGuard.__enter__` to be just a single function call. On my machine,
I see a drop from 3.8us of overhead to 0.94 us with this simple benchmark:
```python
def set_device():
with torch.cuda.device(0):
pass
%timeit set_device()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91045
Approved by: https://github.com/ngimel, https://github.com/anijain2305
Apply clang-tidy check modernize-use-emplace. This is slightly more efficient by using an inplace constructor and is the recommended style in parts of the codebase covered by clang-tidy. This just manually applies the check to rest of the codebase. Pinging @ezyang as this is related to my other PRs he reviewed like #89000
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91077
Approved by: https://github.com/ezyang
Fixes minor perf regression I saw in #85688 and replaced throughout the code base. `obj == Py_None` is directly equivalent to is_none(). Constructing a temporary py::none() object needlessly incref/decref the refcount of py::none, this method avoids that and therefore is more efficient.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88051
Approved by: https://github.com/albanD
### Description
Removed some stubbed out code that was necessary for ROCm builds to support JIT compilation of Event and Stream classes. Original motivation for the code to be stubbed out in the ROCm case was likely due to this pull request:
https://github.com/pytorch/pytorch/pull/48020
In this PR, the include statement at the at the top of cuda.h was incorrectly pointed to aten/src/ATen/cuda/CUDAEvent.h when it should have been set to ATen/cuda/CUDAEvent.h. This error caused the hipification process of build_amd.py to not hipify this include statement correctly, causing errors. The include statement in question was subsequently fixed in the following commit:
acd072967a
This PR re-introduces the stubbed out code to the ROCm build and "unskips" the associated unit tests.
### Testing
Note: bullets prepended by ROCm were tested on systems with AMD GPUs while the others were tested with NVIDIA GPUs.
- apply commit
- (ROCm)`python tools/amd_build/build_amd.py`
- `python setup.py develop`
- (ROCm)`PYTORCH_TEST_WITH_ROCM=1 python test/test_jit.py TestCUDA.test_event_args`
- (ROCm)`PYTORCH_TEST_WITH_ROCM=1 python test/test_jit.py TestCUDA.test_stream_args`
- `python test/test_jit.py TestCUDA.test_event_args`
- `python test/test_jit.py TestCUDA.test_stream_args`
- Confirm tests pass in all scenarios
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82346
Approved by: https://github.com/malfet
We define specializations for pybind11 defined templates
(in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently
it is important that these specializations *always* be #include'd
when making use of pybind11 templates whose behavior depends on
these specializations, otherwise we can cause an ODR violation.
The easiest way to ensure that all the specializations are always
loaded is to designate a header (in this case, torch/csrc/util/pybind.h)
that ensures the specializations are defined, and then add a lint
to ensure this header is included whenever pybind11 headers are
included.
The existing grep linter didn't have enough knobs to do this
conveniently, so I added some features. I'm open to suggestions
for how to structure the features better. The main changes:
- Added an --allowlist-pattern flag, which turns off the grep lint
if some other line exists. This is used to stop the grep
lint from complaining about pybind11 includes if the util
include already exists.
- Added --match-first-only flag, which lets grep only match against
the first matching line. This is because, even if there are multiple
includes that are problematic, I only need to fix one of them.
We don't /really/ need this, but when I was running lintrunner -a
to fixup the preexisting codebase it was annoying without this,
as the lintrunner overall driver fails if there are multiple edits
on the same file.
I excluded any files that didn't otherwise have a dependency on
torch/ATen, this was mostly caffe2 and the valgrind wrapper compat
bindings.
Note the grep replacement is kind of crappy, but clang-tidy lint
cleaned it up in most cases.
See also https://github.com/pybind/pybind11/issues/4099
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552
Approved by: https://github.com/albanD
Adds support for scripting ParameterDicts and getattr() on them. It does
not support iterating on ParameterDicts because torch/nn/container.py
implementation of ParameterDict.items() uses a generator, which is not
supported by torchscript. torch/nn/container.py would need to be updated
so that iter gets correctly registered in python_sugared_value.cpp
Added a test in test_module_containers.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77143
Approved by: https://github.com/eellison
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74785
Fix for https://github.com/facebookresearch/torchdynamo/issues/93
Because the constructor follow a non-standard input schema (variadic integers), they are handled specially in ir_emitter.
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D35362762
Pulled By: eellison
fbshipit-source-id: 960badf08ba2ab0818af5fd331aff3542051250f
(cherry picked from commit bd579dead5a5206fc6e5b535ecf4f99ae67ee135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72899
Reland D33282878 (911d527b87). This is the frontend change.
ghstack-source-id: 149204031
Test Plan: Refer to D33282878 (911d527b87). Also check CI
Reviewed By: gmagogsfm
Differential Revision: D34252127
fbshipit-source-id: 27b17ddd4d05d904eb91fd9ee094d9121f00e388
(cherry picked from commit 1d276baca308110ac40111ccd622400b3bbdc864)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70471
Reland D33282878 (911d527b87). This is the frontend change.
ghstack-source-id: 149114933
Test Plan: Refer to D33282878 (911d527b87). Also check CI
Reviewed By: gmagogsfm
Differential Revision: D33342569
fbshipit-source-id: 57984ac67ae2c56c38f72d3b1fb69105901fb472
(cherry picked from commit b47cc935ee1fd7aa63aa453a323a637bc2c22f3c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71443
cogwheel test inline_cvr_infer_canary_pyper_model_publish is timing out.
The convert_fx call takes > 20 mins for local and local_ro sub modules, which used to take ~ 2 mins.
Test Plan:
Fblearn flow run
* the following cmd took 1113 seconds before the diff and 5002 seconds after.
flow-cli clone-locally 320014219 --run-as-secure-group pytorch_at_scale --operators pyper_model_publish_workflow.pyper_model_publish_workflow.process_torch_package_model_files.process_non_sparse_parameters[0]
Cogwheel test
* Cogwheel test with packages in B3588 (the last good run) took 4694.48s
* Cogwheel test with packages in B3590 (the first timeout) took 13975.83s
* Cogwheel test with the following packages took 4535.04s
* all packages in B3588 except the model publish
* the model publish built with D33469839 (043e84b3d2) reversed (created D33633570)
Reviewed By: albanD, jerryzh168
Differential Revision: D33633570
fbshipit-source-id: dc5e777c48a90c551641a3f79126461f6a60449e
(cherry picked from commit 03ab65023a9f4175584ddac1cca7eab51397c84a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254
Fixes https://github.com/pytorch/pytorch/issues/65997
BC breaking:
`output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument.
Follow up work:
1. disallow `default` as an overload name for aten operators.
2. Add a method to obtain a list of all overloads (exclude the ones registered by JIT)
3. Add methods/properties to `OpOverload` to access more schema information (types of input and output args etc)
cc ezyang gchanan
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D33469839
Pulled By: anjali411
fbshipit-source-id: c3fc43460f1c7c9651c64b4d46337be21c400621
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254
Fixes https://github.com/pytorch/pytorch/issues/65997
TODO: disallow `default` as an overload name for aten operators.
BC breaking:
`output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument.
cc ezyang gchanan
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33262228
Pulled By: anjali411
fbshipit-source-id: 600dbf511514ea9b41aea3e6b1bc1102dab08909
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339
When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message.
Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName .
Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want.
Code under scripts/shunting are just my own experimental code. I can split them out if requested.
ghstack-source-id: 146221879
Test Plan: buck test mode/opt //caffe2/test:jit
Reviewed By: gmagogsfm
Differential Revision: D33282878
fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d