Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414
Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318
Test Plan:
compiles.
Imported from OSS
Reviewed By: ejguan
Differential Revision: D30375410
fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348
This change addresses singlaiiit's comment on D30241792 (61b49c8e41), which makes the JIT interpreter's behavior consistent between `future` is set and not.
Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path.
Reviewed By: singlaiiit
Differential Revision: D30347782
fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792
KinetoEvent
This PR adds module hierarchy information to events.
What is module hierarchy information attached to events?
During profiling a TorchScript module, when events are added, we ask JIT
what is the module hierarchy associated with the node being
executed. At the time of execution of that node, there might be multiple
frames in the stack of interpreter. For each frame, we find
corresponding node and the corresponding module hierarchy is queried.
Module hierarchy corresponding to the node is associated with node's
InlinedCallStack. InlinedCallStack of node tracks the path via which the
node is inlined. Thus during the inlining process we annotate
module information corresponding to the CallMethod nodes being inlined.
With this PR, chrome trace will contain additional metadata:
"Module Hierarchy". This can look like this:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward
It contains module instance, type name and the method name in the
callstack.
Test Plan:
test_profiler
Imported from OSS
Reviewed By: raziel, ilia-cher
Differential Revision: D29745442
fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073
It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase.
This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter.
Reviewed By: Krovatkin
Differential Revision: D30241792
fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57397
Introduces two main classes in C++ runtime:
ScriptProfile is the implementation for enalbing and disabling interpreter
profiling in C++. This should be only used from Python, and we will add
corresponding Python API in the next diff.
InstructionSpan is a utility class to instrument execution of each single
instruction. A start timestamp is recorded in the consturctor, and an end
timestamp is recorded in the destructor. During destruction, this will send
runtime data to all enabled ScriptProfile instances.
Test Plan:
build/bin/test_jit --gtest_filter='ScriptProfileTest.Basic'
Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D28133579
fbshipit-source-id: e7e30e96151367022793ab3ad323f01c51ad4a3b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57635
Note: this PR looks massive, but it's just one simple change, codemodded many times.
In many cases, a callback needs to access the value/error produced by the parent future. In Python this was easy because the callback was invoked with the parent future as argument, and could thus inspect it. In C++ the callbacks didn't take any arguments, thus in many cases we worked around this by capturing the future in its own callback. This is risky (leads to reference cycle and thus memory leak) and must be done carefully (spoiler: sometimes we weren't).
ghstack-source-id: 128296580
Test Plan: CI
Reviewed By: wanchaol
Differential Revision: D28178783
fbshipit-source-id: 6de02c4568be42123372edc008f630d5ddae0081
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56546
A code move for CodeImpl and Frame to a subdirectory runtime/interpreter, so
that it's easier to reuse them and navigate the interpreter code.
Test Plan: Imported from OSS
Reviewed By: nikithamalgifb
Differential Revision: D28133580
fbshipit-source-id: 8de89a4e8e637836625e1ac1db95f0a3353da670
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56652
Previous code doesn't drop prim::Constant values even when they are marked as drop.
Test Plan: Imported from OSS
Reviewed By: iseeyuan
Differential Revision: D27927413
fbshipit-source-id: 67cd52cf292e111be2830ccf93b0e7b089e49001
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54627
This is the simplest little fix to get interpreter to preserve
NotImplementedError, so that the test suite doesn't start choking
on meta tensors not working in interpreter. It is sound and correct
but doesn't work for other c10::Error subclasses with special handling.
A more proper fix is requested at
https://github.com/pytorch/pytorch/issues/54612
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: wenleix, ngimel
Differential Revision: D27328666
Pulled By: ezyang
fbshipit-source-id: 483bef062de5a907d20e2d9e25eafe2d5197cf8d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54110
dictConstruct doesn't need to make its caller have a `shared_ptr<DictType>`. It also doesn't need to do extra `shared_ptr` copies into the `key_type` and `value_type` locals.
ghstack-source-id: 124150642
Test Plan: fitsships
Reviewed By: ezyang
Differential Revision: D27101782
fbshipit-source-id: 3c632ad9d8f1bd7bdf37f517a86aca27bd41548a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54076
If we don't constrain ourselves to use `torch::jit::pop`, we can avoid copying a string or moving IValues around.
ghstack-source-id: 124040891
Test Plan:
existing tests
spot-checked regular interpreter assembly; seems better
Reviewed By: dhruvbird, walterddr
Differential Revision: D27087204
fbshipit-source-id: 7cf355dbcec31409bdb37afa09d7df85cf2a7e4b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54029
I found what appear to be some missed moves and/or extra copies in the JIT interpreter.
ghstack-source-id: 123958682
Test Plan:
Existing CI for correctness
Ran AdIndexer inline_cvr local_ro model benchmark with static_runtime off via
`env bin=/tmp/ptvsc2_predictor_bench.StaticDispatchModeFile static_runtime=0 caffe2=0 scripts/swolchok/static_runtime/inline_cvr/run_local_ro.sh`
before:
```
I0315 14:25:23.916893 3075680 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01635. Iters per second: 983.914
I0315 14:26:05.536207 3080560 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01689. Iters per second: 983.395
I0315 14:26:47.510561 3083335 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.02697. Iters per second: 973.737
I0315 14:27:29.024830 3086767 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01326. Iters per second: 986.918
I0315 14:28:10.849496 3091323 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.023. Iters per second: 977.517
```
after:
```
I0315 14:17:43.280469 3046242 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 0.997838. Iters per second: 1002.17
I0315 14:18:24.244606 3046861 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00173. Iters per second: 998.269
I0315 14:19:05.208899 3051998 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00187. Iters per second: 998.136
I0315 14:19:46.103854 3055392 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00073. Iters per second: 999.27
I0315 14:20:27.011411 3056062 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 0.999121. Iters per second: 1000.88
```
(This was just a convenient workload I had handy; the plan of record is to use static runtime for inline_cvr inference AIUI.)
Reviewed By: dhruvbird, walterddr
Differential Revision: D27060762
fbshipit-source-id: 5567206d7c2d9ae99776ce5524caf09ec2035e87
Summary:
This is a second attempt to use graph executor to run forward on a gradient. This allows a secondary chance to profile intermediate tensor introduced by autodiff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52136
Reviewed By: pbelevich
Differential Revision: D26693978
Pulled By: Krovatkin
fbshipit-source-id: 91dde8009a210950af8e5173668ada241e16dd52
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228
`fastmod -m 'expect(<((at|c10)::)?\w+Type>\(\)\s*)->'
'expectRef${1}.'`
Presuming it builds, this is a safe change: the result of `expect()`
wasn't being saved anywhere, so we didn't need it, so we can take a
reference instead of a new `shared_ptr`.
ghstack-source-id: 119782961
Test Plan: CI
Reviewed By: SplitInfinity
Differential Revision: D25837374
fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08
Summary:
This adds guarding for DifferentiableGraph nodes in order to not depend on
Also bailing out on required gradients for the CUDA fuser.
Fixes https://github.com/pytorch/pytorch/issues/49299
I still need to look into a handful of failing tests, but maybe it can be a discussion basis.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433
Reviewed By: ngimel
Differential Revision: D25681374
Pulled By: Krovatkin
fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48868
Building on the previous diff, we can make `toTensor()` return a
`const Tensor&`, which should make it easier to avoid reference
counting.
ghstack-source-id: 119327372
Test Plan: internal benchmarks.
Reviewed By: bwasti
Differential Revision: D25325379
fbshipit-source-id: ca699632901691bcee432f595f75b0a4416d55dd
Summary:
Adding a flag torch_jit_disable_warning_prints to optimize interpreter performance by suppressing (potentially large amount) of warnings.warn.
This is to work around TorchScript's warning behavior mismatch with Python. Python by default triggers a warning once per location but TorchScript doesn't support it. This causes same warning to trigger and print once per inference run, hurting performance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49313
Reviewed By: SplitInfinity
Differential Revision: D25534274
Pulled By: gmagogsfm
fbshipit-source-id: eaeb57a335c3e6c7eb259671645db05d781e80a2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47810
`bindSymbolicShapes` wasn't checking device or dtype at all, so it wasn't correct. It also isn't being used anywhere (num_profiles is always 1 and we don't use symbolic shapes). We shouldn't have it on until we are actually using symoblic shapes.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D25286214
Pulled By: eellison
fbshipit-source-id: 10fb175d0c75bd0159fb63aafc3b59cc5fd6c5af
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47550
I saw over 5% time spent in RecordFunction's ctor during one
of our framework overhead benchmarks in `perf`. Inspecting assembly,
it looks like we just create a lot of RecordFunctions and the
constructor has to initialize a relatively large number of member
variables.
This diff takes advantage of the observation that RecordFunction does
nothing most of the time by moving its state onto the heap and only
allocating it if needed. It does add the requirement that profiling is
actually active to use RecordFunction accessors, which I hope won't be
a problem.
ghstack-source-id: 117498489
Test Plan: Run framework overhead benchmarks. Savings ranging from 3% (InPlace_ndim_1) to 7.5% (empty_ndim_3) wall time.
Reviewed By: ilia-cher
Differential Revision: D24812213
fbshipit-source-id: 823a1e2ca573d9a8d7c5b7bb3972987faaacd11a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47549
In preparation for moving state onto the heap.
ghstack-source-id: 117027862
Test Plan: CI
Reviewed By: ilia-cher
Differential Revision: D24812214
fbshipit-source-id: 1455c2782b66f6a59c4d45ba58e1c4c92402a323
Summary:
By default, TorchScript execution is single threaded and uses the caller's thread pool. For the use case of distributed inference, we hope there is a way to customize the behavior where the interpreter in torch script can be executed in other places. This diff allows an explicit taskLauncher for torchscript interpreter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46865
Test Plan:
unit test is passed.
fbshipit-source-id: 1d7b003926c0d1f8facc53206efb960cff8897ac
Fixes #{issue number}
Reviewed By: houseroad
Differential Revision: D24616102
Pulled By: garroud
fbshipit-source-id: 79202b62f92d0b0baf72e4bf7aa3f05e0da91d59
Summary:
This diff restores previous behavior of silently allow overflowing when inserting instructions. The behavior was changed recently in https://github.com/pytorch/pytorch/issues/45382. But it started to break some existing use cases that haver overflow problems.
Restoring original behavior but throw a warning to to unblock existing use cases where overflowing happens.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46369
Reviewed By: kwanmacher, wanchaol, fbhuba
Differential Revision: D24324345
Pulled By: gmagogsfm
fbshipit-source-id: 1c0fac421d4de38f070e21059bbdc1b788575bdf
Summary:
* Add a pass at end of runCleanupPasses to annotate `aten::warn` so that each has its unique id
* Enhanced interpreter so that it tracks which `aten::warn` has been executed before and skip them
* Improved insertInstruction so that it correctly checks for overflow
Fixes https://github.com/pytorch/pytorch/issues/45108
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45382
Reviewed By: mrshenli
Differential Revision: D24060677
Pulled By: gmagogsfm
fbshipit-source-id: 9221bc55b9ce36b374bdf614da3fe47496b481c1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43684
This PR attempts to address #42560 by capturing the appropriate
exception_ptr in the autograd engine and passing it over to the Future.
As part of this change, there is a significant change the Future API where we
now only accept an exception_ptr as part of setError.
For the example in #42560, the exception trace would now look like:
```
> Traceback (most recent call last):
> File "test_autograd.py", line 6914, in test_preserve_backtrace
> Foo.apply(t).sum().backward()
> File "torch/tensor.py", line 214, in backward
> torch.autograd.backward(self, gradient, retain_graph, create_graph)
> File "torch/autograd/__init__.py", line 127, in backward
> allow_unreachable=True) # allow_unreachable flag
> File "torch/autograd/function.py", line 87, in apply
> return self._forward_cls.backward(self, *args)
> File "test_autograd.py", line 6910, in backward
> raise ValueError("something")
> ValueError: something
```
ghstack-source-id: 111109637
Test Plan: waitforbuildbot
Reviewed By: albanD
Differential Revision: D23365408
fbshipit-source-id: 1470c4776ec8053ea92a6ee1663460a3bae6edc5