127 Commits

Author SHA1 Message Date
541584d22e [BE][8/16] fix typos in torch/ (torch/csrc/jit/) (#156318)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156318
Approved by: https://github.com/albanD
2025-07-02 22:55:29 +00:00
24ca7e91e6 [1/N] Use internal linkage in torch/csrc C++ files. (#150930)
Turn more functions and variables into static if they are not used outside the cpp files. Unused functions are removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150930
Approved by: https://github.com/Skylion007

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2025-04-11 02:19:31 +00:00
b963d96bad [Torchscript] Add a flag to use mangled names instead of demangled (#148906)
Summary: Optionally keep mangled names when expanding torchscript stacks

Test Plan:
```
buck2 build mode/opt //scripts/rihams/LearnPyTorch:torch_script_generate --show-full-output

/data/users/rihams/fbsource/buck-out/v2/gen/fbcode/0bd9d136228ad8a7/scripts/rihams/LearnPyTorch/__torch_script_generate__/torch_script_generate.par

buck2 build mode/opt //scripts/rihams/LearnPyTorch:torch_script_execute --show-full-output
```

- With `--torch_jit_expanded_stacks_mangled` Flag:

/data/users/rihams/fbsource/buck-out/v2/gen/fbcode/ef35e45045e8164c/scripts/rihams/LearnPyTorch/__torch_script_execute__/torch_script_execute fbcode/model.pt  --torch_jit_expanded_stacks_mangled --torch_jit_enable_expanded_stacks

https://fburl.com/scuba/strobelight_function_tracer/8die4rvm

{F1975933247}

Without Flag:

/data/users/rihams/fbsource/buck-out/v2/gen/fbcode/ef35e45045e8164c/scripts/rihams/LearnPyTorch/__torch_script_execute__/torch_script_execute ./model.pt   --torch_jit_enable_expanded_stacks

https://fburl.com/scuba/strobelight_function_tracer/x3nladpf

 {F1975933268}

Reviewed By: bbus

Differential Revision: D70905872

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148906
Approved by: https://github.com/zdevito
2025-03-19 07:53:02 +00:00
cyy
40fb738197 Use Wextra-semi (#140236)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140236
Approved by: https://github.com/ezyang
2024-11-13 02:15:16 +00:00
cyy
7624d625c0 [Reland][7/N] Fix Wextra-semi warning (#140342)
Reland of #140225 to fix a change in FBCODE_CAFFE2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140342
Approved by: https://github.com/kit1980
2024-11-12 18:55:31 +00:00
dbb55b448b Revert "[7/N] Fix Wextra-semi warning (#140225)"
This reverts commit ffb979032dc149b4c895526fe5b92d713ed7b1e1.

Reverted https://github.com/pytorch/pytorch/pull/140225 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/140225#issuecomment-2469312229))
2024-11-12 00:02:06 +00:00
cyy
ffb979032d [7/N] Fix Wextra-semi warning (#140225)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140225
Approved by: https://github.com/ezyang
2024-11-10 14:28:10 +00:00
fddabc6e0b C10_UNUSED to [[maybe_unused]] (#6357) (#138364)
Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364
Approved by: https://github.com/Skylion007, https://github.com/eqy
2024-10-19 13:17:43 +00:00
8dd575faf6 [BE] Modernize C10_UNUSED (#138102)
[`[[maybe_unused]]`](https://en.cppreference.com/w/cpp/language/attributes/maybe_unused) is part of C++17 standard

Test Plan: Sandcastle

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138102
Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet, https://github.com/eqy
2024-10-18 16:33:01 +00:00
95f869c3d7 [pytorch_operator_stats] log if using torchscript runtime (#137986)
Summary: logs if an operator is run with the TorchScript runtime, using a thread_local variable set in `InterpreterState.run()`

Test Plan: buck2 run mode/dev-nosan caffe2/torch/fb/observers:scuba_observer_runner

Reviewed By: zou3519

Differential Revision: D64200781

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137986
Approved by: https://github.com/angelayi
2024-10-18 00:55:22 +00:00
cyy
f4dcf2ae93 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang, https://github.com/r-barnes
2024-07-08 07:03:53 +00:00
cyy
efb73eda51 [2/N] Fix some violations of unused-function and unused-variable checks in torch_cpu (#129878)
Follows #128670

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129878
Approved by: https://github.com/ezyang
2024-07-04 00:39:28 +00:00
846bb30e13 Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)"
This reverts commit bd72e28314d8d63bb347becb8309f5ac7761c6b5.

Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build bd72e28314. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))
2024-06-15 01:58:20 +00:00
cyy
bd72e28314 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang
2024-06-14 23:21:01 +00:00
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
2c9c57c061 Only profiling when it's enabled. (#121404)
Summary:
The profiling, even when disabled, takes up about 1.5% cpu for a model I'm looking into.

This patch just splits into with/without profile runs.

The potential downside is that now the script can't enable profiling in itself. It doesn't seem to be used anywhere. If that's a crusial usecase, we can do something about it but ideally we wouldn't.

Test Plan:
Link with profiles:
https://fburl.com/scuba/strobelight_services/ihxsl7pj

```
buck2 run fbcode//caffe2/test/cpp/jit:jit
```

Reviewed By: zhxchen17

Differential Revision: D54066589

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121404
Approved by: https://github.com/zhxchen17
2024-03-08 19:23:14 +00:00
b0e2ed4d67 removing some macros (#120314)
Summary: Will be making some changes in the surrounding code, they are going to be easier without macros

Differential Revision: D54001770

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120314
Approved by: https://github.com/zhxchen17
2024-03-06 22:06:05 +00:00
386776c49a [torch] Reduce the memory usage by adding flags to clearing intermediate graphs used for optimization during the ineference. (#115657)
Summary: During the inference time the intermediate graphs for optimization are not used so the Executor's graph is the only graph we need to keep around these two flags

Test Plan:
the FLAGS are all off by default

baseline
```
buck run mode/opt-clang  sigrid/predictor/client/localnet:run_model -- --model_id_to_load=951679039 --model_snapshot_to_load=244 --torch_jit_do_not_store_optimized_graph=true
I1212 10:24:20.407408 401092 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 951679039_244 is 182863 Kb
```
```
buck run mode/opt-clang  sigrid/predictor/client/localnet:run_model -- --model_id_to_load=951679039 --model_snapshot_to_load=244 --torch_jit_do_not_store_optimized_graph=true --torch_jit_release_profiling_graph_after_optimization=true
I1212 10:31:37.663487 464000 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 951679039_244 is 186127 Kb
```
```
buck run mode/opt-clang  sigrid/predictor/client/localnet:run_model -- --model_id_to_load=951679039 --model_snapshot_to_load=244 --torch_jit_do_not_store_optimized_graph=true --torch_jit_release_profiling_graph_after_optimization=true --torch_jit_execution_plan_avoid_extra_graph_copy=true
I1212 10:29:42.848093 447218 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 951679039_244 is 129451 Kb```

Differential Revision: D52081631

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115657
Approved by: https://github.com/houseroad
2023-12-18 17:56:39 +00:00
cyy
12f97bb2e9 [Reland][3/N] Add -Wdeprecated and related fixes (#110518)
Fixes the string_view errors and reland the work. The previous changes in torch/csrc/utils/invalid_arguments.cpp were too aggressive and not tested thoroughly. They are discarded.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110518
Approved by: https://github.com/ezyang
2023-10-07 08:38:40 +00:00
156aefa89b Revert "[3/N] Add -Wdeprecated and related fixes (#109698)"
This reverts commit c31fcdaa4f79e83c82ec4f5ff3cf96e2cb99eecd.

Reverted https://github.com/pytorch/pytorch/pull/109698 on behalf of https://github.com/PaliC due to breaking quantization tests ( quantization/test_quantize_per_channel_sub_byte and  quantization/test_quantize_per_channel_float_qparams) internally ([comment](https://github.com/pytorch/pytorch/pull/109698#issuecomment-1746999806))
2023-10-04 14:33:47 +00:00
cyy
c31fcdaa4f [3/N] Add -Wdeprecated and related fixes (#109698)
This PR follows #108626. Hopefully we can enable the warning in the next PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109698
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2023-10-03 22:50:53 +00:00
92242f599a [PyTorch] Add Expanded call stack to nodes [Take 2] (#110229)
Summary:
Adding back D46578700 / PR https://github.com/pytorch/pytorch/pull/108426

Note: The changes were originally reverted due to memory regression, these changes are putting the code behind a gflag so it is only used by binaries that require expanded stack for BPF Profiling.

Original Diff comment:
To get a Node's call stack we currently loop on the InlinedCallStack graph and follow the "callee" chain. Since the node's inlined stack does not change we can optimize this but expanding the node's inlined stack once and reusing it. This is particularly useful when reading the node's stack from another process (e.g. BPF) as it simplified the memory traversal process.
The new data structure (NodeSourceInfo) only holds pointers to the function name and file name variables, and assumes these objects will be alive throughout the lifetime of the process.
Each Node has an extended attribute that has an index to a vector of stack frames expanded_node_stacks_
node_stack_attr_symbol_ is only needed to make accessing the stack vector index attribute easier from BPF.

Test Plan:
- Verified using BPF Program in subsequent diffs
- Perf testing for loading large model: P822455246

Differential Revision: D49565461

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110229
Approved by: https://github.com/zdevito
2023-10-02 19:52:41 +00:00
ad8aef0f98 [BE] [3/N] Use nested namespaces (#110314)
Mostly in torch/csrc/jit/runtime and in `ATen/cuda/`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110314
Approved by: https://github.com/seemethere
2023-09-30 02:23:48 +00:00
750cbb299b [RPC] Check stack for emptiness in interpreter (#103327)
Hi! I found heap-buffer-overflow during PyTorch RPC-module fuzzing.

[crash-9cc26b8da3b688a9c26614481239943b357c5636.zip](https://github.com/pytorch/pytorch/files/11707706/crash-9cc26b8da3b688a9c26614481239943b357c5636.zip)

```
    "==10634==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060001b6a98 at pc 0x000000639a2e bp 0x7fffffff9100 sp 0x7fffffff90f8",
    "READ of size 4 at 0x6060001b6a98 thread T0",
    "    #0 0x639a2d in c10::IValue::isTensor() const /pytorch/aten/src/ATen/core/ivalue.h:432:27",
    "    #1 0x639a2d in c10::IValue::toTensor() && /pytorch/aten/src/ATen/core/ivalue_inl.h:159:7",
    "    #2 0xc5eb105 in at::Tensor c10::IValue::to<at::Tensor>() && /pytorch/aten/src/ATen/core/ivalue_inl.h:1690:1",
    "    #3 0xc5eb105 in void torch::jit::pop<at::Tensor>(std::vector<c10::IValue, std::allocator<c10::IValue> >&, at::Tensor&) /pytorch/aten/src/ATen/core/stack.h:130:55",
    "    #4 0xc5eaedb in torch::jit::dtype(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/jit/mobile/promoted_prim_ops.cpp:105:3",
    "    #5 0xcc79600 in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/jit/runtime/interpreter.cpp:682:13",
    "    #6 0xcc4158b in torch::jit::InterpreterStateImpl::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/jit/runtime/interpreter.cpp:1052:9",
    "    #7 0x60f378 in runGraph(std::shared_ptr<torch::jit::Graph>, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) /jit_differential.cc:66:38",
    "    #8 0x610bb9 in LLVMFuzzerTestOneInput /jit_differential.cc:107:25",
    "    #9 0x535c91 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15",
    "    #10 0x51fb9c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6",
    "    #11 0x5258eb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9",
    "    #12 0x54eea2 in main /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10",
    "    #13 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)",
    "    #14 0x51a4bd in _start (/jit_differential_fuzz+0x51a4bd)",
    "",
    "0x6060001b6a98 is located 8 bytes to the left of 64-byte region [0x6060001b6aa0,0x6060001b6ae0)",
    "allocated by thread T0 here:",
    "    #0 0x60c66d in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3",
    "    #1 0xa5a41b in std::_Vector_base<c10::IValue, std::allocator<c10::IValue> >::_M_allocate(unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:346:20",
    "    #2 0xa5a41b in void std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_realloc_insert<c10::IValue&>(__gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue, std::allocator<c10::IValue> > >, c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:440:33",
    "    #3 0xa5a241 in c10::IValue& std::vector<c10::IValue, std::allocator<c10::IValue> >::emplace_back<c10::IValue&>(c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:121:4",
    "    #4 0xcc8209c in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/jit/runtime/interpreter.cpp:345:19",
    "    #5 0xcc4158b in torch::jit::InterpreterStateImpl::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/jit/runtime/interpreter.cpp:1052:9",
    "    #6 0x60f378 in runGraph(std::shared_ptr<torch::jit::Graph>, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) /jit_differential.cc:66:38",
    "    #7 0x610bb9 in LLVMFuzzerTestOneInput /jit_differential.cc:107:25",
    "    #8 0x535c91 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15",
    "    #9 0x51fb9c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6",
    "    #10 0x5258eb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9",
    "    #11 0x54eea2 in main /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10",
    "    #12 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)",
    "",
    "SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch/aten/src/ATen/core/ivalue.h:432:27 in c10::IValue::isTensor() const",
    "Shadow bytes around the buggy address:",
    "  0x0c0c8002ed00: 00 00 00 00 00 00 00 fa fa fa fa fa fd fd fd fd",
    "  0x0c0c8002ed10: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd",
    "  0x0c0c8002ed20: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa",
    "  0x0c0c8002ed30: fd fd fd fd fd fd fd fd fa fa fa fa 00 00 00 00",
    "  0x0c0c8002ed40: 00 00 00 00 fa fa fa fa fd fd fd fd fd fd fd fd",
    "=>0x0c0c8002ed50: fa fa fa[fa]00 00 00 00 00 00 00 00 fa fa fa fa",
    "  0x0c0c8002ed60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa",
    "  0x0c0c8002ed70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa",
    "  0x0c0c8002ed80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa",
    "  0x0c0c8002ed90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa",
    "  0x0c0c8002eda0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa",
    "Shadow byte legend (one shadow byte represents 8 application bytes):",
    "  Addressable:           00",
    "  Partially addressable: 01 02 03 04 05 06 07",
    "  Heap left redzone:       fa",
    "  Freed heap region:       fd",
    "  Stack left redzone:      f1",
    "  Stack mid redzone:       f2",
    "  Stack right redzone:     f3",
    "  Stack after return:      f5",
    "  Stack use after scope:   f8",
    "  Global redzone:          f9",
    "  Global init order:       f6",
    "  Poisoned by user:        f7",
    "  Container overflow:      fc",
    "  Array cookie:            ac",
    "  Intra object redzone:    bb",
    "  ASan internal:           fe",
    "  Left alloca redzone:     ca",
    "  Right alloca redzone:    cb",
    "==10634==ABORTING"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103327
Approved by: https://github.com/Skylion007
2023-06-16 20:12:51 +00:00
d70f9c7888 Fix typo under torch/csrc/jit/runtime directory (#97243)
This PR fixes typo in comments and messages under `torch/csrc/jit/runtime` directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97243
Approved by: https://github.com/davidberard98
2023-03-29 20:17:10 +00:00
a229e78544 [BE] Enforce sign-compare (#96723)
Number of OSS PR were reverted, because new signed-unsigned comparison warnings, which are treated as errors in some internal builds.
Not sure how those selective rules are applied, but this PR removes `-Wno-sign-compare` from PyTorch codebase.

The only tricky part in this PR, as making sure that non-ASCII character detection works for both signed and unsigned chars  here:
6e3d51b08a/torch/csrc/jit/serialization/python_print.cpp (L926)

Exclude several files from sign-compare if flash attention is used, due to the violation in cutlass, to be fixed by https://github.com/NVIDIA/cutlass/pull/869
Do not try to fix sign compare violations in caffe2 codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96723
Approved by: https://github.com/albanD
2023-03-15 06:04:20 +00:00
a6a433aecd Add stack emptiness checks inside interpreter.cpp (#94298)
Hi!

I've been fuzzing different pytorch modules, and found a few crashes inside one of them.

Specifically, I'm talking about a module for interpreting the JIT code and a function called `InterpreterState::run()`. Running this function with provided crash file results in a crash, which occurs while calling `dim()` on a `stack` with 0 elements ([line-686](abc54f9314/torch/csrc/jit/runtime/interpreter.cpp (L686))). The crash itself occurs later, when std::move is called with incorrect value of type `IValue`.

The second crash is similar and occurs on [line 328](abc54f9314/torch/csrc/jit/runtime/interpreter.cpp (LL328C15-L328C48)), where `reg(inst.X + i - 1) = pop(stack);` is executed. The error here is the same, `Stack stack` might not contain enough elements.

The third crash occurs on [line 681](abc54f9314/torch/csrc/jit/runtime/interpreter.cpp (L681)). The problem here is the same as for previous crashes. There are not enough elements in the stack.

In addition to these places, there are many others (in the same function) where border checking is also missing. I am not sure what is the best way to fix these problems, however I suggest adding a boundary check inside each of these case statement.

All tests were performed on this pytorch version: [abc54f93145830b502400faa92bec86e05422fbd](abc54f9314)

### How to reproduce

1. To reproduce the crash, use provided docker: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch)

2. Build the container: `docker build -t oss-sydr-fuzz-pytorch-reproduce .`

3. Copy these crash files to the current directory:

    - [crash-4f18c5128c9a5a94343fcbbd543d7d6b02964471.zip](https://github.com/pytorch/pytorch/files/10674143/crash-4f18c5128c9a5a94343fcbbd543d7d6b02964471.zip)
    - [crash-55384dd7c9689ed7b94ac6697cc43db4e0dd905a.zip](https://github.com/pytorch/pytorch/files/10674147/crash-55384dd7c9689ed7b94ac6697cc43db4e0dd905a.zip)
    - [crash-06b6125d01c5f91fae112a1aa7dcc76d71b66576.zip](https://github.com/pytorch/pytorch/files/10674152/crash-06b6125d01c5f91fae112a1aa7dcc76d71b66576.zip)

4. Run the container: ``docker run --privileged --network host -v `pwd`:/homedir --rm -it oss-sydr-fuzz-pytorch-reproduce /bin/bash``

5. And execute the binary: `/jit_differential_fuzz /homedir/crash-4f18c5128c9a5a94343fcbbd543d7d6b02964471`

After execution completes you will see this stacktrace:

```asan
=36==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060001657f8 at pc 0x00000060bc91 bp 0x7fff00b33380 sp 0x7fff00b33378
READ of size 4 at 0x6060001657f8 thread T0
    #0 0x60bc90 in c10::IValue::IValue(c10::IValue&&) /pytorch_fuzz/torch/include/ATen/core/ivalue.h:214:43
    #1 0xc20e7cd in torch::jit::pop(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/aten/src/ATen/core/stack.h:102:12
    #2 0xc20e7cd in torch::jit::dim(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/mobile/promoted_prim_ops.cpp:119:20
    #3 0xc893060 in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:686:13
    #4 0xc85c47b in torch::jit::InterpreterStateImpl::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:1010:9
    #5 0x600598 in runGraph(std::shared_ptr<torch::jit::Graph>, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) /jit_differential_fuzz.cc:66:38
    #6 0x601d99 in LLVMFuzzerTestOneInput /jit_differential_fuzz.cc:107:25
    #7 0x52ccf1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #8 0x516c0c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #9 0x51c95b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #10 0x545ef2 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #11 0x7f9ec069a082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
    #12 0x51152d in _start (/jit_differential_fuzz+0x51152d)

0x6060001657f8 is located 8 bytes to the left of 64-byte region [0x606000165800,0x606000165840)
allocated by thread T0 here:
    #0 0x5fd42d in operator new(unsigned long) /llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:95:3
    #1 0xa16ab5 in void std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_realloc_insert<c10::IValue&>(__gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue, std::allocator<c10::IValue> > >, c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:440:33
    #2 0xa168f1 in c10::IValue& std::vector<c10::IValue, std::allocator<c10::IValue> >::emplace_back<c10::IValue&>(c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:121:4
    #3 0xc89b53c in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:344:19
    #4 0xc85c47b in torch::jit::InterpreterStateImpl::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:1010:9
    #5 0x600598 in runGraph(std::shared_ptr<torch::jit::Graph>, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) /jit_differential_fuzz.cc:66:38
    #6 0x601d99 in LLVMFuzzerTestOneInput /jit_differential_fuzz.cc:107:25
    #7 0x52ccf1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #8 0x516c0c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #9 0x51c95b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #10 0x545ef2 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #11 0x7f9ec069a082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)

SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch_fuzz/torch/include/ATen/core/ivalue.h:214:43 in c10::IValue::IValue(c10::IValue&&)
Shadow bytes around the buggy address:
  0x0c0c80024aa0: fd fd fd fd fd fd fd fa fa fa fa fa 00 00 00 00
  0x0c0c80024ab0: 00 00 00 fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c80024ac0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c80024ad0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x0c0c80024ae0: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 00
=>0x0c0c80024af0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa[fa]
  0x0c0c80024b00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
  0x0c0c80024b10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c80024b20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c80024b30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c80024b40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==36==ABORTING
```

6. Executing the remaining crashes gives similar crash reports
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94298
Approved by: https://github.com/davidberard98
2023-02-13 21:00:00 +00:00
2fc73622f8 [jit] Support Awaitable type (#90863)
We want to make TorchRec sharded models TorchScriptable.

TorchRec sharded models uses generic types Awaitable[W] and LazyAwaitable[W] (https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/types.py#L212).
In sharded model those types are used instead of contained type W, having the initialization function that produces object of type W.

At the moment when the first attribute of W is requested - `LazyAwaitable[W]` will call its initialization function (on the same stack), cache the result inside and work transparently as an object of W. So we can think about it as a delayed object initialization.

To support this behavior in TorchScript - we propose a new type to TorchScript - `Await`.
In eager mode it works the same as `LazyAwaitable[W]` in TorchRec, being dynamically typed - acting as a type `W` while it is `Await[W]`.

Within torchscript it is `Await[W]` and can be only explicitly converted to W, using special function `torch.jit.awaitable_wait(aw)`.
Creation of this `Await[W]` is done via another special function `torch.jit.awaitable(func, *args)`.

The semantic is close to `torch.jit.Future`, fork, wait and uses the same jit mechanics (inline fork Closures) with the difference that it does not start this function in parallel on fork. It only stores as a lambda inside IValue that will be called on the same thread when `torch.jit.awaitable_wait` is called.

For example (more examples in this PR `test/jit/test_await.py`)
```
      def delayed(z: Tensor) -> Tensor:
          return Tensor * 3

      @torch.jit.script
      def fn(x: Tensor):
          aw: Await[int] = torch.jit._awaitable(delayed, 99)
          a = torch.eye(2)
          b = torch.jit._awaitable_wait(aw)
          return a + b + x
```

Functions semantics:

`_awaitable(func -> Callable[Tuple[...], W], *args, **kwargs) -> Await[W]`

Creates Await object, owns args and kwargs. Once _awaitable_wait calls, executes function func and owns the result of the function. Following _awaitable_wait calls will return this result from the first function call.

`_awaitable_wait(Await[W]) -> W`
Returns either cached result of W if it is not the first _awaitable_wait call to this Await object or calls specified function if the first.

`_awaitable_nowait(W) -> Await[W]`

Creates trivial Await[W] wrapper on specified object To be type complaint for the corner cases.

Differential Revision: [D42502706](https://our.internmc.facebook.com/intern/diff/D42502706)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90863
Approved by: https://github.com/davidberard98
2023-01-30 17:38:59 +00:00
0247ed27cc Apply Clang-Tidy readability-container-size-empty (#93236)
Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236
Approved by: https://github.com/malfet
2023-01-29 23:28:19 +00:00
b9182cbbd8 Fixup torch jit with some initializers and moves (#92037)
Fixup some minor codequality issues in torch JIT

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92037
Approved by: https://github.com/ezyang
2023-01-12 17:29:24 +00:00
3916d7a575 Apply modernize-use-emplace to aten, c10, torch (#91077)
Apply clang-tidy check modernize-use-emplace. This is slightly more efficient by using an inplace constructor and is the recommended style in parts of the codebase covered by clang-tidy. This just manually applies the check to rest of the codebase. Pinging @ezyang as this is related to my other PRs he reviewed like #89000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91077
Approved by: https://github.com/ezyang
2022-12-19 07:49:56 +00:00
1dbc8ad3b7 Add Warning class and refactor C++ warnings to use it (#84101)
Also adds `TORCH_WARN_WITH` and `TORCH_WARN_DEPRECATION` macros

Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84101
Approved by: https://github.com/albanD
2022-10-18 20:02:42 +00:00
c083489f46 [kineto] Optimize getStepCallbacks for common case of no active callbacks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77804

IIUC, the result of this function will be empty and unused if there are no sampled callbacks, which is the common case. We can accelerate this case by wrapping the result in an optional to save initializing an empty SmallVector.

Differential Revision: [D36497279](https://our.internmc.facebook.com/intern/diff/D36497279/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36497279/)!

Approved by: https://github.com/robieta
2022-05-24 19:38:01 +00:00
8d34a8325d TorchScript to support capability to rethrow the original python exception (#77093)
Summary:
In order to categorize exceptions/errors, the observability /migration team faced a problem that currently the exception is shown as RuntimeError, and hard to categorize.

The solution to this problem is to be able to get the original python exception's class name and msg, and hopefully to recreate a python exception from that.
TO support this approach, we did the following in this diff:

(1) TorchScript to translate JITException so that it does not show as RuntimeError
(2) record python exception class name, original message during translation.

Then, later, the python exception can be reconstructed.

(3) Added a new decorator to reconstruct the python exception and then rethrow it.

Test Plan:
buck test //caffe2/torch/fb/translate_exception/tests:test_rethrow mode/dev-tsan
```
More details at https://www.internalfb.com/intern/buck/build/1180a788-3767-48e5-a64d-06d284b91a17
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 24ae6c7c-a647-404e-8f12-d12c762bf728
Trace available for this run at /tmp/tpx-20220507-195320.698499-24ae6c7c-a647-404e-8f12-d12c762bf728/trace.log
RemoteExecution session id: reSessionID-24ae6c7c-a647-404e-8f12-d12c762bf728-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774413147962
    ✓ ListingSuccess: caffe2/torch/fb/translate_exception/tests:test_rethrow : 3 tests discovered (27.233)
    ✓ Pass: caffe2/torch/fb/translate_exception/tests:test_rethrow - test_one_parameter (test_rethrow.TestTranslateRethrowPythonException) (28.467)
    ✓ Pass: caffe2/torch/fb/translate_exception/tests:test_rethrow - test_no_parameter (test_rethrow.TestTranslateRethrowPythonException) (28.495)
    ✓ Pass: caffe2/torch/fb/translate_exception/tests:test_rethrow - test_2_parameter_with_torch_script_only (test_rethrow.TestTranslateRethrowPythonException) (28.708)
Summary
  Pass: 3
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8162774413147962

```

Differential Revision: D36166520

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77093
Approved by: https://github.com/qihqi
2022-05-13 16:40:25 +00:00
5b4d110f51 fix lint
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76193

Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-04-21 20:21:17 +00:00
272890998e [JIT] pass more exception info through the JIT interpreter
If TORCH_SHOW_CPP_STACKTRACES=1, then dump e.what() into the RuntimeError, which should make it easier to debug exceptions that happen within interpreted sections.

Test:
```patch
diff --git a/test/cpp/jit/test_dce.cpp b/test/cpp/jit/test_dce.cpp
index 6f9161d0d9..7c574787cf 100644
--- a/test/cpp/jit/test_dce.cpp
+++ b/test/cpp/jit/test_dce.cpp
@@ -3,6 +3,10 @@
 #include <torch/csrc/jit/ir/irparser.h>
 #include <torch/csrc/jit/passes/dead_code_elimination.h>
 #include <torch/csrc/jit/testing/file_check.h>
+#include <torch/csrc/jit/runtime/interpreter.h>
+#include <test/cpp/jit/test_utils.h>
+
+#include <ATen/ATen.h>

 namespace torch {
 namespace jit {
@@ -48,5 +52,30 @@ graph():
   // Check that dead code elimin
   testing::FileCheck().run(input, *graph);
 }
+
+TEST(EliminateDeadCodeTest, interpreterfailure) {
+  const std::string input = R"IR(
+graph(%x.1 : Tensor):
+  %2 : int = prim::Constant[value=128]() # /data/users/dberard/scripts/DGB/sz.py:4:38
+  %3 : int = prim::Constant[value=256]() # /data/users/dberard/scripts/DGB/sz.py:4:43
+  %5 : int = prim::Constant[value=1]() # /data/users/dberard/scripts/DGB/sz.py:4:53
+  %4 : int[] = prim::ListConstruct(%2, %3)
+  %6 : Tensor[] = aten::split_with_sizes(%x.1, %4, %5) # /data/users/dberard/scripts/DGB/sz.py:4:11
+  return (%6)
+)IR";
+  auto graph = std::make_shared<Graph>();
+  parseIR(input, graph.get());
+
+  //auto stack = createStack({at::randn({2, 383}, at::kCPU)});
+  auto stack = createStack({at::Tensor{}});
+
+  Code code(graph, "");
+  InterpreterState interpreter{code};
+  interpreter.run(stack);
+ ASSERT_EQ(2, stack.size());
+  ASSERT_FALSE(stack[0].toTensor().defined());
+  ASSERT_FALSE(stack[1].toTensor().defined());
+}
+
 } // namespace jit
 } // namespace torch
```

^ use this to repro the interpreter issue: `TORCH_SHOW_CPP_STACKTRACES=1 ./bin/test_jit --gtest_filter="EliminateDeadCodeTest.interpreterfailure"` and the stack trace is shown.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75682

Approved by: https://github.com/eellison
2022-04-21 18:26:49 +00:00
a5e338a826 [RecordFunction] More effecient machinery to determine which callbacks to run. (#75807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75807

There is a tension in RecordFunction between two use cases:
1) In the normal eager path we don't run any callbacks, so we need to bail out of the profiling path as soon as possible to minimize eager overhead.
2) When profiling we want to determine which callbacks to run as efficiently as possible to minimize instrumentation overhead.

The confounding factor in all of this is sampling callbacks because they change which callbacks will run on each call, even in steady state operation. This has traditionally been handled with a two stage procedure: first we flip a coin to determine if a sampled callback *might* run. If false (which it usually is), do nothing. This solves (1). If true, check to see if we need to build the full callback set or if it was a false positive. This procedure has two negative effects:
* It forces us to rebuild the set of callbacks to run on every step when profiling
* It leaks the sampling abstraction, requiring other parts of the code to bump certain values and forces RecordFunction to lazily initialize.

This change introduces a multi-level cache which can (in the common case) quickly determine which callbacks *will* run, rather than if callbacks *might* run. This means that rather than call `shouldRunRecordFunction`, we can simply get the callbacks for an invocation and check if they are empty. (And completely removes the pre-sampling heuristic.) Another major benefit of the new cache structure is that it allows thread-safe registration and unregistration of global callbacks.

It's worth briefly discussing how this maintains eager performance. In the standard eager case (only sampling callbacks registered) the cache first checks that the global callbacks haven't changed (atomic read), decrements a counter to see if a sampling callback fired, and then returns the active callbacks which is simply a SmallVector of pointer pairs and a couple POD values (scope, needs inputs/outputs/ids). The biggest cost according to perf is the SmallVector logic; we could consider adopting a hard limit on active callbacks; more than half a dozen callbacks *running* in a single step would be quite a lot. But the total cost relative to `PYTORCH_DISABLE_PER_OP_PROFILING` is only ~10ns, so debatable if it's worth it to switch to `std::array`.

The primary change is in `record_function.cpp`, which has a more detailed description of the new cache structure. `record_function.h` has some minor changes to align with the new calling convention and the remaining files are simply changes to the call sites.

Future work:
  * RecordFunction no longer needs to be lazily initialized.
  * We can deprecate the disable/reenable APIs, since we can not safely add and remove global callbacks.

Test Plan:
I tested eager mode performance using the overhead benchmark and found that the non-profiled path was unaffected. However the no-op observer dropped from 0.41us to 0.37us (0.25us if no observers are active) which is about 1/3rd reduction in the cost of the callback selection machinery.

I also added several C++ unit tests, as the core RecordFunction machinery (especially sampling) was largely untested.

Reviewed By: swolchok, davidberard98

Differential Revision: D35276158

fbshipit-source-id: 35135f444724fba4eb97c0ae7f3f710f0f9016fd
(cherry picked from commit 9e359b87422c18f2a195185f32e7e85c82f956fd)
2022-04-19 20:46:16 +00:00
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
a482aeb0ce [PyTorchEdge] backport v8 to v7 to support promoted ops as instruction (#71662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71662

backport v8 to v7 to support promoted ops as instruction

a flag to help export as instruction from v8 and export as operators for v7 and below

Test Plan:
```
buck test caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions

Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit : 461 tests discovered (15.693)
    ✓ Pass: caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions (2.712)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927
```

```
buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen

buck test mode/opt //caffe2/test:upgrader_codegen -- mobile.test_upgrader_codegen.TestLiteScriptModule
Parsing buck files: finished in 0.8 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 01:39.4 min (100%) 11031/11031 jobs, 2/11031 updated
  Total time: 01:40.2 min
More details at https://www.internalfb.com/intern/buck/build/a8b0e417-019c-44ba-be6b-23379411a965
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 44fbfa66-cce8-4277-82ac-f89d79558581
Trace available for this run at /tmp/tpx-20220202-160956.915412/trace.log
RemoteExecution session id: reSessionID-44fbfa66-cce8-4277-82ac-f89d79558581-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601
    ✓ ListingSuccess: caffe2/test:upgrader_codegen : 1 tests discovered (1.249)
    ✓ Pass: caffe2/test:upgrader_codegen - test_generate_bytecode (mobile.test_upgrader_codegen.TestLiteScriptModule) (1.365)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601
```

Reviewed By: iseeyuan

Differential Revision: D33719098

fbshipit-source-id: e2d2b23d298f98e4d4fcdfc344f7b8c6f92cff26
(cherry picked from commit 81b956c23abc19489b69eee986721252474d00dc)
2022-02-15 03:47:39 +00:00
bf69a61293 (1/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: backend change
Summary: Reland for D33282878 (911d527b87) . Land backend change first to maintain FC. Will wait for 2 weeks after this diff is in. And than land the front-end change in next diff.

Test Plan:
test in next diff

time buck test mode/dev-nosan fblearner/flow/projects/langtech/translation:tests -- test_e2e_base_training

Reviewed By: gmagogsfm

Differential Revision: D33342547

fbshipit-source-id: b3dee9a4bdfd78103848c12629e5fccafdd621e3
(cherry picked from commit ae1935f1af755180e5607e870ff365dc17061e4a)
2022-01-27 03:29:40 +00:00
bf610f08b0 Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions"
Summary: as title

Test Plan:
```
buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform
...
############## Start inline_cvr_post_imp_model Test Results Analysis ##############
I1226 22:03:56.789000 3346280 test_driver.py:139  UNKNOWN     ] Test finished in 808.2743511786684 seconds.
+-------------------------+---------+------------------------+-----------------+
| Test Case               | Status  | Message                | Model Entity ID |
+-------------------------+---------+------------------------+-----------------+
| SmallWorld_release_test | Success | finished successfully. | 987987491       |
+-------------------------+---------+------------------------+-----------------+
I1226 22:03:56.790000 3346280 test_driver.py:143  UNKNOWN     ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework
I1226 22:03:56.792000 3346280 test_driver.py:160  UNKNOWN     ] Calling cleanup
I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385  UNKNOWN     ] Stopping launched jobs 1
I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager
```

Reviewed By: seemethere

Differential Revision: D33325936

fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e
2021-12-27 09:11:46 -08:00
911d527b87 Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339

When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message.

Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName .

Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want.

Code under scripts/shunting are just my own experimental code. I can split them out if requested.
ghstack-source-id: 146221879

Test Plan: buck test mode/opt //caffe2/test:jit

Reviewed By: gmagogsfm

Differential Revision: D33282878

fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d
2021-12-24 00:25:40 -08:00
aa9fbb9ae9 [JIT] check stack size after calling operator (#68788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68788

In debug mode, this should throw errors for ops where the wrong number ops is returned (i.e. the number of values left on the stack is different from the number shown in the schema)

Test Plan:
Run this in debug mode and verify that it doesn't throw an assert
```
import torch

class Thing(torch.nn.Module):
    torch.jit.export
    def en(self, x: torch.Tensor):
        return torch.add(x, 2.0)

    def forward(self, x: torch.Tensor, y: torch.Tensor):
        a = torch.mm(x, y)
        b = torch.nn.functional.gelu(a)
        c = self.en(b)
        return c.std_mean()

if __name__ == '__main__':
    unsc = Thing()
    thing = torch.jit.script(unsc)
    x = torch.randn(4, 4)
    y = torch.randn(4, 4)
    std, mean = thing.forward(x, y)
    print(std, mean)
    print(str(thing.forward.graph))
```

Reviewed By: gchanan

Differential Revision: D32625256

Pulled By: davidberard98

fbshipit-source-id: 61d5ec0c5a9f8b43706257119f4f524bb9dbe6f5
2021-12-07 11:43:50 -08:00
3e45739543 [PyTorch][JIT] Use stack.pop_back() instead of pop(stack) for DROP (#69326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69326

Looks like this really is slightly cheaper (see assembly diff screenshot in internal test plan). The problem is that `pop()` returns the value, so we have to spend instructions moving it out of the stack and then destroying it via a local.
ghstack-source-id: 144641680

Test Plan:
{F684148304}

CI

Reviewed By: zhxchen17

Differential Revision: D32812841

fbshipit-source-id: e9e43458d3364842f67edd43e43575a1f72e3cb0
2021-12-03 11:09:05 -08:00
2c84b010e6 [PyTorch] Use toObjectRef in JIT interpreter (#69324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69324

This slightly shrinks runImpl.

Before:
- Move pointer out of IValue
- Clear the IValue to none
- Do our thing with the Object
- destroy the intrusive_ptr on the C stack
- destroy the IValue on the C stack (even though it was cleared to None, the destructor has to run anyway)

After:
- Grab the pointer out of IValue
- Do our thing with the Object
- Decref the pointer in the IValue on the JIT stack as we assign over it

We should be saving at least the memory traffic from clearing the IValue and possibly the dtor code as well.
ghstack-source-id: 144638920

Test Plan:
Inspected assembly to verify shorter runImpl

Tried to microbenchmark (D32809454) but can't show a difference.

Reviewed By: gchanan

Differential Revision: D32812252

fbshipit-source-id: a3689f061ee51ef01e4696bd4c6ffcbc41c30af5
2021-12-03 11:07:16 -08:00
4eb772fde6 Refactor saving jit::Module to mobile .pt in 2 steps: (#66494)
Summary:
1. is to convert Function -> mobile::Function
2. is to serialize mobile::Function

This also opens opportunity to create mobile::Module without saving/reloading

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494

Reviewed By: zhxchen17

Differential Revision: D32293022

Pulled By: qihqi

fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d
2021-11-17 12:02:20 -08:00
7cd62621fb [PyTorch] Adopt faster Tuple::create (#65381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65381

The previous diff adds a way to make Tuples of size 3 or less
more efficiently. This diff makes it easier to hit that path and
updates a bunch of callsites to hit it.
ghstack-source-id: 142065832

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D31069538

fbshipit-source-id: d04da3709594ed68ab1c0a1471f8cffd8d001628
2021-11-02 10:10:31 -07:00
5ef62c88a9 [jit] Replace get_executor() with call() in abstract Function interface. (#65969)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65969

ghstack-source-id: 141759210

Test Plan: no behavior change.

Reviewed By: anjali411

Differential Revision: D31326151

fbshipit-source-id: 201f6dc4c23fdb2531f6b8c73d26127f9e212de4
2021-10-28 13:11:29 -07:00
72803dbcfd [caffe2] Fix invalid vector accesses and polar() call (#66757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66757

`InterpreterStateImpl::run()` gets the number of outputs from the current frame, but by the time the continuation completes, the frame is gone, so we're calling `front()` on an empty vector. This works out in practice (data is still there) but it is technically undefined behavior and could break in the future.

Also, `std::polar()` expects its argument to be non-negative, but `c10::polar()` does not, so implement it explicitly (implementation is the same as libstdc++).

Test Plan: JIT tests pass.

Reviewed By: zhxchen17

Differential Revision: D31715587

fbshipit-source-id: 98abcc10c2742887af866d8e70169a0187c41d33
2021-10-19 00:29:54 -07:00
8d5b95019d [PyTorch Edge] Support default args with out arg, flag off (#63540)
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540

ghstack-source-id: 137211562

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```

Reviewed By: raziel, iseeyuan, tugsbayasgalan

Differential Revision: D30414156

fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f
2021-09-02 01:36:16 -07:00