Commit Graph

139 Commits

Author SHA1 Message Date
9fff8155c3 [2/N] Fix clang-tidy readability checks (#164652)
This PR applies clang-tidy readability checks to jit sources and all headers in the code base.
`readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652
Approved by: https://github.com/Skylion007
2025-10-06 01:06:01 +00:00
2c5ed6e7c0 Revert "[2/N] Fix clang-tidy readability checks (#164652)"
This reverts commit 3c5ca685d6f5b6f3971c0cd20a054aa355610419.

Reverted https://github.com/pytorch/pytorch/pull/164652 on behalf of https://github.com/izaitsevfb due to need to revert due to a conflict with revert of https://github.com/pytorch/pytorch/pull/162659 ([comment](https://github.com/pytorch/pytorch/pull/164652#issuecomment-3369346707))
2025-10-05 21:36:57 +00:00
3c5ca685d6 [2/N] Fix clang-tidy readability checks (#164652)
This PR applies clang-tidy readability checks to jit sources and all headers in the code base.
`readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652
Approved by: https://github.com/Skylion007
2025-10-05 07:05:11 +00:00
cyy
7c1f627828 Fix 'dllimport attribute ignored on inline function' (#157670)
There are lots of warnings in builds:
```
 2025-07-05T16:59:46.9208806Z C:\actions-runner\_work\pytorch\pytorch\build\aten\src\ATen\core\TensorBody.h(5043,29): warning: 'at::Tensor::less_' redeclared inline; 'dllimport' attribute ignored [-Wignored-attributes]
2025-07-05T16:59:46.9209030Z  5043 | inline at::Tensor & Tensor::less_(const at::Scalar & other) const {
2025-07-05T16:59:46.9209104Z       |                             ^
2025-07-05T16:59:46.9209671Z C:\actions-runner\_work\pytorch\pytorch\build\aten\src\ATen\core\TensorBody.h(5048,29): warning: 'at::Tensor::less_' redeclared inline; 'dllimport' attribute ignored [-Wignored-attributes]
2025-07-05T16:59:46.9209860Z  5048 | inline at::Tensor & Tensor::less_(const at::Tensor & other) const
```
This PR has fixed them and turned the warning into an error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157670
Approved by: https://github.com/albanD
2025-07-07 16:57:48 +00:00
541584d22e [BE][8/16] fix typos in torch/ (torch/csrc/jit/) (#156318)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156318
Approved by: https://github.com/albanD
2025-07-02 22:55:29 +00:00
cyy
45efa1aaa8 [3/N] Use internal linkage in C++ files (#151297)
Follows #151070.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151297
Approved by: https://github.com/Skylion007
2025-05-05 17:48:39 +00:00
cyy
e872c38eb3 Remove cppcoreguidelines-pro-type-member-init_fix suppression (#148638)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148638
Approved by: https://github.com/zou3519
2025-04-02 01:33:20 +00:00
fc5913b6bf [StaticRuntime] Fix a bug that memory planner ignores subblocks (#146728) (#146855)
Summary:

When Static Runtime graph node has sub-blocks, the memory planner does not consider sub-blocks' inputs as a node's input in memory planner. As the result, such nodes' inputs' lifetime is incorrect and corresponding tensor memory is released earlier than required and causes errors.

Differential Revision: D69195886

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146855
Approved by: https://github.com/swolchok
2025-02-11 13:59:54 +00:00
542f7c8383 Eliminate C10_NODISCARD (#138336)
Test Plan: Sandcastle

Reviewed By: swolchok

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138336
Approved by: https://github.com/Skylion007
2024-10-19 02:54:06 +00:00
cyy
043e41f4f4 [10/N] Use std::nullopt and std::make_optional (#132364)
Follows #130674
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132364
Approved by: https://github.com/ezyang
2024-08-01 07:02:35 +00:00
cyy
ddd539ba6c [6/N] Fix clang-tidy warnings in jit (#131986)
Follows  #131969
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131986
Approved by: https://github.com/ezyang
2024-07-29 00:49:08 +00:00
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
4c58f2b675 [PyTorch] Use uint32_t for ProcessedNode::num_outputs (#121335)
We already use uint32_t for indexing, and the notion of a single graph node with more than four billion outputs stretches credulity.

Differential Revision: [D54598821](https://our.internmc.facebook.com/intern/diff/D54598821/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121335
Approved by: https://github.com/Skylion007
2024-03-07 21:15:05 +00:00
cyy
47a2e6b6b8 Fix C++20 build (#112333)
Currently C++20 fails because of incorrect template initialization order. This PR adjusted the order of theses classes and a constructor to address the issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112333
Approved by: https://github.com/albanD
2024-02-13 05:10:19 +00:00
ad8aef0f98 [BE] [3/N] Use nested namespaces (#110314)
Mostly in torch/csrc/jit/runtime and in `ATen/cuda/`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110314
Approved by: https://github.com/seemethere
2023-09-30 02:23:48 +00:00
cyy
dee100945e [2/N] Move c10::variant to std::variant (#109723)
This PR moves most of c10::variant calls to std::variant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109723
Approved by: https://github.com/ezyang
2023-09-24 02:47:43 +00:00
39ff80125f Add support for an operator level thread local observer (#108822)
Summary: Add support for an operator level thread local observer

Test Plan: Verified the interception as part of a pytorch model evaluation with static runtime.

Differential Revision: D49082250

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108822
Approved by: https://github.com/davidberard98
2023-09-08 19:32:03 +00:00
e7874eea7a fix the use of incomplete vector<T> for C++20 compatibilities (#93978)
Avoid referring to std::vector<T> members and constructor/desctructors when T is incomplete.

Referring to incomplete members is [not legal](https://timsong-cpp.github.io/cppwp/n4868/vector#overview-4) according to the C++ standard.

Non-noexcept constructors need access to members' destructors. As of C++20, std::vector's destructor is constexpr and so forcefully requires a complete type for the vector's elements.

These issues cause build errors in newer toolchains under c++20 mode.

Fix them by moving code that needs complete types to a different place where the type is already defined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93978
Approved by: https://github.com/Skylion007
2023-04-04 07:47:43 +00:00
d70f9c7888 Fix typo under torch/csrc/jit/runtime directory (#97243)
This PR fixes typo in comments and messages under `torch/csrc/jit/runtime` directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97243
Approved by: https://github.com/davidberard98
2023-03-29 20:17:10 +00:00
57bb5b159d [static-runtime] one more attempt to improve crash log readability (#96903)
Summary:
* add human readable type and ivalue printout
* fix internal linter warnings

Test Plan:
error message now looks like e.g.
```
E0315 16:27:32.409082 422313 ExceptionTracer.cpp:222] exception stack complete
terminate called after throwing an instance of 'c10::Error'
  what():  List[int] is not a subtype of List[int]; schema arg name: 'split_sizes', ivalue: [1, 1]
```

Differential Revision: D44112297

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96903
Approved by: https://github.com/davidberard98
2023-03-17 17:56:26 +00:00
cc798f1a4f [PyTorch] add c10/util/FbcodeMaps.h (#96359)
Allow us to use folly maps in fbcode and std maps for compatibility in OSS, extending what static runtime is already doing.

Differential Revision: [D43926670](https://our.internmc.facebook.com/intern/diff/D43926670/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96359
Approved by: https://github.com/ezyang
2023-03-10 02:18:16 +00:00
e57a694d77 Add some missing moves to torch jit passes (#92317)
Add some missing moves in torch/jit/passes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92317
Approved by: https://github.com/ezyang
2023-01-22 16:33:08 +00:00
4f34cd6d1e Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032)
Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases.

All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed.
c10/util/logging_is_not_google_glog.h
c10/util/logging_is_google_glog.h

Fixes https://github.com/pytorch/pytorch/issues/81415

cc @miladm @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032
Approved by: https://github.com/soumith, https://github.com/miladm
2022-07-26 01:20:44 +00:00
7f8e852dff [Static Runtime] Support Futures in Static Runtime Engine (#80162)
Summary: - Static Runtime now exports runAsync() API which returns an intrusive_ptr to c10:Future type

Differential Revision: D37385849

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80162
Approved by: https://github.com/mikeiovine
2022-06-28 23:57:26 +00:00
3afc802c5a [Static Runtime] Add Metadata to ProcessedNode depending upon the op type (#79961)
Summary:
- ProcessedNodeMetadata class wraps the possible metadata for ProcessedNode. Depending upon the nature of op, processedNode can have one of the below possibilities of metadata:
1. prim::If/prim::Loop ops contains block_runners_ as their metadata
2. prim::fork op contains TaskLauncher (std::function) responsible for execution of forked subgraph

Differential Revision: D37320704

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79961
Approved by: https://github.com/mikeiovine
2022-06-24 06:03:06 +00:00
ca7ab1708b [Static runtime] Pass parent graph metadata to forked subgraphs (#79578)
Summary:
- Remove creation of new StaticModuleOptions for the forked subgraph. Use parent graph's options for creation of runtime for forked subtraph
- StaticRuntimeMetdata extends CustomClassHolder which can be casted to IValue and attached to IR node's attributes.

Differential Revision: D37159684

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79578
Approved by: https://github.com/mikeiovine
2022-06-16 20:35:46 +00:00
4055d1f653 [SR] Fix StaticRuntime move ctor (#74927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74927

The move ctor was broken because `BlockRunner` stores a reference to `values_`. When moving runtime instances, the pointer to the root block would be moved, but the reference inside it would not be updated.

Pass `BlockRunner` a raw pointer to the heap-allocated IValues instead to avoid this issue.
ghstack-source-id: 153168602

Test Plan: New unit test/CI

Reviewed By: navahgar

Differential Revision: D35228467

fbshipit-source-id: 04e198b39f898b82677a0e41e1cdf00c2b0c09f3
(cherry picked from commit 03e2c591ac3a907d68025eae9500ed7226dec17e)
2022-04-07 02:16:37 +00:00
bbc59ff2bf [Static Runtime] Introduce StaticNodeInfo to store ProcessedNode's data independent from runtime instances (#73536)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73536

Currently `StaticNodeInfo` class assumes 2 distinct roles that are not too obvious:

1) "template" that contains metadata of an actual executable node by runtime. owned by `StaticModule`

2) fully instanced ones that are owned by `StaticRuntime`.

We currently merge these two usecases into one class, that can be error-prone in case illegal copying happens uncontrollably. Currently, we only copy objects of kind (1) into objects of kind (2) when a `StaticRuntime` instance is created.

To address ths issue, this change introduces `StaticNodeInfo`, a separate class, to distinguishes the aforementioned two usecases in the code more clearly. With this `StaticNodeInfo` is for (1) and `ProcessedNode` is now for (2).

Test Plan: Existing tests

Reviewed By: mikeiovine

Differential Revision: D33985600

fbshipit-source-id: 0c79cea2bf982dd956a35f48eaf6027e5b6e390c
(cherry picked from commit 0d8acc4a2b6eeb3e4af3ad2c99f4cd667680f8df)
2022-03-02 22:33:32 +00:00
c62de0ac15 [Static Runtime] [Code Cleanup] Use SROperator for operators' function type (#73450)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73450

This change uses `SROperator` for operators' function type

Test Plan: N/A

Reviewed By: mikeiovine

Differential Revision: D34483246

fbshipit-source-id: ed544bb91b676ed08983dc8dc78cedd0f77d499f
(cherry picked from commit eb9de3ad8de043990c02f30ffa48a29c8e5e81f2)
2022-03-01 02:30:48 +00:00
2724e4c039 [Static Runtime] Do not replace with copy variants if TE fuser is enabled (#72946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72946

The passes to replace with copy variants are run after TensorExpr fusion. Due to this the resulting graph does not conform to the assumptions made in the fuser.

So, even if these flags `use_copy_variants`, `use_maybe_copy_variants` are turned on, the corresponding passes will not be executed if TensorExpr fusion is enabled.

ghstack-source-id: 149429753

Test Plan: Tested locally.

Reviewed By: mikeiovine

Differential Revision: D34283842

fbshipit-source-id: 74edea517a00c85dff0319f9c8b3ac8befe09018
(cherry picked from commit 3798af7f1b8c9b3c072862f58ebf16af6294db14)
2022-02-18 18:34:50 +00:00
d1c5f9e439 [JIT][SR] Introduce prim::IfThenElse (#72587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72587

This pattern frequently appears in a few graphs:

```
%result = prim::If(%condition)
  block0():
    -> (%a)
  block1():
    -> (%b)
```

This is slow, particularly in static runtime. Static runtime creates memory planners/block runners for each sub-block, which eats up a lot of memory and introduces a lot of extra overhead for this relatively simple operation.

This diff introduces a new op that replaces nodes like the above with a single op meant to act like a ternary operator:

```
%result = prim::IfThenElse(%condition, %a, %b)
```

Test Plan: New unit tests

Reviewed By: eellison

Differential Revision: D34091789

fbshipit-source-id: eb6a8c460c39b4c019a1f4ab1f3f1e5b6edc400c
(cherry picked from commit 0f1b335e5b83f402bda2dcdd9ecb411e0b67c651)
2022-02-17 18:22:48 +00:00
5ea74b4996 [Static Runtime] Remove ProcessedNode::num_outputs_ (#72592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72592

Only code paths that are not perf-critical read `ProcessedNode::num_outputs_` and also its static feature of the op that `ProcessedNode` instance is executing.

Therefore, it's better to move `ProcessedNode::num_outputs_` into `ProcessedFunction::num_outputs_` and let `ProcessedNode` access it via `ProcessedNode::fn_` for its occasional use. Note that this prevents duplicating num_outputs_ per node & per Static Runtime instance since `ProcessedFunction` instances are shared across all runtime instances.

It's confirmed that this change reduces the `sizeof(ProcessedNode)` by 14% from local instrumentation as follows:

- Before
-- sizeof(ProcessedNode): 56

- After
-- sizeof(Processednode): 48

Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: mikeiovine

Differential Revision: D33984792

fbshipit-source-id: e29ffc97b799e679215f42e1e85cd3fcd7e88983
(cherry picked from commit 0f7003f4dfd6473a70355ca3c6f51498abf1d7be)
2022-02-17 05:09:17 +00:00
d51d2bd608 [SR] Add a flag to disable copy variants (#71102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71102

This graph pass is causing a major perf regression on some models. Ideally we would introduce maybe_copy variants for all these ops. But since those are tricky to write, I've introduced a flag to just turn the pass off for now.
ghstack-source-id: 148541673

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: navahgar

Differential Revision: D33510080

fbshipit-source-id: bb4847f26561197ea5e6bbad0a4d25db4ef468eb
(cherry picked from commit 8f333d3e8138e2a7ba04bea7509ad84dd97844eb)
2022-02-08 02:43:07 +00:00
3dce68fdf4 [SR] Eliminate op_name_ in ProcessedNode (#71986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71986

To address concerns over space increase from control flow.

`op_name_` was only stored as a minor optimization to avoid name lookup during logging, we can safely get rid of it. Thanks to the sampling mechanism, `get_op_name()` is called very infrequently, so this shouldn't cause too much of a regression
ghstack-source-id: 148086244

Test Plan: CI

Reviewed By: d1jang

Differential Revision: D33821005

fbshipit-source-id: 6f74eb30a54a046ca90768aebbcde22e8c435f35
(cherry picked from commit 361ba32e97dbd130938bae10b5159730822c518c)
2022-02-01 21:22:26 +00:00
4b789df68b [SR] Add BlockRunner and handle sub-blocks (#69834)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69834

* Modify the `StaticModule` constructor to handle index initialization for sub-blocks.
* Add a new class `StaticRuntimeBlockRunner`. This class is almost exactly like what we've been calling `StaticRuntime` up to this point, except that it does not own a `values_` array. All `StaticRuntimeBlockRunners` hold an unowned reference to a `values_` array owned by `StaticRuntime`. This is a useful abstraction for implementing control flow - it gives us a way for sub-blocks to look up values from surrounding scopes!
ghstack-source-id: 148086245

Test Plan: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: d1jang

Differential Revision: D33028039

fbshipit-source-id: 4f01417bad51a0cf09b1680a518308da647be1f6
(cherry picked from commit 3a9feffd929869120c717d35aa55aad8a382783d)
2022-02-01 17:20:55 +00:00
3a77fb244b [PyTorch][Static Runtime] Delete cleanup_activations option (#71501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71501

This option disabled the memory planner. Supporting it would require us to add multiple versions of ops that borrow their inputs (because they rely on the memory planner to support that), and I'm not aware of a particular need to continue supporting it.
ghstack-source-id: 147385569

Test Plan: CI, rerun broken test from task

Reviewed By: mikeiovine

Differential Revision: D33669290

fbshipit-source-id: ecb01995891aecb5f4d0da2d9c51eed1f8fe489a
(cherry picked from commit 5e4fefb109b6c92d59fc7e24d69f1b6b2780c776)
2022-01-21 18:15:43 +00:00
10b40acbdb [PyTorch][Static Runtime] Fast aliasing in select_tensor by manual borrowing (#68122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68122

See code comments for details; in brief, we repurpose support
for borrowing `Tensor`s in `MaybeOwned` to make the `select_tensor`
output a borrowed IValue that we have to clean up manually.

If we have any other ops that always create a new reference to an
existing Tensor, we can easily apply this same optimization.
ghstack-source-id: 146482212

Test Plan:
See perf measurements on ctr_mobile_feed local_ro net for this stack: P467203421
(local is neutral: P467267554)

--do_profile output for local_ro (updated Dec 10):

```
swolchok@devbig032 /d/u/s/f/fbcode> tail Stable.profile.txt
First iter time: 0.989023 ms
Number of operators: 2037
Total number of managed tensors: 1597
Total number of managed output tensors: 0
Total number of unmanaged values: 2568
Number of unmanaged values requiring cleanup: 2568
Number of unmanaged values not requiring cleanup: 0
Total memory managed: 50368 bytes
Total number of reused tensors: 1010
Total number of 'out' variant nodes/total number of nodes: 2001/2037 (98.2327%)
swolchok@devbig032 /d/u/s/f/fbcode> ttail TMCC^C
swolchok@devbig032 /d/u/s/f/fbcode> tail TMCOFastAliasing.profile.txt
First iter time: 0.994703 ms
Number of operators: 2551
Total number of managed tensors: 1146
Total number of managed output tensors: 0
Total number of unmanaged values: 4047
Number of unmanaged values requiring cleanup: 3533
Number of unmanaged values not requiring cleanup: 514
Total memory managed: 50048 bytes
Total number of reused tensors: 559
Total number of 'out' variant nodes/total number of nodes: 2001/2551 (78.4398%)
```

for local: (also Dec 10):

```
==> Stable.local.profile.txt <==
First iter time: 9.0909 ms
Number of operators: 1766
Total number of managed tensors: 1894
Total number of managed output tensors: 0
Total number of unmanaged values: 2014
Number of unmanaged values requiring cleanup: 2014
Number of unmanaged values not requiring cleanup: 0
Total memory managed: 4541440 bytes
Total number of reused tensors: 847
Total number of 'out' variant nodes/total number of nodes: 1744/1766 (98.7542%)

==> TMCOFastAliasing.local.profile.txt <==
First iter time: 7.5512 ms
Number of operators: 2378
Total number of managed tensors: 1629
Total number of managed output tensors: 0
Total number of unmanaged values: 3503
Number of unmanaged values requiring cleanup: 2891
Number of unmanaged values not requiring cleanup: 612
Total memory managed: 3949312 bytes
Total number of reused tensors: 586
Total number of 'out' variant nodes/total number of nodes: 1744/2378 (73.3389%)
```

Reviewed By: hlu1

Differential Revision: D32318674

fbshipit-source-id: a2d781105936fda2a3436d32ea22a196f82dc783
2022-01-04 22:36:13 -08:00
4d8fc8693c [PyTorch][Static Runtime] Support memory planning for torch.to() w/o requiring copying (#67223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67223

ghstack-source-id: 146482215

Test Plan:
See perf measurements on ctr_mobile_feed local_ro net for this stack: P467203421
(local is neutral: P467267554)

Reviewed By: hlu1

Differential Revision: D31776259

fbshipit-source-id: f84fcaa05029577213f3bf2ae9d4b987b68480b3
2022-01-04 22:36:10 -08:00
682fab19d4 [SR] verify_and_correct_memory_overlap handles tensor lists (#69774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69774

We recently ran into a nasty bug caused by incorrect schema annotations on an `aten::split` overload. `verify_and_correct_memory_overlap` is supposed to prevent crashes in this scenario, but it didn't because it did not handle `Tensor[]` outputs.

This change extends the memory correction mechanism to handle tensor lists.
ghstack-source-id: 146152478

Test Plan: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: hlu1

Differential Revision: D33022494

fbshipit-source-id: 8d1d41ca1d4fd5dfb7c8a66028c391ba63551eb0
2021-12-22 17:18:18 -08:00
a6f953156e [StaticRuntime] Add TensorExpr fusion with dynamic shapes in SR (#69475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69475

This diff adds TensorExpr fusion with dynamic shapes in SR. This includes tracing the input graph with sample inputs, and then performing fusion with generalization to get fused graphs with dynamic shapes.
ghstack-source-id: 146059043

Test Plan:
```
buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test
```

Reviewed By: d1jang

Differential Revision: D32320088

fbshipit-source-id: 397f498878ddfcee9dad7a839652f79f034fefe3
2021-12-21 12:41:02 -08:00
91da2d5fa1 [StaticRuntime] Refactor StaticModule to pass in sample inputs (#69473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69473

This diff refactors StaticModule and its uses to pass in sample inputs. These inputs need to be passed into the constructor because they are need to perform TensorExpr fusion before other optimizations are performed on the input graph.
ghstack-source-id: 146059041

Test Plan: buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test

Reviewed By: donaldong

Differential Revision: D32320084

fbshipit-source-id: b8bd46d442be4cc90ca60f521e0416fdb88eea60
2021-12-21 11:20:25 -08:00
181120f7d7 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33229251

fbshipit-source-id: 3a69bb459fa0a65888d6f9c8e70b5de032ddad97
2021-12-19 16:38:25 -08:00
ef70174f2e Separate c10::Symbol header from list of interned strings (#69406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69406

Most files that include `interned_strings.h` don't actually depend on
anything generated from `FORALL_NS_SYMBOLS` yet because they're in a
single file you need to recompile whenever a new symbol is added. Here
I move the class definition into a separate file so this doesn't
happen.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32923637

Pulled By: albanD

fbshipit-source-id: 6e488cbfcfe2c041a99d9ff22e167dbddf3f46d7
2021-12-19 14:52:26 -08:00
873585da2b [SR] Improve set_inputs (#69087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69087
This diff includes a variety of improvements to `set_inputs` to unify behavior with `torch::jit::Module`:

1. Eliminate code duplication between rvalue/lvalue overloads
2. Add type checks
3. Make input length check a `TORCH_CHECK` instead of a debug check - we have to fail when the wrong number of inputs are passed.
4. `schema` now always includes `self`, even if we release `module_`. This is consistent with `torch::jit::Module`.|
ghstack-source-id: 145599837

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D32711705

fbshipit-source-id: fe97c10b4f03801ba59868b452e7d02b26b3106b
2021-12-15 09:31:19 -08:00
c6c3b43498 [SR][easy] Accessors for value array offsets (#69755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69755

Per swolchok's suggestion on D32609915 (1c43b1602c). Hide the value offset indices behind accessors to provide more flexibility if we ever decide to change the layout of the values array.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D32838145

fbshipit-source-id: cf805c077672de4c2fded9b41da01eca6d84b388
2021-12-13 15:31:39 -08:00
a5996a6857 [SR] Wrap check_for_memory_leak with DCHECK (#69588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69588

Code cleanup

Reviewed By: mikeiovine

Differential Revision: D32938333

fbshipit-source-id: d15dc405b281411c4c3c27a1dabf82f430c3ed08
2021-12-09 22:11:21 -08:00
9aa1b3e396 [Static Runtime] [Code Cleanup] Encapsulate function objects within ProcessedFunction (#69595)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69595

This changes encapsulates `function` object in `ProcessedFunction` objects instead of exposing it unnecessarily just for executing it.

Test Plan: Existing tests

Reviewed By: mikeiovine

Differential Revision: D32908341

fbshipit-source-id: 5ff4951cbe276c5c6292227124d9eec1dd16e364
2021-12-09 15:11:03 -08:00
1c43b1602c [SR] Scope exit guard for memory planner deallocation (#68795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68795

This change improves static runtime exception safety. Added a scope exit guard that invokes `MemoryPlanner::deallocate` in its destructor.

Caveat: we have to be really careful with the exception behavior of `MemoryPlanner::deallocate` and `MemoryPlanner`'s constructor, because they're now both potentially called in the destructor of the scope exit guard. Letting exceptions potentially escape destructors is playing with fire since 1) the destructor of `Deallocator` is (implicitly) `noexcept`, 2) even if it wasn't, `std::terminate` will be called if an exception escapes and the stack is already unwinding. To get around this, we wrap the deallocation stuff in a try/catch. If deallocation throws, then we simply reset all of the memory planner stuff and carry on.
There's a catch: the code path that we take when handling the deallocation exception can't throw. However, this code path is much simpler than memory planner construction/deallocation, so it's much easier to manually audit the correctness here.

Test Plan:
**New unit tests**

`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D32609915

fbshipit-source-id: 71fbe6994fd573ca6b7dd859b2e6fbd7eeabcd9e
2021-12-08 16:41:52 -08:00
008469c5e2 [SR] Simplify memory re-use algorithm (#68302)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68302

Implement the new memory re-use algorithm. It’s roughly based on the c2 one, but after going through many iterations it may not be a 1:1 port anymore. Also deleted the old liveness analysis.

Test Plan:
## **Re-use metrics**

`inline_cvr` (294738512_58)
**Before**
* `local`
```
Total number of managed tensors: 2660
Total number of managed output tensors: 0
Total number of unmanaged values: 3041
Total memory managed: 4601984 bytes
Total number of reused tensors: 1183
```
* `local_ro`
```
Total number of managed tensors: 1412
Total number of managed output tensors: 0
Total number of unmanaged values: 2677
Total memory managed: 29696 bytes
Total number of reused tensors: 959
```

**After**
* `local`
```
Total number of managed tensors: 2660
Total number of managed output tensors: 0
Total number of unmanaged values: 3041
Total memory managed: 4520000 bytes
Total number of reused tensors: 1198
```
* `local_ro`
```
Total number of managed tensors: 1412
Total number of managed output tensors: 0
Total number of unmanaged values: 2677
Total memory managed: 29120 bytes
Total number of reused tensors: 963
```

Reviewed By: hlu1

Differential Revision: D32370424

fbshipit-source-id: 06a8e0a295ed7a2b4d14071349c1f1e975f746bf
2021-12-07 13:25:42 -08:00
c97dc9286d Revert D32780415: [Static Runtime] Move implementation details from impl.h into internal.h
Test Plan: revert-hammer

Differential Revision:
D32780415 (999e93e6a8)

Original commit changeset: 119b7aedbf56

fbshipit-source-id: 1aa777e8c1854ab27e86bc625188f7170097fac8
2021-12-04 19:44:07 -08:00