pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Raghavan Raman	31584d065e	[Static Runtime] Added NNC implementation for signed log1p kernel. (#65387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65387 Added a customized NNC implementation for signed log1p kernel and enabled the fusion pass that adds the fused signed log1p op. Also, added a SR microbenchmark for this kernel which shows the performance improvement. Without fusion: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------- BM_signed_log1p/16 1953 ns 1953 ns 358746 BM_signed_log1p/64 2049 ns 2049 ns 342145 BM_signed_log1p/512 3291 ns 3291 ns 214342 BM_signed_log1p/4096 15559 ns 15559 ns 44420 BM_signed_log1p/32768 101936 ns 101935 ns 6843 BM_signed_log1p/65536 194792 ns 194789 ns 3615 ``` With NNC fusion: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------- BM_signed_log1p/16 369 ns 369 ns 1896179 BM_signed_log1p/64 497 ns 497 ns 1406995 BM_signed_log1p/512 1618 ns 1618 ns 430209 BM_signed_log1p/4096 11327 ns 11326 ns 61463 BM_signed_log1p/32768 84099 ns 84086 ns 8325 BM_signed_log1p/65536 166531 ns 166510 ns 4186 ``` This clearly shows >15% improvement in performance of this kernel with NNC fusion. On inline_cvr local model, there is a small improvement in terms of profiled time spent on ops: without fusion: `0.9%` (computed by adding the % spent on all the 4 ops involved) with NNC fusion: `0.55%` Test Plan: `buck test mode/opt-clang //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p` Also, did the accuracy test with inline_cvr as described here, https://fb.quip.com/qmdDAJzEmPtf, on the full size model (285298536_1) ``` get 57220 prediction values get 57220 prediction values max_error: 0 total: 0 ``` Reviewed By: hlu1 Differential Revision: D30609492 fbshipit-source-id: d2e68df580569a30ee61abb0ef18d2c4c56827bd	2021-09-22 15:53:33 -07:00
Hao Lu	ce101fed02	[PyPer] copy-free freeze_module (#65118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65118 Cloning the module can increase memory use. By freezing the module directly without cloning it first, we can avoid this memory usage increase. Reviewed By: eellison, movefast1990 Differential Revision: D30955053 fbshipit-source-id: 2feb738eddcf66aa68c92bf695cc05b57bd990f0	2021-09-20 17:25:10 -07:00
Mike Iovine	99e4ab5d44	[Static Runtime] Implement and enable variadic tuple unpack (#64934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64934 Add a new op `static_runtime::VarTupleUnpack` and a graph pass transforming graph sequences from: ``` %0, %1 = prim::TupleUnpack(%a) %2, %3 = prim::TupleUnpack(%b) ``` into: ``` %0, %1, %2, %3 = static_runtime::VarTupleUnpack(%a, %b) ``` The pass is only applied to contiguous blocks of `TupleUnpack` nodes. This is the most straightforward way to guarantee correctness, and it is sufficient for the models we care about. Test Plan: New unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- VarTupleUnpack` Reviewed By: d1jang Differential Revision: D30872109 fbshipit-source-id: 1ed4a7e201c532da28f703a3a50241c392a6c7e9	2021-09-20 10:36:11 -07:00
Don Jang	ae00075ac7	[Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65123 This change re-reverts D30883290 (`0e11454d19`). D30883290 (`0e11454d19`) broke the OSS build since the change in this change implicitly removed the default move constructor of `StaticRuntime`. ``` ep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:95:10: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime' Sep 15 15:39:57 return torch::jit::StaticRuntime(*smod); Sep 15 15:39:57 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor Sep 15 15:39:57 std::unique_ptr<MemoryPlanner> planner_; Sep 15 15:39:57 ^ Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here Sep 15 15:39:57 unique_ptr(const unique_ptr&) = delete; Sep 15 15:39:57 ^ Sep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:99:9: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime' Sep 15 15:39:57 auto sr = getStaticRuntime(); Sep 15 15:39:57 ^ ~~~~~~~~~~~~~~~~~~ Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor Sep 15 15:39:57 std::unique_ptr<MemoryPlanner> planner_; Sep 15 15:39:57 ^ Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here Sep 15 15:39:57 unique_ptr(const unique_ptr&) = delete; Sep 15 15:39:57 ^ Sep 15 15:39:57 2 errors generated. ``` This change fixes the issue by explicitly defining the default move constructor (courtesy of mikeiovine). Original Summary: This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp. `MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors. This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support. Test Plan: - Confirm that OSS build went well (See External Tests section). Reviewed By: mikeiovine Differential Revision: D30983292 fbshipit-source-id: a59f407fa1123527824157268111144a1bf58116	2021-09-17 13:32:01 -07:00
Don Jang	8241193d76	[Static Runtime] Introduce static_runtime::dict_unpack (#64771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64771 Test Plan: - Added `StaticRuntime.RemoveImmutableInputDictLookupsWithImmutableInputDict` - Added `StaticRuntime.RemoveImmutableInputDictLookupsWithMutableInputDict` - TBD: Perf impact measurement Reviewed By: mikeiovine Differential Revision: D30685083 fbshipit-source-id: 050a92ef3b3ed0fdc0ab7a13a4b5dbfede9342a9	2021-09-16 23:25:13 -07:00
Don Jang	3fb33b38b9	[Static Runtime] Check if outputs of a node do not overlap with each other (#63013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63013 This change enhances the current memory overlapping check to include outputs: the enhancement enforces a constraint that all outputs of a node should NOT overlap with each other since they are supposed to be update by a node at the same time, holding the node's outputs. This check will detect a problem like T97393697 immediately in debug mode. Test Plan: - Added a unittest `ProcessedNode.VerifyMemoryOverlapWithOverlappingOutputs` - Ran `inline_cvr` on ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench with this diff and confirmed that the checking condition holds true during the run. Reviewed By: hlu1 Differential Revision: D30211705 fbshipit-source-id: 994d8dace2422e2498e504eb61452a55739238c0	2021-09-15 08:38:05 -07:00
Mike Iovine	616fd9219d	[Static Runtime] Add sign/abs/lop1p/mul fusion pass (#64209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64209 Add a new fusion pass that turns transforms the following pattern: ``` graph(%input): %0 : Tensor = aten::sign(%input) %1 : Tensor = aten::abs(%input) %2 : Tensor = aten::log1p(%1) %res : Tensor = aten::mul(%0, %2) return (%res) ``` Into a single op: ``` graph(%input): %res : Tensor = static_runtim::signed_log1p(%input) return (%res) ``` The intent is to reduce the number of passes over the tensor. However, enabling this pass actually causes a performance regression, probably due to a lack of vectorization in the fused implementation. Because of this issue, this diff does not enable this pass. Followup: navahgar will add an NNC kernel which is faster than the the unfused version and enable this pass. We still need this version as a fallback since the NNC kernel will not support all dtypes. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p` Test passed with new graph pass disabled and enabled. Reviewed By: hlu1 Differential Revision: D30559929 fbshipit-source-id: e4e080cb2e6a705cfdde1fc98bee92b723f8132a	2021-09-02 08:31:40 -07:00
Ray Peng	09e610e36d	[Static Runtime] Out version for softmax (#64243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64243 Test Plan: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ... V0830 16:35:22.524479 613839 impl.cpp:1410] Switch to out variant for node: %5 : Tensor = aten::softmax(%a.1, %dim.1, %dtype.1) ... [ OK ] StaticRuntime.IndividualOps_Softmax (803 ms) ``` Reviewed By: hlu1 Differential Revision: D30656149 fbshipit-source-id: 115b7b4a75448fd6a5c526808080ca9a4251302c	2021-08-31 18:33:26 -07:00
Harut Movsisyan	3c15822f5f	[Static Runtime] Implement aten::nonzero out variant (#64126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64126 Test Plan: Confirm out variant is called: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: mikeiovine Differential Revision: D30617729 fbshipit-source-id: 752749638c8f467815efa57021cb3de5c728ab1b	2021-08-31 00:51:15 -07:00
Harut Movsisyan	1f16c22dc8	[Static Runtime] Implement aten::cumsum out variant (#64159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64159 Test Plan: Confirm out variant is called for both versions: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: mikeiovine Differential Revision: D30622819 fbshipit-source-id: a2c8c7f969dae5f507718fb3d513e1fb4f026736	2021-08-30 16:18:22 -07:00
Harut Movsisyan	e24c3644d8	[Static Runtime] aten::cat out version when it is not being replaced by prim::VarConcat (#64157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64157 UseVariadicCat optimization is not applied to aten::cat if list input to the op can not be moved to the position before op (https://fburl.com/diffusion/l6kweimu). For these cases we will need out version for SR. Test Plan: Confirm out variant is called: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: d1jang Differential Revision: D30598574 fbshipit-source-id: 74cfa8291dc8b5df4aef58adfb1ab2a16f10d90a	2021-08-30 09:42:38 -07:00
Harut Movsisyan	8af1407eab	[Static Runtime] Out version for torch.linalg.norm (#64070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64070 Test Plan: Confirm out variant is called for both versions: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: d1jang Differential Revision: D30595816 fbshipit-source-id: e88d88d4fc698774e83a98efce66b8fa4e281563	2021-08-29 21:00:11 -07:00
Don Jang	9f1f22b9bc	[Static Runtime] Add out variant of quantized::embedding_bag_byte_prepack (#64081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64081 This change add an out variant of `quantized::embedding_bag_byte_prepack`. Test Plan: - Added `ShapeInferenceTest.QEmbeddingBagByteUnpack`. - Observed ``` V0824 13:38:49.723708 1322143 impl.cpp:1394] Switch to out variant for node: %2 : Tensor = quantized::embedding_bag_byte_prepack(%input) ``` Reviewed By: hlu1 Differential Revision: D30504216 fbshipit-source-id: 1d9d428e77a15bcc7da373d65e7ffabaf9c6caf2	2021-08-27 10:53:23 -07:00
Harut Movsisyan	f2c47cf4db	[Static Runtime] Out version for fmod (#64046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64046 Test Plan: Confirm out variant is used: ``` > //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 V0826 23:31:30.321382 193428 impl.cpp:1395] Switch to out variant for node: %4 : Tensor = aten::fmod(%a.1, %b.1) ``` Reviewed By: mikeiovine Differential Revision: D30581228 fbshipit-source-id: dfab9a16ff8afd40b29338037769f938f154bf74	2021-08-27 03:05:06 -07:00
Don Jang	c90b3cb1da	[Static Runtime] Manage temporary Tensors for aten::layer_norm (#64078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64078 This change converts `aten::layer_norm -> output Tensor` to `static_runtime::layer_norm -> (output Tensor, temp1 Tensor, tmp2 Tensor)` to manage `tmp1` and `tmp2` Tensors by the static runtime. Currently the out-variant of `aten::layer_norm` creates two temporary Tensors inside it: ``` at::Tensor mean = create_empty_from({M}, X); at::Tensor rstd = create_empty_from({M}, X); ``` that the static runtime misses an opportunity to manage. This change puts them into (unused) output Tensors of a new placeholder op `static_runtime::layer_norm` so that the static runtime can mange them since the static runtime as of now chooses to manage only output tensors. Test Plan: - Enhanced `StaticRuntime.LayerNorm` to ensure that `static_runtime::layer_norm` gets activated. - Confirmed that the new op gets activated during testing: ``` V0825 12:51:50.017890 2265227 impl.cpp:1396] Switch to out variant for node: %8 : Tensor, %9 : Tensor, %10 : Tensor = static_runtime::layer_norm(%input.1, %normalized_shape.1, %4, %4, %5, %3) ``` Reviewed By: hlu1 Differential Revision: D30486475 fbshipit-source-id: 5121c44ab58c2d8a954aa0bbd9dfeb7468347a2d	2021-08-27 02:44:43 -07:00
Don Jang	cbfec02007	[Static Runtime] Add native op for aten::expand_as (#64024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64024 `aten::expand_as` creates a view of the input tensor. This change adds its native op implementation for the static runtime. Test Plan: - Added `StaticRuntime.IndividualOps_ExpandAs` Reviewed By: hlu1 Differential Revision: D30546851 fbshipit-source-id: e53483048af890bc41b6192a1ab0c5ba0ee2bdc0	2021-08-26 13:05:53 -07:00
Hao Lu	6fa646ad54	[StaticRuntime] Fix bug in HasInplaceOp (#63842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63842 Reviewed By: mikeiovine Differential Revision: D30506914 fbshipit-source-id: b2e358cfb991dacdb295b61bbc37beb36b73b852	2021-08-24 17:07:45 -07:00
Mike Iovine	7774a4e95b	[Static Runtime] Implement prim::VarStack out variant (#63579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63579 Provide a static runtime out variant implementation for the new op introduced in D30426232 (`1385f9fb12`). Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_VarStack` Reviewed By: navahgar Differential Revision: D30410525 fbshipit-source-id: bc59a3d8ad23e3d94561ec2dca9cc20687dbadf8	2021-08-24 09:44:29 -07:00
Don Jang	84890aae35	[Static Runtime] Add an out variant op for aten::abs (#63675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63675 This change adds an out variant implementation for `aten::abs`. Test Plan: - Observed `V0820 14:14:08.880342 101788 impl.cpp:1394] Switch to out variant for node: %3 : Tensor = aten::abs(%a.1)` - Perf impact: TBD Reviewed By: hlu1 Differential Revision: D30461317 fbshipit-source-id: 0c0230bd40afe463ae1ccb222c2a1207ebcf4191	2021-08-23 16:25:10 -07:00
Hao Lu	b2a601ffe5	[Static Runtime] Implement out variant for fb::quantized_linear (#63635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63635 Reviewed By: ajyu Differential Revision: D30446234 fbshipit-source-id: 1ef014186ff725930a97d0159626f9233ee74030	2021-08-20 21:42:22 -07:00
Don Jang	913c1f83f4	[Static Runtime] Add native op for aten::detach (#63625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63625 This change adds a static runtime's native op implementation for `aten::detach` op. See the standard `aten::detach`'s implementation (https://codebrowser.bddppq.com/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp.html#_ZN2at6native6detachERKNS_6TensorE ) for comparison. Test Plan: - Added `StaticRuntime.IndividualOps_Detach`. - Observed ``` V0819 18:55:33.181188 3092034 impl.cpp:1398] Switch to native impl for node: %a.1 : Tensor = aten::detach(%input.1) ``` Reviewed By: hlu1 Differential Revision: D30443187 fbshipit-source-id: d6e0eadb1b817e0a126c4fc97526abc276ee8a17	2021-08-20 00:46:27 -07:00
Mike Iovine	47a9e8ff32	[Static Runtime] Support __getitem__ for lists (#63398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63398 This change provides a native `__getitem__` implementation for lists to avoid overhead associated with falling back to the JIT interpreter. Test Plan: Unit tests: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D30368464 fbshipit-source-id: e0e0971508cd5d9bcf6025606993dc24ecbf6764	2021-08-19 06:38:51 -07:00
Mike Iovine	9d9e7a8d72	[Static Runtime] Implement aten::append (#63350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63350 Add a native implementation for `aten::append`, the list append op. Test Plan: New unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Append` Reviewed By: hlu1 Differential Revision: D30326461 fbshipit-source-id: 0dbdf6cc82e78c7c36db39583256f6b87385e3d3	2021-08-17 13:40:18 -07:00
Mike Iovine	078b8004a6	[Static Runtime] Implement prim::TupleUnpack (#63243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63243 Add `prim::TupleUnpack` native op to static runtime. Test Plan: Unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D30306955 fbshipit-source-id: 21923d6cbd5545c144ac051b3d48b37ec6e610cf	2021-08-16 14:56:30 -07:00
Mike Iovine	3dcd785cac	[Static Runtime] Add tests for all aten ops (#62347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62347 This diff includes tests for all `aten` ops that did not already have test coverage. Test Plan: `buck test //caffe2/benchmarks/static_runtime/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D29968280 fbshipit-source-id: 768655ca535f9e37422711673168dce193de45d2	2021-08-09 12:09:59 -07:00
Rong Rong (AI Infra)	7f1b672b7a	Revert D29952381: [Static Runtime] Ensure that unittests only use out variants or native ops Test Plan: revert-hammer Differential Revision: D29952381 (`8737e17af2`) Original commit changeset: e60e70b80ccf fbshipit-source-id: 59dc2f920b7ceaf94ba8f5f36024e7cc710f6645	2021-08-04 14:25:11 -07:00
Don Jang	8737e17af2	[Static Runtime] Ensure that unittests only use out variants or native ops (#62335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62335 This change ensures that unittests only use out variants or native ops. - Our unittests currently assume that a graph fed to the static runtime correctly replaces an interpreter op for its corresponding out variant / native op, but it's not checked by the unittest. This change ensures that. - We relied on manual inspection of log messages to see if an out variant is used for a specific workload even for unittesting. This change frees us from doing that. - `aten::add` is excluded from this check since it's only enabled for an internal workload. Also some unittests are excluded by using `expect_interpreter_op = true` since they are written to use interpreter ops by design. Test Plan: Ran `buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest` successfully. Reviewed By: mikeiovine, hlu1 Differential Revision: D29952381 fbshipit-source-id: e60e70b80ccf45e91c6654b4ad53f92ffd5ab702	2021-08-04 11:37:15 -07:00
Mike Iovine	34f50c6e35	[Static Runtime] testStaticRuntime verifies that # of nodes is at least 2 (#62622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62622 This allows us to catch cases where an out variant is being tested but the test author forgot to call `.clone()` in the test script. More than 2 ops does not guarantee that the memory planner is being exercised, but less than 2 guarantees that it is not being used. Reviewed By: hlu1 Differential Revision: D30058050 fbshipit-source-id: 5bc053736f1cc6fd1ffcf8254bf38874ac18c34b	2021-08-03 15:55:57 -07:00
Raghavan Raman	b91a917616	[Static Runtime] Fixed another build failure in OSS due to test_utils.h (#62338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62338 Test Plan: Imported from OSS Reviewed By: d1jang Differential Revision: D29965744 Pulled By: navahgar fbshipit-source-id: cf3e54ac13432ea8afc4b718fac6c9768743d01b	2021-07-28 11:41:33 -07:00
Don Jang	68efa186cc	[static runtime] Implement aten::full (#62227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62227 Test Plan: Added `StaticRuntime.IndividualOps_Full` to cover the newly added code path. Reviewed By: hlu1 Differential Revision: D29923649 fbshipit-source-id: 722950137c35ae325590a670b97f03b395e8eac3	2021-07-28 09:50:27 -07:00
Mike Iovine	e1bee3eb30	[Static Runtime] Add missing unit tests for static runtime ops (#62238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62238 Added tests for the following ops: * `aten::mul` * `aten::nan_to_num` * `aten::stack` * `aten::relu` * `aten::tanh` Reviewed By: hlu1 Differential Revision: D29914217 fbshipit-source-id: 6a6c39629310e7131127e24fdce7253ccdf80340	2021-07-27 14:12:21 -07:00
Raghavan Raman	60070982d2	[Static Runtime] Fixed build failure in OSS due to test_utils (#62216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62216 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D29917514 Pulled By: navahgar fbshipit-source-id: 379863e6cd0b157de3bfa1482f5519b26654b3d2	2021-07-26 16:10:10 -07:00
Mike Iovine	6007ad3529	[Static Runtime] Refactor fb op tests to use testStaticRuntime (#62064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62064 `testStaticRuntime` was previously only available in `test_static_runtime.cc`. It has been moved to a common library `test_utils` to facilitate code re-use. This also lets us test dynamic shapes in `test_fb_operators` Reviewed By: hlu1 Differential Revision: D29858928 fbshipit-source-id: 68a94760166ddb745972b0f1fc24bed594937d1c	2021-07-26 08:25:10 -07:00
Hao Lu	78f7d8ccfa	[Static Runtime] Remove wrappers for aten::cat (#62067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62067 The wrapper for aten::cat is no longer needed after the variadic cat change in D29565344 (`ae58a4c45d`) . Also added a simple test to test dynamic shapes, i.e., input tensors in args2 are larger than in args1. Reviewed By: navahgar, mikeiovine Differential Revision: D29864600 fbshipit-source-id: 44a712c2e776815c09e0bf5631412149b81274b2	2021-07-23 20:33:41 -07:00
Hao Lu	cf3cc01f1d	[Static Runtime] Add is_frozen to StaticModule ctor (#62020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62020 Add is_frozen to StaticModule ctor so we can skip freezing in StaticModule. Reviewed By: ajyu, mikeiovine Differential Revision: D29807431 fbshipit-source-id: 7742e9f5c5ae9f442a9e4007c870a14fd8b4af20	2021-07-23 15:12:35 -07:00
Mike Iovine	ec4e6181e6	[Static Runtime] Fix broken test_static_runtime build (#62098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62098 The build was broken by D29821533 (`1d2ea76afb`). The `clamp` overloads used in `deep_wide.h` are no longer available in the `at::native` namespace. Use `at::cpu::clamp` and `at:🗜️:clip_out` (which should be an alias for clamp) instead. Reviewed By: hlu1 Differential Revision: D29880187 fbshipit-source-id: 210b6d2be8a8142e7af1a0ba07e55a95b1a77d25	2021-07-23 12:35:09 -07:00
Mike Iovine	2b0eddb0aa	[Static Runtime] Implement prim::isinstance and prim::TypeCheck (#61783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61783 Implement two new prim operators for static runtime: `isinstance` and `TypeCheck`. `isinstance` is very straightforward, but there were a few wrinkles with implementing `TypeCheck`: 1. There is no way to directly generate `TypeCheck` nodes from TorchScript, they are generated by the JIT at runtime. This makes testing a little difficult. I had to make some modifications to `testStaticRuntime` to allow for the use of IR and TorchScript tests. 2. The behavior of `prim::TypeCheck` as implemented here does not match up 1:1 with the version implemented in the interpreter! This is because grad mode is disabled in static runtime. Here's an example. IR is the same as the one included in this test, but with `requires_grad == 1` ``` graph(%a.1 : Tensor, %b.1 : Tensor): %t0 : Float(2, 2, strides=[2, 1], device=cpu, requires_grad=1), %t1 : Float(3, 3, strides=[3, 1]), %type_matched : bool = prim::TypeCheck[types=[Float(2, 2, strides=[2, 1], device=cpu, requires_grad=1), Float(3, 3, strides=[3, 1])]](%a.1, %b.1) return (%t0, %t1, %type_matched) ``` And in the test setup: ``` auto a = at::zeros({2, 2}, at::kFloat); a.to(at::kCPU); a.set_requires_grad(true); auto b = at::ones({3, 3}, at::kFloat); std::vector<IValue> args_correct = {a, b}; // prim::TypeCheck should be true with args_correct, // but we get false when using static runtime! ``` Reviewed By: hlu1 Differential Revision: D29743862 fbshipit-source-id: db1788f0f5de42bab42602e8cc24eee04cbcc280	2021-07-22 10:23:35 -07:00
Raghavan Raman	ae58a4c45d	[Static Runtime] Added a variadic cat operator (#61302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61302 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D29565344 Pulled By: navahgar fbshipit-source-id: 96f5f4546ec0e61eb7f87e016e026e7b62576248	2021-07-21 15:58:20 -07:00
Mike Iovine	28150fd0c8	[static_runtime] Implement aten::linear (#61595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61595 Add out variant wrapper for `aten::linear` in the static runtime Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D29684236 fbshipit-source-id: 94df6d7267b3f269b2cadf065f207648777147df	2021-07-16 08:55:43 -07:00
Don Jang	94965212e5	[static runtime] Use at::allclose to test NNC sigmoid (#61566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61566 This change uses `at::allclose` to compare results from sigmoid functions (CPU/NNC) instead of `Tensor::equals` due to numerical errors occurring between them. Test Plan: I confirmed that the flakiness of `StaticRuntime.Sigmoid` is gone with this change: ``` [djang@devvm1999.ftw0 ~/fbsource/fbcode] buck-out/gen/caffe2/benchmarks/static_runtime/static_runtime_cpptest -v 3 --gtest_filter=StaticRuntime.Sigmoid --gtest_repeat=100 &> output.txt [djang@devvm1999.ftw0 ~/fbsource/fbcode] grep PASSED output.txt \| wc 100 500 2100 ``` Reviewed By: bertmaher Differential Revision: D29671203 fbshipit-source-id: 99a7b16d18ea047c9aad444f36d8368f9d0b088d	2021-07-14 19:48:00 -07:00
Hao Lu	a07b08136f	[Static Runtime] Check unsupported up when enabling static runtime (#61613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61613 Reviewed By: ajyu, movefast1990 Differential Revision: D29663466 fbshipit-source-id: d819903b7227f534c0a4fffa5eeea2b5c0c04750	2021-07-14 02:13:51 -07:00
Don Jang	8a2c7d902f	[static runtime] Add DCHECK to ensure that outputs do not overlap with immutable inputs (#61301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61301 This change adds a `DCHECK` to ensure that outputs do not overlap with immutable inputs. Test Plan: Added unittests as follows: - `ProcessedNode.VerifyOutputsNotOverlappingWithImmutableInputsWithImmutableArguments` - `ProcessedNode.VerifyOutputsNotOverlappingWithImmutableInputsWithMutableArguments` Reviewed By: hlu1 Differential Revision: D29564158 fbshipit-source-id: bf14b4978ab544af79010cf724ed28202b4521cc	2021-07-12 18:04:05 -07:00
Don Jang	a74516d699	[static runtime] implement aten::log (#61393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61393 Test Plan: Added `StaticRuntime.IndividualOps_Log` ``` ... [ RUN ] StaticRuntime.IndividualOps_Log V0701 12:10:50.829100 3708165 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0 V0701 12:10:50.888468 3708165 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::log(%inp.1) V0701 12:10:50.889098 3708165 impl.cpp:1279] Switch to out variant for node: %a.1 : Tensor = aten::clone(%3, %2) ``` Reviewed By: hlu1 Differential Revision: D29511622 fbshipit-source-id: 819fd7d90c084609a060efeadb3015e35acac517	2021-07-08 18:25:35 -07:00
Don Jang	c2b0af2560	[static runtime] Implement aten::sign (#61154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61154 Test Plan: Added `StaticRuntime.IndividualOps_Sign` ``` [djang@devvm861.prn0 ~/local/fbsource/fbcode/caffe2] buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1 ... [ RUN ] StaticRuntime.IndividualOps_Sign V0701 12:05:31.836099 3679080 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0 V0701 12:05:31.898192 3679080 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::sign(%input.1) V0701 12:05:31.898849 3679080 impl.cpp:1279] Switch to out variant for node: %4 : Tensor = aten::clone(%3, %2) ``` Reviewed By: hlu1 Differential Revision: D29518603 fbshipit-source-id: e47b96d037fea639c41052f3849c82bbfa5f482a	2021-07-07 12:29:25 -07:00
Hao Lu	46595a9623	[Static Runtime] Add gflag to disable nnc and caffe2 math library (#61090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61090 Reviewed By: ajyu Differential Revision: D29479860 fbshipit-source-id: 2b53405f41d319f074c75d8923d97fd6a45fee4b	2021-07-01 00:01:37 -07:00
Han-Hsien Huang	812ed47caa	[Static runtime] Add unit tests to ops bmm and addmm (#61000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61000 Add unit tests to bmm and addmm operators in static runtime. Test Plan: buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest {F628935117} Reviewed By: hlu1 Differential Revision: D29459679 fbshipit-source-id: 5c7fa5c9b0675c1c84f3ae3110204d663255009c	2021-06-30 15:55:58 -07:00
Hao Lu	e3abccec8a	[Static Runtime] Remove output type constraints (#60669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60669 Test Plan: Added unit test to check for nested outputs. Reviewed By: ajyu Differential Revision: D29322025 fbshipit-source-id: a3c8d3c5f0bb7cf7fda4bc5f579adb8fa7bc3724	2021-06-26 02:36:27 -07:00
Hao Lu	1e31d26b1d	[Static Runtime] Fix bugs in static_runtime::to_copy (#60503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60503 Fixed a few issues in the static_runtime::to_copy impl: - fixed a bug with memory_format - copy strides when appropriate. This is necessary to make sure that the fbgemm path in the copy kernel gets hit. - fix the schema in the `ReplaceWithCopy` pass - add registration of `static_runtime::to_copy.other` Add more unit tests: - test dynamic shapes - test strided input tensor to `aten::to` - test alias case (same input/output) - test `to.other` Reviewed By: ajyu Differential Revision: D26838933 fbshipit-source-id: ec0d1a2deebe998fcfe8858e772e1ef429cb4522	2021-06-23 19:57:17 -07:00
Hao Lu	d200e9de26	[Static Runtime] Test for dynamic shapes in SR unit tests (#60579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60579 - Modify testStaticRuntime to take two sets of inputs so if the second set of inputs have bigger shapes, it would trigger memory allocations in resize_ calls. - Modify test scripts so that the output of the test op is managed by the memory planner, as explained in comments. Reviewed By: ajyu Differential Revision: D29221452 fbshipit-source-id: 09f0f7eb384dc8ca67594f1fa76e1e31392ee6ca	2021-06-23 19:56:05 -07:00
Ansha Yu	bf1c936e06	[static runtime] out variant for full_like (#58079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58079 Support full_like Test Plan: `buck test mode/dev caffe2/benchmarks/static_runtime:static_runtime_cpptest -- StaticRuntime.IndividualOps_FullLike` Test on regenerated local inline_cvr model ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/dec_6x/266377643_shrunk.predictor.disagg.local.regenerated.pt --pt_inputs=/data/users/ansha/tmp/adfinder/dec_6x/local_inputs --pt_enable_static_runtime=1 --pt_cleanup_activations=1 --pt_enable_out_variant=1 --compare_results=1 --iters=5000 --warmup_iters=5000 --num_threads=1 --do_profile=0 --do_benchmark=1 --adsfinder_compatibility=1 --v=1 ``` `V0511 10:59:57.187054 1911683 impl.cpp:1229] Switch to out variant for node: %5571 : Tensor = aten::full_like(%blob_for_shape.1, %235, %654, %75, %75, %75, %75)` Reviewed By: hlu1 Differential Revision: D28361997 fbshipit-source-id: 89c41e37ce23d6008cfe4d80536832ee76d3405e	2021-05-20 16:17:40 -07:00

... 2 3 4 5 6

252 Commits