Commit Graph

327 Commits

Author SHA1 Message Date
dec5aa2260 [JIT] clean up (#60390)
Summary:
* Minor: spelling, grammar.
* Add calls to `GRAPH_DUMP()` where they were missing.
* Add or expand a few comments.
* Move a few comments to seemingly more appropriate spots.
* In canonicalize_graph_fuser_ops.cpp inline `runnableInputs()` since it
  was only called in one place and had a misleading comment and
  confusing name.
* In `PeepholeOptimizeImpl::optimizeBlock()`, set `changed = true;` when
  removing `aten::is_complex`. Pretty sure its absence was a bug.
* Delete unused `_jit_pass_remove_inplace_ops` and and its
  implementation `RemoveInplaceOps()`.
* In `preprocessCaffe2Ops()`, remove redundant check for nested optional
  types. It was already checked in `checkONNXCompatibility()`.
* In `EncoderBase::AddAttribute`, log the unexpected attribute kind.
  I don't remember the repro case now but I did hit this error at some
  point and this additional logging made it easier to understand.
* In `fuseConvBatchNorm()` in eval_peephole.cpp, consistently use
  camelCase instead of snake_case for local variables.
* Add curly braces around the bodies of if and loops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60390

Reviewed By: Krovatkin

Differential Revision: D29523283

Pulled By: SplitInfinity

fbshipit-source-id: 4e16c5648616f53da07d68dab7fdf252e06a0752
2021-07-09 16:28:27 -07:00
93772792e3 [nnc] Get rid of fuser trigger counters (#57334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334

Here's a possibly controversial PR.  These counters got in the way of
generalizing the fuser tests to handle arbitrary devices, and I guess I'm just
generally skeptical that they provide much value.  While true that they let us
observe whether fusion groups were created, we already have assertions based on
the shape of the graph, and I'm not sure that I trust those any less than these
counters.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29471484

Pulled By: bertmaher

fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57
2021-06-29 22:22:15 -07:00
0dd90cceaf [package] track storages across lifetime of PackageExporter (#59735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59735

1. Fixes ABA storage identity problem during serialization for `torch.package` by keeping reference of serialized storages through lifetime of `PackageExporter` to prevent reuse of memory address. Achieved by extending logic used in solution to mobile's same issue.
2. Adds determinism to naming scheme of serialized storages in export code paths which utilize `tensor_cdata_naming_scheme`(introduced 2nd mapping in `StorageContext`, now maps `storage cdata ptr` -> `unique id`, `unique id` -> `c10::Storage`)
3. Additionally uses presence of a storage in the `StorageContext` instance as marker for if a storage has been serialized or not, removing the need to scan the `PythonStreamWriter` for presence of the storage's serialization file

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29075276

Pulled By: Lilyjjo

fbshipit-source-id: 15a5c30b1de99c5bd7079388f2db9b6ece2eca12
2021-06-29 14:16:54 -07:00
9d1d799034 Added API to change logging levels for JIT (#58821)
Summary:
Description:
- Before this, logging level could only be changed by changing the env
variable "PYTORCH_JIT_LOG_LEVEL"
    - Can change the level from python now
- Have not added stream configuration for now
- Configuration is stored in a singleton class managing the options

Issue Link: https://github.com/pytorch/pytorch/issues/54188

Gotchas:
- Created separate functions
`::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of
using the singleton class's method directly
    - This is because when running test cases, two different instances
    of the singleton are created for the test suite and the actual code
    (`jit_log.cpp`)
    - On using these methods directly, `is_enabled` calls the singleton
    in `jit_log.cpp` while we are setting the config using another
    singleton
    - See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times

API:
- To set the level: `torch._C._jit_set_logging_option("level")`
- To get the level: `torch._C._jit_get_logging_option()`

Testing:
- UTs were added for C++
- A very simple UT was added for python to just check if the API is
being called correctly
- The API was checked by running trace in a sample python file
    - Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination`
    - The error output had logs of form [DUMP..] [UPDATE...] etc

Fixes https://github.com/pytorch/pytorch/issues/54188

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821

Reviewed By: soulitzer

Differential Revision: D29116712

Pulled By: ZolotukhinM

fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f
2021-06-21 16:10:49 -07:00
add291cf66 [JIT] Add a phase to perform inplace<->functional conversion for activation operators (#57477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57477

Currently the conversion only deals with activation operators. The legality check is somewhat strict for now.

Test Plan:
```
python test/test_jit.py -k test_functional_to_inplace_activation
python test/test_jit.py -k test_inplace_to_functional_activation
```

Reviewed By: mrshenli

Differential Revision: D28155153

Pulled By: desertfire

fbshipit-source-id: df092830c4dff3ce9578ff76285eb7a566b7d81b
2021-06-03 06:43:23 -07:00
d8cbba3ee2 [JIT] Disable Complete Shape Inlining For Testing Purposes (#56966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56966

This PR adds a toggle to shape analysis which won't inline complete tensor shapes as constants into the shape compute graph, which is a good stress test on the partial evaluation pipeline.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28444664

Pulled By: eellison

fbshipit-source-id: a62e424515a8837a4b596546efa93af5e8e61f10
2021-05-27 17:57:48 -07:00
f66fbb1e2e Add unary/binary ops necessary for mobilenet (#56828)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56828

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28444660

Pulled By: eellison

fbshipit-source-id: 656673e6139550f2752c0d3ac2fb8731f4bf9bbb
2021-05-27 17:56:30 -07:00
e067675167 [Pytorch] Provide API to preserve source range and callstack information during graph rewrite (#58300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58300

Current state: During graph rewriting that can fuse nodes or add nodes
result in new nodes without debug information that was available in
original node. Thus we lose this information during graph rewrite.

This PR changes graph rewriting API to let user specify how the values
in the replacement pattern map to values in the pattern to be matched.
Then the graph rewriting will copy source range and inlined callstack
from the matched nodes onto the nodes being inserted.

(Note: this ignores all push blocking failures!)

Test Plan:
python test/test_jit.py
TestJit.test_pattern_based_rewrite_with_source_range_preserved

Imported from OSS

Reviewed By: malfet

Differential Revision: D28512465

fbshipit-source-id: 863173c29de726be85b3acbd3ddf3257eea36d13
2021-05-25 09:18:59 -07:00
5313bafd31 [JIT] integer value refinement (#56438)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56438

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D27924239

Pulled By: eellison

fbshipit-source-id: ace54fcb594853f30c242369ea203b0eb5527ac1
2021-05-21 08:51:01 -07:00
5cebf29b4e Add list len refinement (#55926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55926

This is necessary for code like conv2d where we wish to share a generic convolution shape function logic with that of conv2d but for conv2d always infer the output is dimension 4. I'm also hoping the refinement algorithm here could be refactored out and used to support refining tensor types from user annotations. i have a length comment explaining how this works, and the logic outside of data structures is pretty small and contained. Additionally, you might check out https://fb.quip.com/X7EVAdQ99Zzm for a very similar description of how to refine values based on comparison operators.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27750997

Pulled By: eellison

fbshipit-source-id: d962415af519ac37ebc9de88f2e1ea60a1374f7c
2021-05-21 08:50:54 -07:00
9fd2306036 Add handling of symbolic shapes (#55925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55925

This sets up the initial handling of symbolic shapes. As in the test, it doesn't work perfectly yet because it needs a couple other optimization passes. The basic description is pretty simple: we resolve tensor dimension indices to the same Value *, and before extracting out the output Tensor shape we substitute in symbolic shapes. We don't substitute during optimization because they are represented as negative numbers so we don't want them inadvertently used in Constant prop or something else.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27750996

Pulled By: eellison

fbshipit-source-id: 6984e7276b578f96b00fc2025cef0e13f594b6e6
2021-05-21 08:50:52 -07:00
f39471a171 Initial Symbolic Shape Analysis (#54809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54809

I'm going to post on dev-discuss soon with a more thorough explanation of the design and advantages of this shape analysis, so I'm leaving out that for now.

There is still a ton left to do, I'm posting this initial version so we can get something on master multiple can work on. List of many remaining steps to do:

- [ ] Add symbolic shapes support
- [ ] Bind shape functions for operators in C++
- [ ] Make classes of operators share the same shape function (e.g. pointwise, broadcast two inputs)
- [ ] Refactor APIs
- [ ] Only iteratively optimize shape function while a change has been made
- [ ] Expand coverage of coverage to common ops
- [ ] Add shape analysis pass on Graph that handles Ifs and Loops
- [ ] Allow concurrent reads to the operator map
- [ ] Successive applications of same inputs to same shape function (e.g. series of pointwise ops)

For this review, I am mostly looking for comments related to the implementation of symolic_shape_analysis.cpp, with the caveats listed above. I am not really looking for comments related to api/registration/graph level analysis as those are all planned to be changed. I am fine landing this as is or waiting until necessary components of the TODOs above are finished.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D27750998

Pulled By: eellison

fbshipit-source-id: 4338b99e8651df076291c6b781c0e36a1bcbec03
2021-05-21 08:49:46 -07:00
3fe72d30dc [NNC] Optimize conditionals that correspond to the form generated for aten::cat op. (#57673)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57673

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231374

Pulled By: navahgar

fbshipit-source-id: 1777a63df4e5ebed6d515683bd772a88be465b3a
2021-05-18 14:23:48 -07:00
5a238eb96e Fix deadlock in Future due to lock inversion with GIL (#58382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58382

Calling markCompleted on a Future now first acquires the Future's mutex (as usual) but then sometimes tries to acquire the GIL during the DataPtr extraction while still holding the Future's mutex. (This happens when the value passed to markCompleted is a Python object). This can cause a deadlock if someone else calls any of the other methods of Future while holding the GIL.

There are two solutions to this: avoid holding the Future's mutex when extracting DataPtrs, and avoid holding the GIL while invoking the Future's method. In this PR I'm going for the latter, because it's a very simple immediate fix, but I believe this is brittle and that we should probably also consider the former fix.
ghstack-source-id: 129105358

Test Plan: The repro in https://github.com/pytorch/pytorch/issues/58239 now doesn't deadlock.

Reviewed By: mrshenli

Differential Revision: D28472816

fbshipit-source-id: 1bc9bca426dd004f9eb2568db1ffd38f014450e2
2021-05-17 10:53:19 -07:00
9403fe17ce [torch.package/TorchScript] logic to enable sharing of tensors on load (#57573)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57573

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D28226975

Pulled By: Lilyjjo

fbshipit-source-id: bc8cb3e8052fa18336c437e0601d8b0028fd1895
2021-05-14 08:21:43 -07:00
38e606d056 [RFC] Add method torch.jit._clone_module_with_class (#56152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56152

Currently, the Bundled Inputs API mutates the module in-place. It adds class methods and not instance methods. This results in a small problem that one can't re-run an already executed cell in Bento if the class has already been subject to bundled inputs.

In addition, there is no way to add bundled inputs to a module that has bundled inputs added already. This API provides a way to solve this problem as well by adding an `ignored_methods` to the call to `clone()` by allowing the implementation of bundled inputs to pass in the methods that it will add as `ignored_methods` so that when it does try to add those methods, it will be able to do so successfully.

We'll have to be careful when ignoring those methods during the call to `torch.jit._clone_module_with_class` since any bundled input that relies on a user-provided method will need to be preserved and not ignored during the clone.

Looking for feedback on whether this is an acceptable direction.
ghstack-source-id: 128908360

Test Plan:
Added unit test and ran it as `buck test //caffe2/test:mobile`

Also see this Bento Notebook: https://www.internalfb.com/intern/anp/view/?id=550829

Reviewed By: gmagogsfm

Differential Revision: D27788394

fbshipit-source-id: 48109cd4583506d4efdb345e4ba31385db23a273
2021-05-13 22:31:05 -07:00
58bc003487 Add pybind type caster for c10::Device (#57292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57292

In Future (and soon in other places too) we need to receive a list of devices from Python-land. We don't want to just take their indices because we need full devices in order to infer the type from them. torch.device is not defined through pybind, it's defined through a plain `PyModule_AddObject` call with CPython, thus pybind isn't naturally able to understand and convert it. However we can provide a custom type caster which fixes that. We have this already for at::Tensor, at::Generator, ...
ghstack-source-id: 127916268

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28092732

fbshipit-source-id: 1c31d0b85a4d5c9e7bde8161efbb7574d505157c
2021-05-01 16:11:10 -07:00
311ad5e3af Merge CUDAFuture into ivalue::Future (#57052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57052

This PR caps a stack whose goal was to merge CUDAFuture into ivalue::Future. CUDAFuture used to be a subclass of ivalue::Future, which was already pretty good, but it meant that in several places we needed `#ifdef`s or registries in order to create the right type of class, which was annoying. We've made CUDAFuture device-agnostic, by using generic helpers, so that it doesn't depend on CUDA. Now all its code can be inserted into ivalue::Future.

This PR does this very naively, by copy-pasting CUDAFuture's code into the (previously empty) virtual methods of ivalue::Future. This helps ensure the correctness of this PR, as it's straightforward to see it behaves exactly like before. However we probably want to polish it a bit later to iron out so wrinkles.
ghstack-source-id: 127713138

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28036829

fbshipit-source-id: 3e5b16402f5dc245c1fcb9d7bf06db64dcb0d2a3
2021-04-29 09:31:52 -07:00
71c2f88b90 Make CUDAFuture handle any kind of device type (#57051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57051

Make CUDAFuture autodetect the devicetype from its arguments (which thus change from DeviceIndices to full Devices). This in fact transforms CUDAFuture into a AnythingFuture, since it's not tied to CUDA in any way anymore. Having made it fully device-agnostic, we'll merge it into ivalue::Future in the next PR.
ghstack-source-id: 127713134

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28032711

fbshipit-source-id: 8ba23b1b0d97f61db8693cd5f3c7bae7989a9bcd
2021-04-29 09:31:50 -07:00
60a5ebfac2 [Pytorch Edge] Remove methods_to_optimize arg (#57045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57045

Went back and adjusted the previous optimizations to just be applied to every function.
Cleaned up api to match.

ghstack-source-id: 127214412
ghstack-source-id: 127536155

Test Plan: unit test

Reviewed By: kimishpatel

Differential Revision: D27950859

fbshipit-source-id: 214e83d5a19b452747fe223615815c10fa4aee58
2021-04-27 14:54:13 -07:00
dc8a8cea79 Move caffe2 signal_handler to c10. (#56717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56717

The signal_handler was under the caffe2 namespacee but was being used
by PyTorch as well.

I've fixed this my moving it to the c10 namespace where now both C2 and PyTorch
can use it.

The signal_handler interface in caffe2/utils/signal_handler.h is kept the same
for backward compatiblity for C2, but most of the commmon code is moved to c10.
ghstack-source-id: 127446929

Test Plan: waitforbuildbot

Reviewed By: ezyang

Differential Revision: D27946738

fbshipit-source-id: d6228d1a0108f4c807d405e7a0bb799c5375388f
2021-04-26 23:08:12 -07:00
15ca379bde Add CUDA support to a user-created torch.futures.Future (#56517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56517

Currently a torch.futures.Future could wrap a CUDAFuture, but it could not create one from scratch. This prevented users from using CUDAFutures in some occasions, for example when using `rpc.functions.async_execution`, or in their own code. I don't see any reason for such a limitation, hence here I add support for this.
ghstack-source-id: 127261554

Test Plan: Added a test later in the stack

Reviewed By: mrshenli

Differential Revision: D27887190

fbshipit-source-id: ecbb39c1ad7cd189d478ded9c361448f05a270ad
2021-04-23 08:13:56 -07:00
818ce1d0d2 Add standardOps match more input type in ORT (#53813) (#56172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56172

Enable the standardOps include **Add\Sub\Mul\Div\Gemm\Pow\Mod**  with low precision input in ORT

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D27866136

Pulled By: SplitInfinity

fbshipit-source-id: f2cf5649fffefd68c0cc7b6dce94198751636727
2021-04-21 17:58:08 -07:00
8e82e932f3 Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120

This reverts commit ad17fadbfc786dc1ccb42e822208ff03c2a2b72c (D27786457).

The big annoyance here is that depending on the threading mode you may not be
able to toggle num_threads at will, so the fusion tests won't fail.

I hate this solution, but I'm adding a secondary override for the TE fuser.
Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're
OK if you're running with 1 thread, or you can add
`_jit_set_texpr_parallel_cpu_enabled` to enable it anyways.

This is (a) mainly for tests, since a real user probably won't fiddle aimlessly
with the thread count, and (b) will go away once NNC's threading support is
fully baked.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D27788199

Pulled By: bertmaher

fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1
2021-04-15 15:50:18 -07:00
20d7916a6a [Pytorch Mobile] Fold Conv BatchNorm for functions besides forward (#54619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54619

Minor refactor to conv batchnorm folding to work on other functions besides forward
ghstack-source-id: 125767010

Test Plan: unit test and {P339453712}

Reviewed By: kimishpatel

Differential Revision: D27301452

fbshipit-source-id: 4e0cc544a171a970583979a496b2908935124497
2021-04-06 13:07:12 -07:00
4626886f21 [JIT] Add CUDNN Conv-Add-Relu fusion for Frozen Model Optimization (#52102)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52102

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D26646100

fbshipit-source-id: 7f7a82cc0b42c958b9e0c854b3b5dc6ea7cfff6c
2021-03-18 15:18:52 -07:00
255b103c1b [WIP] Function to retrieve inspect.Signature instances for PyTorch ops (#53830)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53830

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D26982802

Pulled By: jamesr66a

fbshipit-source-id: 18fddc9f3f34b09e173de59f2fe886f8eedd000e
2021-03-17 20:41:27 -07:00
8f61b13e80 [Pytorch Mobile] Optimize Non Forward for Mobile (#53314)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53314

Introduction of api for optimizing non forward functions for mobile. As of this diff, all functions that you say to optimize will be preserved, and those functions will be run through canonical optimization. The intention is to stack each further optimization onto separate diffs since they touch multiple files, and it seems like it'd be a nightmare to review.
ghstack-source-id: 123909414

Test Plan:
torch.utils.mobile_optimizer.optimize_for_mobile(net, methods_to_optimize=["forward", "foo"]) runs fine

torch.utils.mobile_optimizer.optimize_for_mobile(net, methods_to_optimize={"foo"}) optimizes just foo if the model doesnt define forward otherwise optimizes foo and forward

torch.utils.mobile_optimizer.optimize_for_mobile(net, methods_to_optimize=["forward"]) runs fine

torch.utils.mobile_optimizer.optimize_for_mobile(net) runs fine if the model defines forward, Throws otherwise

Reviewed By: kimishpatel

Differential Revision: D26618689

fbshipit-source-id: 5bff1fb3f3f6085c4a649a8128af9c10f0fa9400
2021-03-17 14:31:24 -07:00
57d1df071f [ONNX] Support inplace operations on inplace indexing (#52063) (#53306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53306

* [ONNX] Fix for sequence of mutations in blocks (#51577)

Fixes consecutive mutations in a tensor inside blocks.
Also, support append and pop in blocks.

* Support inplace operations + indexing

* Clean up old pass for remove mutations

* Add loop test

* Fixes for set attr in loops

* Removing the new jit API flag

* [ONNX] Redesign onnx pass to enable shape type dependent pattern conversion - cont (#51795)

With the introduction of ONNX shape inference, shape and type are inferred on the fly as operators get converted from ATen to ONNX when running symbolic function. This resolves the shape/type requirement for the symbolic functions. The pre-onnx passes however, can not be supported by shape inference, since at that stage the operators in the graph are still ATen operators.

This PR is to update the design of ONNX pass, to enable a mechanism of capturing subgraphs of ATen operators of certain patterns, and convert them later, when shape/type information of upstream operators are available.

The new design will require pre-onnx passes that need shape/type to be written in two parts, encapsulation and conversion.

    The encapsulation part will find the nodes of patterns, like how pre-onnx passes were written previously. But instead of converting the nodes, it will encapsulate them into a sub-block of a new placeholder node. This part is called before onnx pass, so it runs before calling symbolic functions.

    The conversion part will be called inside the onnx pass. In onnx pass, run_symbolic_func will be called for each node in topological order. When it reaches the placeholder node, the conversion part will be invoked. It will convert the nodes inside the sub-block based on pattern. By that time, it will have shape/type of upstream operators available. After the conversion is complete, the placeholder node will be removed, and nodes inside its sub-block converted. Run_symbolic_func will be called for these nodes, and they will be converted from ATen operator to ONNX operator.

This PR includes several other fixes, listed below.
* ~~replace helper.cpp with onnx_utils.cpp for holding utility functions.~~
* fix EraseNumberTypes on Bool type, the code was outdated that back then Bool type doesn't exist.
* ~~enable onnx shape inference in export with parameter/initializer data.~~
* other code clean ups.
* fix insertion of identity nodes for loop opset 13 sequence output.

~~PR depends on #51603~~

* Fix after merge

* clang

* Fix clang

* Fix clang

* Fix warning message.

* Fixes for non-model param attributes

* Fix for caffe2

* Additional test

* clang

* Skip test for lower opsets

* fix clang-tidy

* Update init.cpp

* Update remove_inplace_ops_for_onnx.cpp

* Update remove_inplace_ops_for_onnx.cpp

* Update remove_inplace_ops_for_onnx.cpp

* Fix for clang formatting

Test Plan: Imported from OSS

Reviewed By: pbelevich, malfet

Differential Revision: D26922416

Pulled By: SplitInfinity

fbshipit-source-id: e7108620b39b6404c594910786c4d275fee59d84

Co-authored-by: Bowen Bao <bowbao@microsoft.com>
2021-03-12 02:49:11 -08:00
3f9c803fe8 [ONNX] Redesign onnx pass to enable shape type dependent pattern conversion - cont (#51795) (#53304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53304

With the introduction of ONNX shape inference, shape and type are inferred on the fly as operators get converted from ATen to ONNX when running symbolic function. This resolves the shape/type requirement for the symbolic functions. The pre-onnx passes however, can not be supported by shape inference, since at that stage the operators in the graph are still ATen operators.

This PR is to update the design of ONNX pass, to enable a mechanism of capturing subgraphs of ATen operators of certain patterns, and convert them later, when shape/type information of upstream operators are available.

The new design will require pre-onnx passes that need shape/type to be written in two parts, encapsulation and conversion.

    The encapsulation part will find the nodes of patterns, like how pre-onnx passes were written previously. But instead of converting the nodes, it will encapsulate them into a sub-block of a new placeholder node. This part is called before onnx pass, so it runs before calling symbolic functions.

    The conversion part will be called inside the onnx pass. In onnx pass, run_symbolic_func will be called for each node in topological order. When it reaches the placeholder node, the conversion part will be invoked. It will convert the nodes inside the sub-block based on pattern. By that time, it will have shape/type of upstream operators available. After the conversion is complete, the placeholder node will be removed, and nodes inside its sub-block converted. Run_symbolic_func will be called for these nodes, and they will be converted from ATen operator to ONNX operator.

This PR includes several other fixes, listed below.
* ~~replace helper.cpp with onnx_utils.cpp for holding utility functions.~~
* fix EraseNumberTypes on Bool type, the code was outdated that back then Bool type doesn't exist.
* ~~enable onnx shape inference in export with parameter/initializer data.~~
* other code clean ups.
* fix insertion of identity nodes for loop opset 13 sequence output.

~~PR depends on #51603~~

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D26922417

Pulled By: malfet

fbshipit-source-id: 14ed06158d539e2451c2e5e63ba1b32fb0f75095
2021-03-11 10:30:09 -08:00
b4d8f4af82 [package] implement get_resource_reader API (#51674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51674

See
https://docs.python.org/3/library/importlib.html#importlib.abc.ResourceReader

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D26237034

Pulled By: suo

fbshipit-source-id: 4c19f6172d16b710737528d3de48372873b9368d
2021-03-10 12:11:11 -08:00
d3cde6c23c [NNC] Implementation for aten::cat without conditionals. (#53128)
Summary:
This PR adds an implementation for `aten::cat` in NNC without any conditionals. This version is not enabled by default.

Here is the performance of some micro benchmarks with and without conditionals. There is up to 50% improvement in performance without conditionals for some of the shapes.

aten::cat implementation in NNC **with** conditionals
```
$ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion concat
pt: concat2d2input_fwd_cpu_1_160_1_14_1: 5.44 us, SOL 0.26 GB/s, algorithmic 0.51 GB/s
pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.75 us, SOL 1.05 GB/s, algorithmic 2.10 GB/s
pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.87 us, SOL 4.05 GB/s, algorithmic 8.11 GB/s
pt: concat2d2input_fwd_cpu_20_580_20_174_1: 14.52 us, SOL 8.31 GB/s, algorithmic 16.62 GB/s
pt: concat2d2input_fwd_cpu_8_512_8_512_1: 9.58 us, SOL 6.84 GB/s, algorithmic 13.68 GB/s
```
aten::cat implementation in NNC **without** conditionals
```
$ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion --cat_wo_conditionals concat
pt: concat2d2input_fwd_cpu_1_160_1_14_1: 4.67 us, SOL 0.30 GB/s, algorithmic 0.60 GB/s
pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.65 us, SOL 1.07 GB/s, algorithmic 2.14 GB/s
pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.10 us, SOL 4.56 GB/s, algorithmic 9.12 GB/s
pt: concat2d2input_fwd_cpu_20_580_20_174_1: 7.44 us, SOL 16.22 GB/s, algorithmic 32.44 GB/s
pt: concat2d2input_fwd_cpu_8_512_8_512_1: 6.46 us, SOL 10.14 GB/s, algorithmic 20.29 GB/s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53128

Reviewed By: bertmaher

Differential Revision: D26758613

Pulled By: navahgar

fbshipit-source-id: 00f56b7da630b42bc6e7ddd4444bae0cf3a5780a
2021-03-07 22:57:02 -08:00
56f8379802 [static runtime] Move all heavy constructor logic into InferenceModule (renamed to StaticModule) (#51564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51564

Constructor logic was spread throughout InferenceModule and StaticRuntime.  This diff unifies the two.  After a lot of discussion on this diff D25961626 it became apparent that `clone` is uglier than a cheap StaticRuntime.

This means StaticRuntime is effectively StaticModule and the only code in the new StaticRuntime is the `run` functions.

```
graph, schema = PrepareForStaticModule(torchscript_module)
sm = StaticModule(graph, schema, options)
sm(inputs)
// or create many cheap runtimes with the module
sr = StaticRuntime(sm)
sr(inputs)
```

Changelist:
- Rename InferenceModule StaticModule
- Move all logic for construction into StaticModule
- Create a new StaticRuntime that only has a unique memory planner (everything else is in StaticModule)
- Update comments with explanation
- Propagate all changes to predictor integration
- Propagate all changes to python integration
- Change semantics to be a bit more PyTorch-standard (no "run" calls, no "get_" getters).

Test Plan:
buck test //caffe2/test:static_runtime
buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest

Reviewed By: hlu1

Differential Revision: D25592967

fbshipit-source-id: 8233bed03137ce129137af2d44bce0095033ef0f
2021-03-05 10:15:26 -08:00
bfae3789ba Move conv to mkldnn (#51483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51483

This PR moves the conv weights of a frozen model to MKLDNN, and AOT reorders the weights. When the weights are already in MKLDNN, just computing a single conv by converting the input and output from/to mkldnn provides large speedups. I benchmark'd the results of the top 200 shapes in predictor [here](https://www.internalfb.com/phabricator/paste/view/P171537938), as well as verified that it sped up popular models in torchvision.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26696703

Pulled By: eellison

fbshipit-source-id: 0b4441bee4f6e0890a4540fbca3bb5e58b8c5adf
2021-03-01 21:19:27 -08:00
4d94ee566e Ge v1 (#52136)
Summary:
This is a second attempt to use graph executor to run forward on a gradient. This allows a secondary chance to profile intermediate tensor introduced by autodiff.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52136

Reviewed By: pbelevich

Differential Revision: D26693978

Pulled By: Krovatkin

fbshipit-source-id: 91dde8009a210950af8e5173668ada241e16dd52
2021-02-28 00:53:13 -08:00
1d6bd15790 [JIT] Add torch._C._jit submodule (#52910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52910

**Summary**
PR #52158 tried to move all JIT bindings from `torch._C` to a new
submodule `torch._C._jit`, but that...did not go well. This pull request
adds the new `torch._C._jit` submodule, but does not migrate the
existing bindings. Instead, it adds a unit test that fails if any new
bindings are added to `torch._C`. A comment in the test instructs
developers to add their new binding to the allowlist if it really should
be in `torch._C`, or to add it to the appropriate submodule (e.g
`torch._C._jit`, for example). The idea is to prevent the issue
described in #51691 from getting *worse* if it cannot be fixed.

**Test Plan**
Continuous integration.

**Fixes**
This commit fixes #51691.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26698373

Pulled By: SplitInfinity

fbshipit-source-id: ec9f5426051227a513d4fd09512b624420e0100b
2021-02-26 16:05:05 -08:00
b72a72a477 torch.Package extend PyTorchStreamWriter to track written records (#52218)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52218

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D26429794

Pulled By: Lilyjjo

fbshipit-source-id: 5f68e7991c673ada629d0370c705520243d0637a
2021-02-22 15:02:41 -08:00
60518d10f6 [deploy] torch::deploy API (#51754)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51754

This API allows you to manage multiple python interpreters in a single
process to deploy PyTorch models packaged with torch.package.

torch/csrc/deploy/deploy.h contains the API definition
torch/csrc/deploy/test_deploy.cpp has some examples.

Notes:
* mutex is added to PyTorchStreamReader to make it safe to use from multiple threads at once.
* USE_DEPLOY is only true for the special libtorch_deployinterpreter.so library, when enabled
  we use a hash table to maintain PyObject <> at::Tensor mappping rather than the internal pointer
  in Tensor since >1 interpreter may have a reference to the tensor.
* serialization.py has some additional functions for creating pickle objects
  but keeping storages in memory for use transfering tensors between interpreters

Test Plan: Imported from OSS

Reviewed By: wconstab

Differential Revision: D26329468

Pulled By: zdevito

fbshipit-source-id: d75f4ebb9a27f1d911179d9996041bcb3ca04a07
2021-02-18 02:30:08 -08:00
c357f8b826 [package] make torch.package produce unified format (#51826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51826

Looks like this:
```
resnet.pt
├── .data  # Data folder named so it can't clash with torch.package codemodules.
│   │      # Names/extensions automatically added to avoid namingconflicts.
│   ├── 94286146172688.storage   # tensor data
│   ├── 94286146172784.storage
│   ├── extern_modules           # torch.package metadata
│   ├── version                  # version metadata
│   └── ...
├── model  # package pickled model created w/
│   │      # exporter.save_pickel('model','model.pkl', resnet_model)
│   └── model.pkl
└── torchvision  # all code dependencies for packaged picked
    └── models   # models are captured as source files
            ├── resnet.py
                    └── utils.py
```

Since `version` is hardcoded in our zip reader/writer implementation,
add it as an option that defaults to "version" but accepts other
locations for putting the version metadata.

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D26295649

Pulled By: suo

fbshipit-source-id: 2d75feeb7de0f78196b4d0b6e2b814a7d58bd1dd
2021-02-09 07:45:59 -08:00
c941730b96 [JIT/Futures] support set_exception api (#50983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50983

There is currently no way to handle/propagate errors with the python-based futures API (they are raised correctly if set with an error, but this is only possible from C++).

This diff allows the Future's `unwrap_func` to be set in python optionally, so users can set futures completed with an exception and the error will throw as expected. This is mostly to support the following use case in the next diff:

```
ret_fut = torch.futures.Future(unwrap_func = lambda python_result: {
    # throw exception if needed
    if isinstance(python_result, Exception):
        throw python_result
})

rpc_fut = rpc.rpc_async(...) # RPC future that times out
# Goal is to propagate RPC error to this future
rpc_fut.add_done_callback(
res => {
    # Note that ret_fut.set_result(res.wait()) won't propagate the error
    try:
        ret_fut.set_result(res.wait())
    except Exception as e:
        ret_fut.set_result(e)
}
)
```
ghstack-source-id: 121021434

Test Plan:
unittest
```
buck test mode/dev-nosan mode/no-gpu //caffe2/test:futures -- te
st_unwrap --print-passing-details
```

Reviewed By: mrshenli

Differential Revision: D25950304

fbshipit-source-id: 7ee61e98fcd783b3f515706fa141d538e6d2174d
2021-02-04 20:22:19 -08:00
88baf470d1 [JIT] Provide more info when attribute fails to convert (#50870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50870

**Summary**
Module attributes whose types cannot be determined based on annotations
or inference based on their values at script time are added to the
concrete type of the corresponding module as "failed attributes". Any
attempt to access them in scripted code produces an error with a message
explaining that the attribute could not be contributed to a
corresponding attribute on the TorchScript module. However, this error
is not more specific than that.

This commit modifies `infer_type` in `_recursive.py` so that it returns
`c10::InferredType` instead, which allows more information about typing
failures to be communicated to the caller through the `reason()` method
on this class. This information is appended to the hint added to the
module concrete type for failed attributes.

**Testing**
This commit adds a unit test to `test_module_containers.py` that checks
that extra information is provided about the reason for the failure
when a module attribute consisting of a list of `torch.nn.Module` fails to convert.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D26091472

Pulled By: SplitInfinity

fbshipit-source-id: fcad6588b937520f250587f3d9e005662eb9af0d
2021-01-27 20:37:10 -08:00
1c9347c666 [ONNX] Use parameter values in onnx shape inference (#49706) (#50905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50905

Adds an additional run of onnx shape inference after constant folding, since initializer may have changed and affected shape inference.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D26050881

Pulled By: SplitInfinity

fbshipit-source-id: 9e5d69c52b647133cd3a0781988e2ad1d1a9c09d
2021-01-27 17:45:32 -08:00
137f2a385a [ONNX] Handle sequence output for models (#50599)
Summary:
Duplicate of https://github.com/pytorch/pytorch/issues/46542

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50599

Reviewed By: SplitInfinity

Differential Revision: D25928897

Pulled By: bzinodev

fbshipit-source-id: a898cef7b2d15a287aedd9798ce1423cebf378d4
2021-01-21 15:36:41 -08:00
a9db2f8e7a Revert D24924236: [pytorch][PR] [ONNX] Handle sequence output shape and type inference
Test Plan: revert-hammer

Differential Revision:
D24924236 (adc65e7c8d)

Original commit changeset: 506e70a38cfe

fbshipit-source-id: 78069a33fb3df825af1cb482da06a07f7b26ab48
2021-01-15 05:58:35 -08:00
adc65e7c8d [ONNX] Handle sequence output shape and type inference (#46542)
Summary:
Handle sequence output shape and type inference.

This PR fixes value type of sequence outputs. Prior to this, all model sequence type outputs were unfolded for ONNX models.
This PR also enable shape inference for sequence outputs to represent the dynamic shape of these values.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46542

Reviewed By: ezyang

Differential Revision: D24924236

Pulled By: bzinodev

fbshipit-source-id: 506e70a38cfe31069191d7f40fc6375239c6aafe
2021-01-14 21:12:35 -08:00
e9dc8fc162 [TensorExpr] Add python bindings. (#49698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49698

Reincarnation of #47620 by jamesr66a.

It's just an initial bunch of things that we're exposing to python, more
is expected to come in future. Some things can probably be done better,
but I'm putting this out anyway, since some other people were interested
in using and/or developing this.

Differential Revision: D25668694

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: fb0fd1b31e851ef9ab724686b9ac2d172fa4905a
2021-01-14 21:02:47 -08:00
aeefe2ce31 [ONNX] ONNX dev branch merge 01-06-2021 (#50163)
Summary:
[ONNX] ONNX dev branch merge 01-06-2021
- [ONNX] Support onnx if/loop sequence output in opset 13 - (https://github.com/pytorch/pytorch/issues/49270)
- Symbolic function for torch.square (https://github.com/pytorch/pytorch/issues/49446)
- [ONNX] Add checks in ONNXSetDynamicInputShape (https://github.com/pytorch/pytorch/issues/49783) …
- [ONNX] Enable export af aten::__derive_index (https://github.com/pytorch/pytorch/issues/49514) …
- [ONNX] Update symbolic for unfold (https://github.com/pytorch/pytorch/issues/49378) …
- [ONNX] Update the sequence of initializers in exported graph so that it is as same as inputs. (https://github.com/pytorch/pytorch/issues/49798)
- [ONNX] Enable opset 13 ops (https://github.com/pytorch/pytorch/issues/49612) …
- [ONNX] Improve error message for supported model input types in ONNX export API. (https://github.com/pytorch/pytorch/issues/50119)
- [ONNX] Add a post-pass for If folding (https://github.com/pytorch/pytorch/issues/49410)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50163

Reviewed By: pbelevich

Differential Revision: D25821059

Pulled By: SplitInfinity

fbshipit-source-id: 9f511a93d9d5812d0ab0a49d61ed0fa5f8066948
2021-01-13 13:51:21 -08:00
a389b30bfc Add Post Freezing Optimizations, turn on by default in torch.jit.freeze (#50222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50222

This PR adds a pass which runs a set of optimizations to be done after freezing. Currently this encompasses Conv-BN folding, Conv->Add/Sub/Mul/Div folding and i'm also planning on adding dropout removal.

I would like some feedback on the API. torch.jit.freeze is technically in \~prototype\~ phase so we have some leeway around making changes. I think in the majority of cases, the user is going to want to freeze their model, and then run in inference. I would prefer if the optimization was opt-out instead of opt-in. All internal/framework use cases of freezing all use `freeze_module`, not the python API, so this shouldn't break anything.

I have separated out the optimization pass as a separate API to make things potentially modular, even though I suspect that is an unlikely case. In a future PR i would like to add a `torch::jit::freeze` which follows the same api as `torch.jit.freeze` intended for C++ use, and runs the optimizations.

Test Plan: Imported from OSS

Reviewed By: tugsbayasgalan

Differential Revision: D25856264

Pulled By: eellison

fbshipit-source-id: 56be1f12cfc459b4c4421d4dfdedff8b9ac77112
2021-01-12 11:39:13 -08:00
6971149326 [JIT] Add Frozen Conv-> Add/Sub/Mul/Div fusion (#50075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50075

Adds Conv - Add/Sub/Mul/Div fusion for frozen models. This helps cover models like torchvision maskrcnn, which use a hand-rolled batchnorm implementation: 90645ccd0e/torchvision/ops/misc.py (L45).

I haven't tested results yet but I would expect a somewhat similar speed up as conv-bn fusion (maybe a little less).

Test Plan: Imported from OSS

Reviewed By: tugsbayasgalan

Differential Revision: D25856265

Pulled By: eellison

fbshipit-source-id: 2c36fb831a841936fe4446ed440185f59110bf68
2021-01-12 11:39:02 -08:00
035229c945 [JIT] Frozen Graph Conv-BN fusion (#50074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50074

Adds Conv-BN fusion for models that have been frozen. I haven't explicitly tested perf yet but it should be equivalent to the results from Chillee's PR [here](https://github.com/pytorch/pytorch/pull/476570) and [here](https://github.com/pytorch/pytorch/pull/47657#issuecomment-725752765). Click on the PR for details but it's a good speed up.

 In a later PR in the stack I plan on making this optimization on by default as part of `torch.jit.freeze`. I will also in a later PR add a peephole so that there is not conv->batchnorm2d doesn't generate a conditional checking # dims.

Zino was working on freezing and left the team, so not really sure who should be reviewing this, but I dont care too much so long as I get a review �

Test Plan: Imported from OSS

Reviewed By: tugsbayasgalan

Differential Revision: D25856261

Pulled By: eellison

fbshipit-source-id: da58c4ad97506a09a5c3a15e41aa92bdd7e9a197
2021-01-12 11:37:32 -08:00