Commit Graph

49 Commits

Author SHA1 Message Date
62af37aa88 dropout symbolic_script should respect the training flag (#20760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20760
ghimport-source-id: eb667c3549a03a2fc01ffa0a2d3bc7e3a29b78e0

Reviewed By: jamesr66a

Differential Revision: D15486511

Pulled By: suo

fbshipit-source-id: 56ae930a01b0f6f4305a2a745135d4529b4a1ca0
2019-05-23 18:17:17 -07:00
5b78a5eadb Memory format support for contiguous and is_contiguous (#20455)
Summary:
#19975 was separated by 2 PRs.

This one:

Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.

At this moment both functions just operate with strides and doesn't store any tensor state.

(Original RFC #19092)

-----

Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).

Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.

1.  `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.

    - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.

        `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.

2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.

    - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.

Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455

Differential Revision: D15341577

Pulled By: VitalyFedyunin

fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
2019-05-16 07:18:24 -07:00
da3e74b21c define use_cuda in dropout backward to allow peephole optimization to… (#20289)
Summary:
… work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20289

Differential Revision: D15350262

Pulled By: wanchaol

fbshipit-source-id: b457304688524822c1e6f23049e05472130c1ff4
2019-05-15 11:36:06 -07:00
97e1f07ffc Replace AT_CHECK with TORCH_CHECK [shard 10/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20436

Reviewed By: jerryzh168

Differential Revision: D15318926

fbshipit-source-id: 71a43070cc50cc174f703ebc595f1d87c6fc1e91
2019-05-15 07:35:37 -07:00
d2da3ee601 temporarily disbale layernorm AD (#20442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20442
ghimport-source-id: c246ade4ee9ee31b2e3413efff3ea6a246e1837e

Differential Revision: D15321524

Pulled By: wanchaol

fbshipit-source-id: 22c77d08c91af2d83dfd2c4a84cafc56e9240033
2019-05-13 16:35:50 -07:00
3afd99680c Remove SourceLocation (respin) (#20333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20333
ghimport-source-id: e64075bb82067224463e9955d10bd13967d1975d

Differential Revision: D15284081

Pulled By: zdevito

fbshipit-source-id: ac26ae48392b9daff08f460529c06af8f4e4722a
2019-05-09 16:17:33 -07:00
e870b11ae6 Revert D15275731: Remote SourceLocation
Differential Revision:
D15275731

Original commit changeset: f4da178c3137

fbshipit-source-id: 830b79735eb2dadc4795b5aae407826bf20ef121
2019-05-09 13:07:11 -07:00
eca91de5d2 Remote SourceLocation (#20300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20300
ghimport-source-id: 06f606c4db3b70b1d2ed9f6ed4542c3f703c4e17

Differential Revision: D15275731

Pulled By: zdevito

fbshipit-source-id: f4da178c31372c2264feb9f99476b9c9aa66c1f2
2019-05-09 11:48:29 -07:00
edb376eceb Cleanup includes in torch/csrc/jit/script/* (#19921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19921
ghimport-source-id: 12a4553a4a081e8a41f4ed432b4ce3dc14e4699f

Differential Revision: D15125017

Pulled By: ZolotukhinM

fbshipit-source-id: f7285bd1e0745dadb9cd353a5fa8a09728012a59
2019-05-06 13:24:22 -07:00
3c81eb3aa7 add max_pool2d to AD, add tests for both autodiff and inference mode
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19661

Differential Revision: D15074431

Pulled By: wanchaol

fbshipit-source-id: b31cf2126c2c5d6a12c2ef5dc67b57677652f1fc
2019-04-24 22:19:51 -07:00
1e94a3bc4d Turn resolver into a class (#19236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19236
ghimport-source-id: d36705ea5ecff085d0d84ea57bb96d18d7c260dd

Differential Revision: D14928292

Reviewed By: zdevito

Pulled By: suo

fbshipit-source-id: cd038100ac423fa1c19d0547b9e5487a633a2258
2019-04-19 13:01:59 -07:00
b9291f55bb pow scalar exponent / base autodiff, fusion (#19324)
Summary:
Fixes: #19253

Fixing pow(Tensor, float) is straightforward.
The breakage for pow(float, Tensor) is a bit more subtle to trigger, and fixing needs `torch.log` (`math.log` didn't work) from the newly merged #19115  (Thanks ngimel for pointing out this has landed.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19324

Differential Revision: D15003531

Pulled By: ailzhang

fbshipit-source-id: 8b22138fa27a43806b82886fb3a7b557bbb5a865
2019-04-18 17:58:35 -07:00
ef406ee925 First class modules in the compiler, round 2 (#19167)
Summary:
This PR propagates where we use first-class modules objects into the compiler. This creates a transitionary state where:

* compiler.cpp creates Graphs where `self` is a Module class and attributes/parameters/buffers/submodules are looked up with `prim::GetAttr`
* GraphExecutor still runs "lowered graphs" where the self object has been removed by a compiler pass `lower_first_class_method`.
* Tracing still creates "lowered graphs", and a pass "lift_lowered_method" creates a first-class method graph for things.

* This PR separates out Method and Function. A script::Function is a pure Graph with no `self` bound.  Similar to Python, a script::Method is just a bound `self` and its underlying `script::Function`.
* This PR also separates CompilationUnit from Module. A CompilationUnit is just a list of named script::Functions.  Class's have a CompilationUnit holding the class methods, and Modules also have a CompilationUnit holding their Methods. This avoids the weird circular case Module --has a-> Class -> has a -> Module ...

Details:
* In this transitionary state, we maintain two copies of a Graph, first-class module and lowered. Th first-class one has a self argument that is the module's class type. The lowered one is the lowered graph that uses the initial_ivalues inputs.
* When defining lowered methods using `_defined_lowered` we immediately create the first-class equivalent. The reverse is done lazily, creating lowered_methods on demand from the class.
* The two way conversions will be deleted in a future PR when the executor itself runs first-class objects. However this requires more changes to (1) the traces, (2) the python bindings, and (3) the onnx export pass and would make this PR way to large.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19167

Differential Revision: D14891966

Pulled By: zdevito

fbshipit-source-id: 0b5f03118aa65448a15c7a7818e64089ec93d7ea
2019-04-11 13:55:48 -07:00
f5165ade5b Revert D14842057: Compiler uses first-class modules**
Differential Revision:
D14842057

Original commit changeset: ca6e7b5a4380

fbshipit-source-id: e8f1862a59bf20d5f78648b2fdc53a8b3750ead3
2019-04-11 06:17:01 -07:00
5e1f0b2a07 Compiler uses first-class modules** (#19043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19043
ghimport-source-id: 0c9e80d5f35654af6d472abd5643bff3e9eb9ddf

Differential Revision: D14842057

Pulled By: zdevito

fbshipit-source-id: ca6e7b5a43805240f40b84d30e54495061067dc0
2019-04-11 00:00:48 -07:00
1abbee0f8e Allow Tensor lists to show up in symbolic differentiable graphs. (#16784)
Summary:
It is done by flattening all tensor lists that are inputs/outputs to the
graph into the inputs/outputs list in the autograd graph.

This is less desirable than simply allowing IValues to exist in the
inputs/outputs of autograd::Function but it is substantially less
intrusive.

CaptureList describes the variables captured for backward in a single class.
UnpackInstructs describes how the flattened inputs to backwards are re-packed into lists.
ailzhang

This PR is also part 2 of covering maskrcnn & bert AD formulas, following #16689.

Ops added in this PR:
```
cat
index
meshgrid
reshape
split
split_with_sizes
stack
unbind
```
I will also add a few perf numbers here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16784

Differential Revision: D14104063

Pulled By: ailzhang

fbshipit-source-id: 5ceadadfd67ccaac60c5fd6740786c5354e252b9
2019-04-10 18:16:20 -07:00
dbd9971dd2 move 2ops back to autodiff (#18969)
Summary:
Move these 2 ops back to autodiff to unblock xla CI.
I will leave them for my next PR to cleanup symbolic_variable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18969

Differential Revision: D14816811

Pulled By: ailzhang

fbshipit-source-id: dd8a7e133dcad29560d3d1d25691883960117299
2019-04-06 21:41:25 -07:00
cb3a4a3d28 remove symbolic variable part 1 (#17986)
Summary:
As discussed with gchanan we should deduplicate symbolic_variable and symbolic_script to prepare for the future merge with derivatives.yaml.

This PR moves most easy formulas to symbolic_script.

TODO: run benchmarks to make sure no perf regression

cc: apaszke zdevito wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17986

Differential Revision: D14766412

Pulled By: ailzhang

fbshipit-source-id: d95a3f876e256c0f505779a71587c985571d3b8f
2019-04-05 12:06:47 -07:00
843e6234f5 Fix layernorm ad formula on weight and bias (#18233)
Summary:
Fix the layernorm formula when weight and bias passed in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18233

Differential Revision: D14760375

Pulled By: wanchaol

fbshipit-source-id: d6bd3b137bc04c391aa5c24d021d1f811ba2a877
2019-04-03 16:58:33 -07:00
0512e4e323 Unify namespace of script::Module (#18378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18378
ghimport-source-id: 55c29bb436a2153d29ff2f4488d99d8863c187b1

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18379 Enforce single parent for script submodules
* **#18378 Unify namespace of script::Module**
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.

This removes individual OrderedDicts in favor of a single unified
namespace for all things in a script::Module. This removes a whole
class of bugs where both a method and an parameter could get the
same name, for instance.

Since we no longer have to expose OrderedDict::Item objects, a lot of
downstream code can be simplified.

We no longer now double-store names (both in the key of the dictionary,
and in the object itself).

Differential Revision: D14603723

fbshipit-source-id: b5f7551b3074679623edd6ea70269830353b4d4c
2019-04-03 16:04:17 -07:00
a21e256e8d Fix contiguous AD and Autogradzero inconsistency (#18633)
Summary:
Fixes #17962
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18633

Differential Revision: D14700449

Pulled By: wanchaol

fbshipit-source-id: 3d15d67c01b69b28394a0f2f001db90ed9fd31dc
2019-04-03 12:47:28 -07:00
9c87543124 Enforce check ad in test_jit (#18509)
Summary:
If a test triggers autodiff, it must have a `DifferentiableGraph` in its differentiated forward graph, and this subgraph must have either the original aten node, or the corresponding nodes used in AD formula.

Typically a forward differentiable graph looks like this:
```
graph(%i0 : Float(),
      %i1 : Float()):
  %3 : Float() = prim::DifferentiableGraph_0(%i0, %i1)
  return (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Float(),
      %1 : Float()):
  %2 : Float() = aten::max(%0, %1)
  return (%2)
```
which tells us `aten::max(Tensor self, Tensor other) -> Tensor` is symbolically differentiable.

Update: there're a lot of cases (fusions/ConstantChunk/python implementations) that breaks it so I decided to make the check optionally take node names if different from function name.
~~[OLD]Theoretically I could also check if `aten::max` is in the differentiable block or not to be more precise, but there're also cases like `chunk` where in a differentiable block it's replaced with a prim node (ConstantChunk) and we will have to special case them. Any suggestions here (to be more precise or no) is very welcome!~~

We used to have a list containing nn tests should be run against AD, I moved it to an field when constructing our test(either torch or nn). I think it's cleaner this way, and it matches the fact that for the same op we support one schema of it but not all, in this way we could just turn on the corresponding test which triggers that supported schema.

cc: apaszke zdevito wanchaol ngimel for a review

[Done] :
- Going through a manual second pass of all tests to check if they should enable AD test or not....
- Add a readme about how to add AD for an op and how to add/enable its test in test_jit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18509

Differential Revision: D14696811

Pulled By: ailzhang

fbshipit-source-id: c5e693277baac585cd3aed5ab2c0e7faa5e6f29f
2019-03-31 08:51:30 -07:00
ba4de667fa change dropout lowering in symbolic_script (#18375)
Summary:
Dropout is now eligible for fusion, and generated fused kernels are just as fast as dropout in ATen. Change its lowering in symbolic script so that it can actually be fused. Still special-cased for cuda, because without fusion this lowering is less efficient than current (bernoulli_ * input). Testing is covered by the test case that ailzhang added (test_dropout_cuda).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18375

Differential Revision: D14611938

Pulled By: soumith

fbshipit-source-id: 11b18f4784e6c9265e382a8f8deca7add8df3b37
2019-03-25 20:05:11 -07:00
6c9b312fd4 Add addcmul, lerp to fuser, enable scalar->float specialization in symbolic script (#18081)
Summary:
This PR did two things:

1. Enable scalar->float specialization in symbolic script, so AD formula that contains scalar in the schema, should write `float` instead.
2. add addcmul, lerp to AD and fuser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18081

Differential Revision: D14490493

Pulled By: wanchaol

fbshipit-source-id: b3b86d960d5f051b30733bc908b19786111cdaa4
2019-03-25 11:05:45 -07:00
a50ba7e238 specialized CUDA impl for dropout in AD (#17756)
Summary:
In aten we have a _fused_dropout implementation for CUDA case. As ngimel suggested if we discard it in JIT AD, it hurts performance.

It doesn't seem ideal to include backend specific implementation in AD, but this is helpful to prevent performance regression atm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17756

Differential Revision: D14368999

Pulled By: ailzhang

fbshipit-source-id: 9a371c5020f630e8f6e496849ec9772b6f196169
2019-03-19 10:34:15 -07:00
be364ac8d7 Specify overload name in function schema (#18037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18037

The FunctionSchema can now store an overload name and the parser knows how to parse it. Specify like this:

    my_func.overload1(arg1: Tensor) -> Tensor
    my_func.overload2(arg1: Tensor, arg2: Tensor) -> Tensor

Reviewed By: zdevito

Differential Revision: D14467497

fbshipit-source-id: 8832b32f07351bb61090357b17b77a6a2fed3650
2019-03-15 16:58:13 -07:00
10d64a1372 add tanh to AD and fix layernorm schema
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17816

Differential Revision: D14453048

Pulled By: wanchaol

fbshipit-source-id: 45815db964a4d9ee85d8933e335b47f215e3c467
2019-03-14 11:20:40 -07:00
f9820e55af initializing class value (#17585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17585

Create a sugared value that represents a class during initialization. This is
so that assignments to attributes correctly define attributes in __init__ but
raise an error elsewhere.

Reviewed By: shannonzhu

Differential Revision: D14263403

fbshipit-source-id: 09b2feeb272302f00a79c2a0302fbdf5483aed6a
2019-03-11 19:13:52 -07:00
aa57f17808 add layernorm to AD
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17702

Differential Revision: D14368472

Pulled By: wanchaol

fbshipit-source-id: 8db390e39444078258ad1d34ba74d6ddafa5d02b
2019-03-07 13:36:51 -08:00
fefaebabba fix dropout AD & rename range to rangelist (#17691)
Summary:
fixes #17669
Address apaszke 's comments in #17523
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17691

Differential Revision: D14328083

Pulled By: ailzhang

fbshipit-source-id: 9ec4a54f13bfd1aaf4b1821dd00c31793ac07a44
2019-03-05 20:50:10 -08:00
03132c1f56 convolution/matmul/dropout (#17523)
Summary:
* Add AD formula for _convolution & matmul & dropout
* add prim::range, fixes #17483
Example:
```
dim = 3
x = range(dim)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17523

Differential Revision: D14254002

Pulled By: ailzhang

fbshipit-source-id: ba60d77b047db347929b72beca2623fb26aec957
2019-02-27 21:41:59 -08:00
68e90a398e Temporarily disable select/topk/kthvalue AD (#17470)
Summary:
Temporarily disable them for perf consideration. Will figure out a way to do `torch.zeros(sizes, grad.options())` in torchscript before enabling these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17470

Differential Revision: D14210313

Pulled By: ailzhang

fbshipit-source-id: efaf44df1192ae42f4fe75998ff0073234bb4204
2019-02-25 16:29:11 -08:00
9aae82bc2c Improvements for current AD (#17187)
Summary:
This PR removes a few size of `self` that passed from forward pass to backward pass when `self` is already required in backward pass. This could be reason that cause the potential slow down in #16689 . I will attach a few perf numbers (still a bit volatile among runs tho) I got in the comment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17187

Differential Revision: D14179512

Pulled By: ailzhang

fbshipit-source-id: 5f3b1f6f26a3fef6dec15623b940380cc13656fa
2019-02-22 14:34:14 -08:00
4c6da649e5 Partial support for kwarg_only arguments in script (#17339)
Summary:
This provides the minimum necessary to allow derivative formulas for things that have a kwarg only specifier in their schema. Support for non-parser frontend default arguments for kwargs is not completed.
Fixes #16921
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17339

Differential Revision: D14160923

Pulled By: zdevito

fbshipit-source-id: 822e964c5a3fe2806509cf24d9f51c6dc01711c3
2019-02-21 15:27:06 -08:00
90950f79c7 fix missing std (#17263)
Summary:
add missing std introduced by #16689 . Investigating why this wasn't caught in CI (nor my local dev environment).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17263

Reviewed By: ezyang

Differential Revision: D14134556

Pulled By: ailzhang

fbshipit-source-id: 6f0753fa858d3997e654924779646228d6d49838
2019-02-20 16:47:35 -08:00
82aa511146 move prim::None to prim::Constant (again) (#17186)
Summary:
Trying to land again, make prim::None into a case of prim::Constant. Reverted the previous landing because it broke an important onnx export test.

https://github.com/pytorch/pytorch/pull/16160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17186

Differential Revision: D14115304

Pulled By: eellison

fbshipit-source-id: 161435fc30460b4e116cdd62c7b2e5b94581dcb7
2019-02-19 11:45:50 -08:00
20fd6dca77 fix missing constant in adaptive_avg_pool2d AD (#17180)
Summary:
Thanks ngimel for pointing this out!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17180

Differential Revision: D14113001

Pulled By: ailzhang

fbshipit-source-id: 78e7d7f2cda3889138e2bf26a54980c2cc665882
2019-02-15 21:14:34 -08:00
91c1d728ac Revert D14109636: [pytorch][PR] move prim::None to a case in prim::Constant
Differential Revision:
D14109636

Original commit changeset: d26fd3839761

fbshipit-source-id: c8c8113e2bff49ea93235732603e6ebc89356533
2019-02-15 16:38:12 -08:00
7caa21f5ca move prim::None to a case in prim::Constant (#16160)
Summary:
This change simplifies analysis done on constants since prim::None does not need to be handled separately now.  To check if a constant node is None, use node->isNone().

Next step will be to remove prim::Undefined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16160

Differential Revision: D14109636

Pulled By: eellison

fbshipit-source-id: d26fd383976163a2ddd4c24984bd672a541cc876
2019-02-15 16:27:57 -08:00
7157be8622 Add special ops for BatchNorm symbolic differentiation (#15403)
Summary:
The main problem there is with differentiating batch norm statically
is that we make a lot of complex run-time decisions about the backend
we choose. Then, the autograd derivatives are implemented for every
backend separately, which makes sense, because they might be saving
buffers containing different values. To resolve the issue, the forward
op returns an index of the chosen backend, and the backward function
takes it as an argument, such that it knows how to interpret the buffers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15403

Differential Revision: D14098815

Pulled By: ailzhang

fbshipit-source-id: 7fcd3e6e0566433e81fe8286fb441c1ecaf198ad
2019-02-15 15:40:28 -08:00
8d33eb450e Fix avg pool2d api (#17166)
Summary:
Fix xla breakage (partially).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17166

Differential Revision: D14106954

Pulled By: ailzhang

fbshipit-source-id: 35ae6713272d0517b66da2ee9209f49015492b89
2019-02-15 13:58:30 -08:00
f062f5fd4a add std to autodiff, and mean/var/std to operator set (#17137)
Summary:
supersedes #16684
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17137

Differential Revision: D14096724

Pulled By: wanchaol

fbshipit-source-id: d801d70029a6a1f5851400ff4094c0299c102b2b
2019-02-14 23:18:53 -08:00
91c50aeec6 Speed-up adaptive average pooling for the common case of size=1 output (#17011)
Summary:
When adaptive pooling has to produce a single pixel feature map, it is faster to do so by calling .mean(). Backward calls a pretty inefficient cuda kernel with atomics, which becomes ridiculously slow for halfs. For half this PR provides approx 30x speed-up for adaptive average pooling, which results in 30% end-to-end speed-up on senet. Improvements are smaller for float, but still significant (approx 5x).
Also this PR unifies handling of 3d (no batch dimension) and 4d tensors, using negative dimension indices.
cc ezyang for review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17011

Reviewed By: ailzhang

Differential Revision: D14078747

Pulled By: soumith

fbshipit-source-id: 0eb9255da2351190a6bcaf68c30e2ae2402a2dd9
2019-02-14 21:15:16 -08:00
b0545aa85f maskrcnn & bert AD coverage part 1 (#16689)
Summary:
- Moved a few functions from `autograd` namespace to `aten` namespace to be visible from JIT nativeResolver.
- Added a hack to loop up keyword only argument. Will add proper support for kw only later
- Simulate function overload in aten using `_<number>` as function name suffix.
- Even `forward` returns multiple outputs like in `kthvalue`, there's at most one requires grad that we currently support.
- Removed the `TensorList` related ops here since partial `TensorList` support is prone to bugs. Our symbolic diff for `cat` was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting `TensorList` or sth like `prim::ConstantChunk`  and leave them for next PR.

Ops supported in this PR:
```
erf
expand_as
index
kthvalue
mean
permute
pow
rsub
select
sqrt
squeeze
t
to
topk
transpose
view
var
embedding
logsumexp
// grad is None
_dim_arange
contiguous
nonzero
ones_like
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16689

Differential Revision: D14020806

Pulled By: ailzhang

fbshipit-source-id: a5e2c144a7be5a0d39d7ac5f93cb402ec12503a5
2019-02-14 15:36:39 -08:00
20d45c43d7 Get more fusion after autodiff uses SumToSize (#14957)
Summary:
Here is a fresh attempt at getting some fusion back in autodiff-generated graphs in the presence of SumToSize.

- The sum to size operator is now  `aten::_grad_sum_to_size` to allow symbolic script differentiation (and that in turn would need to use this in place of sum_to_size to signal that it strictly operates on gradients). This is also used in the autodiff code, replacing `prim::SumToSize`.
- `_grad_sum_to_size` is now fusable, `cat`s - which are fused afterwards thanks to Adam's simplification of the code - are only fused if there is no `_grad_sum_to_size` in the fusion group.
- I push the `_grad_sum_to_size` out of the the fusion group when compiling and record the desired summations in the KernelSpec. The reasoning is the following:
  - As the autodiff is a repeated applicaiton of the chain rule, we always have the pattern `grad_in = mm(A, grad_out)`,  with A often diagonal for cases interesting to the fuser, whence it is `grad_in = a * grad_out` (a pointwise multiplication). We know that only `grad_out` may have AutodiffGradSumToSize applied, so we can commute AutodiffGradSumToSize with the `mul` (and `div` and `neg` are of similar origin).
  - For `type_as` the gradient might be giving the type, so just skip SumToSize,
  - `add` (which was inserted as `prim::AutogradAdd`) adding gradients when the forward used the same value in several places. This is non-broadcasting, so we know that the two arguments would have the same sizes as inputs - which is good so we don't have to do bookkeeping of the two parts.

Details:
- During fusion, the Tensor arguments are always kept as the first parameters of the fusion group to accomodate indexing assumptions in the fuser.
- The rewriting of the fusion group to record the necessary output transformation and eliminate `_grad_sum_to_size` from the fusion group is now in the fuser compile step.
- In the execution step, the arguments are split into Tensor / Non-Tensor and the non-tensor args are mostly forgotten about except for doing `sum_to_size` at the end. This would want to be improved if/when we fuse nonconstant scalar arguments.
- In a number of places in the fuser, the non-Tensor arguments to the fusion group needed to be ignored.

Thank you, apaszke for the insightful discussion. All bad ideas and errors are my own.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14957

Differential Revision: D13888173

Pulled By: zou3519

fbshipit-source-id: 071992c876e8b845f2b3e6329ae03a835d39a0ea
2019-01-31 12:24:38 -08:00
f636dc9276 clang format world (#15524)
Summary:
The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook.

Here is a list of non-mechanical changes:
- I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting.
- Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas
- Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas
- Small improvements to the precommit hook clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524

Differential Revision: D13547989

Pulled By: suo

fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493
2018-12-26 06:55:01 -08:00
70aafad08a AD support for adaptive_avg_pool2d (#15459)
Summary:
This adds AD support for adaptive_avg_pool2d, which is necessary for resnet50 in pytorch/vision:master. cc: soumith asuhan dlibenzi

apaszke  I saw that autodiff bug you fixed in #15403 , as it doesn't prevent this PR from passing, so I'll leave it for your PR to fix it. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15459

Differential Revision: D13534732

Pulled By: ailzhang

fbshipit-source-id: 4e48b93e35d5ecfe7bd64b6a132a55b07843f206
2018-12-21 15:38:24 -08:00
1a2ec10bd4 Support enough of closures to write autograd functions (#15411)
Summary:
This PR adds enough of the infra for supporting closures (inner script functions) in order to allow us to expression symbolic gradients using them. We do not actually ever run graphs that contain these closures. The symbolic_script infrastructure just extracts them out of the original forward graph and turns them into discrete forward/backward pairs. This cuts down on the type annotations necessary to write forward/backward pairs and aligns closely with the "differentiator" function approach to expression reverse-mode AD.

Example:

This code:
```
import torch

r = torch.jit.CompilationUnit(
'''
def mul_forward(self, other):
    def backward(grad_output):
        grad_self = (grad_output * other).sum_to_size(self.size())
        grad_other = (grad_output * self).sum_to_size(other.size())
        return grad_self, grad_other
    return self * other, backward
''')

print(r.module.code)
```

Will produce this graph (pretty printed for clarity):

```
def mul_forward(self,
    self: Tensor,
    other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]:
  backward = (self.__lambda, (other, self))
  return (torch.mul(self, other), backward)

def __lambda(self,
    context: Tuple[Tensor, Tensor],
    grad_output: Tensor) -> Tuple[Tensor, Tensor]:
  other, self, = context
  grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self))
  grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other))
  return (grad_self, grad_other)
```

symbolic_script will then do some modifications to remove the unsuppored prim::Function node, yielding:

```
def mul_forward(self,
    self: Tensor,
    other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]:
  return (torch.mul(self, other), (other, self))

def backward(self,
    context: Tuple[Tensor, Tensor],
    grad_output: Tensor) -> Tuple[Tensor, Tensor]:
  other, self, = context
  grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self))
  grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other))
  return (grad_self, grad_other)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15411

Differential Revision: D13523340

Pulled By: zdevito

fbshipit-source-id: 4d4a269460e595b16802c00ec55ae00e3e682d49
2018-12-20 14:39:11 -08:00
6ab2e7442d Autograd using torchscript (#14604)
Summary:
This PR enables autodiff to use the forward/backward graph compiled from python code, instead of using symbolic gradients(modifying the original graph directly).

We put the map in a separate .h file for now to wait for the native_functions.yaml and derivatives.yaml merge. This should ideally go into native_functions.yaml eventually.

This PR should be enough to unblock us for now, we can start writing gradients for aten functions in python.

Differential Revision: D13494635

Pulled By: ailzhang

fbshipit-source-id: f8d51a15243ac46afd09d930c573ccdfcd9fdaaf
2018-12-18 19:10:57 -08:00