Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33359
Updated alias analysis kind to FROM_SCHEMA so input tensors can be marked as nonmutable
when appropriate, allowing for constant folding of these tensors.
Needed to update the schemas of the _out variants with annotations to mark the output input
tensor as aliased and mutable.
Test Plan:
```
import torch
class M(torch.nn.Module):
def __init__(self):
super(M, self).__init__()
def forward(self, x):
w = torch.tensor([3], dtype=torch.float)
w = torch.quantize_per_tensor(w, 1.0, 0, torch.qint8)
y = torch.tensor([3], dtype=torch.float)
y = torch.quantize_per_tensor(w, 1.0, 0, torch.qint8)
return torch.ops.quantized.add_out(x, w, y)
m = torch.jit.script(M())
torch._C._jit_pass_constant_propagation(m.graph)
print(m.graph)
```
```
graph(%self : __torch__.___torch_mangle_9.M,
%x.1 : Tensor):
%11 : int = prim::Constant[value=12]() # <ipython-input-11-1dd94c30cb58>:9:49
%9 : float = prim::Constant[value=1.]() # <ipython-input-11-1dd94c30cb58>:9:41
%10 : int = prim::Constant[value=0]() # <ipython-input-11-1dd94c30cb58>:9:46
%36 : QInt8(1) = prim::Constant[value={3}]()
%y.2 : Tensor = aten::quantize_per_tensor(%36, %9, %10, %11) # <ipython-input-11-1dd94c30cb58>:11:12
%24 : Tensor = quantized::add_out(%x.1, %36, %y.2) # <ipython-input-11-1dd94c30cb58>:12:15
return (%24)
```
As expected, the aten::quantize_per_tensor() for w is now folded. The aten::quantize_per_tensor()
for y is not folded, since that tensor is aliased/modified.
Differential Revision: D19910667
fbshipit-source-id: 127071909573151dc664500d363399e3643441b7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33666
it's caused by a revert. So let's skip it.
Test Plan: ci
Reviewed By: hl475
Differential Revision: D20057382
fbshipit-source-id: d71af8efe68b31befcef5dddc372540e8a8ae2ac
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33572
This reverts commit 687a7e4a2566861c53c8fb53a80b198465168b38.
Original PR #33305
Reland with BC tests whitelisted. See https://github.com/pytorch/pytorch/issues/33580 for reasoning why this change is not actually BC breaking.
Test Plan: Imported from OSS
Differential Revision: D20011011
Pulled By: ezyang
fbshipit-source-id: 116374efc93af12b8ad738a0989d6f0daa9569e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33555
A quick fix for the PyText model (in internal production) on the new bytecode format.
Test Plan: Imported from OSS
Differential Revision: D20008266
Pulled By: iseeyuan
fbshipit-source-id: 1916bd0bf41093898713c567c7f6fa546b9ea440
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32791
When a registered operator has varags (ends with ... in its schema),
the interpreter now appends the number of arguments to the top of
the stack before invoking the operator. This allows the removal of more
uses of Node* in the interpreter.
This PR also then cleans up the constructors for Operator to make
it more likely someone chooses the correct one. After making these ops:
```
USES NODE: prim::TupleUnpack(...) -> (...)
USES NODE: prim::TupleSlice(...) -> (...)
USES NODE: prim::TupleConstruct(...) -> (...)
USES NODE: prim::ListUnpack(...) -> (...)
USES NODE: prim::ListConstruct(...) -> (...)
USES NODE: prim::DictConstruct(...) -> (...)
USES NODE: prim::Constant() -> (...)
USES NODE: prim::isinstance(...) -> (...)
USES NODE: prim::CreateObject(...) -> (...)
USES NODE: prim::fork(...) -> (...)
USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack
```
Into interpreter primitives, we can remove all but two constructors for operators:
one that is (schema_string, operation), and one that is (symbol, op_creator) for
the remaining weird primitives.
Test Plan: Imported from OSS
Differential Revision: D19673158
Pulled By: zdevito
fbshipit-source-id: 95442a001538a6f53c1db4a210f8557ef118de66
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33387
CI is broken. Skip two functions to fix the problem.
Test Plan: ci
Reviewed By: hl475
Differential Revision: D19926249
fbshipit-source-id: a46d1465c59de8616d2af5fb0b9cc18532359f88
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33190
This enable the initial RRef type to be used inside TorchScript, user
could pass a python RRef into a torchscript function and call to_here
inside. Specifically, this PR:
- Add RRef schema type parsing
- Add python interop for RRef in Python and into JIT
- register to_here op in register_distributed_ops
More support for RRef in TorchScript will be added in future PRs
Test Plan: Imported from OSS
Differential Revision: D19871244
Pulled By: wanchaol
fbshipit-source-id: 7eca6c491a84666b261c70806254b705603bd663
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32761
This replaces ImplicitTensorToNum with result-specific operators like
IntImplicit, FloatImplicit, or ScalarImplicit. Note that ScalarImplicit
was not correctly implemented before and this PR fixes the lapse.
This does not change on-disk serialization because these operators are not
serialized directly but written as eg. `annotated(int, foo)`.
Test Plan: Imported from OSS
Differential Revision: D19615385
Pulled By: zdevito
fbshipit-source-id: 48575f408e8219d2ec5b46936fc2aa691f283976
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32843
fix the ci by skipping aten::join
Test Plan: ci
Reviewed By: hl475
Differential Revision: D19650584
fbshipit-source-id: 4446eef568ded334217ff9205a795daffebe41a1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32044
Fix the list of views in the codegen:
- Move `narrow` out of the autograd functions since it's now implemented with slice.
- Add `split_with_sizes` that was missing from the list
- Remove special formulas for both `split` and `split_with_sizes`. Both used not to be considered as views. When they are, all the rnn code breaks because it uses them in an invalid way. The generic formula will generate one `narrow` Node for each output. Which is always valid.
The diff for the generated code can be found [here](https://github.com/pytorch/pytorch/compare/16eff6e...albanD:06d6e85) (outdated for last commit)
Test Plan: Imported from OSS
Differential Revision: D19409648
Pulled By: albanD
fbshipit-source-id: 5ebc4c978af500403f7f008c0231b7db0cabab26
Summary:
Compared to cuDNN bias, PyTorch add has the following advantage:
- faster, especially for backward (see: https://github.com/zasdfgbnm/things/blob/master/2019/conv-backward-profile.md)
- handles 64bit indexing automatically
- has less code, less maintenance effort
ngimel I submit this PR early so the CI could start building it. But I have not tested it locally yet (still waiting for compiling).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31524
Differential Revision: D19264244
Pulled By: ngimel
fbshipit-source-id: cb483d378a6d8bce0a05c3643a796e544bd8e8f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30612
The first version to move prim ops to c10 registration. After the reviewers are fine with the initial changes, more operators will be moved in the same style.
Test Plan: Imported from OSS
Differential Revision: D19237648
Pulled By: iseeyuan
fbshipit-source-id: c5a519604efffb80564a556536f17d829f71d9f9
Summary:
Originally, we only print one broken schema. With this changeset, all the broken schemas are printed out.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31628
Reviewed By: hl475
Differential Revision: D19231444
Pulled By: houseroad
fbshipit-source-id: 3dd5b4609a6a9a9046e95f2f30deb9beeb5dcd56
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30239
Use unboxed registration per smessmer 's request. For some ops with optional arg or tensor list that unboxed registration are not supported, we still use boxed.
Test Plan: Imported from OSS
Differential Revision: D18653846
Pulled By: iseeyuan
fbshipit-source-id: c22ce8111dfff0ba63316a9bcfe2b712b2d31fc1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29577
`torch.autograd.grad` can return none is one of the input is not in the
autograd graph or not requires_grad, this fix it so that it return a
list of optional tensor instead of list of tensor.
This might have BC issue unfortunately, but I think it's rare both
internal and external (only training use it, and most of the training
use backward, instead of autograd.grad), so whitelist it.
Test Plan: Imported from OSS
Differential Revision: D18491642
fbshipit-source-id: d32b2b3446cf9e8b9a98f6d203a21a75643d8991
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29970
Add operators and JMP instruction used in PyText model in lite interpreter.
Test Plan: Imported from OSS
Differential Revision: D18555483
fbshipit-source-id: e5124d908762f78fb548505aecf33be8c8503275
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29960
Overload name is required in mobile operators with the same name but different schema. Since it's not used in JIT, it's safe to add overload names for JIT operators.
Test Plan: Imported from OSS
Differential Revision: D18555484
fbshipit-source-id: b451379af24e255d8b0c61b964ae32fd1a64ed34
Summary:
This reverts the 9a9bb448ee49a1493f22bbbeed4af92b1364fce9
Fixing the broken case which reverts the previous commit.
details about fix:
modified: aten/src/ATen/native/Convolution.cpp
called contiguous on 3D input tensor. This avoids the code path to accidentally
recognize the input as channel_last stride, due to unsqueezing of permuted 3d
tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29361
Differential Revision: D18371964
Pulled By: VitalyFedyunin
fbshipit-source-id: a5985f4687b37e183649fa35b8ccdb50368ebfdf
Summary:
Added nhwc support for:
1. cudnn_batch_norm & cudnn_batch_norm_backward
2. cudnn_convolution_forward & cudnn_convolution_backward
3. cudnn_convolution_transpose & cudnn_convolution_transpose_backward
patching suggest_memory_format for convolution
suggest_memory_format has ambiguous meaning for two cases:
1. tensor with NCHW where C = 1.
we could use stride of C as a hint to tell the intended memory format.
2. tensor with NCHW where H == W == 1.
there's no way to identify the intended memory format from strides.
Currently we fallback to NCHW whenever we see contiguous tensor. Hence avoiding
ambiguity for some of the special cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23861
Differential Revision: D18263434
Pulled By: VitalyFedyunin
fbshipit-source-id: dd9f69576ec12fec879cd87a3d446931371360d9
Summary:
prim::AutogradAnyNonZero is optimized away under normal circumstances (a graph executor specializes tensor arguments and runs `specializeAutogradZero`), so the change should be backward compatible for as long as we are running the original executor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28852
Differential Revision: D18213118
Pulled By: Krovatkin
fbshipit-source-id: 223f172c59e5f2b05460db7de98edbadc45dd73d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27244
Adds memory_format keyword argument (positional for cpp).
'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.
---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D17980310
Pulled By: VitalyFedyunin
fbshipit-source-id: 00a39b40daa4b8ee63c32e60d920222f8be2d6a1