Commit Graph

18149 Commits

Author SHA1 Message Date
a212a5b97a ir.cpp, module.cpp: clang-format. (#20592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20592
ghimport-source-id: 98dc62a9595c6b94706960274ce9beebacc9ca00

Differential Revision: D15375131

Pulled By: ZolotukhinM

fbshipit-source-id: 7edbb14a337d1646b48756eef4163846648cbd93
2019-05-17 09:21:32 -07:00
b90790ab1b Don't split 256-bit AVX2 load/store intrinsics (#20609)
Summary:
Recent versions of GCC split unaligned load and store intrinsics into
two 128-bit instructions. On old processors (Sandy Bridge) this was a
bit faster for unaligned data, but bit slower for aligned data. On new
processors (Intel Haswell+, recent AMD) splitting loads is slower on
both aligned and unaligned data.

Clang, MSVC, and ICC do not split unaligned load and store intrinsics.

There's a good explanation here:
https://stackoverflow.com/questions/52626726/why-doesnt-gcc-resolve-mm256-loadu-pd-as-single-vmovupd#tab-top

Splitting load and store intrinsics makes no sense in our AVX2
configuration because the CPUs that support AVX2 instructions are the
same CPUs where splitting is disadvantageous on all data alignemnt.

Note that this doesn't change the AVX configuration (used by CPUs that
support AVX but not AVX2). It's possible this would be benficial for
that configuration too (our data is usually 32-byte aligned), but I'd
prefer the conservative change for now.

torch.add generated assembly (hot loop) (GCC 7.3.0)
before:
https://gist.github.com/colesbury/066376537bccd514daf8fe4ab54d8295

after:
https://gist.github.com/colesbury/8b4b948145001d44b225c51d2428bb91

Timing of `torch.add(x, y, out=z)` for size 10240 (1 thread, Broadwell,
no turbo):
before: 7.35 us after: 6.39 us

(Take the torch.add timings with a grain of salt. The difference in timings
is much larger than I would expect.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20609

Differential Revision: D15385800

Pulled By: colesbury

fbshipit-source-id: 66415b148a3b19360b9de9881af594ab46547b6f
2019-05-17 09:16:17 -07:00
000d73ccde fix WAR race (#20182)
Summary:
was flagged by racecheck.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20182

Differential Revision: D15393536

Pulled By: ezyang

fbshipit-source-id: ad4849c9fb2c8feb966be1c4ca0dadd7360f58fe
2019-05-17 09:06:52 -07:00
3c69c9a7fe Refine CosineAnnealingWarmRestarts doc for issue #20028 (#20267)
Summary:
Fixes #20028
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20267

Differential Revision: D15393514

Pulled By: ezyang

fbshipit-source-id: 03f270a577fc3e0414d3f07d97512a409b08f7cd
2019-05-17 09:02:28 -07:00
cfb87c1022 Update documentation for CTCLoss (#20422)
Summary:
Change `Inputs` to `Shape` to unify the format of CTCLoss `class`, and add the type of `Output` in `Shape`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20422

Differential Revision: D15393484

Pulled By: ezyang

fbshipit-source-id: 5b49647f9740de77db49a566fa2de74fcecd9110
2019-05-17 09:02:25 -07:00
35e0015c70 Export sign onnx operator (#20470)
Summary:
A trivial commit that supports exporting sign operator
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20470

Differential Revision: D15393446

Pulled By: ezyang

fbshipit-source-id: 12fb1c147d016205abf814907d667f7d8b074ae1
2019-05-17 08:57:22 -07:00
4e551a7edb Make C10_NODISCARD macro more portable for nvcc+clang. (#20324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20324
ghimport-source-id: e51181c82f87c946b5ffcb87b0ad71a056cb4659

Differential Revision: D15359317

Pulled By: ezyang

fbshipit-source-id: d88798f13a61c74456641ddec8250c08ce8af240
2019-05-17 08:57:19 -07:00
690efa5220 Remove checks for CUDA 8 in LU-based tests (#20482)
Summary:
CUDA 8 is no longer supported and removed from CI, so these checks are irrelevant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20482

Differential Revision: D15393438

Pulled By: ezyang

fbshipit-source-id: ac0979bf660b3314eec502c745e34ce4940bda0e
2019-05-17 08:51:56 -07:00
110ed511a4 Make check-doxygen.sh output more interpretable. (#20362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20362
ghimport-source-id: ac791884dc6d3954f69d8fc997b2b561f435e0e7

Differential Revision: D15375139

Pulled By: ezyang

fbshipit-source-id: c8aa0f991430269090e068f828810bae7aa39a07
2019-05-17 08:47:11 -07:00
1136ad59f9 Enable simd and loop vectorizer with MSVC
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20530

Differential Revision: D15392676

Pulled By: ezyang

fbshipit-source-id: c8fda0c7835127f81adf55016223bb4dc14ff40a
2019-05-17 08:38:56 -07:00
fa4ca4e70e Emphasize all DDP forward() outputs must participate in computing loss (#20586)
Summary:
CC borguz chenyangyu1988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20586

Reviewed By: ezyang

Differential Revision: D15373674

Pulled By: mrshenli

fbshipit-source-id: b986918b3592616a9bcc88fba1b8fd53016f68d7
2019-05-17 07:35:49 -07:00
c941abbc0a Fix upsample kernel launch / reorder arguments (#20505)
Summary:
this is a follow up for https://github.com/pytorch/pytorch/pull/19630
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20505

Differential Revision: D15392706

Pulled By: ezyang

fbshipit-source-id: 5a8a7aacdbcf740508baf2b6e0c081c4e5a0390f
2019-05-17 07:30:50 -07:00
3bc0bd9534 Fix caffe2 build failure on Windows (#20574)
Summary:
Fixes #20568.
Looks like CMake is passing `/MD` when we call `add_library`. We need to fix these with C source files too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20574

Differential Revision: D15392682

Pulled By: ezyang

fbshipit-source-id: c92034d8725fcec48fd7db6cf5322868e956dc6b
2019-05-17 07:21:42 -07:00
4c806a9e8a Allow tuples for scale_factor argument in nn.Upsample (#20581)
Summary:
Fixes #20523 .

nn.Upsample was unable to accept tuple inputs for the scale_factor argument due to direct casting to float, which was done in #17732.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20581

Differential Revision: D15392622

Pulled By: ezyang

fbshipit-source-id: b56ba8197a5bbf8891bc7e1bebf5cad63dcab04d
2019-05-17 07:14:18 -07:00
409200df59 Move inter-op settings into ATen/Parallel (#20050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20050
ghimport-source-id: cc102bab8abf3e56c099245976786317ed63ea14

Differential Revision: D15248576

Pulled By: ilia-cher

fbshipit-source-id: 55ddcb7af387ddfc68a42ac7167de07ea648e249
2019-05-17 03:12:02 -07:00
36d3398aa5 Clang-format ImageInputOp (#20441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20441

This op is fairly complex and the fact that it isn't formatted
correctly makes things that much harder to reason about. Clean it up.

Reviewed By: dreiss

Differential Revision: D15220006

fbshipit-source-id: 30632d8bdbf15f96e73d8b6c96c5f29c052e6e7c
2019-05-16 23:00:09 -07:00
ea9c6e7581 eliminate FE_INVALID in unit test (#20502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20502

Following D15307410 removing more floating point exceptions in unit tests

Reviewed By: hx89

Differential Revision: D15340930

fbshipit-source-id: 269fc75e0800bc9d39126767a0f3ca15cd8b0cad
2019-05-16 21:55:28 -07:00
e4c7f59fbc Shallow-copy indices and values in sparse tensor ctor (#20614)
Summary:
(Reopens https://github.com/pytorch/pytorch/pull/20330 and fixes test error.)

After the Variable/Tensor merge, there is no guarantee that `indices` and `values` passed into the sparse tensor constructor don't contain AutogradMeta. However, we want to maintain the existing invariant that `indices_` and `values_` of a sparse tensor don't contain AutogradMeta, and to achieve this we need do shallow-copy in the sparse tensor constructor.

Note that this is BC-breaking for code that changes the sizes / strides of the indices or values tensor after it's used to create a sparse tensor. In current master, such changes will be reflected in the sparse tensor and break sparse tensor invariants. After this PR, those changes will not be reflected in the sparse tensor, and thus the sparse tensor invariants are always preserved. Specifically, running in-place size/stride-changing ops such as `resize_` / `resize_as_` / `as_strided_` / `set_` / `transpose_` on the original values tensor will not update the sparse tensor's `values_`. For example:
```python
# Calling resize_ on non-requires-grad value tensor
i2 = torch.zeros([1, 1])
v2 = torch.ones([1, 2, 3])
t2 = torch.sparse_coo_tensor(i2, v2, torch.Size([2, 2, 3]))
v2.resize_(4, 5)
t2.coalesce().values().size()
# On current master, this throws "indices and values must have same nnz, but got nnz from indices: 1, nnz from values: 4", because resizing the original value tensor affects `values_` of the sparse tensor.
# After this PR, this prints "torch.Size([1, 2, 3])", which means resizing the original value tensor doesn't affect `values_` of the sparse tensor.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20614

Differential Revision: D15385811

Pulled By: yf225

fbshipit-source-id: e963fcf5e4097f8c881b56145f408565d97cf5c1
2019-05-16 18:35:05 -07:00
3c86d597c4 update legacy plus one for mpscnn
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20554

Reviewed By: jerryzh168

Differential Revision: D15362378

fbshipit-source-id: 070cd8314257386036dca89167c738c6602b3f33
2019-05-16 18:17:18 -07:00
8bdbd59d0c handle box plus one for gpu generate_proposals
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20553

Reviewed By: newstzpz

Differential Revision: D15362108

fbshipit-source-id: 53b1ef132288855f8977748442bfe5e5806c6c6e
2019-05-16 18:17:15 -07:00
373e6a78bf make box plus one a legacy argument in detection ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20550

Reviewed By: newstzpz

Differential Revision: D15348610

fbshipit-source-id: 12b1e119e9bc9191ba9f2aa6d695ef215780c349
2019-05-16 18:17:12 -07:00
220e6894c5 Rename qint8 data type (#19932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19932

In preparation to add int8_t data type for QTensor

Reviewed By: zafartahirov

Differential Revision: D15137838

fbshipit-source-id: 59462c36d6fc5982986d4196bf3f32f49bb294d7
2019-05-16 18:09:28 -07:00
980982ac09 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: d0158f7e77a915ffbc28c10de8864d2ae9b24e6f
2019-05-16 16:06:55 -07:00
2ddf126b96 Revert D15373683: [pytorch][PR] [BC-breaking] Shallow-copy indices and values in sparse tensor ctor
Differential Revision:
D15373683

Original commit changeset: 32e7275d7121

fbshipit-source-id: ed1786ee9ffa11f7c14c9cd10be6db48285dc57a
2019-05-16 15:22:48 -07:00
4f02321a9a Shallow-copy indices and values in sparse tensor ctor (#20330)
Summary:
After the Variable/Tensor merge, there is no guarantee that `indices` and `values` passed into the sparse tensor constructor don't contain AutogradMeta. However, we want to maintain the existing invariant that `indices_` and `values_` of a sparse tensor don't contain AutogradMeta, and to achieve this we need do shallow-copy in the sparse tensor constructor.

Note that this is BC-breaking for code that changes the sizes / strides of the indices or values tensor after it's used to create a sparse tensor. In current master, such changes will be reflected in the sparse tensor and break sparse tensor invariants. After this PR, those changes will not be reflected in the sparse tensor, and thus the sparse tensor invariants are always preserved. Specifically, running in-place size/stride-changing ops such as `resize_` / `resize_as_` / `as_strided_` / `set_` / `transpose_` on the original values tensor will not update the sparse tensor's `values_`. For example:
```python
# Calling resize_ on non-requires-grad value tensor
i2 = torch.zeros([1, 1])
v2 = torch.ones([1, 2, 3])
t2 = torch.sparse_coo_tensor(i2, v2, torch.Size([2, 2, 3]))
v2.resize_(4, 5)
t2.coalesce().values().size()
# On current master, this throws "indices and values must have same nnz, but got nnz from indices: 1, nnz from values: 4", because resizing the original value tensor affects `values_` of the sparse tensor.
# After this PR, this prints "torch.Size([1, 2, 3])", which means resizing the original value tensor doesn't affect `values_` of the sparse tensor.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20330

Differential Revision: D15373683

Pulled By: yf225

fbshipit-source-id: 32e7275d7121e17937c7cc258e8a60bb0848ff25
2019-05-16 15:04:23 -07:00
21ef4cc615 Improve bmm performance on CPU by applying TensorAccessor (#20266)
Summary:
Currently `bmm()` has very heavy performance overhead on CPU due to construction/deconstruction of `TensorImpl`. Applying `TensorAccessor` when indexing tensor data can greatly improve the performance.

I tested this on `fairseq` Transformer model. Results on Xeon 6148 (20*2 cores 2.5GHz) indicate this PR improves Transformer training performance by approximately **10%** (seconds per iteration reduced from **3.60** to **3.21**). Considering the fact that `bmm()` takes only **14%** of the total time, 10% overall improvement indicates `bmm()` itself improves by roughly **3x**.

Before:
```
| epoch 001:   0%| | 43/25337 [02:34<25:17:11,  3.60s/it, loss=16.179, nll_loss=16.137, ppl=72045.59, wps=1320, ups=0, wpb=4758.767, bsz=136.558, num_updates=43, lr=6.45e-06, gnorm=6.88
```

After:
```
| epoch 001:   0%| | 23/25337 [01:13<22:32:48,  3.21s/it, loss=17.072, nll_loss=17.068, ppl=137419.42, wps=1478, ups=0, wpb=4746.870, bsz=128.348, num_updates=23, lr=3.45e-06, gnorm=10.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20266

Differential Revision: D15262201

Pulled By: cpuhrsch

fbshipit-source-id: c2e4e406c06714b04cc7534f3da71e986eddca35
2019-05-16 14:01:48 -07:00
fa189641b5 Add export for __and__ & __or__ (#17894)
Summary:
In onnx spec, the supported input/output type for `And` and `Or` is `Bool` only.
Thus in exporting, cast to/from `Bool` is inserted for input/output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17894

Reviewed By: zrphercule

Differential Revision: D15103148

Pulled By: houseroad

fbshipit-source-id: 3e1068ea236c743260d42882fb11f0e3a21707e6
2019-05-16 13:52:06 -07:00
61012080c8 split and register CollectAndDistributeFpnRpnProposals with C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20509

Reviewed By: newstzpz

Differential Revision: D15302181

fbshipit-source-id: 7d3b29b667cd900f2976101f35200e1ee20b0f64
2019-05-16 13:40:46 -07:00
d784636b39 Scope: Move implementations from .h to .cpp file. (#20593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20593
ghimport-source-id: e1e9a7f98158c23adf02d5ed2763ab1e33bb2997

Differential Revision: D15375134

Pulled By: ZolotukhinM

fbshipit-source-id: 8d8e0c1e0ef7697ded59a4b19e2d9de7c5230294
2019-05-16 13:18:04 -07:00
75d04900fe Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 2ee799db589f63e7b9336a02d047afcc768e8b58
2019-05-16 09:48:39 -07:00
5821a76b8e Forcing gcc ABI and safer bash scripts, v2 (#20540)
Summary:
First time this was merged it broke master and was reverted. This time I do not add ```set -u``` to the .circleci/scripts/setup* scripts. There's still a chance that ```set -u``` breaks the binary builds on master, but at least those can be fixed in parallel and don't completely eliminate signal from all merges.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20540

Differential Revision: D15373444

Pulled By: pjh5

fbshipit-source-id: 0203c20865827366ecd8fa07b2db74d255549ed1
2019-05-16 09:40:01 -07:00
66c6133264 fix empty dropout (#20541)
Summary:
Fix for #20499
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20541

Differential Revision: D15372461

Pulled By: ezyang

fbshipit-source-id: cdc237a98244515a573216a6dac4826261c973f9
2019-05-16 09:33:51 -07:00
a837c00acd Removing unnecessary comments (+fix flake8)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20589

Differential Revision: D15373655

Pulled By: VitalyFedyunin

fbshipit-source-id: 25277648d3e8f8a09cec7569ceda56e74c2ef0b1
2019-05-16 09:19:34 -07:00
5f8e849d84 eliminate FE_INVALID in optimizer related operators and tests (#20501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20501

Fixing unit tests related to optimizer related operators and tests

Reviewed By: hx89

Differential Revision: D15307410

fbshipit-source-id: e5400c26e08f26191ee542fe6b02e0a69bc4e1ae
2019-05-16 08:23:46 -07:00
5b78a5eadb Memory format support for contiguous and is_contiguous (#20455)
Summary:
#19975 was separated by 2 PRs.

This one:

Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.

At this moment both functions just operate with strides and doesn't store any tensor state.

(Original RFC #19092)

-----

Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).

Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.

1.  `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.

    - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.

        `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.

2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.

    - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.

Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455

Differential Revision: D15341577

Pulled By: VitalyFedyunin

fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
2019-05-16 07:18:24 -07:00
09f22d10a6 Infer schema for experimental ops (#20513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20513

They've been using an old API, switch them to the new one instead.

Reviewed By: li-roy

Differential Revision: D15346349

fbshipit-source-id: 538eb460897ec6addebeebf88b316eb0d6b1dd6f
2019-05-16 01:29:35 -07:00
9bd3305592 Allow nested lists/dicts in legacy operator API (#20379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20379

The legacy custom op API allowed nesting of std::unordered_map and std::vector. While we haven't figured out yet how to do that with the new API,
we at least have to keep backwards compatibility. This diff adds the feature so we can switch to the new API without breaking third party code.

Reviewed By: li-roy

Differential Revision: D15287693

fbshipit-source-id: bb5b8429fddf6298719cbf567b584ed371f8fc81
2019-05-16 01:29:32 -07:00
456b889353 Require passing version_counter and allow_tensor_metadata_change to shallow_copy_and_detach() (#20496)
Summary:
Previously, the caller of `shallow_copy_and_detach()` is responsible for deciding whether the shallow-copy should share the source TensorImpl's version counter, or have its own new version counter. However, since this decision is crucial for ensuring the correctness of the shallow-copy's version counter, we want to enforce users of `shallow_copy_and_detach()` to pass a version counter to the function call, so that they are required to make the decision at the time of API usage, not as an afterthought.

For similar reasons, we want to enforce users of `shallow_copy_and_detach()` to pass `allow_tensor_metadata_change` to the function call, so that they are required to decide "whether the TensorImpl shallow-copy should allow tensor metadata change" at the time of API usage, not as an afterthought.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20496

Differential Revision: D15363620

Pulled By: yf225

fbshipit-source-id: a65e74738b10452668d6dc644b43aad5b3d8c9e6
2019-05-15 21:02:48 -07:00
3caf4e6985 Remove weak_script in MultiheadAttention function. (#20563)
Summary:
Remove weak_script. After recently splitting the forward() function in MultiheadAttention module, we notice a memory leak on GPU. Fix the problem by removing those "weak_script" decorator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20563

Differential Revision: D15368262

Pulled By: zhangguanheng66

fbshipit-source-id: 475db93c9ee0dbaea8fb914c004e7d1e0d419bc2
2019-05-15 20:10:39 -07:00
7db1fb84fa Use slimmer exception raising code when on mobile. (#20543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20543

All of that code for concatenating strings together adds up. Just discard it all for mobile builds.

Reviewed By: ljk53

Differential Revision: D15353447

fbshipit-source-id: a82dd0b884335d662605aabf7dd3d09dfcc1478b
2019-05-15 19:45:18 -07:00
1891614aa5 Add GivenTensorInt16Fill (#20515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20515

Needed by the upcoming quantized version of GenerateProposals

Reviewed By: dzhulgakov

Differential Revision: D14430952

fbshipit-source-id: ea852f04cc4b070f8fbe7a1e6535bba4d5b230fd
2019-05-15 19:45:15 -07:00
5917ec2c52 Print registry warning only when DEBUG is set (#20398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20398

Reduce logging volume from the Registry

Reviewed By: nairbv

Differential Revision: D15312262

fbshipit-source-id: e3546c288d6e1a396b2a4b08204a418aca889437
2019-05-15 19:29:05 -07:00
c129ab06e9 Change onnxifi workflow to support multi-group quantized & Add multi quantization info to caffe2.proto (#20439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20439

This is the QTensorProto workflow for multi group quantization in C2 side.
No DNNLOWP Tensor related thing is included in this pr, so once we finished glow side, we should be able to test this pr using resnet50.

Reviewed By: yinghai

Differential Revision: D15096919

fbshipit-source-id: 741eecd59eb79d24d9fe2b035f6246d42422d25c
2019-05-15 19:24:08 -07:00
51e40ab832 Add scalar type info to tensor print (#20483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20483
ghimport-source-id: 31bfc51af1060e83492315b96884fc725a1eb84f

Differential Revision: D15334010

Pulled By: li-roy

fbshipit-source-id: 199b575855146a7336d57c165191a16e7e1b5785
2019-05-15 19:03:21 -07:00
abb3698976 Add QInt32 ScalarType and qint32 data type (#19816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19816

We need this for quantization for bias
add third argument of ScalarType to `quantize_linear`

Differential Revision: D15094174

fbshipit-source-id: f19ec8f4716cf5fe0aa21b38d45af6d27c9ab377
2019-05-15 18:50:18 -07:00
1a0f753e6e Fixing typos in schema description for BatchMatMul (#20512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20512

Fixing typos in the description of schema for one of the inputs for BatchMatMul operator.

Reviewed By: jianyuh, BIT-silence

Differential Revision: D15343879

fbshipit-source-id: 06354e8e6b0d79fea937ed2703bb457b2d04f859
2019-05-15 18:06:30 -07:00
b3e510518b Tensor codemod for instance_norm (#20517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20517

fixing a bug in instance_norm

Reviewed By: ezyang

Differential Revision: D15349006

fbshipit-source-id: 2496f7f372118d2713c12a6e9b3357bf6c640b71
2019-05-15 17:51:37 -07:00
ca24e18c7e Add an AssertError check back to MultiheadAttention module (#20492)
Summary:
Fix a typo in the doc.
Add an AssertError check back to MultiheadAttention module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20492

Differential Revision: D15349008

Pulled By: cpuhrsch

fbshipit-source-id: 2d898345f03787c713e537673613a748ad826b34
2019-05-15 17:28:25 -07:00
161566187c enable CopyVector for type of int on CUDA (#20520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20520

as title

Reviewed By: xianjiec

Differential Revision: D15351010

fbshipit-source-id: 99466de9da0abdffe26d6919768dcb4e52cb2ff1
2019-05-15 16:53:51 -07:00
4c23c34e79 Computing var/stddev and mean at the same time (#18731)
Summary:
The current variance kernels compute mean at the same time. Many times we want both statistics together, so it seems reasonable to have a kwarg/function that allows us to get both values without launching an extra kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18731

Differential Revision: D14726082

Pulled By: ifedan

fbshipit-source-id: 473cba0227b69eb2240dca5e61a8f4366df0e029
2019-05-15 16:42:38 -07:00