Commit Graph

18149 Commits

Author SHA1 Message Date
7799ea5eb3 Port adaptive_avg_pool3d to ATen (#19898)
Summary:
Resolves #18065.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19898

Differential Revision: D15240607

Pulled By: ezyang

fbshipit-source-id: 00cf23ed20c1757d5eef71fd8c6a2f53d372e341
2019-05-13 11:29:22 -07:00
5268b7dfaf Remove support for CUDA 8 (#20298)
Summary:
1.1.0 stopped support for CUDA 8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20298

Differential Revision: D15294639

Pulled By: ezyang

fbshipit-source-id: b9411bfe456f93f1529b745dc83b7d6310df684d
2019-05-13 11:24:22 -07:00
62957ab0a1 Tiny spelling mistake fix. (#20425)
Summary:
"then the output would also has k tensors" -> "then the output would also have k tensors"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20425

Differential Revision: D15320152

Pulled By: zou3519

fbshipit-source-id: b04e2ccd29c6a3e33ad1040d0ea975a01a7bd9b5
2019-05-13 11:18:53 -07:00
67414714e5 Move THCTensor_(uniform) to ATen (#20292)
Summary:
As a first step for this plan: https://github.com/pytorch/pytorch/issues/19508#issuecomment-485178192, this PR moves `THCTensor_(uniform)` to ATen. Major changes are:
- `uniform_` cuda kernel now utilizes a philox generator.
- the kernel also utilizes TensorIterator
- the kernel uses a grid-stride loop to achieve peak effective bandwidth

- Since the engine has changed from `curandStateMTGP32` to `curandStatePhilox4_32_10`, the randoms generated now will be different.
- Here is the diff showing codegen changes: https://gist.github.com/syed-ahmed/4af9ae0d42b6c7dbaa13b9dd0d1dd1e8 (BC breaking change if any)

- Philox4_32_10 is known to pass the standard TestU01 Big Crush test (https://www.thesalmons.org/john/random123/papers/random123sc11.pdf) and hence the quality of random numbers generated isn't an issue when compared to the previously used `curandStateMTGP32`.
- I have added a test case in `aten/src/ATen/test/cuda_distributions_test.cu` which verifies that philox offset is incremented properly

The benchmark was done on a DGX station with 4 V100s.
I modified the script from jcjohnson 's [multinomial benchmark](https://github.com/jcjohnson/pytorch-multinomial-benchmark) to produce this notebook which shows that there is a general speedup with this PR and a regression hasn't been introduced: https://gist.github.com/syed-ahmed/9d26d4e96308aed274d0f2c7be5218ef

To reproduce the notebook:
- Run https://gist.github.com/syed-ahmed/4208c22c541f1d30ad6a9b1efc1d728f in a container with the current pytorch top of tree with the command: `python uniform_benchmark.py --stats_json before.json`
- Apply this diff to the current pytorch top of tree and run the same script in a container with the command: `python uniform_benchmark.py --stats_json after.json`
- Run the notebook attached above with the `after.json` and `before.json` in the same directory

The effected bandwidth was calculated using the script (thanks to ngimel ): https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68
Following are the numbers before and after.
```
uniform, size, elements 65536 forward 5.168914794921875e-06 bandwidth (GB/s) 50.71548098597786
uniform, size, elements 131072 forward 5.056858062744141e-06 bandwidth (GB/s) 103.67860705101367
uniform, size, elements 262144 forward 7.164478302001953e-06 bandwidth (GB/s) 146.357621001797
uniform, size, elements 524288 forward 1.1217594146728515e-05 bandwidth (GB/s) 186.9520302275877
uniform, size, elements 1048576 forward 1.923084259033203e-05 bandwidth (GB/s) 218.10297600317384
uniform, size, elements 2097152 forward 3.640890121459961e-05 bandwidth (GB/s) 230.39992200138826
uniform, size, elements 4194304 forward 6.778717041015625e-05 bandwidth (GB/s) 247.49839679819922
uniform, size, elements 8388608 forward 0.00012810707092285157 bandwidth (GB/s) 261.92490202361347
uniform, size, elements 16777216 forward 0.00025241613388061524 bandwidth (GB/s) 265.86598474620627
uniform, size, elements 33554432 forward 0.000497891902923584 bandwidth (GB/s) 269.5720239913193
```
```
uniform, size, elements 65536 forward 5.550384521484375e-06 bandwidth (GB/s) 47.22988091821306
uniform, size, elements 131072 forward 5.581378936767578e-06 bandwidth (GB/s) 93.93520954942333
uniform, size, elements 262144 forward 6.165504455566406e-06 bandwidth (GB/s) 170.071404141686
uniform, size, elements 524288 forward 6.3276290893554685e-06 bandwidth (GB/s) 331.4277702414469
uniform, size, elements 1048576 forward 8.509159088134765e-06 bandwidth (GB/s) 492.91639239047356
uniform, size, elements 2097152 forward 1.2989044189453124e-05 bandwidth (GB/s) 645.8218077979443
uniform, size, elements 4194304 forward 2.347707748413086e-05 bandwidth (GB/s) 714.6211452997259
uniform, size, elements 8388608 forward 4.4286251068115234e-05 bandwidth (GB/s) 757.6715389250498
uniform, size, elements 16777216 forward 8.672237396240235e-05 bandwidth (GB/s) 773.8356427961071
uniform, size, elements 33554432 forward 0.00016920566558837892 bandwidth (GB/s) 793.2224227438523
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20292

Differential Revision: D15277761

Pulled By: ezyang

fbshipit-source-id: 8bfe31a01eeed77f0ed6e7ec4d2dda4c6472ecaa
2019-05-13 09:38:28 -07:00
5f7ef09f57 math module support: gcd, copysign, erf, erfc, expm1, fabs, gamma, lgamma (#19707)
Summary:
eellison driazati Refer to issue #19026
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19707

Differential Revision: D15302632

Pulled By: eellison

fbshipit-source-id: 68ff13b478b93cc33703ef3276b5fa727c8ff31a
2019-05-13 08:55:23 -07:00
41673d477c Disable incremental_state function in MultiheadAttention module. (#20177)
Summary:
To fully support incremental_state function, it requires several additional utils available in fairseq. However, we lack a problem for the unit test. Therefore, the incremental_state function will be disable for now. If it is needed in the future, a feature request could be created. Fixed #20132

Add some unit tests to cover the arguments of MultiheadAttention module, including bias, add_bias_kv, add_zero_attn, key_padding_mask, need_weights, attn_mask.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20177

Differential Revision: D15304575

Pulled By: cpuhrsch

fbshipit-source-id: ebd8cc0f11a4da0c0998bf0c7e4e341585e5685a
2019-05-13 08:21:15 -07:00
f8aa6a8f44 Make a deep copy of extra_compile_flag dictionnary (#20221)
Summary:
See issue #20169
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20221

Differential Revision: D15317126

Pulled By: ezyang

fbshipit-source-id: 0a12932db4f6ba15ea1d558fa329ce23fe2baef6
2019-05-13 08:11:39 -07:00
30bdb8c0d7 Hotfix for caffe2 windows build (#20417)
Summary:
We don't need to overlay vc env when not using ninja. CMake will deal with it automatically. Overlaying is a no-op when the env is the same with the generator specified but will generate the error "Cannot find CMAKE_CXX_COMPILER" when they are different.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20417

Differential Revision: D15317081

Pulled By: ezyang

fbshipit-source-id: 5d9100321ecd593e810c31158f22c67d3e34973b
2019-05-13 08:03:45 -07:00
f496ea36b2 DataLoader: add error detection for worker_init_fn (#20150)
Summary:
This is an attempt to isolate unrelated changes from #19228 for easier review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20150

Differential Revision: D15314891

Pulled By: ezyang

fbshipit-source-id: 8c429747ba83ad5aca4cdd8f8086bcf65a326921
2019-05-12 18:28:56 -07:00
163f0e182c Fix bug in non_blocking copy (#20305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20305
ghimport-source-id: eb3dacb10fd93bbb5a6bbe078ed1ec842163d0e6

Differential Revision: D15276094

Pulled By: li-roy

fbshipit-source-id: 4728f419aa050e6c94a4f62231fa1a86caa556a7
2019-05-11 15:20:19 -07:00
6a8f55796a Add quant-dequant nodes for weights
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20041

Differential Revision: D15178086

fbshipit-source-id: 8cb060d72b68e44bf042338924f203ae62d74f6a
2019-05-11 14:03:10 -07:00
9499c7b7ee Profiling GraphExecutor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19994

Differential Revision: D15307752

Pulled By: Krovatkin

fbshipit-source-id: 7b35191042199ef16823487e15fe639968cbdc89
2019-05-10 23:05:47 -07:00
f4d9bfaa4d Support Exports to Multiple ONNX Opset (#19294)
Summary:
Support exporting multiple ONNX opsets (more specifically opset 10 for now), following the proposal in https://gist.github.com/spandantiwari/99700e60919c43bd167838038d20f353.
And add support for custom ops (merge with https://github.com/pytorch/pytorch/pull/18297).

This PR will be followed by another PR containing the changes related to testing the ops for different opsets.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19294

Reviewed By: zrphercule

Differential Revision: D15043951

Pulled By: houseroad

fbshipit-source-id: d336fc35b8827145639137bc348ae07e3c14bb1c
2019-05-10 18:37:12 -07:00
1129b3344a move DistillBatchLRLoss Layer from open source to fb
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20291

Reviewed By: chocjy

Differential Revision: D15272181

fbshipit-source-id: 2e0964fa1b1031607134548bb87c4e103c5b1383
2019-05-10 17:46:04 -07:00
3f3ee5600a make trace's errors more helpful in terms of what it can and can't do when tracing module's methods
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20368

Differential Revision: D15305340

Pulled By: Krovatkin

fbshipit-source-id: bafb13002df5c9741160e96205e0846243dde3ec
2019-05-10 17:40:21 -07:00
7d3d5b73f4 Add multiline type annotation support for Python frontend (#14922)
Summary:
This allows multiline type comments in accordance with [PEP 484](https://www.python.org/dev/peps/pep-0484/#suggested-syntax-for-python-2-7-and-straddling-code)

```python
torch.jit.script
def foo(x   # type: Tensor
        y   # type: Tuple[Tensor, Tensor]
        ):
    # type: (...) -> Tuple[Tensor, Tensor]
    return x, x
```](https://our.intern.facebook.com/intern/diff/15268432/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14922

Pulled By: driazati

Differential Revision: D15268432

fbshipit-source-id: e9add8d8025e42390a14a643835d15cc67a2f33e
2019-05-10 17:27:41 -07:00
3a39ce0f41 Fix reflection on weak modules, copy attributes (#20190)
Summary:
* Constructs a new type at runtime so that `isinstance` checks work for
weak modules assigned to `ScriptModule`s
* Fix some extraneous names in `__constants__`
* Add `in_features` and `out_features` to `nn.Linear` `__constants__`

Fixes #19363
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20190

Pulled By: driazati

Differential Revision: D15302350

fbshipit-source-id: 1d4d21ed44ab9578a4bc2a72396a82e9bbcd387c
2019-05-10 17:14:49 -07:00
00d0ddb140 Add all list specializations to pickler (#20191)
Summary:
TensorList, DoubleList, and BoolList were missing from the pickler, so
this adds them.

As a follow up a lot of the code for these could be templated and cut
down

](https://our.intern.facebook.com/intern/diff/15299106/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20191

Pulled By: driazati

Differential Revision: D15299106

fbshipit-source-id: f10c0c9af9d60a6b7fb8d93cea9f550b1a7e2415
2019-05-10 17:14:42 -07:00
6197eed409 Eliminate a const_cast.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20371

Differential Revision: D15298327

fbshipit-source-id: 566ec842e971ba6f80e333bb874cc5fd2c36b02e
2019-05-10 15:43:12 -07:00
a4ae689636 quantize_rnn_modules in ensemble_export (#20365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20365

Enable quantization of `torch.nn.LSTM` module in the decoder of PyTorch natively exported beam search models.

Reviewed By: jmp84

Differential Revision: D15260631

fbshipit-source-id: bbdd3a30c2c2110986eb7aa7ff11ce1c9090ddf4
2019-05-10 15:33:41 -07:00
a0c2829194 Preserve log_dir arg and member for SummaryWriter (#20382)
Summary:
Given that  tensorboardX and our PyTorch 1.1 release had `log_dir` as the argument for SummaryWriter initialization and member variable (which some users access), we need to  preserve this name. However, we might deprecate this in the future and I've added a `get_logdir` method that can be used in the future.

cc natalialunova, lanpa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20382

Reviewed By: NarineK

Differential Revision: D15300941

Pulled By: orionr

fbshipit-source-id: a29a70fcbc614a32ebfa6c655962fdff081af1af
2019-05-10 14:59:47 -07:00
3aa414c8f2 Add documentation to Dispatch.h (#20339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20339
ghimport-source-id: 99b6cf9f03be8d55674a43c8a7e19a42b75143f1

Differential Revision: D15295652

Pulled By: ezyang

fbshipit-source-id: f3b127ed3f53f13931596c06e181056940443290
2019-05-10 14:31:03 -07:00
10a9ef833c Avoid unnecessary refcount bump in unary operators. (#20331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20331
ghimport-source-id: 26588acca171555182714c6b6f89610564e228a1

Differential Revision: D15295193

Pulled By: ezyang

fbshipit-source-id: 44b7a8b4c9d41003a6559be2af0bd4f0ada54b31
2019-05-10 14:23:22 -07:00
75c6d37bac profile on uses
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20049

Differential Revision: D15242492

Pulled By: Krovatkin

fbshipit-source-id: f6eab3ec9d7a2f59f19905d757cf7c6f9ad2fdf6
2019-05-10 14:18:03 -07:00
c6255a57e4 Remove CPU_tensor_parallel_kernel_apply2 (#20207)
Summary:
This code is unused and has been superseded by TensorIterators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20207

Differential Revision: D15240832

Pulled By: cpuhrsch

fbshipit-source-id: 4f600bb8645f9b28a137e2cefb099978f5152d05
2019-05-10 14:04:32 -07:00
6dc70aa513 add test coverage for make_np (#20317)
Summary:
addresses https://github.com/pytorch/pytorch/pull/16196#discussion_r276381946

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20317

Differential Revision: D15289400

Pulled By: orionr

fbshipit-source-id: 914416a8c1369d95656f556c6e05348957789466
2019-05-10 13:59:48 -07:00
ce033485eb Convenience APIs for script objects (#20226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20226
ghimport-source-id: 22937d72e35ec4eba38019284a368453089fe3eb

Differential Revision: D15243625

Pulled By: suo

fbshipit-source-id: 5e9fb773da244f9ef201dba524155c3b19b2b4e0
2019-05-10 13:03:58 -07:00
50149fb66b Adds quantized addition and renames sum to add (#20233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20233

Adding a quantized addition (without relu)

Reviewed By: jianyuh

Differential Revision: D15245791

fbshipit-source-id: 34ede5d805d9ab0d31e8ae87cefb110504bd3c87
2019-05-10 12:53:33 -07:00
35fed93b1e Adding Poisson NLL loss to libtorch (#19316)
Summary:
This PR add Poisson NLL loss to aten and substitute the python implementation with a call to the c++.

Fixes #19186.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19316

Differential Revision: D15012957

Pulled By: ezyang

fbshipit-source-id: 0a3f56e8307969c2f9cc321b5357a496c3d1784e
2019-05-10 11:57:49 -07:00
ed25b8a667 Add a FAQ entry to explain Cannot insert a Tensor that requires grad as a constant
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20181

Differential Revision: D15296816

Pulled By: Krovatkin

fbshipit-source-id: 382e3a0aa982774771e98050cbc65d144cbd959e
2019-05-10 10:55:33 -07:00
ea5c9c9267 Update installing.rst (#20354)
Summary:
Delete useless `cd`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20354

Differential Revision: D15296154

Pulled By: soumith

fbshipit-source-id: 2042b56c91b33e302b0ed9c77f29b9b64079fa98
2019-05-10 10:04:06 -07:00
4ba28deb6e Unify libtorch and libcaffe2 (#17783)
Summary:
This PR is an intermediate step toward the ultimate goal of eliminating "caffe2" in favor of "torch".  This PR moves all of the files that had constituted "libtorch.so" into the "libcaffe2.so" library, and wraps "libcaffe2.so" with a shell library named "libtorch.so".  This means that, for now, `caffe2/CMakeLists.txt` becomes a lot bigger, and `torch/CMakeLists.txt` becomes smaller.

The torch Python bindings (`torch_python.so`) still remain in `torch/CMakeLists.txt`.

The follow-up to this PR will rename references to `caffe2` to `torch`, and flatten the shell into one library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17783

Differential Revision: D15284178

Pulled By: kostmo

fbshipit-source-id: a08387d735ae20652527ced4e69fd75b8ff88b05
2019-05-10 09:50:53 -07:00
872bab22c6 Some essential changes needed before updating the Windows AMI (#20353)
Summary:
1. Add cuda 10.1 build
2. Turn on openmp loop support for VS 2019
3. Remove legacy code about selective builds

Tested through CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20353

Differential Revision: D15294806

Pulled By: ezyang

fbshipit-source-id: 0acf5c3fbbc398fd9ebdf9f97653499d39638432
2019-05-10 09:08:51 -07:00
d68802ba47 Sparse half embeddings on cuda (#19695)
Summary:
```
import torch
a = torch.nn.Embedding(3, 4, sparse=True).half().cuda()
a(torch.LongTensor([1, 0]).cuda()).sum().backward()

```
gave: `RuntimeError: torch.cuda.sparse.HalfTensor is not enabled`

This PR enables sparse.HalfTensor on cuda. Still won't work for CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19695

Differential Revision: D15281162

Pulled By: nairbv

fbshipit-source-id: 0d83d946a059393bd53d8b8102e2daa9b4c02588
2019-05-10 08:00:55 -07:00
148e90ba2a Give clear error message when attempting to merge struct which can't be merged.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19804

Differential Revision: D15098833

fbshipit-source-id: 2950e247c74e125e033cd9cfbf5631eee5298ea0
2019-05-10 07:01:01 -07:00
c2c0a32155 Remove setting logger level in caffe2.python.checkpoint (#19803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19803

There is no reason to set a specific logging level for this module. Removing it to just use the default logging level.

Differential Revision: D15098834

fbshipit-source-id: 1654c04500c19690ddde03343f2e84b04bb0f1ef
2019-05-10 07:00:58 -07:00
2a875fc126 Fix THD->c10 dependency to gflags.h (#20319)
Summary:
Fixed #20250

Not sure if there's any specific design reason to `add_dependecy()` and manually add a few include dir, instead of linking the target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20319

Differential Revision: D15294584

Pulled By: ezyang

fbshipit-source-id: 97f813a6b1829dad49958e0f880b33eb95747607
2019-05-10 06:50:58 -07:00
8726b27333 Fix overlay_vc_env when called by legacy python (#20304)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20155.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20304

Differential Revision: D15292369

Pulled By: zdevito

fbshipit-source-id: 7da2e0cb85c98d0fcd4461d39e2a8c57391db60e
2019-05-10 06:44:58 -07:00
c397134d6b Revert D15156384: Dict
Differential Revision:
D15156384

Original commit changeset: b9313ec4dd9a

fbshipit-source-id: 3b44f49ec4eaba692cfb2cfe46e5f98102e337d9
2019-05-10 06:11:25 -07:00
85d56852d3 Revert D15227620: Allow Dict type in c10 operators
Differential Revision:
D15227620

Original commit changeset: c1ea6c12165e

fbshipit-source-id: b2cecfffdd38b7c97e20d0ee81915ea10daf8460
2019-05-10 06:11:22 -07:00
c744468e36 Revert D15227621: Extend testAvailableArgTypes
Differential Revision:
D15227621

Original commit changeset: 83db7536e906

fbshipit-source-id: 8ecc443bd18787de52572fde06df408e1c52c50d
2019-05-10 06:11:18 -07:00
99874f87cb Use registry for BoundShapeInferencer (#20341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20341

Att

Reviewed By: ipiszy

Differential Revision: D15288131

fbshipit-source-id: 956ced99cc5c5b8199f81f7baa844fe8a0505456
2019-05-10 01:16:02 -07:00
a0e5240afc Fix DistributedDataParallelTest.test_accumulate_gradients (#20351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20351

This was broken because of a merge race between #20282 and the stack in #20236.

Cleaned up the test and comments a bit as well.

Differential Revision: D15292786

fbshipit-source-id: a4379ea700cad959d3a6921fc5ddf9384fb8f228
2019-05-09 23:27:18 -07:00
02df1ccd9c Remove const_cast's from subgraph matcher. (#20303)
Summary:
The trick here is that creating a mapping from const values to
const values means that downstream clients that want to mutate
the output of the mapping are stuck.  However, a mapping from
const values to non-const values is just fine and doesn't put
constraints on downstream clients.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20303

Differential Revision: D15284076

fbshipit-source-id: 16206fd910dd5f83218525ca301b1889df0586cb
2019-05-09 18:07:14 -07:00
e47b210075 Adding setup job as prereq to html update jobs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20325

Differential Revision: D15287710

Pulled By: pjh5

fbshipit-source-id: 2bbed3a46c4affb5ae4e6dd4feb1dda59aeb5d04
2019-05-09 16:25:22 -07:00
3afd99680c Remove SourceLocation (respin) (#20333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20333
ghimport-source-id: e64075bb82067224463e9955d10bd13967d1975d

Differential Revision: D15284081

Pulled By: zdevito

fbshipit-source-id: ac26ae48392b9daff08f460529c06af8f4e4722a
2019-05-09 16:17:33 -07:00
558c6c4d8a Make DistributedDataParallel usable with CPU models (#20236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20236

Use the new version of broadcast_coalesced that deals with both CPU
and CUDA models. Add tests that evaluate correctness of
DistributedDataParallel for CPU models.

Closes #17757.

Reviewed By: mrshenli

Differential Revision: D15245428

fbshipit-source-id: d2fa09f68593b3cd1b72efeb13f5af23ebd5c80a
2019-05-09 14:11:17 -07:00
f32c9bd5e9 Refactor core DistributedDataParallel tests (#20235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20235

The tests expected to only run for CUDA models. In a future commit we
need to update this to work for CPU models as well. Therefore, we can
no longer rely on only integers being passed for device identifiers.
With this change we pass both the materialized list of devices to use
(as `torch.Device` objects), as well as an optional list of integers.
The latter is specified to exercise the code in the
DistributedDataParallel constructor that turns a list of integers into
CUDA devices, IFF it is used to wrap a single-device CUDA module.

This commit also groups together the 'str' and non-'str' tests. These
used to test passing the list of devices as integers or as
`torch.Device` instances. These are now executed from the same test.

Reviewed By: mrshenli

Differential Revision: D15245429

fbshipit-source-id: 5797ba9db33d2c26db8e7493c91bb52f694285ac
2019-05-09 14:11:14 -07:00
caa0d0c50a Add c10d::broadcast_coalesced and tests (#20234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20234

The differences with the existing function _dist_broadcast_coalesced
is that this one works for both CPU and CUDA tensors and that it has a
maximum number of in flight operations.

This should be the final change needed to have only a single version
of DistributedDataParallel that both supports CPU and CUDA models, or
even a mix of both.

See #17757 for more information.

Reviewed By: mrshenli

Differential Revision: D15228099

fbshipit-source-id: a2113ba6b09b68cb5328f49f4c1960031eb43c93
2019-05-09 14:11:08 -07:00
c31fccd678 Fix crash issue in conv+sum fusion for MKLDNN on caffe2 (#20139)
Summary:
The isConvFusion(...) is only for Conv op.
If non-Conv op, the crash takes place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20139

Differential Revision: D15280604

Pulled By: yinghai

fbshipit-source-id: eb45be11990b3bf7c5b45f02ebb6018444ab5357
2019-05-09 13:53:46 -07:00