35 Commits

Author SHA1 Message Date
f3fce597e9 [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769
Approved by: https://github.com/ezyang
2024-08-04 10:24:09 +00:00
3bf922a6ce Apply UFMT to low traffic torch modules (#106249)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249
Approved by: https://github.com/Skylion007
2023-07-29 23:37:30 +00:00
c8166d4b58 Add torch.cuda.comm to typechecking CI (#45350)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45350

Reviewed By: walterddr

Differential Revision: D23935750

Pulled By: malfet

fbshipit-source-id: 5a7d2d4fbc976699d80bb5caf4727c19fa2c5bc8
2020-09-25 12:13:43 -07:00
8d570bc708 Decouple DataParallel/DistributedDataParallel from CUDA (#38454)
Summary:
Decouple DataParallel/DistributedDataParallel from CUDA to support more device types.
- Move torch/cuda/comm.py to torch/nn/parallel/comm.py with minor changes for common devices support. Torch.cuda.comm is kept as is for backward compatibility
- Provide common APIs to arbitrary device types without changing existing CUDA APIs in torch.cuda space.
- Replace the torch.cuda calls in DataParellel/DistributedDataParallel with the new APIs.

Related RFC: [https://github.com/pytorch/pytorch/issues/36160](https://github.com/pytorch/pytorch/issues/36160)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/38454

Differential Revision: D22051557

Pulled By: mrshenli

fbshipit-source-id: 7842dad0e5d3ca0f6fb760bda49182dcf6653af8
2020-07-07 12:48:16 -07:00
de7ac60cf4 Add out= variants for cuda.comm.broadcast/gather/scatter (#39681)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/38911
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39681

Differential Revision: D22161342

Pulled By: mrshenli

fbshipit-source-id: 60295077159b02087823e93bb6ebac9d70adea0a
2020-06-24 12:58:19 -07:00
d5236f8517 Avoid initializing unnecessary tensors in nccl.reduce (#39688)
Summary:
While working on https://github.com/pytorch/pytorch/issues/38911, I realized that `nccl.reduce` only needs a single output tensor, while our current implementation requires a list of output tensors. This, along with a TODO I fixed in reduce_add, should have some speed up for data parallel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39688

Differential Revision: D22034547

Pulled By: mrshenli

fbshipit-source-id: e74d54d673ebbb062474b1bb5cc93a095a3a5f6c
2020-06-14 10:11:32 -07:00
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
2448a83d30 Give broadcast_coalesced tensors different version counters (#13594)
Summary:
In `broadcast_coalesced`, since multiple variables can be "views" of a big flattened tensor, they can share the same version counter. However, this base flat tensor is not exposed and they don't share any memory locations, so this is not necessary. Furthermore, it can cause problems, e.g., when two buffers are broadcast together in `DataParallel` and one of them is modified in-place during `forward` but the other is needed in backward, autograd engine will complain.

Fixing the bug discovered at https://github.com/pytorch/pytorch/pull/13350#issuecomment-436011370

edit: This is a very real problem. E.g., consider using Spectral Norm + Batch Norm together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13594

Differential Revision: D12967311

Pulled By: SsnL

fbshipit-source-id: 52998dbabe149f575cf0fb79e7016f0b95e4b9e5
2018-11-07 21:49:35 -08:00
f1ce15b50c Move nccl scatter and gather to C++ (#9117)
Summary:
As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already.

I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code.

I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO.

apaszke colesbury teng-li pietern
Closes https://github.com/pytorch/pytorch/pull/9117

Differential Revision: D8721729

Pulled By: goldsborough

fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68
2018-07-06 11:10:33 -07:00
30ec06c140 Merge Variable and Tensor classes (#5225)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.

To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.

There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:

 https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
2018-02-23 18:03:31 -05:00
895aebac08 Use Variable instead of Tensor in Function.forward (#4786)
The Tensor and Variable classes are being merged.
autograd.Function.forward is now called on Variables, but with "no-grad"
mode (torch.no_grad()) enabled.

One benefit is that we no longer have to explicitly track shared
storages.
2018-02-06 17:24:27 -05:00
86fd5fd524 Replace async with non_blocking for Python 3.7 (#4999)
* Replace async with non_blocking for Python 3.7 upgrade

* Remove trailing whitespace

* Give _cuda and _type kwargs and accept async for compatibility

* Rename async to non_blocking in all C++ code

* Add entries for async in python_variable_methods

* Friendlier backward compatibility for cuda and type
2018-02-02 09:23:51 -05:00
f1c616418d Fix Python docs for broadcast and braodcast_coalesced (#4727) 2018-01-19 10:57:20 -05:00
1061d7970d Move broadcast and broadcast_coalesced to C++ 2018-01-18 11:16:45 +01:00
fa5efab669 comments and case where not all sparse (#3370) 2017-11-01 06:05:17 -04:00
01be4d6b20 sparse broadcast_coalesce and reduce_add_coalesced 2017-10-28 18:52:35 -04:00
3d7459ff6c fix indices for data_parallel and add parameter gradient tests (#2632) 2017-09-05 17:29:27 -04:00
2c07f88ea3 Fix typos. 2017-08-25 14:27:07 -04:00
50c208a50b Revert "Fix typos."
This reverts commit 4622b3395276b37e10141fab43ffea33941ca0c2.
2017-08-10 13:57:00 -04:00
4622b33952 Fix typos. 2017-08-08 11:05:38 -04:00
12813b88f6 Add DistributedDataParallel 2017-06-12 22:00:22 -04:00
8db8716c7c Support non-default streams in NCCL reduce 2017-06-12 21:58:38 -04:00
69287250d1 Add a broadcast parameter to copy_, use it in the library in cases where there is non-broadcasting calls exposed by the tests. 2017-06-11 05:37:59 -04:00
01a35dcace Fix coalesced CUDA collectives for nonhomogeneous lists 2017-04-11 14:48:54 -07:00
e50a1f19b3 Use streams in scatter to overlap copy with compute 2017-03-14 22:46:07 +01:00
c7c4778af6 modify docs of broadcast to fix issuse #940 (#970) 2017-03-10 09:54:43 -05:00
15a9fbdedb Merge pull request #881 from colesbury/parallelize_backwards
Parallelize autograd backwards
2017-03-06 16:57:19 -05:00
65b66264d4 Improve broadcast/reduce performance by coalescing tensors 2017-03-06 12:47:53 -08:00
b1ae7f90d5 Added functionality for data parallel table (#843) 2017-03-05 02:35:46 +01:00
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
15c1dad340 Minor fixes and torch.cuda docs 2017-01-16 20:38:14 -05:00
f2d7e94948 Use torch.Size for Tensor sizes and tuple for strides
See issue #20

The torch.Size class is a tuple subclass which distinguishes sizes from
other tuples so that torch.Tensor(size) is interpreted as size instead
of data.
2016-10-28 19:37:09 +02:00
f30081a313 Use NCCL bcast and reduce functions in comm 2016-10-14 14:16:32 -07:00
11b38a6895 Add more functions to autograd 2016-09-30 16:37:07 -04:00
3eac7164f4 Add data parallel functions to nn 2016-09-27 15:45:45 -07:00