pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Xuehai Pan	f3fce597e9	[BE][Easy][17/19] enforce style for empty lines in import segments in `torch/[a-c]/` and `torch/[e-n]/` (#129769 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769 Approved by: https://github.com/ezyang	2024-08-04 10:24:09 +00:00
Edward Z. Yang	3bf922a6ce	Apply UFMT to low traffic torch modules (#106249 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249 Approved by: https://github.com/Skylion007	2023-07-29 23:37:30 +00:00
Nikita Shulga	c8166d4b58	Add `torch.cuda.comm` to typechecking CI (#45350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45350 Reviewed By: walterddr Differential Revision: D23935750 Pulled By: malfet fbshipit-source-id: 5a7d2d4fbc976699d80bb5caf4727c19fa2c5bc8	2020-09-25 12:13:43 -07:00
chengjun	8d570bc708	Decouple DataParallel/DistributedDataParallel from CUDA (#38454 ) Summary: Decouple DataParallel/DistributedDataParallel from CUDA to support more device types. - Move torch/cuda/comm.py to torch/nn/parallel/comm.py with minor changes for common devices support. Torch.cuda.comm is kept as is for backward compatibility - Provide common APIs to arbitrary device types without changing existing CUDA APIs in torch.cuda space. - Replace the torch.cuda calls in DataParellel/DistributedDataParallel with the new APIs. Related RFC: [https://github.com/pytorch/pytorch/issues/36160](https://github.com/pytorch/pytorch/issues/36160) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38454 Differential Revision: D22051557 Pulled By: mrshenli fbshipit-source-id: 7842dad0e5d3ca0f6fb760bda49182dcf6653af8	2020-07-07 12:48:16 -07:00
SsnL	de7ac60cf4	Add out= variants for cuda.comm.broadcast/gather/scatter (#39681 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/38911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39681 Differential Revision: D22161342 Pulled By: mrshenli fbshipit-source-id: 60295077159b02087823e93bb6ebac9d70adea0a	2020-06-24 12:58:19 -07:00
SsnL	d5236f8517	Avoid initializing unnecessary tensors in nccl.reduce (#39688 ) Summary: While working on https://github.com/pytorch/pytorch/issues/38911, I realized that `nccl.reduce` only needs a single output tensor, while our current implementation requires a list of output tensors. This, along with a TODO I fixed in reduce_add, should have some speed up for data parallel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39688 Differential Revision: D22034547 Pulled By: mrshenli fbshipit-source-id: e74d54d673ebbb062474b1bb5cc93a095a3a5f6c	2020-06-14 10:11:32 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Tongzhou Wang	2448a83d30	Give broadcast_coalesced tensors different version counters (#13594 ) Summary: In `broadcast_coalesced`, since multiple variables can be "views" of a big flattened tensor, they can share the same version counter. However, this base flat tensor is not exposed and they don't share any memory locations, so this is not necessary. Furthermore, it can cause problems, e.g., when two buffers are broadcast together in `DataParallel` and one of them is modified in-place during `forward` but the other is needed in backward, autograd engine will complain. Fixing the bug discovered at https://github.com/pytorch/pytorch/pull/13350#issuecomment-436011370 edit: This is a very real problem. E.g., consider using Spectral Norm + Batch Norm together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13594 Differential Revision: D12967311 Pulled By: SsnL fbshipit-source-id: 52998dbabe149f575cf0fb79e7016f0b95e4b9e5	2018-11-07 21:49:35 -08:00
Peter Goldsborough	f1ce15b50c	Move nccl scatter and gather to C++ (#9117 ) Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes https://github.com/pytorch/pytorch/pull/9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68	2018-07-06 11:10:33 -07:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
Sam Gross	895aebac08	Use Variable instead of Tensor in Function.forward (#4786 ) The Tensor and Variable classes are being merged. autograd.Function.forward is now called on Variables, but with "no-grad" mode (torch.no_grad()) enabled. One benefit is that we no longer have to explicitly track shared storages.	2018-02-06 17:24:27 -05:00
Peter Goldsborough	86fd5fd524	Replace async with non_blocking for Python 3.7 (#4999 ) * Replace async with non_blocking for Python 3.7 upgrade * Remove trailing whitespace * Give _cuda and _type kwargs and accept async for compatibility * Rename async to non_blocking in all C++ code * Add entries for async in python_variable_methods * Friendlier backward compatibility for cuda and type	2018-02-02 09:23:51 -05:00
Sam Gross	f1c616418d	Fix Python docs for broadcast and braodcast_coalesced (#4727 )	2018-01-19 10:57:20 -05:00
Adam Paszke	1061d7970d	Move broadcast and broadcast_coalesced to C++	2018-01-18 11:16:45 +01:00
SsnL	fa5efab669	comments and case where not all sparse (#3370 )	2017-11-01 06:05:17 -04:00
SsnL	01be4d6b20	sparse broadcast_coalesce and reduce_add_coalesced	2017-10-28 18:52:35 -04:00
ngimel	3d7459ff6c	fix indices for data_parallel and add parameter gradient tests (#2632 )	2017-09-05 17:29:27 -04:00
Zhou Mo	2c07f88ea3	Fix typos.	2017-08-25 14:27:07 -04:00
Gregory Chanan	50c208a50b	Revert "Fix typos." This reverts commit 4622b3395276b37e10141fab43ffea33941ca0c2.	2017-08-10 13:57:00 -04:00
Zhou Mo	4622b33952	Fix typos.	2017-08-08 11:05:38 -04:00
Adam Paszke	12813b88f6	Add DistributedDataParallel	2017-06-12 22:00:22 -04:00
Adam Paszke	8db8716c7c	Support non-default streams in NCCL reduce	2017-06-12 21:58:38 -04:00
Gregory Chanan	69287250d1	Add a broadcast parameter to copy_, use it in the library in cases where there is non-broadcasting calls exposed by the tests.	2017-06-11 05:37:59 -04:00
Adam Paszke	01a35dcace	Fix coalesced CUDA collectives for nonhomogeneous lists	2017-04-11 14:48:54 -07:00
Sam Gross	e50a1f19b3	Use streams in scatter to overlap copy with compute	2017-03-14 22:46:07 +01:00
陈云	c7c4778af6	modify docs of `broadcast` to fix issuse #940 (#970 )	2017-03-10 09:54:43 -05:00
Sam Gross	15a9fbdedb	Merge pull request #881 from colesbury/parallelize_backwards Parallelize autograd backwards	2017-03-06 16:57:19 -05:00
Sam Gross	65b66264d4	Improve broadcast/reduce performance by coalescing tensors	2017-03-06 12:47:53 -08:00
Christian Sarofeen	b1ae7f90d5	Added functionality for data parallel table (#843 )	2017-03-05 02:35:46 +01:00
Luke Yeager	e7c1e6a8e3	[pep8] Fix most lint automatically with autopep8 Here's the command I used to invoke autopep8 (in parallel!): git ls-files \| grep '\.py$' \| xargs -n1 -P`nproc` autopep8 -i Several rules are ignored in setup.cfg. The goal is to let autopep8 handle everything which it can handle safely, and to disable any rules which are tricky or controversial to address. We may want to come back and re-enable some of these rules later, but I'm trying to make this patch as safe as possible. Also configures flake8 to match pep8's behavior. Also configures TravisCI to check the whole project for lint.	2017-01-28 01:15:51 +01:00
Adam Paszke	15c1dad340	Minor fixes and torch.cuda docs	2017-01-16 20:38:14 -05:00
Sam Gross	f2d7e94948	Use torch.Size for Tensor sizes and tuple for strides See issue #20 The torch.Size class is a tuple subclass which distinguishes sizes from other tuples so that torch.Tensor(size) is interpreted as size instead of data.	2016-10-28 19:37:09 +02:00
Sam Gross	f30081a313	Use NCCL bcast and reduce functions in comm	2016-10-14 14:16:32 -07:00
Adam Paszke	11b38a6895	Add more functions to autograd	2016-09-30 16:37:07 -04:00
Adam Paszke	3eac7164f4	Add data parallel functions to nn	2016-09-27 15:45:45 -07:00

35 Commits