Fixes#102441
improves type hinting of the module attribute, since it can easily be bound in `DataParallel.__init__`
```python
from torch.nn import DataParallel
class MyModule(Module):
...
my_data_parallel = DataParallel(MyModule(), device_ids=[0, 1, 2])
reveal_type(my_data_parallel) # Type of "my_data_parallel" is "DataParallel[MyModule]"
reveal_type(my_data_parallel.module) # Type of "my_data_parallel.module" is "MyModule"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102455
Approved by: https://github.com/Skylion007
Fixes#91648
As explained in the tracking issue, the incomplete type stubs in `torch/nn/parallel` mask `DataParallel` methods relevant for subclassing and also mask type issues present in the code as well.
One notable change here is the addition of [`allow_redefinition = True`](https://mypy.readthedocs.io/en/stable/config_file.html#confval-allow_redefinition) in `mypy.ini`, which allows for a common pattern:
> Allows variables to be redefined with an arbitrary type, as long as the redefinition is in the same block and nesting level as the original definition.
This is added specifically to allow for the type narrowing of `device_ids` in `torch.nn.parallel.data_parallel.data_parallel` from `Sequence[Union[int, torch.device]]` to `Sequence[int]`.
Other than this, there are various renamings and `type: ignore` comments added to bypass errors that arose from the merging.
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101528
Approved by: https://github.com/ezyang
Summary:
These unused variables were identified by [pyflakes](https://pypi.org/project/pyflakes/). They can be safely removed to simplify the code and possibly improve performance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50100
Reviewed By: ezyang
Differential Revision: D25797764
Pulled By: smessmer
fbshipit-source-id: ced341aee692f429d2dcc3a4ef5c46c8ee99cabb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40902
See the bottom of this stack for context.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D22360210
Pulled By: suo
fbshipit-source-id: 4275127173a36982ce9ad357aa344435b98e1faf
Summary:
Decouple DataParallel/DistributedDataParallel from CUDA to support more device types.
- Move torch/cuda/comm.py to torch/nn/parallel/comm.py with minor changes for common devices support. Torch.cuda.comm is kept as is for backward compatibility
- Provide common APIs to arbitrary device types without changing existing CUDA APIs in torch.cuda space.
- Replace the torch.cuda calls in DataParellel/DistributedDataParallel with the new APIs.
Related RFC: [https://github.com/pytorch/pytorch/issues/36160](https://github.com/pytorch/pytorch/issues/36160)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38454
Differential Revision: D22051557
Pulled By: mrshenli
fbshipit-source-id: 7842dad0e5d3ca0f6fb760bda49182dcf6653af8
Summary:
In DataParallel, replica parameters are not leaves (because they are computed via broadcast from master parameters), and should be treated as such. Fixes https://github.com/pytorch/pytorch/issues/33552
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33907
Differential Revision: D20150199
Pulled By: ngimel
fbshipit-source-id: 5965d4115b6b3a8433063126ff6269567872fbeb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29499
This changes how DataParallel and trace module creation works so that
we no longer need to mutate Module class after it has been created.
The only remaining usage of register_* functions are now inside C++
tests.
Test Plan: Imported from OSS
Differential Revision: D18413652
Pulled By: zdevito
fbshipit-source-id: f039e5400cd016632768be4547892f6a69645c20
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28828
This updates torch::script::Module to more closely match the behavior
of nn.Module. In particular, it implements the (optionally recurisive)
iterators that retrieve submodules, parameters, and buffers and makes
their names match the python versions.
This also removes the individual accessors for Parameter, Module, Buffer, etc.
and replaces them with a single `attr` function which is equivalent to
writing `a.foo` in Python (`setattr` emulates `a.foo = v`).
As we build out the user-facing API for TorchScript values this will end
up matching how an attribute is accessed on general objects.
This PR preservers the python bindings for script::Module by emulating the
old API at the binding level. A followup will clean up the usage to more
directly match the C++ API.
Test Plan: Imported from OSS
Differential Revision: D18197611
Pulled By: zdevito
fbshipit-source-id: 7ee4dcbb258605d1c988314b05d938423f1ccee5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26666
Changes:
- Introduce a `ConcreteModuleType` concept. This acts both as the key into the type
cache, and as the source of truth for `ModuleValue::attr` queries. It needs
to do both jobs because that's how we ensure correctness (if the types are
different, it's because `ModuleValue::attr` would return different things).
- Now `recursive_script` will first construct a `ConcreteModuleType` and search for a
pre-existing type before starting compilation.
- All previous paths to creating a `ScriptModule` (including inheriting from
`ScriptModule`) are now rewritten to go through `create_script_module`, so
that we have only a single place where construction happens.
Behavioral changes:
- Big change to `torch.jit.ScriptModule` inheritance: all attributes are now
recursively scripted if possible, matching recursive scripting semantics.
This makes it hard to keep something from being scripted (for example, a
Python submodule). Possibly we'll need an `ignore()` type thing for
attributes. In particular, this adds `self.training` to *every* ScriptModule, since
it's present on every `nn.Module`.
- I believe this change to be transparent to existing users of the inheritance API, since if you had an attribute that is unscriptable that you never used, there is no error. In some cases, we will create new attributes (even if they are unused), which will increase serialized model size from before.
Test Plan: Imported from OSS
Differential Revision: D17551196
Pulled By: suo
fbshipit-source-id: b476d1c9feb3ddfd63406d90989aaf9dfe890591
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27399
This was devised in a time when we didn't have module attributes. They
are essentially just tensor lists, so represent them that way. This has
the additional benefit of making the RNN forward pass faster because we
effectively cache the flattened weights.
The only complication part is that someone may come along and do:
```
my_rnn_mod.w_ih_l0 = torch.nn.Parameter(...)
```
This means we need to override setattr to keep the flattened weights
cache up to date.
Test Plan: Imported from OSS
Differential Revision: D17785658
Pulled By: suo
fbshipit-source-id: 7789cd1d0d4922bfd5eba1716976442fbf150766
Summary:
* Deletes all weak script decorators / associated data structures / methods
* In order to keep supporting the standard library in script, this enables recursive script on any function defined in `torch.nn`
* Most changes in `torch/nn` are the result of `ag -Q "weak" torch/nn/ -l | xargs sed -i '/weak/d'`, only `rnn.py` needed manual editing to use the `ignore` and `export` to continue supporting the overloaded `forward` methods
* `Sequential`/`ModuleList` no longer need to be added to constants since they are compiled on demand
This should also fix https://github.com/pytorch/pytorch/issues/22212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22212
Differential Revision: D15988346
Pulled By: driazati
fbshipit-source-id: af223e3ad0580be895377312949997a70e988e4f
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19587 [jit] Make ScriptModule.training an attribute instead of a parameter**
Remove the hack we had previously where `training` was a buffer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19587
Differential Revision: D15502768
Pulled By: driazati
fbshipit-source-id: 3022f2d57ec6849868f9225d9bc2bfb7828cb318
Summary:
If the input `network` resides on multiple GPUs, `devices` must be a 2D list with `devices[0]` matching `network`'s devices. See #18591
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18687
Differential Revision: D14706162
Pulled By: mrshenli
fbshipit-source-id: dca630d3308f2dbcf8b75629c452d7a64092ba42
Summary:
Similar to `nn.Parameter`s, this PR lets you store any `IValue` on a module as an attribute on a `ScriptModule` (only from the Python front-end currently). To mark something as an attribute, it should wrapped in `jit.Attribute(value, type)` (ex. `self.table = torch.jit.Attribute(table, Dict[str, torch.Tensor])`)
Followup Work:
* (de)serializing for use in C++
* change `self.training` to be a `bool` attribute instead of a buffer
* mutable attributes
* string frontend support
* documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17309
Differential Revision: D14354316
Pulled By: driazati
fbshipit-source-id: 67e08ab5229366b67fbc837e67b58831a4fb3318
Summary:
Causing a problem with spectral norm, although SN won't use that anymore after #13350 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13352
Differential Revision: D14209562
Pulled By: ezyang
fbshipit-source-id: f5e3183e1e7050ac5a66d203de6f8cf56e775134
Summary:
support data parallel for ScriptModule.
see unit tests for testing done for this PR. I also tried traced version of resnet18 from torchvision.
I'm yet to try a complete end-to-end data parallel training. This will be next steps.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16891
Differential Revision: D14002222
Pulled By: gqchen
fbshipit-source-id: fce3598169113215599815c6978e66d3c3a8c282
Summary:
This commit adds the ``buffers()`` and ``named_buffers()`` methods as
analogues of ``parameters()`` and ``named_parameters()``.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10554
Reviewed By: SsnL
Differential Revision: D9367762
Pulled By: jma127
fbshipit-source-id: f2042e46a7e833dce40cb41681dbd80d7885c74e
modules(): returns an iterator over all modules in the network
children(): returns an iterator over immediate children
Also fix __getitem__ in Sequential