Commit Graph

24 Commits

Author SHA1 Message Date
d795fb225a [RFC] Add pyrefly to lintrunner (#165179)
This will add pyrefly to lint runner as a warning only - and allow us to collect feedback about the tool before switching to pyrefly as the main type checker.

References the steps outlined here: : https://github.com/pytorch/pytorch/issues/163283:

test plan:
`lintrunner init`
`lintrunner`
confirm when pyrefly errors are present results look like: https://gist.github.com/maggiemoss/e6cb2d015dd1ded560ae1329098cf33f

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165179
Approved by: https://github.com/ezyang
2025-10-16 20:07:09 +00:00
9944cac6e6 Add suppressions to torch/_inductor (#165062)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Split this directory into two PRs to keep them from being too large.

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:
INFO 0 errors (6,884 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165062
Approved by: https://github.com/oulgen, https://github.com/mlazos
2025-10-09 20:34:20 +00:00
c855f8632e Pyrefly suppressions 7/n (#164913)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Almost there!

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:
 INFO 0 errors (6,884 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164913
Approved by: https://github.com/oulgen
2025-10-08 07:27:17 +00:00
5880996b4c Expose torch.nn.utils.parametrize (#163835)
`torch.nn.utils.parametrize` is not imported from `torch/nn/utils/__init__.py`, thus is not exposed and make it hard for code editors to statically analyze the code and provide auto-completion based on the function signature.

<img width="615" height="292" alt="Screenshot 2025-09-25 at 12 01 52 PM" src="https://github.com/user-attachments/assets/a276f6f0-87f3-4732-943d-2a92ea871974" />

after the fix:

<img width="964" height="393" alt="Screenshot 2025-09-25 at 12 02 16 PM" src="https://github.com/user-attachments/assets/ca47f09e-dc4e-4420-a2d2-11669e07471a" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163835
Approved by: https://github.com/albanD
2025-09-26 16:38:18 +00:00
2ee91db03d Add APIs to separate norm calculation and gradient scaling in nn.utils.clip_grad_norm_ (#139662)
Fixes https://github.com/pytorch/pytorch/issues/139467

Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_total_norm` and then `nn.utils.clip_grads_with_norm_` . `clip_grad_norm_` now calls into these two new ops,

`get_total_norm` is generalized (rather than `get_grad_norm` due to the discussion on the issue from @awgu)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139662
Approved by: https://github.com/H-Huang
2024-11-07 23:13:23 +00:00
5a80d2df84 [BE] enable UFMT for torch/nn/utils (#128595)
Part of #123062

- #123062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128595
Approved by: https://github.com/Skylion007
2024-06-13 18:34:57 +00:00
2d2016fdf8 WIP Add compatibility with channels_last_3d for conv3d (#114790)
Part of a multi-PR work to fix #59168

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114790
Approved by: https://github.com/albanD
2023-12-20 19:28:25 +00:00
ad4472833c define public API for torch.nn.utils (#111026)
Adding modules imported here and the following functions to the `__all__`:
* [clip_grad_norm_](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html)
* [clip_grad_value_](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_value_.html)
* [remove_weight_norm](https://pytorch.org/docs/stable/generated/torch.nn.utils.remove_weight_norm.html)
* [parameters_to_vector](https://pytorch.org/docs/stable/generated/torch.nn.utils.parameters_to_vector.html)
* [vector_to_parameters](https://pytorch.org/docs/stable/generated/torch.nn.utils.vector_to_parameters.html)
* [remove_spectral_norm](https://pytorch.org/docs/stable/generated/torch.nn.utils.remove_spectral_norm.html)
* [skip_init](https://pytorch.org/docs/stable/generated/torch.nn.utils.skip_init.html)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111026
Approved by: https://github.com/mikaylagawarecki
2023-10-12 23:05:23 +00:00
a6a5e6cecf move the stateless util to public API!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75834
Approved by: https://github.com/zou3519, https://github.com/jbschlosser
2022-04-21 13:42:24 +00:00
c58709b7bb Helper function for skipping module parameter / buffer initialization (#57555)
Summary:
This PR introduces a helper function named `torch.nn.utils.skip_init()` that accepts a module class object + `args` / `kwargs` and instantiates the module while skipping initialization of parameter / buffer values. See discussion at https://github.com/pytorch/pytorch/issues/29523 for more context. Example usage:

```python
import torch

m = torch.nn.utils.skip_init(torch.nn.Linear, 5, 1)
print(m.weight)

m2 = torch.nn.utils.skip_init(torch.nn.Linear, 5, 1, device='cuda')
print(m2.weight)

m3 = torch.nn.utils.skip_init(torch.nn.Linear, in_features=5, out_features=1)
print(m3.weight)
```
```
Parameter containing:
tensor([[-3.3011e+28,  4.5915e-41, -3.3009e+28,  4.5915e-41,  0.0000e+00]],
       requires_grad=True)
Parameter containing:
tensor([[-2.5339e+27,  4.5915e-41, -2.5367e+27,  4.5915e-41,  0.0000e+00]],
       device='cuda:0', requires_grad=True)
Parameter containing:
tensor([[1.4013e-45, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]],
       requires_grad=True)
```

Bikeshedding on the name / namespace is welcome, as well as comments on the design itself - just wanted to get something out there for discussion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57555

Reviewed By: zou3519

Differential Revision: D28640613

Pulled By: jbschlosser

fbshipit-source-id: 5654f2e5af5530425ab7a9e357b6ba0d807e967f
2021-05-24 11:28:32 -07:00
e1bb9d2d99 Reimplement spectral_norm using new parametrization functionality (#57784)
Summary:
Adds a new file under `torch/nn/utils/parametrizations.py` which should contain all the parametrization implementations

For spectral_norm we add the `SpectralNorm` module which can be registered using `torch.nn.utils.parametrize.register_parametrization` or using a wrapper: `spectral_norm`, the same API the old implementation provided.

Most of the logic is borrowed from the old implementation:
 - Just like the old implementation, there should be cases when retrieving the weight should perform another power iteration (thus updating the weight) and cases where it shouldn't. For example in eval mode `self.training=True`, we do not perform power iteration.

There are also some differences/difficulties with the new implementation:
 - Using new parametrization functionality as-is there doesn't seem to be a good way to tell whether a 'forward' call was the result of parametrizations are unregistered (and leave_parametrizations=True) or when the injected property's getter was invoked. The issue is that we want perform power iteration in the latter case but not the former, but we don't have this control as-is. So, in this PR I modified the parametrization functionality to change the module to eval mode before triggering their forward call
 - Updates the vectors based on weight on initialization to fix https://github.com/pytorch/pytorch/issues/51800 (this avoids silently update weights in eval mode). This also means that we perform twice any many power iterations by the first forward.
 - right_inverse is just the identity for now, but maybe it should assert that the passed value already satisfies the constraints
 - So far, all the old spectral_norm tests have been cloned, but maybe we don't need so much testing now that the core functionality is already well tested

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57784

Reviewed By: ejguan

Differential Revision: D28413201

Pulled By: soulitzer

fbshipit-source-id: e8f1140f7924ca43ae4244c98b152c3c554668f2
2021-05-13 14:16:13 -07:00
Jie
9e7c47644f [NHWC CUDNN CONV]Update cudnn convolution memory_format behavior (#32482)
Summary:
1. Allows both the memory_format of weight & input to dictate the output
memory_format.
2. Provides utility function to recursively convert memory_format of Conv2d and
ConvTranspose2d layers. This allows easy model conversion and ensures that lost
memory_format through incompatible layers could be restored at Convolution-like
layer, where significant performance boost is expected on later generation CUDA
devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32482

Differential Revision: D19647903

Pulled By: VitalyFedyunin

fbshipit-source-id: 62c96ff6208ff5e84fae1f55b63af9a010ad199a
2020-02-04 09:50:57 -08:00
d83389d327 Ignore F401 in all __init__.py without putting noqa (#25823)
Summary:
By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line.

http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823

Differential Revision: D17252182

Pulled By: soumith

fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b
2019-10-23 15:28:13 -07:00
761ae8e9b6 Add intrinsic module mappings (#23753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23753

Add intrinsic(fused) module mappings in quantize.py to enable mapping fused modules
in both QAT and post PTQ

Differential Revision: D16820749

fbshipit-source-id: 07de76a4f09b44bde8b193c103eac02c22b875b6
2019-08-15 09:37:24 -07:00
058645acb1 Fusion and _intrinsic modules (#23003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23003

torch.quantization.fuse_module and torch.nn._intrinsic convRelu and LinearRelu

Fusion function to combine specific modules: (conv,bn) and  (conv,bn,relu).
In all cases, replace modules in place. The first module is replaced with the _intrinsic fused module and the remaining modules are replaced by nn.Identity.
Support both training and eval. For training, the modules are "fused" with a sequential container. This is to allow for further module swaps for quantization aware training.
Also add: torch.nn._intrinsic for convRelu and LinearRelu.

TODO: Add tests for _intrinsic modules.

Conv BN fusion code is based on DsKhudia's implementation

Differential Revision: D16199720

fbshipit-source-id: 95fb9ffe72b361d280313b2ec57de2acd4f9dda2
2019-07-23 14:54:19 -07:00
79533ef097 convert_sync_batch_norm to SyncBatchNorm (#18787)
Summary:
Closes #18382

Please let me know if any changes are required.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18787

Differential Revision: D14821147

Pulled By: soumith

fbshipit-source-id: edd98eab1b3f4151c4ae5148146435ddb2ae678d
2019-04-07 00:13:02 -07:00
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
39669316a6 (#14267)
Summary:
- Summary:

Added synchronized batch normalization, allows synchronization of stats across mini-batches between processes within a process group.
Current implementation uses a mixture of extended ATen native functions (cpp cuda extension) + torch.nn.modules (c10d python API)

- User-facing api:

1. torch.nn.utils.convert_sync_batchnorm(modules, process_group=None)

2. torch.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True, ***process_group=None***)

- supported use case:
DistributedDataParallel with ***single-gpu multi-process***

a. User creates model containing `torch.nn.SyncBatchNorm` layers through one of the ways listed below:

  1. use layers directly:

     torch.nn.SyncBatchNorm(...)

     similar API as with torch.nn.BatchNormXd(...)
     with added argument `process_group` which is used to limit the scope of
     synchronization within each process group. Default value is None, which
     implies synchronization across all GPUs

  2. use torch.nn.utils.convert_sync_batchnorm(modules, process_group)

     recursively convert all `torch.nn.BatchNormXd` into `torch.nn.SyncBatchNorm`
     preserving values of parameters/buffers.
     the utility function also allows user to specify process_group value to all
     converted layers.

b. user wraps their model with
   `torch.distributed.parallel.DataParallelDistributed`, from this point, user
   should follow the general guidelines for DDP use guide

- Error checking

For use cases not supported, we error out:

1. Application launched without ddp:
   > import torch
   > sbn = torch.nn.SyncBatchNorm(10).cuda()
   > inp = torch.randn(5, 10, 3, 3).cuda()
   > sbn(inp) --> Error!
   > AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel

2. Application launched using DDP with multi-GPU per-process:
   > ddp_module = nn.parallel.DistributedDataParallel(module, device_ids=device_ids, output_device=args.local_rank)
   > ValueError: SyncBatchNorm is only supported for DDP with single GPU per process
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14267

Differential Revision: D14270035

Pulled By: ezyang

fbshipit-source-id: 4956d8fa565c32e9df5408d53719ff9f945f4d6d
2019-03-06 13:39:11 -08:00
ba046331e8 add spectral normalization [pytorch] (#6929)
* initial commit for spectral norm

* fix comment

* edit rst

* fix doc

* remove redundant empty line

* fix nit mistakes in doc

* replace l2normalize with F.normalize

* fix chained `by`

* fix docs

fix typos
add comments related to power iteration and epsilon
update link to the paper
make some comments specific

* fix typo
2018-05-01 17:00:30 +08:00
7fcaf3b49e Update torch.nn.init and torch.nn.utils.clip_grad (#6173)
Introducing two updates.

1. Add param to He initialization scheme in torch.nn.init
Problem solved:
The function calculate_gain can take an argument to specify the type of non-linearity used. However, it wasn't possible to pass this argument directly to the He / Kaiming weight initialization function.

2. Add util to clip gradient value in torch.nn.utils.clip_grad
Problem solved:
DL libraries typically provide users with easy access to functions for clipping the gradients both using the norm and a fixed value. However, the utils clip_grad.py only had a function to clip the gradient norm.

* add param to He initialization scheme in torch.nn.init

* add util to clip gradient value in torch/nn/utils/clip_grad.py

* update doc in torch.nn.utils.clip_grad

* update and add test for torch.nn.utils.clip_grad

* update function signature in torch.nn.utils.clip_grad to match suffix_ convention

* ensure backward compatibility in torch.nn.utils.clip_grad

* remove DeprecationWarning in torch.nn.utils.clip_grad

* extend test and implementation of torch.nn.utils.clip_grad

* update test and implementation torch.nn.utils.clip_grad
2018-04-17 11:32:32 -04:00
eff5b8b09c parameters to vector and vector to parameters (#2795) 2017-09-23 13:06:40 -04:00
2c038f2074 Add weight normalization implementation (#1945)
* Add weight normalization implementation

This adds forward "pre-hooks" which get called before the module's
forward() method. Weight norm is implemented as a hook which calculates
the weight variable from the weight_g and weight_v every iteration.

Based on @rtqichen implementation.

* Specify return type
2017-06-30 15:41:40 -04:00
da725830c2 Add support for variable length sequences in RNNs (#873) 2017-03-01 17:36:32 +01:00
c2c1710047 Add clip_grad_norm 2017-02-20 23:28:31 -08:00