Make variables in dict lazy and remove DICT_KEYS guard.
We build the keys of a dict depth-first and we rely on the guards of
each element in the dict to create the correct guards. This allows us to
remove the rather buggy DICT_KEYS guard and make the guard lazy.
The guards are not completely lazy yet, as we instantiate them in
`_HashableTracker._eq_impl` but it should be possible to make them
truly lazy.
Also, adding new types to the supported types within keys should be less
error prone.
This is marginally less efficient when we graph break, but in turn we
should graph break much less. It also makes the dicts code easier to maintain
(removes `is_hashable_python_var`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117625
Approved by: https://github.com/jansel, https://github.com/peterbell10, https://github.com/anijain2305
ghstack dependencies: #117982, #118098, #117983
Co-authored-by: Ashkan <yousefpour@fb.com>
Adds "same" and "valid" padding support, as Opacus (well @ashkan-software) did https://github.com/pytorch/opacus/pull/451
Basics of it are this:
- during forward pass, if there's "same" padding, we manually pad the input (NB: this will cause a small perf hit, haven't benchmarked yet)
- during backward pass, the gradient wrt input needs to be cut down to the correct size if the original padding was same (conv_transpose doesn't accept string padding). Because conv_transpose will give us a gradient wrt the padded shape, we cut down the gradient to the correct size (we know how much padding we added to the left and right)
- then, for the per sample gradients wrt weights, the input is already padded so neither the unfold nor group convolution have any padding
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83345
Approved by: https://github.com/zou3519
Opacus has been asking for the ability to not specify a batch size. Previously a user had to do
`call_for_per_sample_grads(module, batch_size)(*args, **kwargs)`
They rightfully pointed out that in most cases when you're passing a single argument to a module's forward function, it seems repetitive to specify the batch_size. The argument here is that in cases where a user was passing more than one argument, we might not know what the batch size is if they don't match. So, this lets a user not specify a batch size (or pass it as None), meaning that
`call_for_per_sample_grad(linear_module)(torch.randn(5, 4))`
now works and has a batch size of 5
If there are multiple tensor arguments with different batch sizes, we fail, even if one of the inputs wouldn't have been used by the module because we can't tell which batch size we should be using.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80944
Approved by: https://github.com/zou3519
Conv3d has an ordering typo in its unfold (it needs a custom unfold since unfold3d doesn't exist in torch). This was caught by Opacus but not by us because dilation, padding, and stride always matched in the test cases. Conv3d has very few test cases since it doesn't have an OpInfo and we were skipping some by the nn module testing filtering. So this updated this filtering to add more of the common nn tests (one of which did fail without the change)
Closes#80953
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80943
Approved by: https://github.com/zou3519
Opacus found that Layer Norm can fail from a wrong ordering in the ExpandedWeights code. What was happening is that all our tests had the input require grad so a layer norm check was always short circuiting in the tests, avoiding the wrong ordering. This adds a test where the input does not require gradients and fixes the issue in Layer Norm
Closes#80952
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80895
Approved by: https://github.com/zou3519
Two changes in here:
(1) Changes `call_for_per_sample_grads` to be curried. Old call looks like:
`call_for_per_sample_grads(module, batch_size, args, kwargs)`
New call looks like:
`call_for_per_sample_grads(module, batch_size, loss_reduction=loss_reduction)(args, kwargs)`
(2) Adds the ability to specify a loss reduction, to match what is done in Opacus. Opacus has a more complete explanation but essentially, they want the per sample gradient behavior to match what is happens in a for loop with a single example. This gets messed up if you use a mean reduction at the end since in a batch that ends up scaling all the grad_outputs by 1/batch_size, so we offset that by scaling all the grad_samples by batch_size if the loss_reduction is mean
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80892
Approved by: https://github.com/zou3519
Opacus found an issue with the input (batched) gradients produced from instance norm. What was surprising is that we are testing that the input gradients match--but here the input gradients with instance norm are so close to 0 (typically around 1e-10) that they all look the same. It only shows up if you use another layer in front of instance norm so those small differences get magnified. This fixes the bug and makes sure that each layer we support is used in a test with a model at least once
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79800
Approved by: https://github.com/zou3519, https://github.com/albanD
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70140
[Design Doc for Expanded Weights](https://gist.github.com/samdow/fa0a164fec7963f93ff45284989cfc55) <-- gives an overview of the design for Expanded Weights
Introduces the ExpandedWeights mechanism and user-facing API without any custom implemented, faster rules.
- User facing API is in `_stateless.py` (with documentation)
- Testing is in test_expanded_weights
- The rest is the implementation of the erroring fallback + the mechanism for being able to register faster per sample grad rules. Only linear is implemented here, but they are all implemented in #70141
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D34350950
Pulled By: samdow
fbshipit-source-id: 69c664b0bc3dff6951358d79d7e5d94882f7aef2
(cherry picked from commit ae1620d3b6507b27c3bc08ecfb2b1418aa8ce7d7)