1592 Commits

Author SHA1 Message Date
4bf22fcfe2 add mixed data type support for GroupNorm (#81852)
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81852
Approved by: https://github.com/jgong5, https://github.com/malfet
2022-12-19 07:59:40 +00:00
670efb974a [CUDA] Use accumulate type to improve accuracy of grid_sample on half precision inputs (#90427)
Fixes https://github.com/pytorch/pytorch/issues/89836

This PR changes the CUDA kernels of grid_sample 2d and 3d, forward, to use accumulate type to improve accuracy on half precision inputs.

Also, the backward error on grad with half input is in the order of 1e-4, unlike 1e2 in forward process. The backward kernels are thus unchanged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90427
Approved by: https://github.com/ngimel
2022-12-15 03:41:35 +00:00
9c80f13692 [Resubmit] state_dict_pre_hook (#90435)
Resubmit of https://github.com/pytorch/pytorch/pull/88541 which got stale.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90435
Approved by: https://github.com/fegin
2022-12-08 07:54:14 +00:00
f09e7b5ce7 Replace assertEqualIgnoreType in test_nn.py (#90242)
See https://github.com/pytorch/pytorch/issues/38095.

Also removed some redundant separate `dtype` checks when `dtype` is already checked by the next line's `assertEqual`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90242
Approved by: https://github.com/malfet
2022-12-06 22:34:01 +00:00
cba96366a2 Revert "remove torch.equal usages (#89527)"
This reverts commit 4095ef8b809f922f2e0e09011afd00037d20a771.

Reverted https://github.com/pytorch/pytorch/pull/89527 on behalf of https://github.com/clee2000 due to broke periodic multigpu tests 4095ef8b80 https://github.com/pytorch/pytorch/actions/runs/3592806602/jobs/6049368502
2022-12-02 21:36:13 +00:00
4095ef8b80 remove torch.equal usages (#89527)
Preparation for the next PR in this stack: #89559.

I replaced

- `self.assertTrue(torch.equal(...))` with `self.assertEqual(..., rtol=0, atol=0, exact_device=True)`,
- the same for `self.assertFalse(...)` with `self.assertNotEqual(...)`, and
- `assert torch.equal(...)` with `torch.testing.assert_close(..., rtol=0, atol=0)` (note that we don't need to set `check_device=True` here since that is the default).

There were a few instances where the result of `torch.equal` is used directly. In that cases I've replaced with `(... == ...).all().item()` while sometimes also dropping the `.item()` depending on the context.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89527
Approved by: https://github.com/mruberry
2022-12-01 11:22:52 +00:00
f1978b18f9 add mixed data type support for LayerNorm (#81851)
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81851
Approved by: https://github.com/ezyang
2022-12-01 04:48:34 +00:00
8314d403a6 [test_nn] split multihead_attention from test_nn (#89748)
Ref: https://github.com/pytorch/pytorch/issues/63085
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89748
Approved by: https://github.com/albanD
2022-11-29 18:15:18 +00:00
620994cd7a Guard the boundary of index computed in compute_source_index_and_lambda (#89252)
Improve the fix in https://github.com/pytorch/pytorch/pull/89210
See discussion in https://github.com/pytorch/pytorch/issues/89212#issuecomment-1318911969
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89252
Approved by: https://github.com/mingfeima, https://github.com/weiwangmeta
2022-11-29 13:55:22 +00:00
56e40fe054 Let SyncBatchNorm fallback to BN if not using distributed training (#89706)
Fixes #63662
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89706
Approved by: https://github.com/soumith
2022-11-27 05:55:24 +00:00
d3c012f409 [test_nn] split pruning tests from test_nn (#89590)
Ref: https://github.com/pytorch/pytorch/issues/63085

Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89590
Approved by: https://github.com/albanD
2022-11-24 21:41:22 +00:00
0a1a53083e [primTorch] Enable regex error testing for some refs (#87765)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87765
Approved by: https://github.com/mruberry
2022-11-23 23:36:27 +00:00
1333fdcff1 [test_nn] split parametrization test from test_nn (#89552)
Ref: https://github.com/pytorch/pytorch/issues/63085

Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89552
Approved by: https://github.com/albanD
2022-11-23 17:27:40 +00:00
c651944f92 [test_nn] split hooks test from test_nn (#89201)
Ref: https://github.com/pytorch/pytorch/issues/63085

Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89201
Approved by: https://github.com/albanD
2022-11-23 08:39:45 +00:00
dd140fc351 [test_nn] move init tests from test_nn (#89202)
Ref: https://github.com/pytorch/pytorch/issues/63085

Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89202
Approved by: https://github.com/albanD
2022-11-23 08:30:51 +00:00
3beccbc299 Add BFloat16 support and optimization for mish, hardtanh backward, and silu on CPU (#82460)
### Description
* add BFloat16 support for mish and hardtanh backward on CPU.
* optimize the performance for silu

### Testing

- optimize the performance for silu: bfloat16

single socket (28 cores):
```
before: 1x128x1024  forward 0.090 s  backward  0.218 s
        10x128x1024 forward 0.146 s  backward  0.314 s

after:  1x128x1024   forward  0.064 s backward  0.100 s
        10x128x1024  forward  0.085 s backward  0.133 s
```
single core:
```
before: 1x128x1024   forward 0.300 s  backward  0.606 s
        10x128x1024  forward 2.825 s  backward  5.834 s

after:  1x128x1024   forward 0.156 s backward   0.239 s
        10x128x1024  forward 1.447 s backward   2.165 s
```

- Add BFloat16 support for mish and backward of hardtanh on CPU.

single socket (20 cores):
op | shape | fp32 / s | fp32 / s | bf16 / s |  bf16 / s
-- | -- | -- | -- | -- | --
  |   | forward | backward | forward | backward
silu | [10, 128, 10, 10] | 4.41E-05 | 7.67E-05 | 5.32E-05 | 9.38E-05
  | [10, 128, 80, 80] | 0.0008 | 0.001788 | 0.00067 | 0.001031
mish | [10, 128, 10, 10] | 0.000356 | 0.000427 | 0.000367 | 0.000436
  | [10, 128, 80, 80] | 0.004527 | 0.005807 | 0.004757 | 0.005393
hardtanh | [10, 128, 10, 10] | / | 3.97E-05 | / | 4.45E-05
  | [10, 128, 80, 80] | / | 0.001748 | / | 0.000645

single core:
op | shape | fp32 / s | fp32 / s | bf16 / s |  bf16 / s
-- | -- | -- | -- | -- | --
  |   | forward | backward | forward | backward
silu | [10, 128, 10, 10] | 1.17E-04 | 1.91E-04 | 1.35E-04 | 2.23E-04
  | [10, 128, 80, 80] | 0.007434 | 0.013141 | 0.008464 | 0.013044
mish | [10, 128, 10, 10] | 0.00103 | 0.00122 | 0.00106 | 0.001227
  | [10, 128, 80, 80] | 0.065629 | 0.078418 | 0.067779 | 0.077214
hardtanh | [10, 128, 10, 10] | / | 1.18E-04 | / | 9.30E-05
  | [10, 128, 80, 80] | / | 0.010773 | / | 0.005834

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82460
Approved by: https://github.com/mingfeima, https://github.com/malfet
2022-11-17 08:15:52 +00:00
44c9185f91 Fix empty input issue of convolution for channels last memory format (#86521)
Fixes empty input convolution issue : when input is empty e.g. shape of (0, 3, 3, 4) and weight is channels last format, at::_unsafe_view will raise "view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead."

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86521
Approved by: https://github.com/jgong5, https://github.com/malfet
2022-11-17 04:47:45 +00:00
1adb7b9b84 [nn][utils] Preserve requires_grad from original weight and bias in fuse conv/linear bn weights (#89100)
Summary:
att, previously we just call nn.Parameter which will have requires_grad=True by default, after
this PR we will preserve the requires_grad

Test Plan:
python test/test_nn.py TestFusionUtils

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D41343694](https://our.internmc.facebook.com/intern/diff/D41343694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89100
Approved by: https://github.com/ngimel
2022-11-17 03:58:16 +00:00
f5df685090 Enable channels_last_3d on SyncBatchNorm (#88401)
This PR enabled the use of fast channels_last kernels on SyncBatchNorm with channels_last_3d memory format.

With a small benchmark script here https://github.com/pytorch/pytorch/issues/88021#issuecomment-1299059859, on V100, I got

master:
```
DDP channels_last=False, run_forward_backward, time: 0.8945400714874268 sec
DDP channels_last=True, run_forward_backward, time: 1.4736433029174805 sec
```

This PR:
```
DDP channels_last=False, run_forward_backward, time: 0.8927242755889893 sec
DDP channels_last=True, run_forward_backward, time: 0.48697471618652344 sec
```

This PR is a follow-up of https://github.com/pytorch/pytorch/pull/46906

Close https://github.com/pytorch/pytorch/issues/88021
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88401
Approved by: https://github.com/ngimel
2022-11-15 19:25:53 +00:00
7ad87f63e2 Support src_mask and src_key_padding_mask for Better Transformer (#88488)
Fixes T135842750 (follow-up for #87377)

## Description

At present, having both `src_key_padding_mask` and `src_mask` at the same time is not supported on the fastpath in Transformer and Multi-Head Attention.

This PR enables using both masks on the fastpath on CPU and GPU: if both masks are passed, we merge them into a 4D mask in Python and change mask type to 2 before passing downstream.

Downstream processing in native code is not changed, as it already supports 4D mask. Indeed, it is done depending on the device:
- on CUDA, by `SoftMax.cu::masked_softmax_cuda`. When mask type is 2, it calls either `dispatch_softmax_forward` -> `softmax_warp_forward` or `at::softmax` (depending on the input size). In both cases 4D mask is supported.
- on CPU, by `SoftMax.cpp::masked_softmax_cpp`. It calls `hosted_softmax` which supports 4D mask.

## Tests
- Extended `test_mask_check_fastpath` to check that fast path is indeed taken in Transformer when two masks are passed
- Added `test_multihead_self_attn_two_masks_fast_path_mock` to check that fast path is taken in MHA when two masks are passed
- Added `test_multihead_self_attn_two_masks_fast_path` to check that fast and slow paths give the same result when two masks are passed in MHA
- `test_masked_softmax_mask_types` now covers mask type 2
- `test_transformerencoderlayer_fast_path` (CPU smoke test) is expanded to the case of both masks provided simultaneously
- `test_masked_softmax_devices_parity` checks that mask type 2 is accepted by CPU and CUDA paths

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88488
Approved by: https://github.com/mikekgfb
2022-11-10 08:12:56 +00:00
87238e6491 [nn] add remove_duplicate flag to named_parameters (#759) (#88090)
Summary:
X-link: https://github.com/pytorch/torchrec/pull/759

Since the remove_duplicate flag was added to named_buffers in D39493161 (c12f829cce), this adds the same flag to named_parameters

Test Plan:
python test/test_nn.py -k test_buffers_and_named_buffers

OSS Tests

Differential Revision: D40801899

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88090
Approved by: https://github.com/albanD
2022-11-09 00:09:20 +00:00
bbaa0637df Add error inputs to gaussian_nll_loss OpInfo (#88486)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88486
Approved by: https://github.com/lezcano
2022-11-05 20:10:54 +00:00
bc73affdad prepare removal of deprecated functionality in torch.testing (#87969)
_Redo of #86586 with all BC breaking changes granularly placed into separate commits._

---

Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac33be0c8ad9956e3fc15e78ddb6cb95, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969
Approved by: https://github.com/mruberry
2022-11-02 14:04:48 +00:00
4c78c7c82a Enable src_mask in fast path of TransformerEncoderLayer (#87377)
## Issues
Fixes https://github.com/pytorch/pytorch/issues/81129#issuecomment-1179435674

## Description

Passing a 2D attention mask `src_mask` into the fast path of `TransformerEncoderLayer` in CPU was causing an error and so was disabled in https://github.com/pytorch/pytorch/pull/81277. This PR unrolls this fix, enabling `src_mask` on the fast path:

- Either attention mask `src_mask` of shape `(L, L)` or padding mask `src_key_padding_mask` of shape `(B, L)` are now allowed on the CPU fast path. If softmax is applied along the last dimension (as in multi-head attention), these masks are processed without expanding them to 4D. Instead, when iterating through the input, `Softmax.cpp::host_softmax` converts the index to match the mask dimensions, depending on the type.
- If softmax is applied along the dimension other than the last, `Softmax.cpp::masked_softmax_cpu` expands masks to 4D, converting them to `mask_type=2`. Theoretically one could also add special optimized cases for `dim=0, 1, 2` and process them without mask expansion, but I don't know how often is that used

## Tests:
- `test_transformerencoderlayer_fast_path` is extended to cover both attention mask and padding mask
- `test_masked_softmax_mask_types_0_1` is added to ensure results from CPU softmax with attention and padding masks match the explicit slow calculation
- `test_masked_softmax_devices_parity` is added to ensure results from masked softmax on CPU and CUDA match

## Note
I had to replace `float` with `torch.get_default_dtype()` in a couple of tests for the following reason:
- `test_nn.py` [sets the default type to `torch.double`](https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L24-L26)
- If I execute `test_nn.py` and `test_transformers.py` in one `pytest` run, this default still holds for transformer tests
- Some tests in `test_transformers.py` which were previously following the slow path now switched to fast path, and hard-coded `float` started clashing with default `double`

Let me know if there is a better way around it - or maybe I'm not supposed to run tests with `pytest` like this

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87377
Approved by: https://github.com/mikekgfb, https://github.com/weiwangmeta, https://github.com/malfet
2022-10-31 19:59:36 +00:00
6735bf21c7 [test_nn] split convolution tests from test_nn (#87474)
Ref #63085

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87474
Approved by: https://github.com/albanD
2022-10-31 04:42:45 +00:00
c5cb6ec066 Allow 64bit indexing for channels-last upsample2d on CUDA (#87901)
#81665

CC @ngimel @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87901
Approved by: https://github.com/ngimel
2022-10-28 19:33:42 +00:00
eqy
4c8e1a9829 Fix 64bit indexing in vol2col (#87527)
Surfaced from #87354

CC @ngimel @ptrblck @maybeLee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87527
Approved by: https://github.com/ngimel
2022-10-23 21:17:12 +00:00
6b59d9b566 Fix registration hooks (#87369)
There is a bug in the implementation of the registration hooks introduced in https://github.com/pytorch/pytorch/pull/86148 whereby if the hook returns a tensor, then the short circuiting logic:
```
value = hook(self, name, value) or value
```
Raises an exception
```
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
```
Fixing the logic so that it only checks to see if the value is `None` before overriding

Fixes #85837

CC: @albanD @jbschlosser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87369
Approved by: https://github.com/albanD
2022-10-21 05:12:25 +00:00
4b757f4633 Assert if padding mask type is unexpected (#86353) (#87106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86353

Fix the issue described in
https://github.com/pytorch/pytorch/issues/86120

Test Plan: buck test mode/opt caffe2/test:test_transformers -- test_train_with_long_type_pad

Differential Revision: D40129968

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87106
Approved by: https://github.com/malfet
2022-10-20 16:01:54 +00:00
54ee95c8ec [nn] module: full_backward_pre_hook (#86700)
Fixes https://github.com/pytorch/pytorch/issues/42824

* [x] Test
* [x] Doc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86700
Approved by: https://github.com/soulitzer
2022-10-13 17:36:39 +00:00
b79bac0e4d Make the data types of output and input consistenst for batchnorm (#84410)
The model TTS will crash due to the issue:: when input of BN is not contiguous and the data type of input is different with that of parameters, BN will raise error `RuntimeError: !needs_dynamic_casting<func_t>::check(iter) INTERNAL ASSERT FAILED at "xxx/pytorch/aten/src/ATen/native/cpu/Loops.h":311, please report a bug to PyTorch`.

Make the data types of output and input consistenst for batchnorm to fix the issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84410
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet
2022-10-13 00:42:46 +00:00
09a676f639 Add hooks for register_buffer/module/parameter (#86148)
As described in the issue, this PR adds hooks to be run when `register_parameter`, `register_buffer` and `register_module` are called.

Fixes #85837

cc @albanD @mruberry @jbschlosser @walterddr @kshitij12345 @saketh-are
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86148
Approved by: https://github.com/albanD
2022-10-12 20:57:22 +00:00
d56017a14f [primTorch] Add ref for triplet_margin_loss, improve triplet_margin_with_distance_loss (#85614)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85614
Approved by: https://github.com/lezcano, https://github.com/mruberry
2022-10-12 18:37:58 +00:00
9eb4f9dd17 Tweak test tolerances to be compatible with A10G (#86538)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86538
Approved by: https://github.com/ngimel
2022-10-11 23:31:48 +00:00
c12f829cce [nn] Add remove_duplicate flag to named_buffers (#674) (#85903)
Summary:
X-link: https://github.com/pytorch/torchrec/pull/674

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84984

this is to allow named_buffers to return the same buffer objects with different names multiple times, needed by internal use cases
ghstack-source-id: 168589597

Test Plan:
python test/test_nn.py -k test_buffers_and_named_buffers

Imported from OSS

Reviewed By: albanD

Differential Revision: D39493161

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85903
Approved by: https://github.com/albanD
2022-10-11 18:49:09 +00:00
e18d466f35 [test_nn] split lazy_modules from test_nn (#86526)
Ref: #63085

NOTE: We don't need an accompanying XLA PR as these tests run only on CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86526
Approved by: https://github.com/albanD
2022-10-10 16:29:56 +00:00
6b295cd046 Enable autograd on Linear with sparse COO weight (#86302)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86302
Approved by: https://github.com/cpuhrsch
2022-10-06 18:39:31 +00:00
f104490d63 Support autograd on Linear with sparse compressed weight. (#86137)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86137
Approved by: https://github.com/cpuhrsch
2022-10-06 18:39:25 +00:00
6a5550fca4 [test_nn] split embedding tests from test_nn (#85892)
Ref https://github.com/pytorch/pytorch/issues/63085
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85892
Approved by: https://github.com/albanD
2022-09-30 21:45:40 +00:00
787028cadb Implement col2im decomposition and fix im2col and add a few preconditions (#85541)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85541
Approved by: https://github.com/jansel
2022-09-30 09:31:53 +00:00
85258ec17e Add mask_type=2 to masked_softmax for when mask.size() == input.size() (#85915)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85915
Approved by: https://github.com/cpuhrsch
2022-09-29 23:13:37 +00:00
ef0baba23f Use int64_t for nll_loss with cuda inputs (#85395)
Related #85005
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85395
Approved by: https://github.com/t-vi, https://github.com/lezcano
2022-09-29 17:02:04 +00:00
afaee00fec Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593)
Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ).

Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc.

Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593
Approved by: https://github.com/drisspg, https://github.com/cpuhrsch
2022-09-28 20:15:02 +00:00
b2311192e6 [NN module] speed up _load_from_state_dict (#85743)
Fixes #61398

The original implementation is very slow when the state_dict.keys() is long. This PR only passes relevant keys to the child module.

existing test passes: `pytest test/test_nn.py -k state_dict`
I couldn't figure out a good way to write a new test for this new behavior. Had a new snippet, but it will be flaky if integrated into the main CI because it's a timing based check.
But I can verify that the test took 30s to run, after this PR it only takes 0.5s.

```python
    def test_load_state_dict_large(self):
        # construct a module with 4 levels of module, 10 linear each, leads to 10k items in the dictionary
        import copy
        import time
        base_module = nn.Linear(1,1)
        model = base_module
        for level in range(4):
           model = nn.Sequential(*[copy.deepcopy(model) for _ in range(10)])
        state_dict = model.state_dict()
        self.assertEqual(len(state_dict), 20000)
        st = time.time()
        model.load_state_dict(state_dict, strict=True)
        strict_load_time = time.time() - st
        # it took 0.5 seconds to
        self.assertLess(strict_load_time, 10)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85743
Approved by: https://github.com/albanD
2022-09-28 15:26:03 +00:00
2bc82163eb Reduce memory usage requirement of test_warp_softmax_64bit_indexing in test_nn.py (re-open of #85037) (#85373)
CC @ngimel @xwang233 @ptrblck

Adds fix for `get_tolerances`, tested locally on a dgx Volta.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85373
Approved by: https://github.com/ngimel
2022-09-22 07:34:47 +00:00
77f1f98479 Re-introduce torch.Tensor.to_padded_tensor (#85293)
Differential Revision: [D39629004](https://our.internmc.facebook.com/intern/diff/D39629004)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85293
Approved by: https://github.com/cpuhrsch
2022-09-21 18:45:56 +00:00
53fdd60635 Revert "Reduce memory usage requirement of test_warp_softmax_64bit_indexing in test_nn.py (#85037)"
This reverts commit 66a9cba221ac32658ea837e88b68b859a08378d0.

Reverted https://github.com/pytorch/pytorch/pull/85037 on behalf of https://github.com/clee2000 due to broke test_warp_softmax_64bit_indexing_cuda_float32 and test_warp_softmax_64bit_indexing_cuda_float16 on rocm https://github.com/pytorch/pytorch/actions/runs/3085764744/jobs/4989643817
2022-09-20 00:13:41 +00:00
eqy
66a9cba221 Reduce memory usage requirement of test_warp_softmax_64bit_indexing in test_nn.py (#85037)
For reference: #84944

CC @xwang233 @ngimel @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85037
Approved by: https://github.com/ngimel, https://github.com/pmeier
2022-09-19 21:31:08 +00:00
f37069aac7 Re-enable fixed dynamo tests (#84969)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84969
Approved by: https://github.com/bdhirsh, https://github.com/ezyang
2022-09-16 15:36:52 +00:00
b6d6a78c12 [ROCM] test_batchnorm_cudnn_nhwc (#84603)
This pr enables test_batchnorm_cudnn_nhwc.  This is a follow up to https://github.com/pytorch/pytorch/pull/82512
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84603
Approved by: https://github.com/jithunnair-amd, https://github.com/malfet
2022-09-14 15:50:14 +00:00