pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
mingfeima	4bf22fcfe2	add mixed data type support for GroupNorm (#81852 ) 1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81852 Approved by: https://github.com/jgong5, https://github.com/malfet	2022-12-19 07:59:40 +00:00
Xiao Wang	670efb974a	[CUDA] Use accumulate type to improve accuracy of grid_sample on half precision inputs (#90427 ) Fixes https://github.com/pytorch/pytorch/issues/89836 This PR changes the CUDA kernels of grid_sample 2d and 3d, forward, to use accumulate type to improve accuracy on half precision inputs. Also, the backward error on grad with half input is in the order of 1e-4, unlike 1e2 in forward process. The backward kernels are thus unchanged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90427 Approved by: https://github.com/ngimel	2022-12-15 03:41:35 +00:00
Rohan Varma	9c80f13692	[Resubmit] state_dict_pre_hook (#90435 ) Resubmit of https://github.com/pytorch/pytorch/pull/88541 which got stale. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90435 Approved by: https://github.com/fegin	2022-12-08 07:54:14 +00:00
Sergii Dymchenko	f09e7b5ce7	Replace assertEqualIgnoreType in test_nn.py (#90242 ) See https://github.com/pytorch/pytorch/issues/38095. Also removed some redundant separate `dtype` checks when `dtype` is already checked by the next line's `assertEqual`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90242 Approved by: https://github.com/malfet	2022-12-06 22:34:01 +00:00
PyTorch MergeBot	cba96366a2	Revert "remove torch.equal usages (#89527 )" This reverts commit 4095ef8b809f922f2e0e09011afd00037d20a771. Reverted https://github.com/pytorch/pytorch/pull/89527 on behalf of https://github.com/clee2000 due to broke periodic multigpu tests `4095ef8b80` https://github.com/pytorch/pytorch/actions/runs/3592806602/jobs/6049368502	2022-12-02 21:36:13 +00:00
Philip Meier	4095ef8b80	remove torch.equal usages (#89527 ) Preparation for the next PR in this stack: #89559. I replaced - `self.assertTrue(torch.equal(...))` with `self.assertEqual(..., rtol=0, atol=0, exact_device=True)`, - the same for `self.assertFalse(...)` with `self.assertNotEqual(...)`, and - `assert torch.equal(...)` with `torch.testing.assert_close(..., rtol=0, atol=0)` (note that we don't need to set `check_device=True` here since that is the default). There were a few instances where the result of `torch.equal` is used directly. In that cases I've replaced with `(... == ...).all().item()` while sometimes also dropping the `.item()` depending on the context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89527 Approved by: https://github.com/mruberry	2022-12-01 11:22:52 +00:00
mingfeima	f1978b18f9	add mixed data type support for LayerNorm (#81851 ) 1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81851 Approved by: https://github.com/ezyang	2022-12-01 04:48:34 +00:00
kshitij12345	8314d403a6	[test_nn] split multihead_attention from test_nn (#89748 ) Ref: https://github.com/pytorch/pytorch/issues/63085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89748 Approved by: https://github.com/albanD	2022-11-29 18:15:18 +00:00
Jiong Gong	620994cd7a	Guard the boundary of index computed in compute_source_index_and_lambda (#89252 ) Improve the fix in https://github.com/pytorch/pytorch/pull/89210 See discussion in https://github.com/pytorch/pytorch/issues/89212#issuecomment-1318911969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89252 Approved by: https://github.com/mingfeima, https://github.com/weiwangmeta	2022-11-29 13:55:22 +00:00
Yuxin Wu	56e40fe054	Let SyncBatchNorm fallback to BN if not using distributed training (#89706 ) Fixes #63662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89706 Approved by: https://github.com/soumith	2022-11-27 05:55:24 +00:00
kshitij12345	d3c012f409	[test_nn] split pruning tests from test_nn (#89590 ) Ref: https://github.com/pytorch/pytorch/issues/63085 Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89590 Approved by: https://github.com/albanD	2022-11-24 21:41:22 +00:00
Nikita Karetnikov	0a1a53083e	[primTorch] Enable regex error testing for some refs (#87765 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87765 Approved by: https://github.com/mruberry	2022-11-23 23:36:27 +00:00
kshitij12345	1333fdcff1	[test_nn] split parametrization test from test_nn (#89552 ) Ref: https://github.com/pytorch/pytorch/issues/63085 Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89552 Approved by: https://github.com/albanD	2022-11-23 17:27:40 +00:00
Kshiteej K	c651944f92	[test_nn] split hooks test from test_nn (#89201 ) Ref: https://github.com/pytorch/pytorch/issues/63085 Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89201 Approved by: https://github.com/albanD	2022-11-23 08:39:45 +00:00
Kshiteej K	dd140fc351	[test_nn] move init tests from test_nn (#89202 ) Ref: https://github.com/pytorch/pytorch/issues/63085 Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89202 Approved by: https://github.com/albanD	2022-11-23 08:30:51 +00:00
ecao	3beccbc299	Add BFloat16 support and optimization for mish, hardtanh backward, and silu on CPU (#82460 ) ### Description * add BFloat16 support for mish and hardtanh backward on CPU. * optimize the performance for silu ### Testing - optimize the performance for silu: bfloat16 single socket (28 cores): ``` before: 1x128x1024 forward 0.090 s backward 0.218 s 10x128x1024 forward 0.146 s backward 0.314 s after: 1x128x1024 forward 0.064 s backward 0.100 s 10x128x1024 forward 0.085 s backward 0.133 s ``` single core: ``` before: 1x128x1024 forward 0.300 s backward 0.606 s 10x128x1024 forward 2.825 s backward 5.834 s after: 1x128x1024 forward 0.156 s backward 0.239 s 10x128x1024 forward 1.447 s backward 2.165 s ``` - Add BFloat16 support for mish and backward of hardtanh on CPU. single socket (20 cores): op \| shape \| fp32 / s \| fp32 / s \| bf16 / s \| bf16 / s -- \| -- \| -- \| -- \| -- \| -- \| \| forward \| backward \| forward \| backward silu \| [10, 128, 10, 10] \| 4.41E-05 \| 7.67E-05 \| 5.32E-05 \| 9.38E-05 \| [10, 128, 80, 80] \| 0.0008 \| 0.001788 \| 0.00067 \| 0.001031 mish \| [10, 128, 10, 10] \| 0.000356 \| 0.000427 \| 0.000367 \| 0.000436 \| [10, 128, 80, 80] \| 0.004527 \| 0.005807 \| 0.004757 \| 0.005393 hardtanh \| [10, 128, 10, 10] \| / \| 3.97E-05 \| / \| 4.45E-05 \| [10, 128, 80, 80] \| / \| 0.001748 \| / \| 0.000645 single core: op \| shape \| fp32 / s \| fp32 / s \| bf16 / s \| bf16 / s -- \| -- \| -- \| -- \| -- \| -- \| \| forward \| backward \| forward \| backward silu \| [10, 128, 10, 10] \| 1.17E-04 \| 1.91E-04 \| 1.35E-04 \| 2.23E-04 \| [10, 128, 80, 80] \| 0.007434 \| 0.013141 \| 0.008464 \| 0.013044 mish \| [10, 128, 10, 10] \| 0.00103 \| 0.00122 \| 0.00106 \| 0.001227 \| [10, 128, 80, 80] \| 0.065629 \| 0.078418 \| 0.067779 \| 0.077214 hardtanh \| [10, 128, 10, 10] \| / \| 1.18E-04 \| / \| 9.30E-05 \| [10, 128, 80, 80] \| / \| 0.010773 \| / \| 0.005834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82460 Approved by: https://github.com/mingfeima, https://github.com/malfet	2022-11-17 08:15:52 +00:00
ecao	44c9185f91	Fix empty input issue of convolution for channels last memory format (#86521 ) Fixes empty input convolution issue : when input is empty e.g. shape of (0, 3, 3, 4) and weight is channels last format, at::_unsafe_view will raise "view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead." Pull Request resolved: https://github.com/pytorch/pytorch/pull/86521 Approved by: https://github.com/jgong5, https://github.com/malfet	2022-11-17 04:47:45 +00:00
Jerry Zhang	1adb7b9b84	[nn][utils] Preserve requires_grad from original weight and bias in fuse conv/linear bn weights (#89100 ) Summary: att, previously we just call nn.Parameter which will have requires_grad=True by default, after this PR we will preserve the requires_grad Test Plan: python test/test_nn.py TestFusionUtils Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41343694](https://our.internmc.facebook.com/intern/diff/D41343694) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89100 Approved by: https://github.com/ngimel	2022-11-17 03:58:16 +00:00
Xiao Wang	f5df685090	Enable channels_last_3d on SyncBatchNorm (#88401 ) This PR enabled the use of fast channels_last kernels on SyncBatchNorm with channels_last_3d memory format. With a small benchmark script here https://github.com/pytorch/pytorch/issues/88021#issuecomment-1299059859, on V100, I got master: ``` DDP channels_last=False, run_forward_backward, time: 0.8945400714874268 sec DDP channels_last=True, run_forward_backward, time: 1.4736433029174805 sec ``` This PR: ``` DDP channels_last=False, run_forward_backward, time: 0.8927242755889893 sec DDP channels_last=True, run_forward_backward, time: 0.48697471618652344 sec ``` This PR is a follow-up of https://github.com/pytorch/pytorch/pull/46906 Close https://github.com/pytorch/pytorch/issues/88021 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88401 Approved by: https://github.com/ngimel	2022-11-15 19:25:53 +00:00
Grigory Sizov	7ad87f63e2	Support src_mask and src_key_padding_mask for Better Transformer (#88488 ) Fixes T135842750 (follow-up for #87377) ## Description At present, having both `src_key_padding_mask` and `src_mask` at the same time is not supported on the fastpath in Transformer and Multi-Head Attention. This PR enables using both masks on the fastpath on CPU and GPU: if both masks are passed, we merge them into a 4D mask in Python and change mask type to 2 before passing downstream. Downstream processing in native code is not changed, as it already supports 4D mask. Indeed, it is done depending on the device: - on CUDA, by `SoftMax.cu::masked_softmax_cuda`. When mask type is 2, it calls either `dispatch_softmax_forward` -> `softmax_warp_forward` or `at::softmax` (depending on the input size). In both cases 4D mask is supported. - on CPU, by `SoftMax.cpp::masked_softmax_cpp`. It calls `hosted_softmax` which supports 4D mask. ## Tests - Extended `test_mask_check_fastpath` to check that fast path is indeed taken in Transformer when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path_mock` to check that fast path is taken in MHA when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path` to check that fast and slow paths give the same result when two masks are passed in MHA - `test_masked_softmax_mask_types` now covers mask type 2 - `test_transformerencoderlayer_fast_path` (CPU smoke test) is expanded to the case of both masks provided simultaneously - `test_masked_softmax_devices_parity` checks that mask type 2 is accepted by CPU and CUDA paths Pull Request resolved: https://github.com/pytorch/pytorch/pull/88488 Approved by: https://github.com/mikekgfb	2022-11-10 08:12:56 +00:00
Samantha Andow	87238e6491	[nn] add remove_duplicate flag to named_parameters (#759 ) (#88090 ) Summary: X-link: https://github.com/pytorch/torchrec/pull/759 Since the remove_duplicate flag was added to named_buffers in D39493161 (`c12f829cce`), this adds the same flag to named_parameters Test Plan: python test/test_nn.py -k test_buffers_and_named_buffers OSS Tests Differential Revision: D40801899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88090 Approved by: https://github.com/albanD	2022-11-09 00:09:20 +00:00
Nikita Karetnikov	bbaa0637df	Add error inputs to `gaussian_nll_loss` `OpInfo` (#88486 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88486 Approved by: https://github.com/lezcano	2022-11-05 20:10:54 +00:00
Philip Meier	bc73affdad	prepare removal of deprecated functionality in torch.testing (#87969 ) _Redo of #86586 with all BC breaking changes granularly placed into separate commits._ --- Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac33be0c8ad9956e3fc15e78ddb6cb95, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969 Approved by: https://github.com/mruberry	2022-11-02 14:04:48 +00:00
Grigory Sizov	4c78c7c82a	Enable `src_mask` in fast path of `TransformerEncoderLayer` (#87377 ) ## Issues Fixes https://github.com/pytorch/pytorch/issues/81129#issuecomment-1179435674 ## Description Passing a 2D attention mask `src_mask` into the fast path of `TransformerEncoderLayer` in CPU was causing an error and so was disabled in https://github.com/pytorch/pytorch/pull/81277. This PR unrolls this fix, enabling `src_mask` on the fast path: - Either attention mask `src_mask` of shape `(L, L)` or padding mask `src_key_padding_mask` of shape `(B, L)` are now allowed on the CPU fast path. If softmax is applied along the last dimension (as in multi-head attention), these masks are processed without expanding them to 4D. Instead, when iterating through the input, `Softmax.cpp::host_softmax` converts the index to match the mask dimensions, depending on the type. - If softmax is applied along the dimension other than the last, `Softmax.cpp::masked_softmax_cpu` expands masks to 4D, converting them to `mask_type=2`. Theoretically one could also add special optimized cases for `dim=0, 1, 2` and process them without mask expansion, but I don't know how often is that used ## Tests: - `test_transformerencoderlayer_fast_path` is extended to cover both attention mask and padding mask - `test_masked_softmax_mask_types_0_1` is added to ensure results from CPU softmax with attention and padding masks match the explicit slow calculation - `test_masked_softmax_devices_parity` is added to ensure results from masked softmax on CPU and CUDA match ## Note I had to replace `float` with `torch.get_default_dtype()` in a couple of tests for the following reason: - `test_nn.py` [sets the default type to `torch.double`](https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L24-L26) - If I execute `test_nn.py` and `test_transformers.py` in one `pytest` run, this default still holds for transformer tests - Some tests in `test_transformers.py` which were previously following the slow path now switched to fast path, and hard-coded `float` started clashing with default `double` Let me know if there is a better way around it - or maybe I'm not supposed to run tests with `pytest` like this Pull Request resolved: https://github.com/pytorch/pytorch/pull/87377 Approved by: https://github.com/mikekgfb, https://github.com/weiwangmeta, https://github.com/malfet	2022-10-31 19:59:36 +00:00
Kshiteej K	6735bf21c7	[test_nn] split convolution tests from test_nn (#87474 ) Ref #63085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87474 Approved by: https://github.com/albanD	2022-10-31 04:42:45 +00:00
Eddie Yan	c5cb6ec066	Allow 64bit indexing for channels-last upsample2d on CUDA (#87901 ) #81665 CC @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/87901 Approved by: https://github.com/ngimel	2022-10-28 19:33:42 +00:00
eqy	4c8e1a9829	Fix 64bit indexing in `vol2col` (#87527 ) Surfaced from #87354 CC @ngimel @ptrblck @maybeLee Pull Request resolved: https://github.com/pytorch/pytorch/pull/87527 Approved by: https://github.com/ngimel	2022-10-23 21:17:12 +00:00
Antonio Kim	6b59d9b566	Fix registration hooks (#87369 ) There is a bug in the implementation of the registration hooks introduced in https://github.com/pytorch/pytorch/pull/86148 whereby if the hook returns a tensor, then the short circuiting logic: ``` value = hook(self, name, value) or value ``` Raises an exception ``` RuntimeError: Boolean value of Tensor with more than one value is ambiguous ``` Fixing the logic so that it only checks to see if the value is `None` before overriding Fixes #85837 CC: @albanD @jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/87369 Approved by: https://github.com/albanD	2022-10-21 05:12:25 +00:00
Rui Zhu	4b757f4633	Assert if padding mask type is unexpected (#86353 ) (#87106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86353 Fix the issue described in https://github.com/pytorch/pytorch/issues/86120 Test Plan: buck test mode/opt caffe2/test:test_transformers -- test_train_with_long_type_pad Differential Revision: D40129968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87106 Approved by: https://github.com/malfet	2022-10-20 16:01:54 +00:00
Kshiteej K	54ee95c8ec	[nn] module: full_backward_pre_hook (#86700 ) Fixes https://github.com/pytorch/pytorch/issues/42824 * [x] Test * [x] Doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/86700 Approved by: https://github.com/soulitzer	2022-10-13 17:36:39 +00:00
CaoE	b79bac0e4d	Make the data types of output and input consistenst for batchnorm (#84410 ) The model TTS will crash due to the issue:: when input of BN is not contiguous and the data type of input is different with that of parameters, BN will raise error `RuntimeError: !needs_dynamic_casting<func_t>::check(iter) INTERNAL ASSERT FAILED at "xxx/pytorch/aten/src/ATen/native/cpu/Loops.h":311, please report a bug to PyTorch`. Make the data types of output and input consistenst for batchnorm to fix the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84410 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet	2022-10-13 00:42:46 +00:00
Antonio Kim	09a676f639	Add hooks for register_buffer/module/parameter (#86148 ) As described in the issue, this PR adds hooks to be run when `register_parameter`, `register_buffer` and `register_module` are called. Fixes #85837 cc @albanD @mruberry @jbschlosser @walterddr @kshitij12345 @saketh-are Pull Request resolved: https://github.com/pytorch/pytorch/pull/86148 Approved by: https://github.com/albanD	2022-10-12 20:57:22 +00:00
Nikita Karetnikov	d56017a14f	[primTorch] Add ref for `triplet_margin_loss`, improve `triplet_margin_with_distance_loss` (#85614 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85614 Approved by: https://github.com/lezcano, https://github.com/mruberry	2022-10-12 18:37:58 +00:00
Nikita Shulga	9eb4f9dd17	Tweak test tolerances to be compatible with A10G (#86538 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86538 Approved by: https://github.com/ngimel	2022-10-11 23:31:48 +00:00
Jerry Zhang	c12f829cce	[nn] Add remove_duplicate flag to named_buffers (#674 ) (#85903 ) Summary: X-link: https://github.com/pytorch/torchrec/pull/674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84984 this is to allow named_buffers to return the same buffer objects with different names multiple times, needed by internal use cases ghstack-source-id: 168589597 Test Plan: python test/test_nn.py -k test_buffers_and_named_buffers Imported from OSS Reviewed By: albanD Differential Revision: D39493161 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85903 Approved by: https://github.com/albanD	2022-10-11 18:49:09 +00:00
Kshiteej K	e18d466f35	[test_nn] split lazy_modules from test_nn (#86526 ) Ref: #63085 NOTE: We don't need an accompanying XLA PR as these tests run only on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86526 Approved by: https://github.com/albanD	2022-10-10 16:29:56 +00:00
Pearu Peterson	6b295cd046	Enable autograd on Linear with sparse COO weight (#86302 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86302 Approved by: https://github.com/cpuhrsch	2022-10-06 18:39:31 +00:00
Pearu Peterson	f104490d63	Support autograd on Linear with sparse compressed weight. (#86137 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86137 Approved by: https://github.com/cpuhrsch	2022-10-06 18:39:25 +00:00
Kshiteej K	6a5550fca4	[test_nn] split embedding tests from test_nn (#85892 ) Ref https://github.com/pytorch/pytorch/issues/63085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85892 Approved by: https://github.com/albanD	2022-09-30 21:45:40 +00:00
lezcano	787028cadb	Implement col2im decomposition and fix im2col and add a few preconditions (#85541 ) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/85541 Approved by: https://github.com/jansel	2022-09-30 09:31:53 +00:00
George Qi	85258ec17e	Add mask_type=2 to masked_softmax for when mask.size() == input.size() (#85915 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85915 Approved by: https://github.com/cpuhrsch	2022-09-29 23:13:37 +00:00
Masaki Kozuki	ef0baba23f	Use `int64_t` for nll_loss with cuda inputs (#85395 ) Related #85005 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85395 Approved by: https://github.com/t-vi, https://github.com/lezcano	2022-09-29 17:02:04 +00:00
Mikayla Gawarecki	afaee00fec	Add python `nested_tensor` and `as_nested_tensor` constructors in `torch.nested` (#85593 ) Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ). Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc. Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593 Approved by: https://github.com/drisspg, https://github.com/cpuhrsch	2022-09-28 20:15:02 +00:00
Weiyi Zheng	b2311192e6	[NN module] speed up _load_from_state_dict (#85743 ) Fixes #61398 The original implementation is very slow when the state_dict.keys() is long. This PR only passes relevant keys to the child module. existing test passes: `pytest test/test_nn.py -k state_dict` I couldn't figure out a good way to write a new test for this new behavior. Had a new snippet, but it will be flaky if integrated into the main CI because it's a timing based check. But I can verify that the test took 30s to run, after this PR it only takes 0.5s. ```python def test_load_state_dict_large(self): # construct a module with 4 levels of module, 10 linear each, leads to 10k items in the dictionary import copy import time base_module = nn.Linear(1,1) model = base_module for level in range(4): model = nn.Sequential(*[copy.deepcopy(model) for _ in range(10)]) state_dict = model.state_dict() self.assertEqual(len(state_dict), 20000) st = time.time() model.load_state_dict(state_dict, strict=True) strict_load_time = time.time() - st # it took 0.5 seconds to self.assertLess(strict_load_time, 10) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85743 Approved by: https://github.com/albanD	2022-09-28 15:26:03 +00:00
Eddie Yan	2bc82163eb	Reduce memory usage requirement of test_warp_softmax_64bit_indexing in test_nn.py (re-open of #85037 ) (#85373 ) CC @ngimel @xwang233 @ptrblck Adds fix for `get_tolerances`, tested locally on a dgx Volta. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85373 Approved by: https://github.com/ngimel	2022-09-22 07:34:47 +00:00
Mikayla Gawarecki	77f1f98479	Re-introduce `torch.Tensor.to_padded_tensor` (#85293 ) Differential Revision: [D39629004](https://our.internmc.facebook.com/intern/diff/D39629004) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85293 Approved by: https://github.com/cpuhrsch	2022-09-21 18:45:56 +00:00
PyTorch MergeBot	53fdd60635	Revert "Reduce memory usage requirement of `test_warp_softmax_64bit_indexing` in `test_nn.py` (#85037 )" This reverts commit 66a9cba221ac32658ea837e88b68b859a08378d0. Reverted https://github.com/pytorch/pytorch/pull/85037 on behalf of https://github.com/clee2000 due to broke test_warp_softmax_64bit_indexing_cuda_float32 and test_warp_softmax_64bit_indexing_cuda_float16 on rocm https://github.com/pytorch/pytorch/actions/runs/3085764744/jobs/4989643817	2022-09-20 00:13:41 +00:00
eqy	66a9cba221	Reduce memory usage requirement of `test_warp_softmax_64bit_indexing` in `test_nn.py` (#85037 ) For reference: #84944 CC @xwang233 @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85037 Approved by: https://github.com/ngimel, https://github.com/pmeier	2022-09-19 21:31:08 +00:00
Elias Ellison	f37069aac7	Re-enable fixed dynamo tests (#84969 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84969 Approved by: https://github.com/bdhirsh, https://github.com/ezyang	2022-09-16 15:36:52 +00:00
Michael Melesse	b6d6a78c12	[ROCM] test_batchnorm_cudnn_nhwc (#84603 ) This pr enables test_batchnorm_cudnn_nhwc. This is a follow up to https://github.com/pytorch/pytorch/pull/82512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84603 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2022-09-14 15:50:14 +00:00

... 4 5 6 7 8 ...

1592 Commits