pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Zachary DeVito	6a50b83b73	Expandable blocks in allocator (#96995 ) Common advice we give for handling memory fragmentation issues is to allocate a big block upfront to reserve memory which will get split up later. For programs with changing tensor sizes this can be especially helpful to avoid OOMs that happen the first time we see a new largest input and would otherwise have to allocate new segments. However the issue with allocating a block upfront is that is nearly impossible to correctly estimate the size of that block. If too small, space in the block will run out and the allocator will allocate separate blocks anyway. Too large, and other non-PyTorch libraries might stop working because they cannot allocate any memory. This patch provides the same benefits as using a pre-allocating block but without having to choose its size upfront. Using the cuMemMap-style APIs, it adds the ability to expand the last block in a segment when more memory is needed. Compared to universally using cudaMallocAsync to avoid fragmentation, this patch can fix this common fragmentation issue while preserving most of the existing allocator behavior. This behavior can be enabled and disabled dynamically. This should allow users to, for instance, allocate long-lived parameters and state in individual buffers, and put temporary state into the large expandable blocks, further reducing fragmentation. See inline comments for information about the implementation and its limitations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96995 Approved by: https://github.com/eellison	2023-04-14 09:49:11 +00:00
Huy Do	60631aefe5	Disable test_variable_sharing on ASAN due to non-deterministically hang (#97742 ) See https://github.com/pytorch/pytorch/issues/94024. I disabled this test on ASAN a while ago for this exact issue. The issue, unfortunately, was hard to reproduce and flaky bot closed it 3 weeks ago. ASAN job has been hanging flakily since then, i.e. `8313becefa`. I don't want to reopen the issue and forget about it after 2 weeks, so let's disable the test for ASAN and be at peace (for now). Interesting, there are other tests here also hanging on ASAN, i.e. `test_leaf_variable_sharing`: ``` # See https://github.com/pytorch/pytorch/issues/14997 @unittest.skipIf(TEST_WITH_ASAN, "non-deterministically hangs with ASAN") def test_leaf_variable_sharing(self): ``` I suspect that they have the same root cause. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97742 Approved by: https://github.com/clee2000	2023-03-29 01:18:44 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Huy Do	aac9e5288f	Increase test multiprocessing waiting time (#93183 ) Fixes https://github.com/pytorch/pytorch/issues/67002 This is a follow-up from https://github.com/pytorch/pytorch/pull/91459 which fixed the flaky test everywhere excepts ROCm and MacOS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93183 Approved by: https://github.com/clee2000	2023-01-28 07:59:59 +00:00
Jeff Daily	04689ae209	[CI][ROCm] skip multiprocessing tests that trigger hangs (#92101 ) Skip tests affected by #90940. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92101 Approved by: https://github.com/huydhn	2023-01-13 22:39:00 +00:00
Huy Do	dbd0d76515	Disable test_fs family for dynamo (#91459 ) This should help address https://github.com/pytorch/pytorch/issues/67002. At the end of these tests, any temp file `/dev/shm/torch_*` are cleaned up, but somehow it might take longer than 0.5s to finish causing the test to fail. So, the PR tries to increase this max waiting time to 5s while polling for the result every 0.5s as before ### Testing `pytest test_multiprocessing.py -k test_fs --verbose --flake-finder` to run `test_fs`, `test_fs_is_shared`, `test_fs_pool`, `test_fs_preserve_sharing`, and `test_fs_sharing` 50 times on a dynamo shard. All passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91459 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/atalman	2022-12-29 00:26:57 +00:00
Sergii Dymchenko	0ac0af02d5	Reland Fix issue 38095 TODO in test_multiprocessing.py (#90741 ) Fix TODO related to https://github.com/pytorch/pytorch/issues/38095 Reland of https://github.com/pytorch/pytorch/pull/90335 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90741 Approved by: https://github.com/clee2000	2022-12-15 05:32:27 +00:00
PyTorch MergeBot	465005c1e0	Revert "Fix issue 38095 TODO in test_multiprocessing.py (#90335 )" This reverts commit cbb2d5af81dcfaf181db7e9083b9c41b29fdb4eb. Reverted https://github.com/pytorch/pytorch/pull/90335 on behalf of https://github.com/clee2000 due to somehow caused test_multiprocessing to timeout `cbb2d5af81` https://github.com/pytorch/pytorch/actions/runs/3645873711/jobs/6159998523	2022-12-08 17:12:10 +00:00
Sergii Dymchenko	cbb2d5af81	Fix issue 38095 TODO in test_multiprocessing.py (#90335 ) Fix TODO related to https://github.com/pytorch/pytorch/issues/38095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90335 Approved by: https://github.com/clee2000	2022-12-08 06:27:08 +00:00
Jagadish Krishnamoorthy	f5bfa4d088	[ROCm] Enable test_multiprocessing tests (#82356 ) Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Issue fixed in ROCm 5.2 user space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82356 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/huydhn	2022-08-24 20:49:20 +00:00
Alban Desmaison	090eddf1c7	Fix MPS interaction with autograd engine Pull Request resolved: https://github.com/pytorch/pytorch/pull/77644 Approved by: https://github.com/kulinseth, https://github.com/soulitzer, https://github.com/seemethere	2022-05-17 21:26:16 +00:00
Pruthvi Madugundu	2469525c4c	[ROCm] Skipping few multiprocess test - Found it is failing on ROCm 5.1.1, will be enabled back as soon as it is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76402 Approved by: https://github.com/jeffdaily, https://github.com/seemethere	2022-04-27 17:56:36 +00:00
Jeff Daily	956a028b55	[ROCm] enable HIP IPC Enables code paths that use hipIpc* functions. Also enables test_multiprocessing.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74383 Approved by: https://github.com/osalpekar	2022-03-21 19:32:01 +00:00
Alban Desmaison	4d04ef62a1	Allow forking until a worker thread is created in autograd engine (#72689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72689 Fix https://github.com/pytorch/pytorch/issues/69839 Should we add a private python binding to check if the bad fork guard has been set and add test in CI to make sure that it is never set on our CPU-only CI build? Not sure how flaky that will be out of CI for people that run CPU build on a machine that cuda installed... EDIT: turns out, we already had such tests in test_multiprocessing. So should be tested and enforced now! Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D34180243 Pulled By: albanD fbshipit-source-id: 3284db52dcf4568362244b60e3c5657153e64fa4 (cherry picked from commit 6e23f7a33a065c2ab6a267b2c7f0ca97c24532ea)	2022-02-12 01:52:57 +00:00
Jane Xu	00a871c5c9	[skip ci] Set test owner for multiprocessing tests (#66848 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/66848 Reviewed By: VitalyFedyunin Differential Revision: D31828908 Pulled By: janeyx99 fbshipit-source-id: 45d6901648f5564c1bf07ad8d01d69ef486ae104	2021-10-21 13:13:53 -07:00
Kurt Mohler	5883523c1d	Remove dtype from torch.Storage and use only torch.ByteStorage (#62030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030 Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible Fixes https://github.com/pytorch/pytorch/issues/47442 * THE SERIALIZATION FORMAT IS FULLY FC/BC. We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today. * There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate. * As we no longer know what dtype of a storage is, we've removed the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes. * `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls. `Storage._new_with_file` and `_set_from_file` require explicit element size arguments. * It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor. * It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling. * The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall. To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage or your serialization code will degrade to standard file-based serialization. Original pull request: https://github.com/pytorch/pytorch/pull/59671 Reviewed By: soulitzer, ngimel Differential Revision: D29466819 Pulled By: ezyang fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e	2021-10-05 13:50:34 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Vasiliy Alekseev	bac4cfd54d	Fix mp serialization for integer nn.Parameter on CUDA (#56529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56342 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56529 Reviewed By: albanD Differential Revision: D27896094 Pulled By: ngimel fbshipit-source-id: fe817781eb7139ea57c78acfd56e7c11b61eb4ed	2021-04-22 16:21:04 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Rong Rong (AI Infra)	71766d89ea	[BE] unified run_process_no_exception code (#49774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49774 Reviewed By: janeyx99 Differential Revision: D25756811 Pulled By: walterddr fbshipit-source-id: 4d2b3bd772572764ff96e5aad70323b58393e332	2021-01-04 13:43:09 -08:00
Alexander Golynski	33b7970d9e	fix slow windows test (#49258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49258 Tested by adding `time.sleep(3) ` in SubProcess.run and see test print "test_inherit_tensor: SubProcess too slow" Sample failure: https://app.circleci.com/pipelines/github/pytorch/pytorch/249756/workflows/3605479e-1020-4325-9a4c-8bde5ae38262/jobs/9550663 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D25507209 Pulled By: agolynski fbshipit-source-id: ec808f0f658d0fb4c8447f68ec5ceba2aa66b1b5	2020-12-12 06:48:38 -08:00
Mike Ruberry	cb26661fe4	Throws runtime error when torch.full would infer a float dtype from a bool or integral fill value (#40364 ) Summary: BC-breaking NOTE: In PyTorch 1.6 bool and integral fill values given to torch.full must set the dtype our out keyword arguments. In prior versions of PyTorch these fill values would return float tensors by default, but in PyTorch 1.7 they will return a bool or long tensor, respectively. The documentation for torch.full has been updated to reflect this. PR NOTE: This PR causes torch.full to throw a runtime error when it would have inferred a float dtype by being given a boolean or integer value. A versioned symbol for torch.full is added to preserve the behavior of already serialized Torchscript programs. Existing tests for this behavior being deprecated have been updated to reflect it now being unsupported, and a couple new tests have been added to validate the versioned symbol behavior. The documentation of torch.full has also been updated to reflect this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40364 Differential Revision: D22176640 Pulled By: mruberry fbshipit-source-id: b20158ebbcb4f6bf269d05a688bcf4f6c853a965	2020-06-23 23:27:22 -07:00
Vitaly Fedyunin	7bf1dd582a	Fix Cuda IPC deadlock (#40347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40347 Fixes: #39541 Fixes: #25301 Differential Revision: D22152662 Test Plan: Imported from OSS Pulled By: VitalyFedyunin fbshipit-source-id: 82548aa4c937e0260932244e78cb132bcb3209b3	2020-06-22 20:50:25 -07:00
Mike Ruberry	13120bf677	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21740237 Pulled By: mruberry fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042	2020-05-27 06:31:07 -07:00
Rohan Varma	63e545e0fe	Revert D21717199: [pytorch][PR] Updates assertEqual to require atol and rtol, removes positional atol Test Plan: revert-hammer Differential Revision: D21717199 Original commit changeset: 9feb856f94ee fbshipit-source-id: bfde9c39a5ce99f0ca6183a7dde703c65b7c8259	2020-05-26 18:23:59 -07:00
Mike Ruberry	6ddca30b2d	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21717199 Pulled By: mruberry fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a	2020-05-26 08:30:23 -07:00
Vitaly Fedyunin	57d01be92b	Replacing assertEqual with assertEqualIgnoreType wherever types missmatch (#38102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38102 Test Plan: Imported from OSS Differential Revision: D21477060 Pulled By: VitalyFedyunin fbshipit-source-id: 25e0fd837ca9bfccf0ce994c80f7790c894096d4	2020-05-09 14:48:55 -07:00
Ailing Zhang	9232356e5f	remove uses of type() and type_as() part 1. (#38029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38029 Differential Revision: D21468523 Pulled By: ailzhang fbshipit-source-id: 14b7185d43eb03f630cfaa2d70e02d637ff8551b	2020-05-08 08:16:24 -07:00
David Reiss	e75fb4356b	Remove (most) Python 2 support from Python code (#35615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well (though using side-by-side view and ignoring whitespace change might be helpful). Test Plan: CI Differential Revision: D20842886 Pulled By: dreiss fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed	2020-04-22 09:23:14 -07:00
Vitaly Fedyunin	877ab3afe3	Better handing of Autograd+Fork errors. (#33885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33885 Fixes: #32835 Fixes: #5834 Can not combine with CUDA's implementation as each of them requires individual `std::once_flag` as well as different `forked_autograd_child` functions. CUDA version relays to python module, autograd uses TORCH_CHECK to report error to python and cpp. Test Plan: Imported from OSS Differential Revision: D20144024 Pulled By: VitalyFedyunin fbshipit-source-id: e7cf30568fff5110e9df7fe5b23f18ed992fa17f	2020-02-27 16:07:29 -08:00
Pritam Damania	f050b16dd9	Move pytorch distributed tests to separate folder for contbuild. (#30445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445 Create distributed and rpc directories under caffe/test for better management of unit tests. Differential Revision: D18702786 fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606	2020-01-22 21:16:59 -08:00
Pritam Damania	1250acef90	Disable tsan for test_multiprocessing. (#27410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27410 Similar to https://github.com/pytorch/pytorch/pull/25005, TSAN is not safe to use in a multi-threaded program with fork and can cause deadlocks. As a result, disabling this test for TSAN. ghstack-source-id: 91393545 Test Plan: buildbot Differential Revision: D17775141 fbshipit-source-id: 109b8095240ad43ee4a6380f70b9efca863c0a4a	2019-10-07 11:29:04 -07:00
Guanheng Zhang	b22adeb007	Fix error message for a wrong fork CUDA (#23322 ) Summary: Re-land https://github.com/pytorch/pytorch/pull/23030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23322 Differential Revision: D16469442 Pulled By: zhangguanheng66 fbshipit-source-id: 70b63ab6265efa3f289111ef0ce46bb3c0d353bc	2019-07-25 12:58:14 -07:00
Edward Yang	1f608d09cf	Revert D16440000: [pytorch][PR] Re-land "Fix error message for a wrong fork CUDA" Differential Revision: D16440000 Original commit changeset: e05683275522 fbshipit-source-id: b688f24c1e6d3d8f63c2d415262a3f0ab1b85914	2019-07-24 12:05:36 -07:00
Guanheng Zhang	aa660b8eb7	Re-land "Fix error message for a wrong fork CUDA" (#23209 ) Summary: Re-land https://github.com/pytorch/pytorch/pull/23030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23209 Differential Revision: D16440000 Pulled By: zhangguanheng66 fbshipit-source-id: e05683275522835a33d5a7e6d76b7e94774e4d98	2019-07-24 07:01:04 -07:00
Jesse Hellemn	06d11f0434	Revert D16368004: [pytorch][PR] Fix error message for a wrong fork CUDA Differential Revision: D16368004 Original commit changeset: 44b6977790ce fbshipit-source-id: c81a232bd52219e56a19c64650c4b6dedeb167cb	2019-07-22 18:46:48 -07:00
Guanheng Zhang	a6e45a69a8	Fix error message for a wrong fork CUDA (#23030 ) Summary: Fix https://github.com/pytorch/pytorch/issues/17357 Unblock 1.2 release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23030 Differential Revision: D16368004 Pulled By: zhangguanheng66 fbshipit-source-id: 44b6977790ce768efa4777bae41d4b26dae5f288	2019-07-22 15:04:32 -07:00
Soumith Chintala	2e029db2f9	fixes multiprocessing serialization for integer nn.Parameter (#18639 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18639 Differential Revision: D14711565 Pulled By: soumith fbshipit-source-id: 0063ed138a215b95d6571dcd68b18569714abe19	2019-04-01 17:15:42 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Vitaly Fedyunin	5653a914f7	Implement reference counting for shared IPC CUDA tensors (#16854 ) Summary: This is to fix #16141 and similar issues. The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage. ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854 Differential Revision: D13994490 Pulled By: VitalyFedyunin fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1	2019-03-25 10:24:38 -07:00
Edward Yang	ba81074c40	Fix B902 lint error: invalid first argument. (#18181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18181 ghimport-source-id: 9c23551584a1a1b0b7ac246367f3a7ae1c50b315 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18184 Fix B903 lint: save memory for data classes with slots/namedtuple * #18181 Fix B902 lint error: invalid first argument. * #18178 Fix B006 lint errors: using mutable structure in default argument. * #18177 Fix lstrip bug revealed by B005 lint A variety of sins were committed: - Some code was dead - Some code was actually a staticmethod - Some code just named it the wrong way - Some code was purposely testing the omitted case Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14530876 fbshipit-source-id: 292a371d9a76ddc7bfcfd38b6f0da9165290a58e	2019-03-21 09:10:28 -07:00
Edward Yang	84c30398c7	Fix lint in test_multiprocessing. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18016 Reviewed By: eellison Differential Revision: D14458177 fbshipit-source-id: f17b3e06223ab399e9ce24be6988effe04dad636	2019-03-14 09:58:13 -07:00
Stefan Krah	5ea6344c54	Skip test_event_handle_multi_gpu() on a single GPU system (#17402 ) Summary: This fixes the following test failure: ``` ====================================================================== ERROR: test_event_handle_multi_gpu (__main__.TestMultiprocessing) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_multiprocessing.py", line 445, in test_event_handle_multi_gpu with torch.cuda.device(d1): File "/home/stefan/rel/lib/python3.7/site-packages/torch/cuda/__init__.py", line 229, in __enter__ torch._C._cuda_setDevice(self.idx) RuntimeError: cuda runtime error (10) : invalid device ordinal at /home/stefan/pytorch/torch/csrc/cuda/Module.cpp:33 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17402 Differential Revision: D14195190 Pulled By: soumith fbshipit-source-id: e911f3782875856de3cfbbd770b6d0411d750279	2019-02-23 08:29:36 -08:00
Shen Li	898329c3f9	Unify device() return type in Stream, Event, and Tensor (#16150 ) Summary: Addresses one future work item in #15937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16150 Differential Revision: D13732299 Pulled By: mrshenli fbshipit-source-id: 4d0b35df573a3bf92dea6e2e7eb42fe8bac77b18	2019-01-19 23:01:31 -08:00
Shen Li	24f4d3987e	Move all Stream and Event Python implementation to C++ (#15937 ) Summary: 1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation. 2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++ 3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937 Differential Revision: D13649001 Pulled By: mrshenli fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240	2019-01-17 07:29:22 -08:00
Edward Yang	1989157eb6	Disable test_leaf_variable_sharing on ASAN runs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15001 Reviewed By: orionr Differential Revision: D13399119 fbshipit-source-id: 6b1d098e55a67b1f5bc6d08a8ee3c1be8234a654	2018-12-10 10:43:05 -08:00
Ailing Zhang	be47470c91	Fix cuda multiprocessing cached memory (#14736 ) Summary: This PR fixes #11422 In the old world of CUDA IPC, when we want to share a tensor T from A to B, we have to share the whole CUDA mem allocation where T's storage sit in. And we casted it to the same type of storage of T's. This causes problem when two different types of storage got allocated to the same CUDA mem block. When we try to reconstruct the second tensor, it will complain about wrong storage type. In this PR we reconstruct the storage only (not the entire mem block). However, CUDA only allows one open memHandle once per process, we have to save the device pointer in a global cache so that we can reconstruct tensors as they come. Thanks a ton to ezyang who helped design the solution and debugged the issue! Pull Request resolved: https://github.com/pytorch/pytorch/pull/14736 Differential Revision: D13335899 Pulled By: ailzhang fbshipit-source-id: cad69db392ed6f8fdc2b93a9dc2899f6d378c371	2018-12-05 10:55:43 -08:00
Tongzhou Wang	8ad69a80e3	Test scripts only run cases defined in the running script (#13250 ) Summary: 1. Refactors `TestTorch` into `TestTorchMixin` (subclass of `object`) and `TestTorch` (subclass of `TestCase`, MRO `(TestCase, TestTorchMixin)`, only defined if `__name__ == '__main__'`). So other scripts won't accidentally run it. 2. Adds an assertion in `load_tests` that each script only runs cases defined in itself. cc yf225 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13250 Differential Revision: D12823734 Pulled By: SsnL fbshipit-source-id: 7a169f35fe0794ce76e310d8a137d9a3265c012b	2018-10-29 13:57:40 -07:00

1 2

97 Commits