pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-05 16:44:58 +08:00

Author	SHA1	Message	Date
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Mike Ruberry	2f2a0d1607	Disables test_atomic_ops and testInputOrder (#29145 ) Summary: These tests have been flaky for some time, see: - https://github.com/pytorch/pytorch/issues/28179 - https://github.com/pytorch/pytorch/issues/9064 This PR disables them. The actual tests were added/updated 2+ years ago. It's unclear who, if anyone, would own them now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29145 Differential Revision: D18327937 Pulled By: mruberry fbshipit-source-id: d02731d662aff3545b581272e5ae8db4e3097d87	2019-11-05 16:53:53 -08:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Kevin Wilfong	d072701547	Caffe2: Refactor the core logic from data_workers.py into parallel_workers.py Summary: data_workers.py provides a really nice, easy way to run background threads for data input. Unfortunately, it's restrictive, the output of the fetcher function has to be a numpy array. I pulled out that core nice thread management into parallel_workers, and updated the classes data_workers to extend those classes. The main change was refactoring out most of the queue handling logic into QueueManager. This way parallel_workers can be used to manage background threads without having to use the queue for output. Reviewed By: akyrola Differential Revision: D5538626 fbshipit-source-id: f382cc43f800ff90840582a378dc9b86ac05b613	2017-08-07 10:14:08 -07:00
Aapo Kyrola	87275817a4	fix a rare race condition by initializing scratch blobs beforehand Summary: Data workers test timeouts randomly (very seldom), and looks like the reason is that we call FeedBlob in a thread (eneuque-thread), and first time that is called, it will call workspace.CreateBlob() -- which is not thread safe. Fix this by initializing the scratch blobs explicitly. Reviewed By: panshen1 Differential Revision: D5292426 fbshipit-source-id: d7dad68f3ccc636c60bd82b2527f00f20da298b5	2017-06-26 10:18:18 -07:00
Zheng Yan	8f1e641d5f	Deprecate CNNModelHelper in python/data_workers_test.py Summary: Deprecate CNNModelHelper in python/data_workers_test.py Reviewed By: harouwu Differential Revision: D5312089 fbshipit-source-id: 37b72ac2031acf14a7e6a6ea0a298b71b00b10dd	2017-06-23 14:46:58 -07:00
Aapo Kyrola	aa603a9083	add test for input order Summary: Based on jay-mahadeokar's code, add a test for input order consistency to data workers. Reviewed By: jay-mahadeokar Differential Revision: D5096887 fbshipit-source-id: efd226343f81e9a0157ec89d4588f1eee8a78549	2017-05-19 23:46:38 -07:00
Aapo Kyrola	1a831ce8f2	Add direct enqueuing to enable RNN input, allow specify batch columns Summary: Add a parameter dont_rebatch to data_workers. This disables batching of input from fetcher to equal-batch size chunks. This is not desired with RNNs where with longer sequence length we might want to have smaller batches etc. For some reason the graceful-shutdown test interfered with other tests, so I removed it. Reviewed By: jay-mahadeokar Differential Revision: D4988549 fbshipit-source-id: cbab46d77c948f2e293e79e6eb538dde17d800ee	2017-05-03 14:49:44 -07:00
Aapo Kyrola	9215afef7d	Allow stopping of specific data workers + specify c2 queue size Summary: Now you can call coordinator.stop_coordinator("train") to stop the train model's data input and release its memory. Reviewed By: rpenggithub Differential Revision: D4955014 fbshipit-source-id: c1bc3ec67337b94aff8ea9b306c3b4158eeef42c	2017-04-26 11:18:40 -07:00
Luke Yeager	f768233a1c	Fix a data_workers test Summary: This is a global variable which can be incremented by other tests. Before: ``` $ pytest -v caffe2/python/data_workers_test.py ... caffe2/python/data_workers_test.py::DataWorkersTest::testGracefulShutdown PASSED caffe2/python/data_workers_test.py::DataWorkersTest::testNonParallelModel FAILED ============================================= FAILURES ============================================== _______________________________ DataWorkersTest.testNonParallelModel ________________________________ self = <data_workers_test.DataWorkersTest testMethod=testNonParallelModel> def testNonParallelModel(self): model = cnn.CNNModelHelper(name="test") coordinator = data_workers.init_data_input_workers( model, ["data", "label"], dummy_fetcher, 32, 2, ) > self.assertEqual(coordinator._fetcher_id_seq, 2) E AssertionError: 4 != 2 caffe2/python/data_workers_test.py:38: AssertionError ----------------- Closes https://github.com/caffe2/caffe2/pull/211 Differential Revision: D4916591 Pulled By: Yangqing fbshipit-source-id: 281f12d7f02dbd0ce0932024cf1f16cd12130112	2017-04-20 11:38:11 -07:00
Aapo Kyrola	449f8997ab	close blobs queues when stopping + test Summary: Mysterious deadlocks after epoch has finished have occured randomly but quite frequently recently for myself, vigneshr and others. Looking at a stack trace of vigneshr's job (P57129798), I noticed a couple of threads were calling BlobsQueue.blockingWrite (or something like that). That call stucks when the caffe2/c++ side queue is at capacity (we use capacity of 4 with data workers). So in cases when this call was just being made while the script was to be terminated, the thread did not close and the whole process did not close either (not completely sure why that is since thread is a daemon thread, but this might be a flow-related issue since we run inside a flow container). This is quite easy to fix: just call CloseBlobsQueue() when terminating the process. I modified coordinator.stop() and wait_for_finish() to return a status code based on whether threads that were joined actually closed within the 1.0sec timeout. This allowed creating an unit test to test for this issue. Before my change, the unit test failed. Reviewed By: pietern Differential Revision: D4619638 fbshipit-source-id: d96314ca783977517274fc7aadf8db4ee5636bdf	2017-02-27 10:07:57 -08:00
Aapo Kyrola	35fa9e9c5f	a couple small reliability improvements Summary: A couple of more misc changes: - allow starting the coordinator multiple times -- this makes data parallel programming easier - make the fetcher id a global sequence, before each gpu had same ids for workers - my flow jobs got stuck when joining the fetcher threads. I think there is actually a memory fencing problem with the is_active boolean. But I am too tired to add proper condition variables there. Instead just add timeout to join(). It is needed anyway since some i/o thread could get blocked. Differential Revision: D4333381 fbshipit-source-id: 88226c8a9c9a5e05d771360a502a2ba21a6b9d76	2016-12-15 21:29:29 -08:00
Aapo Kyrola	0b52b3c79d	Generalize threaded data input via queues + Everstore input Summary: Xray sampler (originally by ajtulloch) and prigoyal's resnet trainer use variants of the threaded data input where worker threads put stuff into a python queue that is drained by an enqueuer thread that dumps those batches to a Caffe2 queue, that is then drained by the net's DequeueBlobs operator. There is a lot of boilerplate, which is also quite complicated. This diff is an attempt to generalize that general stuff under a new module "data_workers" (name could be improved). Basically you pass it a function that is able to return chunks of data (usually data + labels). I also created a module 'everstore_data_input' which generalizes everstore-origin data input with preprocessing function (image augmentation , for example). See how I refactored sampler.py for the usage. Next we could create fetcher function for Laser data. Differential Revision: D4297667 fbshipit-source-id: 8d8a863b177784ae13940730a27dc76cd1dd3dac	2016-12-15 12:01:30 -08:00

14 Commits