110 Commits

Author SHA1 Message Date
24d92b5d9f Concatenate directly into shared memory when constructing batches (#1323)
This saves an extra memory copy, which speeds up data loading a bit
(5-10% with accimage).

As part of this change:

 * torch.cat accepts keyword argument out
 * sepcifiying out=None is treated like not specifying out
2017-04-22 03:40:30 -04:00
f17cfe4293 sparse tensor operations (#735) 2017-03-03 18:37:03 +01:00
6464e69e21 Docs for torch.Storage (#475) 2017-01-18 03:22:30 -05:00
24af02154c Use ForkingPickler for sharing tensor/storages across processes (#344)
This hooks into the (internal) ForkingPickler class in multiprocessing
to reduce tensors, storages, and CUDA events instead of our queue from
joblib. This makes it easier to use the standard multiprocessing classes
in later versions of Python.

This also exposes:

 - Tensor/Storage.share_memory_()
 - Module.share_memory()

These methods move the CPU tensors and storages to shared memory. If
you're using the "fork" method of multiprocessing, these objects can be
directly inherited instead of serialized through a queue.
2016-12-28 20:34:23 -05:00
1af9a9637f Refactor copy and release GIL during copy (#286) 2016-12-11 21:54:58 +01:00
2fd78112ab Add half copy/conversions 2016-11-17 14:34:33 -08:00
ee14cf9438 Add support for pinned memory: (#127)
torch.Storage/Tensor.pin_memory()
 torch.Storage/Tensor.is_pinned()
2016-10-15 18:38:26 -04:00
cb5d4e836f Lazy load CUDA and THNN modules (#64) 2016-09-28 19:29:53 -04:00
1828e7c42f Add async CUDA copy 2016-09-27 15:12:48 -07:00
8fdec15a55 Codemod to remove camel case method naming 2016-09-20 08:40:28 -07:00