Compare commits

...

1616 Commits

Author SHA1 Message Date
0b92e5c9ed fix static linkage and make THD statically linked 2017-08-28 10:41:55 -04:00
df44c571c6 increase test subprocess timeout 2017-08-25 09:00:36 -07:00
ed03f74043 fix leaking symbols in THNN 2017-08-25 09:00:35 -07:00
c8d8803b90 Remove unnecessary moves in convolution autograd. 2017-08-25 09:00:35 -07:00
750245f990 Remove unnecessary moves, avoid IncRef/DecRef of PyBools. 2017-08-25 09:00:35 -07:00
fbf573a6b8 Properly pass saved_for in BatchNorm/Conv as the relevant Backward function.
Previously, these Functions passed themselves, i.e. the saved_for from
ConvForward would be ConvForward.
2017-08-25 09:00:35 -07:00
9bda6dee8f Add AutoGPU guard and properly reference Python args from BatchNormBackwardBackward. 2017-08-25 09:00:35 -07:00
e02f7bf8a3 Update autograd notes (#2295) 2017-08-04 20:28:04 -04:00
7ea48eaf7a cuda 7.5 fix for gloo 2017-08-04 02:25:16 -04:00
d278a14141 Fix ZeroPad2d backwards with negative pads. 2017-08-03 21:16:37 -04:00
6b1ca4b4b6 variable shape error of LSTMCell, GRUCell (#2289) 2017-08-03 21:16:06 -04:00
65ddaf13a9 Improve cuDNN weight layout test 2017-08-03 02:06:27 -04:00
96156013c3 Make sure deserialized RNN modules have _data_ptrs too 2017-08-03 02:06:21 -04:00
a997cdbb25 Fix BatchNorm double backwards when training=False.
Changes for v.0.2.0 around using shared_ptrs rather than at::Tensors.
2017-08-03 10:47:34 +05:30
8db9df94b6 Merge commit '74e5328b03634e163df65d6c6877c6f03387b536' 2017-08-02 22:51:17 -04:00
6c9e3334b1 Merge commit '70c95dbe52102d70facf7fc5d31cb8bd9ae860d9' 2017-08-02 22:50:52 -04:00
b33f232678 disable cudnn when output_padding >= stride or dilation 2017-08-02 22:48:03 -04:00
058f50aa50 fix shape and correctness bugs in autograd/convolution BackwardBackward 2017-08-02 22:48:03 -04:00
8b06efea7a remove dead code for python ConvNd (moved to C already) 2017-08-02 22:48:03 -04:00
52b7a49b37 enable cudnn transposed dilated 2017-08-02 22:48:03 -04:00
47f4d549e0 refactoring the THNN calls in autograd/convolution.cpp to be more compact 2017-08-02 22:48:03 -04:00
5b6d1837c7 enable dilated transpose and gradgrad tests 2017-08-02 22:48:02 -04:00
69642d4423 add THNN bindings for DilatedConvTranspose in autograd/convolution 2017-08-02 22:48:02 -04:00
70c95dbe52 fix Conv3d non-contiguous weight bug 2017-08-02 22:47:09 -04:00
74e5328b03 remove limitations on output_padding in Conv* routines 2017-08-02 22:46:24 -04:00
814b65df4f remove limitations on output_padding in Conv* routines 2017-08-02 22:46:04 -04:00
a565b77791 add 2d and 3d dilated full Convolution 2017-08-02 22:44:59 -04:00
6e6dca001c add 2d and 3d dilated full Convolution 2017-08-02 22:44:44 -04:00
daf5b20cd7 Add tests that gradcheck grad sizes match input size and fix advanced indexing
case that fails check.
2017-08-02 07:13:01 +05:30
515efdab5d add reentrancy checking for gradcheck. 2017-08-02 07:13:01 +05:30
f9f98daf11 Remove save_mean/save_var from BatchNorm double backwards, as it's not needed.
These could cause a problem with double backwards because they were std::move'd in
Backward.
2017-08-02 07:13:01 +05:30
2ac1003228 Implement LogSoftmax (v.0.2.0) (#2265) 2017-08-01 14:32:05 +05:30
141224ad7c Implement SoftMax and NLLLoss double backwards. (#2233)
* Implement SoftMax and NLLLoss double backwards.

* Update legacy ClassNLLCriterion to add ignore_index.

* Fix serialization of legacy ClassNLLCriterion with ignore_index.
2017-07-30 09:02:04 +05:30
ac76ab5fca Increase tol. for float tensor qr big test.
test_FloatTensor_qr_big test is still a bit flaky on K80. Increasing tolerance to improve reliability as tests are moved around and results change for this test.
2017-07-27 14:23:06 -04:00
04f31aa034 Improve Variable.retain_grad 2017-07-27 20:36:14 +05:30
ae59e008cd add retain_grad method, to variable, so gradient gets stored during backpop, on non-user variables 2017-07-27 20:36:14 +05:30
e25b3d7bc5 replace lon glong types with size_t (#1267)
Work around bug in msvc compiler in win32 mode
2017-07-27 19:13:56 +05:30
925208af72 Implement BatchNorm double backwards (#2207)
* Implement BatchNorm double backwards as a python function called directly from C++.

This will be converted to C++ code once ATen is integrated with autograd.

* Some performance improvements via inplace ops and reusing calculations.
2017-07-27 06:00:31 +05:30
643f8d12ff [bugfix] in bce_with_logits logsumexp calculation (#2221)
* fix bug in bce_with_logits logsumexp calculation

* flake8 fix
2017-07-27 05:58:56 +05:30
fb8f9de498 fix for ATen API Change 2017-07-26 18:55:56 -04:00
cb9ad7a892 Opt into Trusty builds. (#2214)
* Opt into Trusty builds.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Bump to 2.7.9.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-27 04:04:57 +05:30
f7de7bab6e Merge commit 'fd97d92479e32e550866adfd1f0465e4cfa5e581' 2017-07-26 18:11:16 -04:00
fd97d92479 allow retain to be specified for unsafeTensorFromTH 2017-07-26 14:58:32 -07:00
f3aa97f169 Deduplicate THPUtils_checkLong/THPUtils_unpackLong (#2218)
There were two implementations of THPUtils_checkLong/THPUtils_unpackLong; one
that was a macro and one that was not, which is hella bad if you accidentally
include the macro before the real definition.  Now we always use the inline
function.

A reasonable follow-up task would be to un-macro-ify the rest of these functions.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-27 03:12:12 +05:30
b0648fc3fc Merge commit 'be9ef9283f297997afd3bf8e21147ec6bf09ebbf' 2017-07-26 17:25:39 -04:00
be9ef9283f Merge pull request #35 from ezyang/pr/undefined-dim-doc
Note [Undefined-dim versus 0-dim]
2017-07-26 12:42:33 -07:00
9c0d52a32f fix osx build errors related to long/int64_t 2017-07-26 12:36:25 -07:00
54545c2154 Note [Undefined-dim versus 0-dim]
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-26 12:34:13 -07:00
9ec7051442 Remove __func__ hack in auto nn. 2017-07-26 15:28:25 -04:00
2676c6357f Enable Conv groups gradgradchecks. (#2216) 2017-07-27 00:24:12 +05:30
ef3b09fb5f fix a bug where some scalars were getting truncated to integers incorrectly. 2017-07-25 14:27:16 -07:00
f194ac1e09 Merge pull request #477 from wickedfoo/feature_lp_pooling
GPU implementation of L_p feature pooling
2017-07-26 02:31:59 +05:30
26a0b9aa43 Merge pull request #1259 from wickedfoo/feature_lp_pooling
CPU implementation of L_p feature pooling
2017-07-26 02:31:50 +05:30
e548580f31 Add missing models to torch vision documentation (#2204) 2017-07-26 01:58:18 +05:30
421607a935 DataParallel device_ids slicing fixes (#2200) 2017-07-26 01:54:38 +05:30
7be545292d Update cudnn.py 2017-07-25 09:35:44 -04:00
a0e83280ef Update cudnn.py 2017-07-25 09:35:44 -04:00
aa35be2032 search for cudnn in conda 2017-07-25 09:35:44 -04:00
626840aef3 C function wrapper uniqueness (#1912)
* add SharedFunctionMaker to create Function shared in the graph

* Clean shared_ptr usage for only function that will be used in the graph

* make Function binding match Varible one

* remove unnecessary changes

* fix comments

* proper weakref implementation

* add call to clear in dealloc
2017-07-25 13:12:54 +05:30
bcea678e7b Update rebased functions to call apply. 2017-07-25 07:37:25 +05:30
1a52ca02ef Always return indices from MaxPool autograd functions to simplify implementation;
The callers (in functional.py) will filter out the return instead.
2017-07-25 07:37:25 +05:30
84314859af Implement double backwards for MaxPool2d. 2017-07-25 07:37:25 +05:30
9c2beb33c5 Implement double backwards for MaxPool1d. 2017-07-25 07:37:25 +05:30
7deba74969 Implement MaxPool{1d,2d,3d}Backwards (non-differentiable) functions. 2017-07-25 07:37:25 +05:30
48bb07a4db Implement double backwards for AvgPool3d. 2017-07-25 07:37:25 +05:30
bb86ed7b97 Implement double backward for AvgPool1d, AvgPool2d, LPPool2d. 2017-07-25 07:37:25 +05:30
291369ff1b Convert pooling functions to new-style, once_differentiable functions. 2017-07-25 07:37:25 +05:30
2118400e18 Fix lint. 2017-07-25 07:37:25 +05:30
39934da8b3 Address review comments. 2017-07-25 07:37:25 +05:30
c12b494329 Implement double backwards for ELU. 2017-07-25 07:37:25 +05:30
506d52dc33 Add check_gradgrad=False for new NLLLoss2d test. 2017-07-25 07:37:25 +05:30
7687c2677a Fix double backwards advanced indexing derivative wrt grad_output.
Also small legacy nn test issue and unrelated syntax issue.
2017-07-25 07:37:25 +05:30
97d21e243b Implement L1Cost double backwards. 2017-07-25 07:37:25 +05:30
0bda56956e Implement double backwards for auto-generated HardTanh. 2017-07-25 07:37:25 +05:30
40af93bb57 Optimize PReLU double backwards via a PReLUBackwards autograd function. 2017-07-25 07:37:25 +05:30
9608e37969 Implement double backwards for PReLU. 2017-07-25 07:37:25 +05:30
ec7c510557 Implement Softsign double backwards. 2017-07-25 07:37:25 +05:30
8636be3880 Ensure gradients wrt grad_outputs are checked in gradgradcheck. 2017-07-25 07:37:25 +05:30
fb2284f3a0 Add gradgrad checks for NN module and criterion tests. 2017-07-25 07:37:25 +05:30
9ec9dee27d Implement NN Criterion functions as potentially double backwards functions. 2017-07-25 07:37:25 +05:30
7b6aab9079 Unify implementation of _Loss and _WeightedLoss autograd functions. 2017-07-25 07:37:25 +05:30
852dd5f011 Convert _WeightedLoss functions to new style autograd functions. 2017-07-25 07:37:25 +05:30
085abee444 Rebase kl_div changes. 2017-07-25 07:37:25 +05:30
48b85fe012 Implement THNN non-criterion Functions as new style with backward/backward. 2017-07-25 07:37:25 +05:30
45ce4df74c Convert auto nn Functions (non-criterion) to new style. 2017-07-25 07:37:25 +05:30
5695cbf986 Add comments in loss.py and distance.py (#2189)
* Add examples in CrossEntropyLoss

1. Added examples in CrossEntropyLoss
2. Make consistent style of example for PyTorch docs
3. Delete unnecessary character '

* Change comments in distance.py

1. Delete x1, x2 from arguments and add eps in PariwiseDistance
2. For the shape, added input1 and input2 for readability (PairwiseDistance and CosineSimilarity.

* Add examples

Added the word 'examples' for PyTorch docs
2017-07-25 07:36:28 +05:30
03df5debe3 Gloo fixes for Linux + old cmake (2.8.0) + old glibc (CentOS6) 2017-07-24 21:59:58 -04:00
2ebdef0154 Add 'torch/lib/gloo/' from commit '1978bba3e421eceab6181bcbc838553091cedecc'
git-subtree-dir: torch/lib/gloo
git-subtree-mainline: ceb4f84d12304d03a6a46693e54390869c0c208e
git-subtree-split: 1978bba3e421eceab6181bcbc838553091cedecc
2017-07-24 21:59:49 -04:00
ceb4f84d12 Improve memory usage of cuDNN RNN modules (#2179) 2017-07-25 04:00:17 +05:30
112728cbe9 reformulate bce_with_logits to not use abs (#2195)
* reformulate bce_with_logits to not use abs

* flake8 fixes
2017-07-25 03:46:27 +05:30
dc17fb68e4 Fix minor bug in parallel_apply (#2193) 2017-07-25 03:45:00 +05:30
4a4d8841e6 Delete unused import 2017-07-23 12:48:11 -04:00
3c275fe7a0 Increase flaky test tolerance (#2185) 2017-07-22 11:37:34 -04:00
1978bba3e4 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:12 -07:00
35757af6f7 Add broadcasting of weights to bce/bce_with_logits (#2161)
* added tests + removed explicit expand of weight in bce with logits

* add auto broadcasting of weight to BCELoss

* remove the need for _BCELoss

* formatting of warning

* remove TODO

* move across assert from _functions/thnn/loss.py

* flake8 fixes
2017-07-21 16:02:07 -04:00
8ab3d214d5 Fixes for DistributedDataParallel (#2168) 2017-07-21 16:00:46 -04:00
ec2def803b Merge commit '2efac3ed83a29f57f914e9044fdddd2ce7ecd6b7' 2017-07-21 15:58:23 -04:00
71ce3448d9 Fix torch.inverse when magma is not available
Fixes #2156
2017-07-21 15:57:43 -04:00
2efac3ed83 Fix torch.inverse when magma is not available
Fixes #2156
2017-07-21 15:57:25 -04:00
66bbe5d75a .creator -> .grad_fn in the code example (#2171) 2017-07-21 14:43:16 -04:00
ea607afd06 Add comments in nn.Upsample (#2175) 2017-07-21 14:34:58 -04:00
4f035f14de Add a support matrix for distributed backends 2017-07-21 14:19:46 -04:00
72e9e7abf7 Warning squash.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-21 14:13:11 -04:00
4d45ce7d11 Added UpSampling module and associated tests. 2017-07-21 12:25:50 +01:00
eed323c344 avoid warning 2017-07-20 10:59:56 -07:00
ea6f9a26b8 fix version number 2017-07-20 13:30:53 -04:00
3719b4247a return a sentinel value when THTensor has undefined dimensions. 2017-07-20 10:25:30 -07:00
bf1fc250d1 get conda root dir automatically, trick from Dockerfile 2017-07-20 11:02:30 -04:00
47942307b5 Comment that data of THStorage may be NULL.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-20 10:55:35 -04:00
6b69723d4f Document how Numpy memory management works.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-20 10:55:35 -04:00
5254846bb2 fix typo of error msg of cmul in THSTensorMath (#2158) 2017-07-20 02:58:54 -04:00
f3f478960e Convert Embedding to new style. (#1916)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-20 02:35:21 -04:00
e537023147 add functional embedding (#1987) 2017-07-20 01:53:37 -04:00
09abaa2189 make keepdim backcompat warnings emit in autograd as well (#2157) 2017-07-20 01:48:05 -04:00
575a4a98e0 Remove assertions with side effects 2017-07-20 01:45:57 -04:00
02e23f4f6b Unify argument names in tensor and Variable methods 2017-07-20 01:45:57 -04:00
8946502348 Accept all kinds of arguments in Variable.expand 2017-07-20 01:45:57 -04:00
e708de37cc Allow keyword args in long_arg options 2017-07-20 01:45:57 -04:00
4af40e3471 Let parallel_apply accept arbitrary inputs 2017-07-20 01:45:57 -04:00
f417cb062b Fix repeat backward to handle unsqueezed dims 2017-07-20 01:45:57 -04:00
11f3ccf98f Add missing Modules to nn.functional (#1801)
* add dropout2d and dropout3d to functional

added some loss functions to functional

added tests

using dropout from backend

added docs

fixes

* edited loss modules to call functional
2017-07-19 15:55:21 -04:00
31894cafdd add support for advanced indexing with less than ndim indexers, ellipsis (#2144) 2017-07-19 15:51:03 -04:00
95ccbf8b0b better error message in load_state_dict when there are inconsistent tensor sizes (#2151) 2017-07-19 15:50:29 -04:00
a5422d14c8 Merge commit 'bd6263c338c717de880cddfed660b5aa06ee108b' 2017-07-19 15:48:54 -04:00
82143487b3 Add CUDA support for arange
Also enables CUDA for range
2017-07-19 15:48:20 -04:00
bd6263c338 Add CUDA support for arange
Also enables CUDA for range
2017-07-19 15:43:00 -04:00
f4a565ded9 Merge commit '1c6a08c1c2a50a7048ae9e6e11290740d24a8374' 2017-07-19 15:42:20 -04:00
1c6a08c1c2 fix lint 2017-07-19 12:41:17 -07:00
a5c2546c0f version bump 2017-07-19 12:34:43 -07:00
13e84e460b Use unaligned store intrinsic to enable vectorized reductions on unaligned buffers
Summary: When performing reductions on fp16 buffers, gloo assumed that both buffers were either aligned to 32 bytes or misaligned by the same offset. This may not hold in intermediate steps of halving-doubling allreduce, when the reduction is performed on some offset within the receive buffer. The fix is to use intrinsic instructions that work with unaligned pointers.

Reviewed By: akyrola

Differential Revision: D5450103

fbshipit-source-id: 9a1c8f8c34d2e62223f6d5c21573ea1cfad6537f
2017-07-19 11:06:32 -07:00
4d5d9de541 Merge commit '768b7c0dee34b614ab1cd8f89c69ec7d86c19c88' 2017-07-19 12:22:36 -04:00
9da882e396 Merge commit 'ae3a8d5d2eaa1b15d825b86ce706b046e68733b8' 2017-07-19 12:21:52 -04:00
15bece50d1 Merge commit 'cfcf2af95f91a88ec61cbcac8b30a718e7332aa5' 2017-07-19 12:20:54 -04:00
8144f7c95d Merge commit '58334a0c4b3c386931293f7fbee3d2cf066221a5' 2017-07-19 12:20:20 -04:00
b660303a16 Static linking against libstdc++ in Binary Build mode 2017-07-19 12:19:36 -04:00
768b7c0dee Static linking against libstdc++ in Binary Build mode 2017-07-19 11:23:31 -04:00
ae3a8d5d2e Static linking against libstdc++ in Binary Build mode 2017-07-19 11:23:21 -04:00
58334a0c4b static MKL detection and linkage fixes 2017-07-19 11:22:46 -04:00
cfcf2af95f add explicit BLAS linkage to THC when linked against magma (in binary build) 2017-07-19 11:22:23 -04:00
f3df24269d Merge commit '975550512200cfa1ae18e21400e7efa3924a3d46' 2017-07-19 11:05:51 -04:00
c4120f34bf move to model with cuda indexing tensors for cuda tensor adv indexing 2017-07-19 11:05:10 -04:00
9755505122 move to model with cuda indexing tensors for cuda tensor adv indexing 2017-07-19 11:04:49 -04:00
8b42308f71 Bug in line 381 (sparse) (#2130)
The function iterates over columns and sets "sparsity" fraction of entires in each column to 0. The number of zeros in a column (num_zeros) is then ceil(rows*sparsity)
2017-07-18 22:55:06 -04:00
685ae4813e Squash "macro expansion producing 'defined' has undefined behavior" warnings.
Fixes #2141.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-18 22:24:55 -04:00
a0fef9dd22 Merge commit '703429d49eb397102ba20e6d4c0dd7714be001a5' 2017-07-18 20:17:26 -04:00
703429d49e Make clang shut up about class/struct mismatch.
Makes us -Werror clean again, I think.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-07-18 20:16:20 -04:00
567d95fa09 Merge pull request #25 from killeent/nullable-tensors
add support for Null Tensors to functions
2017-07-18 17:35:02 -04:00
7914d67ce3 Merge pull request #20 from killeent/type-equality
operator== for type
2017-07-18 14:32:45 -07:00
8451468d8b still generate multiple versions 2017-07-18 14:31:35 -07:00
138b216686 add support for Null Tensors to functions 2017-07-18 07:51:51 -07:00
6f6d70ffed Merge commit 'dc5854477951765f5edbac34b0c228449de1b56b' 2017-07-18 01:34:54 -04:00
dc58544779 fix baddbmm for expanded tensors 2017-07-18 01:33:59 -04:00
e13704c467 fix shadowed variable name
Summary: When compiled with -Werror=shadow-compatible-local, cannot reuse a variable name. This passed our tests, but some people use stronger settings to compile.

Differential Revision: D5440805

fbshipit-source-id: a246af748717fb7e0e7a321e1ac4ddfef68ae524
2017-07-17 19:10:30 -07:00
e9dd8e0e3b Use one key for all pairs per node
Summary: To reduce round trips with store handlers, it is better to store all addresses in one key instead of one address per pair. This is what this implements.

Reviewed By: andrewwdye

Differential Revision: D5435893

fbshipit-source-id: 2d3ea3a2822c3b934ff2578d44a262e7bfbde6d0
2017-07-17 17:35:19 -07:00
a3c9054245 Add comments in loss.py (#2128) 2017-07-17 13:56:19 -04:00
c7b624651e CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera
Summary:
CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera.

The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures.

Reviewed By: Orvid

Differential Revision:
D5432398

Tags: codemod, codemod-opensource

fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914
2017-07-16 21:24:13 -07:00
ba544aa0ad Add comments in nn.ELU (#2111) 2017-07-16 23:04:11 -04:00
849fb1f7e3 Fix when running with python -O (#2120) 2017-07-16 13:51:14 -04:00
16dd997239 Spelling tweaks for documentation (#2114) 2017-07-15 13:16:32 -07:00
1c0135b6f2 CreateCommonWorld: pass timeout for storehandler
Summary: Use the CreateCommonWorld timeout for the storehandler as well, not just the device connect.

Reviewed By: andrewwdye

Differential Revision: D5425923

fbshipit-source-id: 936d2129e2db3bfed8759ca097b75843d3931d5f
2017-07-14 19:20:11 -07:00
a7d82b935f Merge commit '9851ef4979bad0c8618e586e711c1bfd8648fd52' 2017-07-14 17:31:21 -04:00
af7aea9f17 Merge commit 'f805a8388be8dc55af0e3aa165b13cd0fce484d3' 2017-07-14 17:29:50 -04:00
366299f9f3 Wrap unbiased flag in var, std, varall, stdall 2017-07-14 17:29:06 -04:00
9851ef4979 Wrap unbiased flag in var, std, varall, stdall 2017-07-14 17:28:14 -04:00
f805a8388b Wrap unbiased flag in var, std, varall, stdall 2017-07-14 17:25:25 -04:00
2f7b6db429 Merge commit 'd2874c560ebd197297ef737a084b6f7ee3f03dc6' 2017-07-14 17:21:16 -04:00
16203f3325 fix test 2017-07-14 17:04:21 -04:00
80d067e70f retain_variables -> retain_graph (#2107)
Closes #1928
2017-07-14 16:45:25 -04:00
d2874c560e lint fixes 2017-07-14 16:32:15 -04:00
83596bdcb1 produce a Declarations.yaml file that describes Functions/Type/Tensor methods that framework produced. 2017-07-14 12:34:03 -07:00
f3f8ce44bd Merge pull request #18 from soumith/master
Fix handling of if_true/if_false in ATen
2017-07-14 15:16:07 -04:00
33ac9cdc10 add ATen tensor support to pytorch tuple_parser (#2102) 2017-07-14 13:56:02 -04:00
38ba935547 operator== for type 2017-07-14 10:39:40 -07:00
128e02d792 allow type inference to work on TensorList 2017-07-14 10:27:05 -07:00
7ee7542fc8 Fix handling of if_true/if_false in ATen 2017-07-14 11:58:03 -04:00
52a9367fa7 Fix minor typo (#2100)
Fixed minor typo in Autograd mechanics docs.
2017-07-14 10:20:13 -04:00
08bb3b7cc8 Merge commit '7e498d2219c8dbeb801fc4cefa36b147bbf76ff4' 2017-07-14 02:55:55 -04:00
43eaa28b9f fix empty Tensor mmap 2017-07-14 02:55:05 -04:00
7e498d2219 fix empty Tensor mmap 2017-07-14 02:54:39 -04:00
d6bc2642e7 Add ignore_index to NLLLoss2d 2017-07-13 23:22:48 -04:00
7d3511f5f2 Half fixes for ATen and CUDA 9.0 2017-07-13 22:52:39 -04:00
a5a8ab10b0 fix Hardtanh argument names to be consistent between functional and Module 2017-07-13 22:46:51 -04:00
25b591eb05 lint fixes 2017-07-13 22:41:01 -04:00
06f94a7d59 better error message when thread_local is not supported (#2092) 2017-07-13 22:32:10 -04:00
027264cd64 Merge commit '9e720f15477d2d7a388c5b5ec7d397fa5706d64f' 2017-07-13 19:59:07 -04:00
7c14c377df Merge commit 'd8fee1ebe675b9d31894ac79145f2b2629e322e4' 2017-07-13 19:25:56 -04:00
c674923bcc Merge commit 'ed6f5d7038f0e3873c2ed6add2ede7c9ab38e1ea' 2017-07-13 19:24:22 -04:00
d8fee1ebe6 add launch_bounds to greedy kernels 2017-07-13 19:23:29 -04:00
ed6f5d7038 add launch_bounds to greedy kernels 2017-07-13 19:23:24 -04:00
9e720f1547 fix bug in method declarations 2017-07-13 16:22:52 -07:00
ab26fa01e6 install vision in devel dockerfile, minor fixes to dockerfile (#2090) 2017-07-13 19:06:41 -04:00
f4ae64a6c7 add isCUDA() on Type 2017-07-13 15:13:20 -07:00
07fcd977bb add cudnn data type processing for ATen tensor (#2087) 2017-07-13 16:37:53 -04:00
54cabb8bf3 Correct negative dim behavior in torch.stack (#2084)
Fixes #1950
2017-07-13 16:29:31 -04:00
42485d87c2 Set the current device in each engine's thread (#2081)
Fixes #2017
2017-07-13 16:24:38 -04:00
007d6ad816 write generated_cpp. to a file rather than as output to make error reporting clearer. 2017-07-13 11:04:52 -07:00
abd433fa07 Merge commit '6db960fbcff7ae194c6827c73113c222391f2c3e' 2017-07-13 13:49:26 -04:00
6db960fbcf dont clobber gen.py error, fix for old versions of python 2017-07-13 10:45:14 -07:00
384f03f1be Merge commit '48b797a785c1fc6ea34398985c49b2c7c55d28ae' 2017-07-13 10:40:58 -04:00
c011d4f3d6 resolves #1991 (#2073) 2017-07-13 09:57:33 -04:00
f98c384973 Raise error when call from_numpy on 0-dim array (#2075)
* Raise error when call from_numpy on 0-dim array

Fixes: #2055

* reword error message
2017-07-13 09:56:12 -04:00
48b797a785 fix lint 2017-07-13 03:22:31 -04:00
8983bf13f4 fix max and min docs 2017-07-13 03:03:27 -04:00
20ce45b0c3 fix EmbeddingSum offsets initialization 2017-07-13 02:57:25 -04:00
1e98155711 long ->size_t 2017-07-13 02:40:44 -04:00
1c14178c65 fix osx compilation 2017-07-13 02:38:56 -04:00
37183e91de add normalize docs to sphinx 2017-07-13 02:31:57 -04:00
14337693d0 Merge commit 'b900a49308cb0363d00add7e123b824fda3eab37' 2017-07-13 01:01:38 -04:00
58e4caf80f add missing docs 2017-07-13 01:01:04 -04:00
b900a49308 Merge pull request #11 from soumith/master
Fix ATen build for debug python
2017-07-12 21:51:36 -07:00
c888857461 Conv double backward groups (#1993)
* add support for groups in double backward

* add tests for group in double backward

* fix lint

* separate some tests to reduce number of test cases

* remove redundant testing for different number of output channels
2017-07-13 00:41:14 -04:00
7053b84c0e Merge commit '41abcd4b41308b3453cce6731d896d094b23c62a' 2017-07-13 00:39:35 -04:00
8304dc4d68 Merge commit '703ccbb8cbe1c4ce3eeb62548ce51f71181883d6' 2017-07-13 00:39:03 -04:00
c48d50a2e2 Advanced Indexing: Calculate linear offsets directly on the GPU when working with CUDA Tensors 2017-07-13 00:38:23 -04:00
41abcd4b41 Advanced Indexing: Calculate linear offsets directly on the GPU when working with CUDA Tensors 2017-07-13 00:37:20 -04:00
703ccbb8cb Advanced Indexing: Calculate linear offsets directly on the GPU when working with CUDA Tensors 2017-07-13 00:37:13 -04:00
27da4eafc2 Remove more advanced indexing duplicate tests (#2071) 2017-07-13 00:30:52 -04:00
459cb697b5 Merge commit 'ce96b84ccbdfbbee7f744942b1bb9fdc5924e442' 2017-07-13 00:26:06 -04:00
ce96b84ccb Check for shared_mem size in multinomial single-sample implementation
Handle limited shared memory on function torch.multinomial

Update THCTensorRandom.cu
2017-07-13 00:25:13 -04:00
feddb03d58 LP pooling kernels 2017-07-12 19:31:06 -07:00
fe3802d724 match PyTorch syntax 2017-07-12 16:58:57 -07:00
b8d0c7fc0d checked cast does it all 2017-07-12 14:41:04 -07:00
ea563c1df1 Make weight norm pickleable (#2066) 2017-07-12 17:21:22 -04:00
2520459617 cpu lp pooling 2017-07-12 14:21:17 -07:00
841173c530 Use NamedTemporaryFile to avoid filename collisions (#2069) 2017-07-12 17:14:42 -04:00
f4c502e8a8 basic cat implementation in ATen 2017-07-12 12:04:24 -07:00
593c5e12e1 Merge commit 'be18499e852d8b292491e27d87dadebe68931fc3' 2017-07-12 14:55:21 -04:00
dc2ed7fd33 Fix ATen build for debug python 2017-07-12 14:52:03 -04:00
81fd2bf2d0 fix some language / typos 2017-07-12 14:47:36 -04:00
8915e2710c Refactor scatter/gather and add distributed docs 2017-07-12 14:47:36 -04:00
ebd5c085dc Fix a memory leak in DataChannelTCP 2017-07-12 14:47:36 -04:00
a9759ef401 Fix undefined symbol errors in THD 2017-07-12 14:47:36 -04:00
f899eafe85 Merge commit '5894864a1c5c9596da0ae88b477ee421e3a5065b' 2017-07-12 14:33:47 -04:00
169ca67a4e Adding Spatial Transformers w/CuDNN support 2017-07-12 14:32:06 -04:00
5894864a1c Adding Spatial Transformers w/CuDNN support 2017-07-12 14:31:14 -04:00
41c8fee3e7 Merge commit '7c10f1b932fbebdf0e9105f2848229ea22109747' 2017-07-12 12:57:52 -04:00
bb891758bf Merge commit 'a20729244b43f7072797cc5e93898df795455e5b' 2017-07-12 12:57:12 -04:00
7c10f1b932 Avoid two unnecessary copies in addmm backward
The `r_` and `t` tensors become different objects, even though they
point to the same data. Avoid the copy whenever beta=0.
2017-07-12 12:56:17 -04:00
a20729244b Avoid two unnecessary copies in addmm backward
The `r_` and `t` tensors become different objects, even though they
point to the same data. Avoid the copy whenever beta=0.
2017-07-12 12:56:08 -04:00
a74fb22b9a fix inplace division for python3 (#2063) 2017-07-12 11:37:55 -04:00
0d91048639 add dummy tensor.data property, to provide interpretable error message to users (#2058) 2017-07-12 10:22:08 -04:00
10e23943b3 Fix missing _forward_pre_hooks in serialized modules (#2057) 2017-07-11 18:23:35 -04:00
be18499e85 Fix a few C++ warnings
1) Type needs a virtual dtor
2) Tensor move ctor should be noexcept
3) Make constructors from Context* and Type* explicit
2017-07-11 15:18:15 -07:00
1037f30e41 add some documentation to Tensor 2017-07-11 11:00:45 -07:00
78ecc2d3b1 Alias multinomial sampling in Cuda (#784)
* Support Multinomial Alias sampling in cuda

Moving benchmark file

* Review changes
2017-07-11 13:23:35 -04:00
f483679425 Implementation of Alias Multinomial for faster Multinomial sampling (#1046) 2017-07-11 13:22:36 -04:00
dfd5d8d0fe Avoid two unnecessary copies in addmm backward (#1971)
The `r_` and `t` tensors become different objects, even though they
point to the same data. Avoid the copy whenever beta=0.
2017-07-11 11:55:22 -04:00
158c7e86dd add basic gitignore, thpp -> at doc fix 2017-07-11 08:32:58 -07:00
73128f7b08 fix minor typos (#2051)
* Update extending.rst

fix typo

* Update cuda.rst

fix typo
2017-07-11 11:01:41 -04:00
f536c662bf fix op in docs (#2048) 2017-07-11 10:36:19 -04:00
2ecb18881c add DynamicType variants for ATen functions. 2017-07-11 10:35:03 -04:00
9d8cff9bc1 initialize aten and pytorch to share the same THCState 2017-07-11 10:35:03 -04:00
ab3d85c410 add build commands for ATen 2017-07-11 10:35:03 -04:00
e58e27cf16 Add 'torch/lib/ATen/' from commit '9d0c674cb7bcfae989d69f988363c1688c22fa89'
git-subtree-dir: torch/lib/ATen
git-subtree-mainline: 3314d51dcc1535dc2d00d357be889807d1bb8c57
git-subtree-split: 9d0c674cb7bcfae989d69f988363c1688c22fa89
2017-07-11 10:33:24 -04:00
3314d51dcc Add __repr__ to Avgpool and maxunpool layers (#2047) 2017-07-11 10:13:22 -04:00
1ef1dd9cad Add comments for readability (#2005) 2017-07-10 23:02:56 -07:00
98206c326e Fix ref counting in wrapped tuple functions (#2042)
Fixes #1963
2017-07-10 18:46:06 -04:00
9d0c674cb7 always use a custom default float 2017-07-10 15:37:18 -07:00
bff762c3ff python style fixes 2017-07-10 15:37:07 -07:00
10a8ccf27f only test gets for advanced indexing with duplicates (#2041) 2017-07-10 16:05:55 -04:00
0a9e8a23ef add atan2 function to autograd (#2040) 2017-07-10 16:04:35 -04:00
8b003565ec remove inaccessible median variant (#2015)
With the addition of medianall() this variant can no longer be accessed, because both it and  medianall take no arguments.
2017-07-10 10:42:45 -04:00
53ac2d46c6 Fix typos in docstrings. (#2034) 2017-07-10 10:35:46 -04:00
318ea29a86 Merge commit 'ab3a9e177ee5eb7d39de2d385ba1e141858e8329' 2017-07-10 10:30:24 -04:00
ab3a9e177e Fix sdot_ bug for runtime F2C symbol conflicts by using cblas where available 2017-07-10 10:29:26 -04:00
46a868dab7 [Ready] Limit docs line length (#1900)
* some docs are ready

* docs

* docs

* fix some more

* fix some more
2017-07-10 10:24:54 -04:00
581921f696 support unsafe functions for getting/constructor tensors from TH objects for backward compat. 2017-07-09 21:25:38 -07:00
0025e1c776 Fix typos in the docstrings of Conv3d, AvgPool3d and MaxPool3d (#2030)
* Fix a typo of the docstring of Conv3d

* Fix typos in docstrings of 3D operations.
2017-07-09 23:20:07 -04:00
9cba97a833 Pairwise-exchange benchmark with bandwidth measurement
Summary: A simple benchmark to determine network bandwidth for pairwise communication.

Reviewed By: plapukhov

Differential Revision: D5159607

fbshipit-source-id: d16c3ed3a0c2ae182138df91bdae821f5508c6ac
2017-07-09 15:55:20 -07:00
c6d7e1e6bf added input size checks to batchnorm (#2020) 2017-07-09 15:31:24 -04:00
49f679d0e9 Acknowledge the existence of cpu HalfTensor (#2018) 2017-07-08 10:03:36 -04:00
f0788afb0c lazily initialize cuda so that we behave similar to PyTorch 2017-07-07 22:21:31 -07:00
a4dc7dcd04 osx build issues and clang warnings 2017-07-07 11:50:02 -07:00
5dd05ed8ee remove Sparse from dispatch for now, will add dispatch variants later 2017-07-07 11:40:08 -07:00
0a34f05d5b Always include THNN in the build, don't check for CUDA twice
As a result, the project builds on MacOS with gcc-6 (without CUDA).
2017-07-07 14:14:02 -04:00
4fda678a85 fix build issue when cuda does not exist 2017-07-07 10:54:17 -07:00
ebdec9a837 Skip distributed tests if not supported (#2004) 2017-07-07 11:06:56 -04:00
c3c7845572 added asserts that grad_output + input are contiguous (#2000) 2017-07-07 09:14:02 -04:00
90d0762d14 Use torch.arange instead of torch.range in test_torch.py (#1996) 2017-07-07 00:06:31 -04:00
73fead9f8f add shape alias (#1983) 2017-07-05 19:12:37 -04:00
3748b6d3eb Data parallel fix for https://github.com/pytorch/pytorch/issues/1857 (#1880)
* Data parallel fix for https://github.com/pytorch/pytorch/issues/1857
searches recursively for variable in input

* parallel_apply.py lint
2017-07-05 11:46:00 -04:00
b3589b04fd Fix exceptions not being caught (#1948)
Adding -fexceptions to both torch and pytorch C/C++ builds fixes tests
not passing.

Closes #1297
2017-07-05 00:25:39 -04:00
5964394a4c return empty iter when tensor is empty 2017-07-04 17:29:27 -04:00
1aaa24d99b add medianall prototype to docs 2017-07-04 16:52:36 -04:00
295ed7e264 Merge commit 'ab7d4e2bcea5cae8f05873fb0bbb31985cc58d47' 2017-07-04 16:47:48 -04:00
ab7d4e2bce add missing definition 2017-07-04 16:46:04 -04:00
ae65236490 Fix typo 2017-07-04 15:19:05 -04:00
c2069a15e0 Merge commit '56df97ce939985a30dcfefb1136bf45faf64413c' 2017-07-04 15:18:14 -04:00
56df97ce93 remove unnecessary contiguous assertion 2017-07-04 15:17:15 -04:00
89c682dfb9 Merge commit '0dbf871d9ec424f1a7897af77bf93219d3be23bf' 2017-07-04 14:56:53 -04:00
ae839f4b2e Merge commit 'f425c5216b7fe35dd03e0161a3440ec968c63636' 2017-07-04 14:56:22 -04:00
05c2bafc9d Have median reduce over all dims and return just the value when dim is not provided 2017-07-04 14:55:37 -04:00
0dbf871d9e Have median reduce over all dims and return just the value when dim is not provided 2017-07-04 14:55:30 -04:00
f425c5216b Have median reduce over all dims and return just the value when dim is not provided 2017-07-04 14:55:19 -04:00
635bb5ec9d corrects typo 2017-07-04 11:09:40 -04:00
a7f6b0ab4f Merge commit 'e5bac2dd2d69772938482c1431db1fc1efb64c6f' 2017-07-03 20:41:28 -04:00
e5bac2dd2d Add critical section to BLAS gemm.
This is needed because of possible races in SpatialConvolutionMM (and others that use gemm)
if the BLAS library is not thread-safe.

In terms of performance, there's not much benefit to run two gemms in parallel, because the
BLAS libraries have their own all-occupying gemms anyways.
2017-07-03 20:40:21 -04:00
ec8da55a7d bind THS THCS, leaving all operators unimplemented. This is required because THPP can represent Sparse tensors even though the wrapper doesn't implement any operators. 2017-07-03 16:52:41 -07:00
b4414c0dc3 Handle None in modules list.
It's often useful to add None to an nn.ModuleList to keep the indexing
of the module list to match some other property.
2017-07-03 18:53:21 -04:00
39edc378fb Fix lint. 2017-07-03 18:51:22 -04:00
f6578c1b24 Implement double backwards for Dropout and FeatureDropout. 2017-07-03 18:51:22 -04:00
daa84e7663 Implement bilinear double backward. 2017-07-03 18:51:22 -04:00
1aa145dbac Implement ConstantPad2d double backwards. 2017-07-03 18:51:22 -04:00
d4b8834131 Improve non-contiguous testing in TestAutograd: (#1933)
* Improve non-contiguous testing in TestAutograd:
1) Test gradcheck and gradgradcheck with non-contiguous inputs
2) Test gradgradcheck with non-contiguous gradoutputs (gradcheck would take more work)
3) Fix discovered issue in Prod backwards.

* Simplify non-contiguous setting wrt View.
2017-07-03 18:49:52 -04:00
699d1ec7fb Address flaky Norm test issues:
1) Add a correction for 1.5 norms to ensure input can't be zero.
2) Increase test tolerance.
2017-07-03 18:48:22 -04:00
05062a1439 Better handle random seeds in tests.
Previously, there were 2 issues with test_autograd randomness:
1) Many random operations (e.g. random selection in prod_zeros) happened
   before the torch random seed was set (because it was set in run_tests
   at the end of the file.
2) The random seed was not set consistently: run_tests would set it to the
   proper value, but each call to setUp would set it to 0 (because SEED wasn't
   global in run_tests), which made setting the seed mostly worthless.
2017-07-03 18:48:22 -04:00
e187ba7a9f Decrease likelyhood that Fmod/Remainder tests fail due to numerical jacobian check.
Previously, these tests added 5e-2 to the denominator tensor (the same as the div
tests), which only avoids divide by 0, but not issues with computing the numerical
jacobian due to non-linearity of fmod/remainder, when input / divisor is close to an
integer.  These tests now add 1.5 to the denominator, which is the same as the non-tensor
version of the tests; Note that we can still hit the above condition but it will be much
less likely.
2017-07-03 18:48:22 -04:00
35ed224d04 Merge commit '8a24f2b4d8646de10b497c2eca2f1edc525a1e09' 2017-07-03 00:49:59 -04:00
72b292d45c Merge commit '733a7c6d9a22dfc9be1b11d47384991208658bfb' 2017-07-03 00:49:52 -04:00
5b4cd9bb49 Merge commit 'c691fc6dc711814a06107d4a9b763f34bff5afca' 2017-07-03 00:49:34 -04:00
c691fc6dc7 Add a nonContigDim reduction kernel to improve latency for small tensors. (#768) 2017-07-03 00:39:40 -04:00
42cf68b402 Make reduction functors accept only constant arguments (#753)
(similar to MaxValuePair and MinValuePair above).
2017-07-03 00:35:39 -04:00
8a65ef1098 cc 2.0 -> 3.0 in docs. 2017-07-02 22:08:42 -04:00
406040f6a9 fix torch.is_tensor not recognizing HalfTensor (#1934) 2017-07-02 10:13:44 -04:00
e26139b7f7 fixed shapes in GRU and LSTM docs. 2017-07-01 23:15:10 -04:00
457587088a Fix broadcasting issues in binary_cross_entropy_with_logits (#1944)
* done re-seed cuda device if in bad fork

* avoid broadcasting in binary_cross_entropy_with_logits

* assert input sizes for BCEWithLogitLoss

* added check that BCEWithLogitsLoss == Sigmoid + BCELoss

* fix flake8 issues

* rename test_bce_with_logits_gives_same_result_as_bce_and_sigmoid -> test_bce_with_logits_gives_same_result_as_sigmooid_and_bce_loss

* add warning in BCELoss about input shapes

* fix lint
2017-07-01 23:06:36 -04:00
da0fad8a7a Use torch.matmul in nn.Linear (#1935)
This takes advantage of the broadcasting behavior of torch.matmul to
support inputs with more than two dimensions. The extra dimensions are
treated like part of the batch dimension, much like nn.Bottle in Lua
Torch.

There are a few related small performance changes:

 * Addmm computes the gradient in column-major for inputs in
   column-major format
 * Variable.mm calls Addmm in-place with the desired output buffer
2017-06-30 16:53:26 -04:00
2c038f2074 Add weight normalization implementation (#1945)
* Add weight normalization implementation

This adds forward "pre-hooks" which get called before the module's
forward() method. Weight norm is implemented as a hook which calculates
the weight variable from the weight_g and weight_v every iteration.

Based on @rtqichen implementation.

* Specify return type
2017-06-30 15:41:40 -04:00
b3e500c522 fix docs generation warnings 2017-06-30 14:39:21 -04:00
b3f6ff1b3d Fix unused linker argument warnings. (#1958)
* Fix unused linker argument warnings.

This patch began when I noticed the following clang warning:

clang: warning: -Wl,-rpath,RIGIN: 'linker' input unused
clang: warning: argument unused during compilation:
'-L/home/ezyang/local/pytorch/torch/lib/tmp_install/lib'

The warning is minor, but I was a bit worried our rpath wasn't
setup correctly.  Actually, it was, and there wasn't a problem,
but I had to spend some time figuring out exactly what as going
on, and by the end of it, I might as well fix the warning.  In the end, I ended
up filing two upstream tickets for ccache and cmake:

- https://github.com/ccache/ccache/issues/189
- https://gitlab.kitware.com/cmake/cmake/issues/17025

We can remove the warning by using CMAKE_EXE_LINKER_FLAGS and
CMAKE_SHARED_LINKER_FLAGS, which have sane macro expansion rules
(although still slightly insane: the first level of escaping gets removed.)
To ensure that the rpath was being set correctly, I ran
objdump -x torch/lib/build/TH/libTH.so | grep RPATH and verified that ORIGIN
was setup correctly.

I also considered using CMAKE_INSTALL_RPATH, but the rpath here doesn't
seem to get set until you actually install, which is a change in behavior,
and I wasn't sure if anyone was relying on rpaths being setup in the build
directory.

There is a SLIGHT behavior change, in that if we happened to need these
LDFLAGS passed to the static linker, they won't get passed. I don't
think we ever build static libraries today so this shouldn't be aproblem.

P.S. Because of the ccache bug, you may continue to see these warnings
after this patch.  If you apply https://github.com/ccache/ccache/pull/190
and clear your cache, it will solve the problem.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Remove unnecessary -Qunused-arguments

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-30 14:15:31 -04:00
6df23b418d mark tools as excluded in find_packages (#1915) 2017-06-29 13:49:56 -04:00
e5b5154768 Make cudnn warnings clean. (#1940)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-29 10:58:04 -04:00
bfaddc0a19 Warp intrinsic fixes (#785) 2017-06-29 00:14:07 -04:00
4d5075add2 Add ignore_index to nnl_loss and cross_entropy (#1937) 2017-06-29 00:10:13 -04:00
0a95613cef Improve error message when accessing attributes that don't exist (#1936)
New:
   >>> torch.autograd.Variable(torch.randn(3, 3)).foobar
   AttributeError: 'Variable' object has no attribute 'foobar'

Old:
   >>> torch.autograd.Variable(torch.randn(3, 3)).foobar
   AttributeError: foobar
2017-06-28 20:13:15 -04:00
8a4eb50ed1 Speed up torch.matmul for 3D+ x 2D/1D tensors (#1931)
If the left tensor is 3D+ and the right tensor is at most 2D, we can
fold the batch into the matrix dimension and use torch.mm instead of
torch.bmm. In practice, this is faster especially if the right tensor is
column major.
2017-06-28 17:43:21 -04:00
b5e1df046e fixed typo in formula of GRU in doc (#1921) 2017-06-28 11:02:06 -04:00
08648061f7 Advanced Indexing 2A - Colons + Adjacent Adv Indexers (#1890) 2017-06-28 10:01:45 -04:00
4c35c630ec Enable norm gradgradchecks by lowering precision requirements. 2017-06-27 18:44:14 -04:00
3744efeaf8 Fix double backwards for prod. 2017-06-27 18:44:14 -04:00
bc032be13e Implement negative dimensions and double backwards cumprod. 2017-06-27 18:44:14 -04:00
f814a892cf done re-seed cuda device if in bad fork (#1923) 2017-06-27 13:24:52 -04:00
d592e188f7 port of ConcatDataset (#1902) 2017-06-27 12:31:56 -04:00
ae61f3ff42 adds poisson NLL loss (#1779) 2017-06-27 10:04:54 -04:00
1f391a42f7 fix warnings for docs generation 2017-06-27 00:18:32 -04:00
b933423495 support more than 8 gpus (#774) 2017-06-26 16:49:14 -04:00
ee1b7b50b3 fix docs for broadcast warning 2017-06-26 14:50:57 -04:00
7cdd018db4 Fix assertEquals for lists and tuples (#1913)
zip finishes once the first iterator is exhausted, so we were erroneously allowing things like assertEquals([1, 2], [1]) to pass.
2017-06-26 14:13:21 -04:00
7806a09f03 Fp16 fixes for CUDA 9 (#783) 2017-06-26 11:38:18 -04:00
7523c49f03 add missing INCREF 2017-06-26 11:33:16 -04:00
733a7c6d9a Fix segfault in SpatialDepthWiseConvolution w/o bias 2017-06-26 16:33:45 +02:00
32e666551a Fix lint. 2017-06-24 09:45:21 -04:00
ab0c321f80 Fix index_copy gradgrad test by ensuring indices cannot be repeated. 2017-06-24 09:45:21 -04:00
9db14936eb Ensure masked_select tests don't have masks of all zeros which yields
0-dimensional tensors.
2017-06-24 09:45:21 -04:00
e5857c5f1c Implement Gather double backwards. 2017-06-24 09:45:21 -04:00
7da77c4255 Add ScatterAdd autograd function. 2017-06-24 09:45:21 -04:00
656cb1c31a Implement and test double backwards for IndexCopy. 2017-06-24 09:45:21 -04:00
4ab4938cf0 Fix and test single backwards IndexCopy. 2017-06-24 09:45:21 -04:00
1324c4b081 Implement double backwards for masked_scatter. 2017-06-24 09:45:21 -04:00
bb3779efe8 Add broadcasting to masked_select. 2017-06-24 09:45:21 -04:00
7c24a3d5cf fix arguments for cudnnFindEx for transposed wgrad 2017-06-23 23:18:32 -04:00
194bc404b5 CUDA 9
Summary:
Adds basic CUDA 9 support, including adding Volta arch, and making appropriate modifications for half precision datatype changes
Closes https://github.com/facebookincubator/gloo/pull/49

Differential Revision: D5315336

Pulled By: pietern

fbshipit-source-id: 6468b0f357206d604bdcfec69ba82509a2c91407
2017-06-23 16:41:27 -07:00
a9ea975977 enable warnings in build and fix warnings 2017-06-23 11:49:09 -07:00
b1a84e3c70 update readme and add assign_(Scalar) variant 2017-06-23 11:27:55 -07:00
8a24f2b4d8 Fix segfault in SpatialDepthWiseConvolution w/o bias 2017-06-23 11:14:00 +02:00
66d93b60b3 fix a bug with scalar handling by simplifiying the maybeScalar check. 2017-06-22 23:07:56 -07:00
2af6ba3b2a handle select and operator[] style operations 2017-06-22 22:57:43 -07:00
b59b44fac7 add checks for scalars on output 2017-06-22 21:46:04 -07:00
a10a1c92b1 start adding rules to propagate scalar to results 2017-06-22 20:51:02 -07:00
bb6908e163 Scalar objects can now be backed by 0-dim Tensors. 2017-06-22 18:57:09 -07:00
c555cd8253 missing fixed allocator files 2017-06-22 18:32:10 -07:00
5e078bb7cc scalar flags added, and used to dispatch when there is a scalar variant of a function. broadcast annotations are used to figure out when a scalar s + A should also be converted. 2017-06-22 17:22:16 -07:00
ee10e7457f Corrected erroneous docstring for MultiLabelSoftMarginLoss 2017-06-22 17:42:18 -04:00
7cd6cc17af Merge commit '93e05eb458ad4c939e905668c1792692315880b0' 2017-06-22 17:23:02 -04:00
8bfef60b07 Merge commit '32fd4a3d6081a13c18ce4f8dcb37260a830a911f' 2017-06-22 17:22:31 -04:00
a45ad7cfba Advanced Indexing Part 1 -- Purely Integer Array Indexing 2017-06-22 17:21:50 -04:00
93e05eb458 Advanced Indexing Part 1 -- Purely Integer Array Indexing 2017-06-22 17:21:30 -04:00
32fd4a3d60 Advanced Indexing Part 1 -- Purely Integer Array Indexing 2017-06-22 17:21:19 -04:00
f09027bc29 Add batch sampler to DataLoader (#1867) 2017-06-22 20:18:31 +02:00
9a196829e2 Merge commit '43dec0a210103c4421bc73c7e742f0f746b7e39e' 2017-06-22 13:55:54 -04:00
43dec0a210 Remove THCTensor_(expand2) and THCTensor_(expand3).
They are no longer needed and the corresponding TH versions have been removed.
2017-06-22 13:55:08 -04:00
064ef8b81b Merge commit '104234a6a8937f09208061975ce90190a7be4159' 2017-06-22 13:21:59 -04:00
662faf7c41 Merge commit 'a940d4ff8bf5debc76d909a778e2e47d24148ee1' 2017-06-22 13:21:38 -04:00
cph
104234a6a8 add asserts to BCECriterion 2017-06-22 13:20:25 -04:00
cph
a940d4ff8b add asserts to BCECriterion 2017-06-22 13:20:07 -04:00
c16a268f47 Merge commit 'fb32164a72004e63ebfe1f9ca8366ff12f8fbec2' 2017-06-22 12:56:36 -04:00
cb4eaa9c5d TensorLib/Aten --> changes required in pytorch 2017-06-22 12:55:55 -04:00
fb32164a72 TensorLib/Aten --> changes required in pytorch 2017-06-22 12:55:17 -04:00
b5854a11c4 Merge commit 'eccc759c36a4023357c87fde79732e4c916676d2' 2017-06-22 12:49:50 -04:00
ddbd4ef4ac Support out-of-place broadcast type definitions. 2017-06-22 12:49:06 -04:00
eccc759c36 Support out-of-place broadcast type definitions. 2017-06-22 12:48:43 -04:00
fecd05ba2f Merge commit '81e14ad2dee356b2c2274eb302bc2438c9a6161a' 2017-06-22 12:46:37 -04:00
a7d1cd75ec Merge commit '93a7c9de29900f166486373744a0e90c7046a56a' 2017-06-22 12:46:02 -04:00
497db732fc btrifact: Make pivoting optional. 2017-06-22 12:45:14 -04:00
81e14ad2de btrifact: Make pivoting optional. 2017-06-22 12:45:01 -04:00
93a7c9de29 btrifact: Make pivoting optional. 2017-06-22 12:44:51 -04:00
96febbb762 Merge commit '62cfc94f445bfaeaccc3dcc1fc69ea5b75039823' 2017-06-22 12:40:40 -04:00
62cfc94f44 improving TH error messages in Apply macros 2017-06-22 12:38:10 -04:00
3f6cda8696 fix bug of threshold activation 2017-06-22 12:23:35 -04:00
a836f8f56f Use and document saved_variables for double backwards. 2017-06-22 11:46:24 -04:00
278cbbae49 set TH_INDEX_BASE to 0 2017-06-21 16:43:16 -07:00
68cbb857f2 allow tensors to be constucted from views of external data. Support creating new tensors that already have a size/stride 2017-06-21 15:35:08 -07:00
a1c557bc45 improve error reporting for undefined tensors passed as arguments. 2017-06-21 12:24:59 -07:00
4c5b7d41ba tensor.data<> also as toLongData() variants. Scalar now also has .to<T>() variants 2017-06-21 11:57:37 -07:00
13e7648fd1 document accessors 2017-06-21 11:23:03 -07:00
1572173ca7 Implement double backwards for Sort, Topk. 2017-06-21 00:24:13 -04:00
e16ceef76a Implement Scatter double backwards. 2017-06-21 00:24:13 -04:00
b79ff11aca Implement IndexAdd, IndexFill, IndexSelect, MaskedSelect double backwards. 2017-06-21 00:24:13 -04:00
50c0912a75 Implemented masked_fill double backwards. 2017-06-21 00:24:13 -04:00
c3ad55f746 add readme and generated files for Type/Tensor/Functions to a doc folder to make it possible to view headers without building the library 2017-06-20 20:33:26 -07:00
4b93f32234 rename TensorLib -> ATen 2017-06-20 16:49:13 -07:00
03f41c8120 fix capitalization of Python, make it consistent 2017-06-21 00:09:37 +02:00
e0b70d0f64 Fix Fmod/Remainder gradgradcheck by ensuring inputs requires_grad. 2017-06-20 11:59:21 -04:00
0b2b7d0594 Kth value function passes gradgradcheck. 2017-06-20 11:59:21 -04:00
6d97ac0c0f Missing includes in cuda_collective_device.h
Summary: Closes https://github.com/facebookincubator/gloo/pull/47

Differential Revision: D5283752

Pulled By: pietern

fbshipit-source-id: 8ad3353b3455c5416e31e75b46755e2f7fcaad52
2017-06-20 08:54:16 -07:00
a405efa756 CUDA collectives as alternative to NCCL
Summary:
Adds a separate set of CUDA collectives that run on device as an
alternative to NCCL. Use these collectives as default on-device
collectives instead of NCCL.

Whenever multiple processes on the same machine use Gloo with NCCL and
end up doing concurrent CUDA memory allocations and algorithm
execution, we risk deadlock. A follow up change will enable opt-in
usage of NCCL (e.g. through environment variable).

Benchmark output below with varying number of elements. It shows a
minor improvement over using NCCL for local reduction and broadcast.

Number of elements equal to on-device threshold (256K):

```
Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_ring
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before)  262144       2685       2907       3035       3215        562
(after)   262144       2682       2874       3013       3395        577

Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_ring_chunked
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before)  262144       2045       2133       2325       2643        725
(after)   262144       1533       1673       1834       2048        800

Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_halving_doubling
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before)  262144       1580       1640       1718       2069        893
(after)   262144       1371       1446       1539       1748       1125
```

Larger number of elements (4M):

```
Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_ring
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before) 4194304      55543      58058      60103      62659         32
(after)  4194304      54490      57923      60893      66058         33

Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_ring_chunked
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before) 4194304      18049      22820      24997      26634        105
(after)  4194304      18356      20463      21695      22589         99

Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   cuda_allreduce_halving_doubling
Options:     processes=2, inputs=8, gpudirect=no

        elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
(before) 4194304      18584      24345      27809      29722         95
(after)  4194304      19541      22718      25408      26688         88
```

Reviewed By: akyrola

Differential Revision: D5278192

fbshipit-source-id: 53f09e404663ddc8bb46d06ac87afd8ee3ffc3a2
2017-06-20 00:23:43 -07:00
67968cb60b Add numerically stable BCELoss which takes logits as input (#1792) 2017-06-19 22:05:51 -04:00
a6c5e3f2e2 Fix case where interface doesn't have an address
Summary:
Code in tcp/transport tries to find the network interface a socket was
bound to when create a TCP device context. Per getifaddrs(3), it is
possible for the ifa_addr field to be NULL (supposedly when an
interface doesn't have an address). Ignore such entries.

Thanks to slayton58 for reporting this.

Reviewed By: wesolwsk

Differential Revision: D5279376

fbshipit-source-id: 039380b95ba4d6d94942c30581e0b230a060870c
2017-06-19 18:05:32 -07:00
6ee6b4980b multiple docs 2017-06-19 20:06:27 -04:00
ceb13c8cc3 Don't propagate -mavx flag to dependents
Summary:
Previously, `gloo/math.h` inlined methods which use AVX builtins,
which required propagating the `-mavx` flag.
This diff moves these definitions out of the header and into a source
file to prevent avoid this.

Reviewed By: pixelb

Differential Revision: D5271043

fbshipit-source-id: dde4dc560dfb557b46d1a582a8b38e7cb8eb0c37
2017-06-19 16:46:43 -07:00
82ef292f00 Add gradgradchecks for various autograd Functions and support Unfold double backwards. 2017-06-19 18:19:16 -04:00
76ee014d10 Add documentation to SELU and AlphaDropout 2017-06-19 18:18:01 -04:00
f619ac6ac9 Quickfix for AlphaDropout on CUDA 2017-06-19 18:18:01 -04:00
32e6372538 Split cuda_collectives.h into two files
Summary:
This changes prepares for having a separate set of collectives that
use native CUDA calls instead of NCCL. This is needed to workaround
the issue where NCCL deadlocks when it is interleaved with CUDA memory
management operations in other processes on the same machine.

Includes a modification to the host reduction functions to bring them
up to parity with the NCCL reduction functions (they now incorporate
offset/counter arguments).

Reviewed By: wesolwsk

Differential Revision: D5276291

fbshipit-source-id: 8844731760d2c48577d207c026ce0cd641f2fc6d
2017-06-19 12:57:53 -07:00
172a356668 forgotten import in variables.py
Fixing error on line 661: 
warnings.warn("masked_copy_ is deprecated and renamed to masked_scatter_, and will be removed in v0.3")
NameError: name 'warnings' is not defined
2017-06-19 14:23:48 +02:00
329a2f7d27 Prevent divide by zero in dropout with p=1 2017-06-17 11:38:02 -04:00
69e38ee821 clean test code, no functional change 2017-06-17 11:11:48 -04:00
38e6b9c7e7 fix bug in wrap_outputs miscounting the number of inputs 2017-06-17 11:11:48 -04:00
7775e9e777 add newNarrow to thpp THCTensor 2017-06-17 11:11:48 -04:00
293262b8f1 fix cuda tests 2017-06-17 11:11:48 -04:00
e66e01a2a0 remove extra computations for input usage check 2017-06-17 11:11:48 -04:00
0a93903e8e move tests to test_nn 2017-06-17 11:11:48 -04:00
bcac55dd2f force 1 stride for 1-sized dim for cudnn, fix lint, remove extra unpacking 2017-06-17 11:11:48 -04:00
6cdcd9c603 Add Narrow function
clean error message and support non perfectly sized inputs
2017-06-17 11:11:48 -04:00
075030d974 add cuda tests that use only cunn for finite difference computations 2017-06-17 11:11:48 -04:00
23dec70614 comment on working values for epsilon 2017-06-17 11:11:48 -04:00
fc0ab229ad remove extra cloning and add contiguous calls 2017-06-17 11:11:48 -04:00
ce3bc5a4a5 force cloning of weights 2017-06-17 11:11:48 -04:00
3dbece7eb5 clean tests 2017-06-17 11:11:48 -04:00
bd94718c87 cleaner AccumulateGrad 2017-06-17 11:11:48 -04:00
2f8d21a7f2 add contiguous function 2017-06-17 11:11:48 -04:00
4f4fc9091a add support for newTranspose in thpp::THCTensor 2017-06-17 11:11:48 -04:00
7ee095cf7f add newExpand and newView to thpp::Tensor 2017-06-17 11:11:48 -04:00
462ab8a644 add Transpose View Expand C functions 2017-06-17 11:11:48 -04:00
dd5c7c473f Add ConvBackwardBackward class 2017-06-17 11:11:48 -04:00
6dca309017 make AccumulateGrad support no input gradient 2017-06-17 11:11:48 -04:00
f945fbc3dd add gradgradcheck and conv double backward tests 2017-06-17 11:11:48 -04:00
db70d4d223 1) Simplify CompareOp autograd backward
2) Use better approach for avoiding divide-by-0 in autograd tests.
2017-06-17 09:38:28 -04:00
7714b5a088 Fix autograd shape tracking for 1-d reduction ops. 2017-06-17 09:38:28 -04:00
860f51e67f Avoid nans in fmod/remainder tensor tests.
Also clean up CompareOp autograd backwards impl.
2017-06-17 09:38:28 -04:00
2c04ce63a5 Fix masked_scatter autograd broadcasting. 2017-06-17 09:38:28 -04:00
83bfa5e1ab Fix masked_scatter pointwise autograd backward behavior. 2017-06-17 09:38:28 -04:00
618f20fb38 Fix autograd broadcasting for masked_fill. 2017-06-17 09:38:28 -04:00
9711223c12 Add broadcast autograd tests for dist. 2017-06-17 09:38:28 -04:00
7d0f1c51bb Fix autograd broadcast for min, max. 2017-06-17 09:38:28 -04:00
7560474fbb Fix autograd pointwise fallback for max,min. 2017-06-17 09:38:28 -04:00
e69fe5bdb0 Automatically detect when to skip inplace tests and fix lint. 2017-06-17 09:38:28 -04:00
f3ae90e329 Fix broadcast and pointwise compare ops with autograd. 2017-06-17 09:38:28 -04:00
bfdd1f2199 Fix fmod/remainder autograd broadcasting. 2017-06-17 09:38:28 -04:00
b164efb8b0 Fix lerp broadcast autograd. 2017-06-17 09:38:28 -04:00
94c7260087 Fix pointwise fallback for lerp. 2017-06-17 09:38:28 -04:00
aac459431b Fix pow autograd broadcast. 2017-06-17 09:38:28 -04:00
a04d1af0a4 Fix addr, addmm, baddmm, addmvm, addbmm broadcasting with autograd.
Fix autograd broadcast for addmm, baddmm, others.
2017-06-17 09:38:28 -04:00
a54a7c1312 Fix addcmul, addcdiv autograd broadcasting. 2017-06-17 09:38:28 -04:00
9ba799c26b Fix pointwise fallback for addcdiv, addcmul. 2017-06-17 09:38:28 -04:00
5cfb1329b5 Make implementation of Variable.mul_ and Variable.div_ consistent. 2017-06-17 09:38:28 -04:00
af2dd0d3e9 Fix autograd for broadcasting with add, sub, mul, div. 2017-06-17 09:38:28 -04:00
79a343bbd4 Remove unnecesssary squeezing in Expand backwards.
Also add size checks to test_autograd to try to catch such issues.
2017-06-17 09:38:28 -04:00
88e4bec8fa resize bug fix 2017-06-17 11:07:22 +02:00
faa7c2cc2c fix cuda breakage 2017-06-16 20:13:46 -04:00
3cecdf84f1 Storage from_file method (#1821) 2017-06-17 00:34:20 +02:00
49586d9556 Add basic API support for NCCL 2.0
Summary:
\cc pietern
Minimal changes to allow gloo to compile and run with NCCL 2.0
Closes https://github.com/facebookincubator/gloo/pull/46

Differential Revision: D5268074

Pulled By: pietern

fbshipit-source-id: 58d625d57b31cfc932f3dbbdd7a4b83d9a2e60a8
2017-06-16 15:22:14 -07:00
8d33603901 make t() of Variable consistent with Tensor (#1823) 2017-06-16 16:08:53 +02:00
a64560c22e Remove flattening for torch.dot (#1781) 2017-06-16 02:15:33 +02:00
97f50edf46 Add documentation for Cholesky lapack functions (#1816) 2017-06-16 02:10:56 +02:00
86a96cd759 Merge commit 'd605afe8b51bf1522d3caf4efef4b3c85def499b' 2017-06-15 12:33:45 -04:00
f61ec2495e nn.EmbeddingBag to compute a bag of word embeddings (Embedding + Sum/Mean) 2017-06-15 12:32:47 -04:00
d605afe8b5 nn.EmbeddingBag to compute a bag of word embeddings (Embedding + Sum/Mean) 2017-06-15 12:32:28 -04:00
909f31764f Add nn.padding to docs fixes #1127 (#1808)
* exposed nn.padding modules

* using functional
2017-06-15 07:41:38 -04:00
ea5819045e a few comments in build_all.sh (#1807) 2017-06-14 17:58:56 -04:00
9c53c6dcb9 Fix errors and warnings when building docs (#1806) 2017-06-14 13:50:14 -04:00
9d916e561c batch norm docfix (#1804)
fixes the formula for batch normalization (moves the epsilon inside
the square root)
2017-06-14 11:57:46 -04:00
4e356528b4 Add torch.matmul function. (#1780)
* Add torch.matmul function.

Includes test_torch, test_autograd and docs changes.

* Add __all__ to functional so imports are accidentally imported.

* Include unbind in __all__.

* Add matmul case for when one argument is 1-dimensional and the other
at least 3-dimensional.

* Add squeeze_ to Variable.

* Use squeeze_ instead of squeeze for matmul.
2017-06-14 08:14:53 -04:00
9fd354e643 More accurate build instructions based on @apaszke's comments. (#1800) 2017-06-14 12:04:45 +02:00
c8e9bc493b Merge commit '244af06adc77674e7e1134d67d4a56ae7641f7b9' 2017-06-13 20:49:37 -04:00
6de5ce6bac Merge commit '1cf105d517c4308912eee85eff8f50f31c9e31f1' 2017-06-13 20:49:13 -04:00
38b9598685 Added GLU (gated linear unit)
From https://arxiv.org/abs/1612.08083
2017-06-13 20:48:19 -04:00
244af06adc Added GLU (gated linear unit)
From https://arxiv.org/abs/1612.08083
2017-06-13 20:48:03 -04:00
1cf105d517 Added GLU (gated linear unit)
From https://arxiv.org/abs/1612.08083
2017-06-13 20:47:55 -04:00
3ada9da808 Make csrc -Werror clean. (#1795)
Primary things I had to fix:

- Suppress _XOPEN_SOURCE warnings by ensuring that Python.h is included
  first, because it always unconditionally defines this macro.

- Turn off strict aliasing, because Python 2 doesn't work with strict
  aliasing.

- Workaround setuptools bug, where it's incorrectly passing
  -Wstrict-prototypes to C++ compilers (where this doesn't make
  any sense)

To compile csrc with -Werror, run `CFLAGS="-Werror" python setup.py build_ext`

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 20:18:09 -04:00
5a63a6d47f Better document how to rebuild only parts of the project. (#1796)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 17:23:39 -04:00
38a48729f0 Merge commit '1a6995b28ca42df41270d4fd914adfb9c8c59674' 2017-06-13 16:31:48 -04:00
deb0aef30c Merge commit '122dd9e8ec4627ccdd895a7dc88a1ec6f13ad6d2' 2017-06-13 16:31:13 -04:00
3977ee3520 Support device on sparse tensor constructor, assert values/indices on same device.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:35 -04:00
c0e7bda3f1 Enforce storage is not NULL invariant for sparse tensors.
Fixes #1783.

There is an undocumented invariant in PyTorch that we should
try to avoid having storage == NULL as much as possible (even
though Torch supports it.)  This commit properly documents the
invariant, and fixes a bug in sparse where the invariant was
not respected.  This now means that sparse tensors now correctly
remember what GPU they are associated with.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:35 -04:00
df412051fd Add comment stating nDenseTensors != nTensors in checkGPU.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:35 -04:00
7bee03fe1e Do NOT clone indices/values passed to sparse tensor by default.
Fixes #1782.

The default operation should be cheap: user can always choose to
explicitly make a copy on the way in.  Note that this is a
BACKWARDS COMPATIBILITY BREAKING change.  However, we DO create
a new tensor wrapper (so we are not affected by subsequent
size changes, etc.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:34 -04:00
865beada0e Add comment about new implementation being CPU-only.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:34 -04:00
6a46863c83 Abort on known bug (#1521) for spcadd on non-coalesced.
It's better to error than to silently give wrong results.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:19 -04:00
d763db59a9 More efficient nnz test in spcadd.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:19 -04:00
5d6e593c67 Test clone preserves uncoalescedness if it wasn't coalesced.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:19 -04:00
bac408b693 Add some docs about storage->Size.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:19 -04:00
2f967a204c Sparse tensor clone() preserves coalescedness.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:30:19 -04:00
1a6995b28c Short-circuit copy if src and dest are equal.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:20:04 -04:00
122dd9e8ec Short-circuit copy if src and dest are equal.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-13 16:19:35 -04:00
7c024e93c6 Implement Cumprod function for autograd (#1439) 2017-06-13 17:48:15 +02:00
b4698d6d1d add init to __init__.py of torch.nn (#1789) 2017-06-13 09:02:30 -04:00
d9d50f80c7 Rename arguments to distributed collectives 2017-06-12 22:02:11 -04:00
714351ff39 Officially enable process-group mode 2017-06-12 22:02:11 -04:00
6f51b4ce2d Fix deadlock in GlooCache 2017-06-12 22:00:22 -04:00
12813b88f6 Add DistributedDataParallel 2017-06-12 22:00:22 -04:00
23ab9d481a Add Module._all_buffers 2017-06-12 21:58:38 -04:00
8db8716c7c Support non-default streams in NCCL reduce 2017-06-12 21:58:38 -04:00
b37f18be53 Free GIL when entering THD functions 2017-06-12 21:58:38 -04:00
5a0d5ec058 Add more checks in torch.distributed 2017-06-12 21:58:38 -04:00
095ddc7d08 THD updates and bug fixes
* Add keepdim
* Fix DataChannel signature
* Fix incorrect locking
* Use current stream in DataChannelGloo
2017-06-12 21:58:38 -04:00
86a065e45b Add end callbacks to the engine 2017-06-12 21:58:38 -04:00
59d438de2e change function to remove dependence on CUDA 8.0
Summary: Replace call to function that is only supported in CUDA 8.0 with one that has been supported in previous releases.

Reviewed By: pietern

Differential Revision: D5231755

fbshipit-source-id: d72aec2a4a1c511064a65142887f8a05b51dad55
2017-06-12 15:53:59 -07:00
6626881e7a Add Alpha Dropout (#1775) 2017-06-13 00:39:49 +02:00
49ec984c40 Ensure warnings are repeated in python2 for tests. 2017-06-11 05:37:59 -04:00
afaad94fed Rename autograd keepdim tests that now default to True. 2017-06-11 05:37:59 -04:00
4f602a52b5 Use THPUtils_assert rather than THError in torch/csrc/Module. 2017-06-11 05:37:59 -04:00
3abc8be42c Clarify use of warn vs raise in expand_utils and don't catch exception in Broadcast plugin when fallback = false. 2017-06-11 05:37:59 -04:00
f4ce99fd87 Add dist, atan2, lerp to fallback functions.
They weren't documented as having those semantics, but tests on
master show they do.
2017-06-11 05:37:59 -04:00
d5a0f97ea7 Renamed masked_copy to masked_scatter in test, fix use of break/continue. 2017-06-11 05:37:59 -04:00
e8ec4110f6 Fix Prod backward for broadcasting. 2017-06-11 05:37:59 -04:00
ffd808768e Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function. 2017-06-11 05:37:59 -04:00
5b81746767 Simplify python warning settings and cleanup tests. 2017-06-11 05:37:59 -04:00
d49b73bbe6 Rename check_fallback to check_backincompat_expand_warn for clarity. 2017-06-11 05:37:59 -04:00
7040b82ede Change async/broadcast copy arguments to be parsed as ints. 2017-06-11 05:37:59 -04:00
723819014e Move expand_utils-inl.h to generic/ and generate via macros. 2017-06-11 05:37:59 -04:00
1ef4cc1591 Incorporate review comments:
1) Line up trailing dimensions in broadcast docs.
2) remove unnecessary expand_as in common_nn test.
3) use view in tensor_str instead of resize_.
4) newExpand remove raiseErrors change.
5) clarify expandedSizes/expandedStrides parameters in inferExpandGeometry.
6) simplify inferSize2/inferSizeN implementations.
7) use new-style classes for warning.
2017-06-11 05:37:59 -04:00
deec86cc05 Clarify a number of comments. 2017-06-11 05:37:59 -04:00
7da46097fe Fix lint errors. 2017-06-11 05:37:59 -04:00
21d9b0c9dd Ensure warnings are repeated in test, necessary in python2. 2017-06-11 05:37:59 -04:00
69287250d1 Add a broadcast parameter to copy_, use it in the library in cases where there is non-broadcasting calls exposed by the tests. 2017-06-11 05:37:59 -04:00
74a23c5aba Fix test_broadcast for cuda tensors, since map_, map2_ not implemented. 2017-06-11 05:37:59 -04:00
177785eecf explicit Ptr constructors, fast transposed copy. 2017-06-11 05:37:59 -04:00
ad9604f45a Add documentation for copy_. 2017-06-11 05:37:59 -04:00
65b23f146e Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils. 2017-06-11 05:37:59 -04:00
c54e532954 Add broadcasting support for map_, map2_. 2017-06-11 05:37:59 -04:00
ec120fac0c Add broadcasting support for masked_copy, masked_fill. 2017-06-11 05:37:59 -04:00
e06523482a Use THSize_isSameSizeAs, instead of THTensor_(isSameSizeAs) in order to compare sizes of tensors with different data types. 2017-06-11 05:37:59 -04:00
d6fb92fec9 Improve in-place broadcasting back compat warning message and fix an issue where the deprecated warning would not be printed. 2017-06-11 05:37:59 -04:00
5e1a714386 Add backwards incompatibility docs. 2017-06-11 05:37:59 -04:00
be65f46c76 Add optional warning for backwards incompatible keepdim. Setting torch.utils.backcompat.keepdim.warning.enabled=True will cause Python warnings in the case where the default value of keepdim is used for 1-d reductions.
Also specify keepdim via kwargs in library so these warnings have less
noise.
2017-06-11 05:37:59 -04:00
3556d1b8a3 Add optional warning for backwards incompatible broadcast.
Setting torch.utils.backcompat.broadcast.warning.enabled=True
will cause Python warnings in the case where broadcast occurs
but previously 1-d view style pointwise ops occured.
2017-06-11 05:37:59 -04:00
5af46cb352 Add broadcasting support for matmul. 2017-06-11 05:37:59 -04:00
a36f95fe26 Add broadcast support for fused-matmul broadcasting. Functions are: addmm, addbmm, addr, addmv, baddbmm. 2017-06-11 05:37:59 -04:00
cd35091d9b Include simple broadcasting example and demonstrate lining up trailing dimensions. 2017-06-11 05:37:59 -04:00
3c586d196a Document Broadcast Plugin. 2017-06-11 05:37:59 -04:00
8e2f347951 Proof that broadcasting 3 args (expand3) is equivalent to breaking up operation. 2017-06-11 05:37:59 -04:00
d279c6e099 Docs for addcdiv, addcmul 2017-06-11 05:37:59 -04:00
014372e707 Support "fused" ops: addcmul/addcdiv. 2017-06-11 05:37:59 -04:00
92fde6cf06 Breakup in place broadcast to better handle multiple arguments. 2017-06-11 05:37:59 -04:00
b44ea57ba8 Change order of Broadcast specification.
Since fused ops require broadcasting self over multiple other arguments,
it is simpler to specify broadcast on self rather than the other
way around.
2017-06-11 05:37:59 -04:00
e96f854ce2 Implement/test broadcasting semantics for comparison ops. 2017-06-11 05:37:59 -04:00
edf2969bd8 Backwards compatible Spatial Normalizations / CrossMapLRN. 2017-06-11 05:37:59 -04:00
e653fe2857 Test fixes for keepdim=False, suppress warnings on backwards-compatible behavior. 2017-06-11 05:37:59 -04:00
70c33777a6 pow, fmod, remainder also should fallback.
This behavior isn't listed in the docs, but the tests depend on it.
2017-06-11 05:37:59 -04:00
471dfe9791 Add documentation including links to numpy broadcasting semantics. 2017-06-11 05:37:59 -04:00
85d838a028 Testing over the following: 1) CPU tensor out-of-place functions 2) CPU tensor in-place functions 3) GPU tensor out-of-place functions 4) GPU tensor in-place functions 5) torch. functions 6) Fallback semantics (use pointwise nElem matching rather than broadcasting) 2017-06-11 05:37:59 -04:00
6a40acb4f0 Add Broadcast plugin. 2017-06-11 05:37:59 -04:00
9087624634 Revert "Restore examples with keepdim=True default."
This reverts commit 6fab62173e842bbf550de1c68cfae507ca35b800.
2017-06-11 05:37:58 -04:00
e772a440cb Revert "Change keepdim default to False."
This reverts commit e124790cb2b6675a4b6edf64620a7eb7f7228b29.

Note the original commit message is incorrect; this changes keepdim
back to false.
2017-06-11 05:37:58 -04:00
efd8b54be2 Merge commit 'e45c1046feba46aef2ffac1b1d978a3e76936bab' 2017-06-11 05:37:51 -04:00
54c3441e9c Merge commit '7d1b042cb2198d2bdb5871b08c6c0fb2ccc8e6b1' 2017-06-11 05:37:18 -04:00
7d1b042cb2 fix type 2017-06-11 04:42:34 -04:00
e45c1046fe Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function. 2017-06-11 04:33:54 -04:00
a563ce1105 Incorporate review comments:
1) Line up trailing dimensions in broadcast docs.
2) remove unnecessary expand_as in common_nn test.
3) use view in tensor_str instead of resize_.
4) newExpand remove raiseErrors change.
5) clarify expandedSizes/expandedStrides parameters in inferExpandGeometry.
6) simplify inferSize2/inferSizeN implementations.
7) use new-style classes for warning.
2017-06-11 04:33:54 -04:00
92d52bf395 Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils. 2017-06-11 04:33:54 -04:00
0463ddf16b Support "fused" ops: addcmul/addcdiv. 2017-06-11 04:33:54 -04:00
9060e6be7f Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function. 2017-06-11 04:32:08 -04:00
f0b8c4821b Incorporate review comments:
1) Line up trailing dimensions in broadcast docs.
2) remove unnecessary expand_as in common_nn test.
3) use view in tensor_str instead of resize_.
4) newExpand remove raiseErrors change.
5) clarify expandedSizes/expandedStrides parameters in inferExpandGeometry.
6) simplify inferSize2/inferSizeN implementations.
7) use new-style classes for warning.
2017-06-11 04:32:08 -04:00
0f79bf1a69 Clarify a number of comments. 2017-06-11 04:32:08 -04:00
503002eda7 Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils. 2017-06-11 04:32:08 -04:00
cf55e1e48a Add broadcasting support for masked_copy, masked_fill. 2017-06-11 04:32:08 -04:00
8d35d4215b Use THSize_isSameSizeAs, instead of THTensor_(isSameSizeAs) in order to compare sizes of tensors with different data types. 2017-06-11 04:32:08 -04:00
9356640453 Properly clean up expand error cases. 2017-06-11 04:32:08 -04:00
ae6b8d0112 Include simple broadcasting example and demonstrate lining up trailing dimensions. 2017-06-11 04:32:08 -04:00
ec2f6a81fd Support "fused" ops: addcmul/addcdiv. 2017-06-11 04:32:08 -04:00
1f9a365fdc Add Infer Size N, for expansion of fused operations. 2017-06-11 04:32:08 -04:00
d38a87217f Expand improvements
1) Rename calculateExpandGeometry to inferExpandGeometry for consistency
2) Simplify inferExpandGeometry implementation by using a single pass
   through dimensions
3) Implement a two operand expansion, expand2.
4) Implement versions that return error code to use for fallback to
equal nElem support.
2017-06-11 04:20:04 -04:00
baa4ba973b Expand improvements
1) Rename calculateExpandGeometry to inferExpandGeometry for consistency
2) Simplify inferExpandGeometry implementation by using a single pass
   through dimensions
3) Implement a two operand expansion, expand2.
4) Implement versions that return error code to use for fallback to
equal nElem support.
2017-06-11 04:19:37 -04:00
a24db91a38 Add SELU activation function (#1769)
* Add SELU activation function

* Remove unnecessary case

* Add Function for SELU + tests and fix RReLU inplace

* Fix extra line in doc

* Fix tests

Remove in-place tests for RReLU. For some reason they fail on legacy nn, but passes on nn

* SELU in new-style Function

It also supports double backprop, verifyed with gradgradcheck

* Fix flake8
2017-06-11 10:07:48 +03:00
e3d5826b92 Add Cumsum double backwards support. (#1758) 2017-06-10 18:27:44 +02:00
ba690d5607 Add support for NVTX functions. (#1748) 2017-06-10 18:26:58 +02:00
5f1a16a018 Torch manual seed to seed cuda devices (#1762) 2017-06-10 12:37:21 +02:00
dcf07a2d7f Fix typo in ParameterList documentation 2017-06-10 02:16:52 +02:00
fab5bef9f6 Merge pull request #45 from slayton58/nccl_cmake_fix
Fix NCCL directory typo
2017-06-08 11:28:25 -07:00
21a5c8ea5e Fix use of nccl_INCLUDE_DIRS in nccl.cmake 2017-06-07 20:13:11 -04:00
5300aafc1f Fix NCCL directory typo 2017-06-07 17:01:13 -04:00
a9bd1de9e9 fixed README to reflect docker image name (#1751) 2017-06-07 15:49:39 -04:00
e57eef4bcb Merge commit '62835fc3f5346968b4dca392c77efdeb75a6b172' 2017-06-07 14:54:47 -04:00
d81da41650 Make sure the number of MKL and OpenMP threads match
Otherwise, on many machines, the size of the OpenMP thread pool will
change between MKL and our OpenMP enabled functions. The constant thread
creation and destruction results in worse performance and leaks memory
on GCC 5.4
2017-06-07 14:53:29 -04:00
62835fc3f5 Make sure the number of MKL and OpenMP threads match
Otherwise, on many machines, the size of the OpenMP thread pool will
change between MKL and our OpenMP enabled functions. The constant thread
creation and destruction results in worse performance and leaks memory
on GCC 5.4
2017-06-07 14:53:14 -04:00
da7957c660 Fix masked_copy call to masked_scatter. (#1749) 2017-06-07 12:58:47 -04:00
2a49353d5e minor fix for docs of Upsample 2017-06-07 11:42:52 -04:00
b05c23de44 Merge commit 'da45b4c6b3b0b7cd8f0dc612b9afa6a3a07b8305' 2017-06-07 11:31:38 -04:00
019e967113 Merge commit '47bf87b9220c10edaafec98c6bd20bdb1436c8e4' 2017-06-07 11:30:35 -04:00
b9ab26765e Add 3D upsampling (nearest and trilinear) with tests 2017-06-07 11:29:27 -04:00
da45b4c6b3 Add 3D upsampling (nearest and trilinear) with tests 2017-06-07 11:24:41 -04:00
47bf87b922 Add 3D upsampling (nearest and trilinear) with tests 2017-06-07 11:24:05 -04:00
edd41d8d80 BatchNorm fallback to THNN when eps < CUDNN_BN_MIN_EPSILON (#1742) 2017-06-07 09:56:28 -04:00
352f8b2fa6 Merge commit 'ced01f6c919c4b7109512ce797a2a0185c8f8112' 2017-06-07 09:22:14 -04:00
ced01f6c91 fix GRUFused signature 2017-06-07 09:21:20 -04:00
d351239c10 fix legacy ClassNLLCriterion for upstream change 2017-06-07 00:38:00 -04:00
1b1579c89d Merge commit 'b96f76e470b25454b6b14c7ace888686295405e9' 2017-06-07 00:19:42 -04:00
df7c47142d fix for THNN NLLLoss signature change 2017-06-07 00:18:11 -04:00
b96f76e470 standalone macros 2017-06-07 00:17:05 -04:00
7e62971c86 Merge commit '71ccedbc6c4e460d38c794737bba780e7673e888' 2017-06-06 23:38:52 -04:00
a7d987544d Merge commit '4e49aed5eaa5a4abaf0a51bb87a49b44394ea3c3' 2017-06-06 23:35:42 -04:00
4e49aed5ea fix outputHeight <-> outputWidth 2017-06-06 23:33:51 -04:00
71ccedbc6c Merge pull request #470 from qqning/master
Fix the mix-up of height and width on depth-wise convolution
2017-06-06 23:31:54 -04:00
c3cda260b6 Merge commit '64faf120acb97866dfd90bf428b385deee4ee912' 2017-06-06 23:27:45 -04:00
22949350b6 More performant fix for fused rnn kernels (#1532) and bugfix (#1721) 2017-06-06 23:25:31 -04:00
3f7b48ccda Remove clone in fused rnn 2017-06-06 23:20:14 -04:00
db620304b2 More performant fix for fused rnn kernels (#1532) and bugfix for #1721 2017-06-06 23:13:07 -04:00
d7db75c10f added CosineSimilarity to nn.distance and updated docs (#1672)
* added CosineSimilarity to nn.distance and updated docs
2017-06-06 22:53:21 -04:00
e50d599240 Fix header inclusion in math.h
Summary:
While debugging #43 I found common/common.h missing some headers as well.

Fixes #43.
Closes https://github.com/facebookincubator/gloo/pull/44

Differential Revision: D5194970

Pulled By: pietern

fbshipit-source-id: 4861cd04c56931d4759f5bc050816788252003ee
2017-06-06 15:21:08 -07:00
c6a6391c38 added checks to cudnn Convolution for stride, dilation, kernel size and num input planes (#1723)
* added checks to cudnn Convolution for stride, dilation, kernel size and num input planes
2017-06-06 15:42:00 -04:00
d50ad408fa fix incorrect grad_weight in Bilinear 2017-06-06 15:07:09 -04:00
73ccdb3920 Fixing the issue with incorrect normalized values in IndexLinear 2017-06-06 11:44:11 -07:00
b6c75c43c8 add tests for checking the type of .data and .grad.data is the same 2017-06-06 01:06:14 -04:00
a53cde09b5 Rename masked_copy_ to masked_scatter_ 2017-06-06 01:06:14 -04:00
98afdcf409 Accept None values returned from grad hooks 2017-06-06 01:06:14 -04:00
ef32e96447 Fix grad type of compare functions 2017-06-06 01:06:14 -04:00
b032b88f34 Fix Prod backward and autograd tests 2017-06-06 01:06:14 -04:00
a76098ac15 fix optimizer when given single parameters (instead of an iterable)
When I use the named_parametes to modify the lr and weight decay, I will face a bug. Because the value of the named_parameters return is  torch.nn.paramter.Parameter, not a generator of the Parameter.
2017-06-05 23:47:56 -04:00
2ce5875a4d Modify the sample code of extending autograd (#1720)
The original input can not be used as input of Linear(), because forward() takes at least 3 arguments (2 given)
2017-06-05 23:36:58 -04:00
511cb20e7d Add Gesv to autograd (#1733)
* Add Gesv to autograd

* Add TODO for backprop through LU
2017-06-05 21:38:49 -04:00
e3305eb9dc Runtime dockerfile (#1732)
* reduce the size of Docker image

* add runtime dockerfile
2017-06-05 17:40:06 -04:00
e9bf702c5e LSTM bias_hh, fix docs
Rename W_hi ... to b_hi ...
2017-06-05 22:55:09 +02:00
9a2d11dd36 Use a longer timeout when establing initial tcp connection
Summary: Machines may not create their Gloo pairs at the same time, due to earlier variable time work. Increase the timeout used to establish the initial tcp connection to accommodate without sacrificing the shorter default timeout for outstanding reads/writes. No related change required for ibverbs as there is no communication on init.

Reviewed By: akyrola

Differential Revision: D5184518

fbshipit-source-id: 0e6c9704a2d2f1406b3927f75887f0a42199450b
2017-06-05 13:40:22 -07:00
3716286e6b reduce the size of Docker image (#1729) 2017-06-05 14:03:11 -04:00
c357ebd590 Merge commit '6422ea3d9f065683bb899b88ae0baec79e6d73ca' 2017-06-05 13:01:25 -04:00
85a95d8a23 Fix sharing of CUDA tensors on non-current devices
The correct device must be set when getting the base allocation and when
calling cudaIpcCloseMemHandle. Store the device in the allocators
context, which was previously always NULL.

Fixes #1707
2017-06-05 13:01:19 -04:00
6422ea3d9f Fix sharing of CUDA tensors on non-current devices 2017-06-05 12:58:34 -04:00
ddf6328990 Document type function returns type with no args (#1719) 2017-06-05 11:54:55 -04:00
174c3cc399 Add support for double backward of LeakyReLU (#1714) 2017-06-05 11:53:27 -04:00
24aecaa2c8 Cleanup torch vision docs (#1699)
* Modify torchvision documentation following https://github.com/pytorch/vision/pull/179

* Add new datasets to docs

* Fix wording in torch.datasets

* Small clarification
2017-06-05 11:52:41 -04:00
4853cc0194 convert linalg.py to new-style functions (#1638) 2017-06-04 09:27:01 -04:00
ac1c674723 Fix a couple of selection reduce function autograd bugs (#1702)
* Fix Median/Mode autograd functions.

* Fix kthvalue autograd function.

* Double backward for selection reduce functions.
2017-06-03 02:12:15 -04:00
eba3dc8561 Fix gc_refs assertion failure (#1705)
* Fix gc_refs assertion failure

Ensure that each THPVariable -> THPFunction reference contributes one
ref count to the THPFunction by creating a new shared_ptr for each ref.

Because multiple shared_ptrs can again manage a single THPFunction, it's
not safe to use std::weak_ptr where it may point to a PyFunction. It's
still safe to use weak_ptr for grad_accumulator since these are never
PyFunctions.

Fixes #1626

* Remove stale comment
2017-06-02 21:08:50 -04:00
ee9d4d58e2 Fix connect bug
Before the change, processes were not waiting for master even when they got
'connection refused' (master is not listening yet, so we should wait).
It was because we were closing socket twice: first, by
the resource guard; second, manually in exception handler.
That caused errno to be set to different value (9 - bad file descriptor)
and in result `if`, which checked if connection was refused, was failing.
2017-06-02 23:42:11 +02:00
b7c4900d19 Fix minor bug in InitMethodFile 2017-06-02 23:42:11 +02:00
e22f9036de Add tcp init method for non-multicast addresses 2017-06-02 23:42:11 +02:00
c01ff1f3dc Make world_size mandatory for Master and Worker; Minor refactor 2017-06-02 23:42:11 +02:00
eeb8e5c31b Linux fixes 2017-06-02 23:42:11 +02:00
c6c9e61169 Implement THD tensor copies 2017-06-02 23:42:11 +02:00
34804e9600 Refactor file and tcp init methods
* Add sanity checks
 * Refactor InitMethodFile and TCPInitMethod to more logical functions
 * Update few error messages
 * Add passing parameters by **kwargs, so now order of parameters is not relevant
 * Review comments
2017-06-02 23:42:11 +02:00
c41555fb0a Add rank parameter; Fix MW mode initalization 2017-06-02 23:42:11 +02:00
96cc1e1ac7 Review comments 2017-06-02 23:42:11 +02:00
cfdd49f76a Simplify and refactor init code 2017-06-02 23:42:11 +02:00
447d9287bf Refactor multicast and change env init method 2017-06-02 23:42:11 +02:00
832eaf900b Fix bugs and improve init methods 2017-06-02 23:42:11 +02:00
e685277299 Add address discovery; Bug fixes; 2017-06-02 23:42:11 +02:00
8ea7c87c29 Improve init methods 2017-06-02 23:42:11 +02:00
09c0d9c51c Add multiple initalization methods for DataChannels 2017-06-02 23:42:11 +02:00
240384605c Make copy functions thread safe (#82) 2017-06-02 23:42:11 +02:00
9f9a3d596f Use lock_guard and don't use unique_ptr 2017-06-02 23:42:11 +02:00
a8c26c1040 Add mutexes to MasterCommandChannel::sendMessage 2017-06-02 23:42:11 +02:00
6cdfe0d7b9 Remove MASTER_ADDR and _PORT from MPI benchmarking 2017-06-02 23:42:11 +02:00
1b66b50064 Benchmarks: Don't export WORLD_SIZE when using MPI
I just realized we don't need it (any longer?).
2017-06-02 23:42:11 +02:00
cf42c1a044 Improve error messages of DataChannel::newChannel 2017-06-02 23:42:11 +02:00
f717f29d7e Change function names; Change thpp::Tensor to THDTensorDescriptor 2017-06-02 23:42:11 +02:00
181d2f41bd Add initial Python wrappers for THDTensors 2017-06-02 23:42:11 +02:00
2059ece284 Exit workers gracefully in master-worker mode 2017-06-02 23:42:11 +02:00
b3e100b40e Add copy (TH <-> THD) functions to MW mode 2017-06-02 23:42:11 +02:00
ec2de16776 Improve README copyediting 2017-06-02 21:02:14 +02:00
ea05d6aec3 Fix compilation with cuDNN 5 (#1703) 2017-06-02 14:03:02 -04:00
5a93d6b903 Fix CUDA_HOME detection (#1675) 2017-06-02 19:26:00 +02:00
75e0df271a Add Inverse to autograd (#1670)
* Add Inverse to autograd

* Add SkipTest to autograd tests
2017-06-02 12:00:13 -04:00
565bf7116b A pile of misc doc fixes. (#1682)
* A pile of misc doc fixes.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Handle @apaszke  review comments.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Initial csrc documentation.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-06-02 11:59:03 -04:00
f1c57ace1b added input dim checks to convxD and conv_transposedxd (#1695)
* add input dim check for conv2d

* add None check to conv2d

* added input dim checks to convxD and conv_transposedxd

* flake8 fixes
2017-06-02 11:58:19 -04:00
460b8715a8 display version number in docs 2017-06-02 11:56:48 -04:00
6da111c53d Merge commit '00843c57c936720b3d17f4c0afaab08dcb52a7cc' 2017-06-02 11:52:19 -04:00
568c5c91ee substitute cudnnFind* functions with cudnnFind*Ex 2017-06-02 11:52:12 -04:00
00843c57c9 substitute cudnnFind* functions with cudnnFind*Ex 2017-06-02 11:50:50 -04:00
501467db17 added param name to tuple_parser for better error messages 2017-06-02 16:16:21 +02:00
d51cd61e2e add checks for input, weight and bias types when using cudnn conv2d (#1689) 2017-06-01 10:06:30 -04:00
447fe953e5 Modify the sample code of volatile (#1694)
The original two inputs (torch.randn(5,5)) can not be used as input of resnet, which must be (batch, channels, width, height)
2017-06-01 09:46:04 -04:00
7b5af7d1b7 Expand ibverbs read timeout messages
Summary: TSIA

Reviewed By: romain-intel

Differential Revision: D5158642

fbshipit-source-id: 6e55a69a140c1f5f6e4ce6262afaf5014c412414
2017-05-31 19:50:21 -07:00
afc26ac675 Added time-out to ibverbs transport
Summary: Extended the time-out option from just working on TCP to also working with ibverbs

Reviewed By: pietern

Differential Revision: D5090258

fbshipit-source-id: fee685850d761d0c2130852f513c64ceb19f4e9e
2017-05-31 11:20:40 -07:00
6f791e74f1 Add a minimum iteration count of 1 for benchmarks
Summary:
For some long running benchmarks, the iteration count could be 0
which would lead to a segfault when printing results

Reviewed By: pietern

Differential Revision: D5149034

fbshipit-source-id: 7b56e8961c302d1ff11ffcd74ca8e909ea046231
2017-05-30 18:12:39 -07:00
3106423713 Synchronize with H2D copyAsync before signalling the broadcast sender
Summary: Closes https://github.com/facebookincubator/gloo/pull/41

Differential Revision: D5149996

Pulled By: pietern

fbshipit-source-id: 15d61fab9babfeb1e4178b84ecf5f6e32ad3bfb3
2017-05-30 14:20:29 -07:00
4eb448a051 Fix simple typo
Dimension a bit wrong
2017-05-28 18:53:04 +02:00
065c59860a Fix docs: masked_fill_ takes a value, not a tensor. (#1663) 2017-05-26 14:41:03 -04:00
45f665d05c Fix decodeUInt64BE
Fixes #1658
2017-05-26 11:21:31 -07:00
64faf120ac Adding support for ADD_TORCH_LIBRARY macro 2017-05-25 15:41:52 -07:00
0b74f0d796 lua 5.3 changes and gcc constants 2017-05-25 15:41:52 -07:00
8074180081 Faulty error message for InstanceNorm1d (#1609) 2017-05-25 17:13:01 -04:00
5ce4a4adbf Merge commit '3f1f3f97343d2ab7eb522cac7330f6b7478bd4da' 2017-05-25 16:51:57 -04:00
3e9caed731 Merge commit 'bd705d38ce11a0ca1547f709f29f80a02b3dd894' 2017-05-25 16:51:09 -04:00
7b578dd68e Add scatterAdd 2017-05-25 16:49:48 -04:00
3f1f3f9734 Add scatterAdd 2017-05-25 16:49:32 -04:00
bd705d38ce Add scatterAdd 2017-05-25 16:49:22 -04:00
630af4d7d8 add learning rate schedulers (#1370) 2017-05-25 16:21:43 -04:00
0409b42a02 Merge commit '3abe5c80d2073f0e72f79b88f11b2a9d320fb116' 2017-05-25 15:40:27 -04:00
c39d48ea7d Fast transposed copy 2017-05-25 15:39:21 -04:00
3abe5c80d2 Fast transposed copy 2017-05-25 15:39:07 -04:00
05bc877a05 make THPPointer have explicit constructors (#1636) 2017-05-25 15:35:54 -04:00
7ea9d9af4e Fix build when included by another project; take 2
Summary:
Only adding `include_directories` doesn't propagate to the including
targets. Also use `target_include_directories` to do so.
Closes https://github.com/facebookincubator/gloo/pull/39

Differential Revision: D5131001

Pulled By: pietern

fbshipit-source-id: 6c58c4b76ae7fa008e4fb26d1bca7900165884d0
2017-05-25 11:50:23 -07:00
6a7c56499c How to manage multiple build trees of PyTorch. (#1654)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-25 11:21:52 -04:00
46ee1e4687 Clarify definition of gather function in docs. (#1652) 2017-05-25 11:06:28 -04:00
e63b49d9ab Fix build when included by another project
Summary:
The CMake variable CMAKE_BINARY_DIR points to the top level build
directory. For standalone Gloo builds this path lets files include the
generated file "gloo/config.h". When Gloo is included as project, this
variable points to a different path and "gloo/config.h" cannot be
resolved. Fix is to build a path from CMAKE_CURRENT_BINARY_DIR.
Closes https://github.com/facebookincubator/gloo/pull/38

Differential Revision: D5129385

Pulled By: pietern

fbshipit-source-id: 722cebf4892b34f869fe43320153efbb181555b6
2017-05-25 07:50:53 -07:00
036c3f93af Check for released variables in SavedVariable::unpack() (#1648)
Fixes #1288
2017-05-25 00:35:19 -04:00
4f261f5730 Add support for fast float16 reductions using AVX
Summary: Using Misha's vectorized AVX code to greatly improve performance of reductions on float16 values. Float16 reductions are now 2x faster than float.

Reviewed By: pietern

Differential Revision: D5123331

fbshipit-source-id: 03d4e76886d538b7e24eedaf32a92231a80b1e43
2017-05-24 21:20:06 -07:00
98581b9f7e Fix conv1d segfault when weight doesn't require grad (#1646)
Fixes #1600
2017-05-24 20:46:32 -04:00
9a497f824b Add size/dimensionality documentation for torch.gather. (#1645) 2017-05-24 20:42:18 -04:00
1e63a04a18 Use clear-to-send notification for broadcast algorithms
Summary:
The broadcast algorithms use the buffers they were given directly.
There is no inbox/outbox pattern. This means that we can race if the
algorithm is run repeatedly within a short time frame. This hasn't
been an issue so far since we've only used it in combination with
other process wide barriers.

Since this adds a round trip the latency of these ops from the root
rank perspective increases. The variance between the before and after
runs is pretty high since there is no back and forth interaction on
the root. It simply waits for recipients to be ready and then sends
its data.

Before:

```
Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   broadcast_one_to_all
Options:     processes=4, inputs=1

   elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
        100          1         16         29         50     426075
        200          2         17         32         50     179953
        500          2         11         31         59     140291
       1000          2         12         29         59     177619
       2000          3         12         29         62     117882
       5000          5         16         31         64     127113
      10000          9         21         38         88      60328
      20000         19         36         65        130      30427
      50000         48         68        221        556      11180
     100000         92        136        426        871       7314
     200000        193        251        829       2965       4092
     500000        492        638       2098       4133       1677
    1000000       1195       2024       3513      11646        628
    2000000       3446       4216       5007      17100        282
    5000000      12956      13919      14941      37751         71

```

After:

```
Device:      tcp, pci=0000:25:00.0, iface=eth0, speed=50000
Algorithm:   broadcast_one_to_all
Options:     processes=4, inputs=1

   elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
        100         15         37         52        107      27332
        200         14         40         63        199      28620
        500         17         37         52        118      18299
       1000          9         39         57        120      33375
       2000         20         57         78        180      24779
       5000         31         61         84        190      18039
      10000         39         70         90        225       8908
      20000         57        108        130        940       8313
      50000         94        163        217       1933       5326
     100000        132        231        331       3501       3681
     200000        256        426        560       6509       2272
     500000        774       1092       1698      10039        985
    1000000       1132       2106       3878      18218        484
    2000000       3509       4252       6832      20228        226
    5000000      11326      15447      27129      52694         77
```

Reviewed By: wesolwsk

Differential Revision: D5123341

fbshipit-source-id: f3bab4f75ef7c38817f74f00b382f18fe43d85d5
2017-05-24 15:36:36 -07:00
e54112758c Fix potential vector out of range issue in ContextFactory::makeContext
Summary: Vector out-of-range error was being triggered in some tests due to trying to get the address of an element past the end of vector.

Reviewed By: pietern

Differential Revision: D5123044

fbshipit-source-id: 004f72ebaa27c609290959c12a3d99b16289bfa8
2017-05-24 14:50:09 -07:00
e1d257bc6d Fix segfault in autograd: (#1644)
* Fix segfault in autograd:

1) Every "output" variable must have a grad_fn or grad_accumulator
2) compute_partial_exec_callbacks uses Python errors

* assertRaisesRegexp was renamed assertRaisesRegex in 3.2

* Use HANDLE_TH_ERRORS macro
2017-05-24 17:13:08 -04:00
3d38e4f126 Acquire GIL before THPVariable_wrap (#1625)
* Acquire GIL before THPVariable_wrap.

* mutex not required when GIL is held.

* Remove unused mutex.
2017-05-24 15:19:34 -04:00
fa93653d09 Improve handling of graph roots in autograd engine (#1635) 2017-05-24 14:50:07 -04:00
ff047fdeef Fix the mix-up of height and width on depth-wise convolution 2017-05-24 21:05:08 +08:00
2486a6bbd0 Add missing header file types.h in CMakeLists.txt
Summary: A recently added header file was missing in CMakeLists.txt

Reviewed By: pietern

Differential Revision: D5116962

fbshipit-source-id: 6c3fbd4b49c913f20308c1b057a7e09806e0c2b0
2017-05-23 16:50:41 -07:00
640846b864 Fix race in ibverbs transport
Summary:
In a previous commit where the slot numbering was expanded, I changed
the memory region send/recv path to use a map for the outgoing memory
regions (since they may complete out of order). Before, this was a
fixed size array, which was mutated by both the user thread and device
thread without holding a lock. The map, however, can't be mutated
without a lock. This change adds that lock and a few assertions to
check for this type of problem.

Reviewed By: andrewwdye

Differential Revision: D5108194

fbshipit-source-id: 1908c988112469ecdec6cb6eb9849068d896c409
2017-05-23 15:38:48 -07:00
ba56de1150 add coding UTF-8 declaration 2017-05-23 16:02:34 -04:00
6e3e453ad2 Tidy up convs docs (#1602) 2017-05-23 18:32:33 +02:00
f5d919a685 Generate config.h file with compilation options
Summary:
This file can then be used by downstream code to figure out what Gloo
features it can support (e.g. ibverbs transport or not).
Closes https://github.com/facebookincubator/gloo/pull/36

Differential Revision: D5110769

Pulled By: pietern

fbshipit-source-id: 2c0c07537258048737ae764a4978f2f7fdbd992d
2017-05-23 09:26:03 -07:00
02e4ca9cab fix wrapper 2017-05-23 08:43:13 -07:00
70a774898e Remove superfluous forward declaration
Summary: ContextFactory is no longer mentioned in gloo/context.h.

Reviewed By: romain-intel

Differential Revision: D5110328

fbshipit-source-id: 48dd020dc39d71d0d5f72deebfa5d80122b70c0d
2017-05-23 08:20:55 -07:00
49befe3fcd Remove commPairs_ member variable from halving/doubling
Summary: TSIA

Reviewed By: wesolwsk

Differential Revision: D5110348

fbshipit-source-id: d3346e2af1a9f13410dc93336c53040a29e22e66
2017-05-22 21:21:42 -07:00
7eac2073b8 Add notification mechanism to ContextFactory
Summary:
This is another example where our unsolicited writes may interfere
across calls to the collective function. In this case, it was possible
for a second call to overwrite a pair's address before it had been
used to connect the pair in the previous iteration.

Thinking out loud, we could avoid this from happening by supporting
this pattern natively in the Buffer classes. For example, we can add a
notification mechanism (opt in) to the Buffer class such that the
receiver may call `ackRecv()` to acknowledge receipt and handling of
the data in the buffer. Then the sender will block on new sends until
acknowledgement from the previous send has been received. Until then,
we have to keep an extra eye out.

Reviewed By: wesolwsk, romain-intel

Differential Revision: D5095430

fbshipit-source-id: 4c100433108fccea7457bba4dc00f651f722e6c9
2017-05-22 19:50:18 -07:00
45524ec33c Fix indices bug in MM.py (#1613) (#1617) 2017-05-22 16:47:51 -04:00
f072c74dfd make it effective to transfer a tensor from other devices to device 0 (#1610) 2017-05-22 11:06:57 -04:00
107a0fe9ac Revert "Revert "ClassNLLCriterion supports missing targets"" 2017-05-21 13:48:19 -04:00
2acfb2376a fixes eval mode in InstanceNorm (#1604)
fixes https://github.com/pytorch/pytorch/issues/1541
2017-05-21 13:27:48 -04:00
0c5598c668 Update build status matrix 2017-05-21 12:20:50 +02:00
feaee29bfe Add argmax and argmin to docs 2017-05-20 18:56:20 +02:00
7f6cd7c7ea Fix error message in CUDA forked subprocess (#1585)
We need to re-call _lazy_init in _CudaBase.__new__ in the subprocess.
2017-05-19 12:36:08 -04:00
625850c2c2 Check cuDNN version at runtime (#1586)
* Check cuDNN version at runtime

This checks that the version from cudnn.h matches the version from
libcudnn.so.

Fixes #1476

* Only check major and minor version numbers
2017-05-19 01:55:09 -04:00
9b3447761a Check for required non-None arguments in C++ autograd functions (#1589) 2017-05-19 01:47:35 -04:00
ed679fc43c disabling fd leakchecker test (#1593) 2017-05-19 01:20:50 -04:00
e6c9509a41 Fix call to Tensor.set_ in rnn.py (#1592) 2017-05-18 20:28:49 -04:00
c57f0530e7 let long_args False for param "size" of set_ (#1568)
* fix #1524, let long_args False for param "size" of set_
2017-05-18 19:31:36 -04:00
8021bb938c Remove slot number limitation from ibverbs transport
Summary:
The pair was still hardcoding limits on the slot numbers. In this
change those limits are lifted.

This also adds back assertions on work completion status in
handleCompletion.

Reviewed By: wesolwsk

Differential Revision: D5090457

fbshipit-source-id: 7bf884e1f31e48e8f1cdfb179a225999e28171b2
2017-05-18 16:20:40 -07:00
1f4317be3f Add support for half-precision floating point operations
Summary: Add support for collectives over vectors of half-precision floating point values.

Reviewed By: pietern

Differential Revision: D5062938

fbshipit-source-id: 0b39fa53370393fec1edf2d852ff7f1d862b9022
2017-05-18 15:09:06 -07:00
cba46a4869 Assert that we don't do out of bound writes on recv
Summary:
The halving/doubling algorithm had two instances where a receive
buffer was registered with a number of elements instead of a number of
bytes. This change adds the assertion that should have caught this in
the first place.

Reviewed By: wesolwsk

Differential Revision: D5089483

fbshipit-source-id: fd0f0724ef04300236c9297ee88b27e61fb1e5a0
2017-05-18 14:34:39 -07:00
b391f53681 Cache send/recv buffers in ContextFactory
Summary:
The original implementation created temporary buffers on the backing
context. This also meant an ordering problem when using the ibverbs
transport, as a call to send will block until the remote side has
created its receive side buffer. Since all buffers are now created
prior to using them, this is no longer an issue.

Reviewed By: romain-intel

Differential Revision: D5082352

fbshipit-source-id: 4c260f06e8f461c0336e7eec7ca891e07ff41cd3
2017-05-18 10:20:42 -07:00
85732b52ec fix cuda multiple algorithm test
Summary: Fixing a bug in the multiple algorithm test where threads were spawned repeatedly, causing collisions during rendezvous.

Reviewed By: pietern

Differential Revision: D5082945

fbshipit-source-id: 4adbbc963b1ff652f73a44cd9fd75dcd3325f182
2017-05-17 16:35:25 -07:00
156fe28666 dataloader can now handle growing datasets (#1575) 2017-05-17 19:23:15 -04:00
2f4bf4ab39 Rewrite 'How autograd encodes the history' to accurately describe current setup. (#1580)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-17 19:21:20 -04:00
1f3ff5ced2 Miscellaneous documentation around autograd. (#1577)
* Miscellaneous documentation around autograd.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-17 19:19:24 -04:00
b8b7f879c2 .gitignore updated with editor temporaries (#1574) 2017-05-17 19:16:02 -04:00
7b10b16496 Move ibverbs buffer send logic to pair.cc
Summary:
TSIA

This matches the approach in the TCP transport where all send/recv
logic is contained in the pair code.

Reviewed By: wesolwsk

Differential Revision: D5082503

fbshipit-source-id: b70886ed9aaeb381cdb45fba00704118cff62a23
2017-05-17 15:54:34 -07:00
da86633c7c Additional synchronization in halving/doubling
Summary:
This is necessary to avoid the next iteration of the algorithm
overwriting data in recvBuf_ before it has been consumed by the
receiver of that data. If this does happen, the result of the previous
iteration for the receiving end is corrupted. This can only happen in
async mode on the TCP transport (so all incoming data is unsolicited)
when spinning on the run function.

Reviewed By: wesolwsk

Differential Revision: D5074789

fbshipit-source-id: 66668fbd885888f26266d812e78d61c6d65c2461
2017-05-17 15:21:09 -07:00
c573d53939 Bug fixes (#1573)
* Fix clang warnings
* Raise errors when unsupported ConvNd configurations are used
* Properly handle Variable indexing with LongTensors
* Support both tensors and variables in Variable.type_as
2017-05-17 15:28:16 -04:00
cb79c24d0b Added powerpc64le support (#1572) 2017-05-16 08:30:06 -06:00
caa1cdf0ce ClassNLLCriterion ignoreIndex 2017-05-15 22:27:00 -04:00
368ecb47f9 Fix flaxy test_sparse_adagrad (#1562) 2017-05-16 01:03:08 +02:00
6107d15d14 Twice differentiability of pointwise functions (#1531) 2017-05-15 12:00:59 -06:00
ba885a1a51 expose bitwise operators from C/CUDA (#1556)
* fix issue #1549, expose bitwise and

* expose C bitwise or of Tensor

* expose C bitwise xor of Tensor

* use built-in method for inplace and, or, xor

* expose C bitwise lshift(ilshift) and rshift(irshift) of Tensor
2017-05-15 11:36:15 -06:00
ce1a0eb6c9 Merge commit '7afd78d77ffad503357c35f495ae6d4d2b008862' 2017-05-15 11:20:27 -06:00
7afd78d77f Cuda reduce in a consistent direction 2017-05-15 11:18:20 -06:00
6b84dc26f0 Add F.cosine_similarity (#1502) 2017-05-15 11:12:54 -06:00
0f458ee3c4 Fix memory leak in THCSTensor_spcadd. (#1519)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-15 11:11:03 -06:00
8aa011f52a minor typo and style changes to _torch_docs.py (#1559) 2017-05-15 15:32:56 +02:00
2a610c9d13 Revert "Update to ignore zero targets" 2017-05-14 18:15:30 -07:00
ac8b2c0fa3 Revert "ClassNLLCriterion supports missing targets" 2017-05-14 18:14:36 -07:00
0ba20435ce Add high order grad support for Some operator (#1507) 2017-05-14 23:02:04 +02:00
6fc9130052 Adapt documentation to reflect new supported argument (#1548)
Reflect the changes of #1323
2017-05-13 21:09:34 -06:00
28f4f6db2c typo error for torch.addr (#1547)
fix the typo error in the example for torch.addr
2017-05-13 08:53:05 -07:00
9b2de027be SpatialDepthWiseConvolution.cu added 2017-05-12 16:02:14 -04:00
bf4345e2ef ClassNLLCriterion supports missing targets 2017-05-12 15:15:39 -04:00
029290c5b1 SpatialDepthWiseConvolution 2017-05-12 11:34:27 -04:00
78abf0134d Merge pull request #458 from jnhwkim/master
Update to ignore zero targets
2017-05-12 10:38:18 -04:00
9db7787316 Updating __getitem__ and __len__ for containers (#1544) 2017-05-12 16:17:06 +02:00
efa913b1c2 fix uninitialized variable in cmake FindSSE (#1023) 2017-05-11 18:57:34 -07:00
d1a4467682 fix a bug when calling modules
a module that returns a non-standard data structure currently breaks
due to checks for backwards hooks. This refactors the code slightly so
this will only break in the event of backwards hooks.
2017-05-11 23:00:45 +02:00
507ddc4cde Temporary fix for multiple backwards with fused pointwise RNN (#1540) 2017-05-11 11:18:56 -07:00
aba05ce9db Ensuring float tensors call float versions of math functions 2017-05-11 10:39:35 -07:00
be843eb26b Add unfold to autograd (#1523) 2017-05-11 17:53:16 +02:00
5bb13485b8 Fix Linear function 2017-05-10 16:43:14 +02:00
a86adf43a1 Fix comparison functions 2017-05-10 16:43:14 +02:00
1c304a9ef6 Expose variable attribute of AccumulateGrad 2017-05-10 16:43:14 +02:00
feef54ec34 Don't modify non-volatile grads in zero_grad 2017-05-10 16:43:14 +02:00
5026209d0c Minor fix in Prod backward 2017-05-10 16:43:14 +02:00
e7220380bc Add new flags to Variable.backward 2017-05-10 16:43:14 +02:00
9fa0e403d6 Replace retain_variables with retain_graph 2017-05-10 16:43:14 +02:00
35cf380ed1 Improve output wrapping logic in autograd 2017-05-10 16:43:14 +02:00
3a7e068439 Remove spurious memo argument in Module.parameters() (#1527) 2017-05-10 13:55:15 +02:00
862105ec8b Merge commit 'd5e821044aa20d67122f4570a3f1cb7e6e9c2617' 2017-05-09 17:06:25 -07:00
d5e821044a Make torch.cat not synchronize the host and device 2017-05-09 17:05:23 -07:00
bfc8a3ebba Reference counting documentation. (#1520)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-09 17:02:28 -07:00
6fab62173e Restore examples with keepdim=True default. 2017-05-09 14:49:55 -07:00
c4742fd128 Explicitly pass keepdim=False for tests that require it.
If we change the default to False, reverting this commit is optional.
2017-05-09 14:49:44 -07:00
e124790cb2 Change keepdim default to False. 2017-05-09 14:49:21 -07:00
171638a451 Fix test_normalize NN test. 2017-05-09 14:25:06 -07:00
d95f711501 Add a keepdim test to torch_test. 2017-05-09 14:25:01 -07:00
b9e00dfbb8 Make (non-legacy) nn backwards compatible.
The keepdim change only seems to leak in one place:
when the grad_bias is returned in linear.py.
2017-05-09 14:24:53 -07:00
f6a00fac13 Add autograd tests for keepdim 2017-05-09 14:24:45 -07:00
be5191a00b Add documentation for keepdim. 2017-05-09 14:16:42 -07:00
c9d8e0a43a Change all legacy/nn modules to use keepdim=True (even if tests don't fail).
We shouldn't be introducing changes in legacy modules if we can avoid it.
2017-05-09 14:16:31 -07:00
ae2b2cbbec Make keepdim work with autograd. 2017-05-09 14:15:59 -07:00
f4cf1d6d18 Merge commit 'af790f86f329364dacef1301fc9b5b292629075c' 2017-05-09 14:04:08 -07:00
c34cff7035 Merge commit '906c550e1079e9762194db59440a202ffca90dca' 2017-05-09 14:03:28 -07:00
194d7408bb Merge commit '5f308b50fb558a620253443ef45f7cf3a91be410' 2017-05-09 14:02:25 -07:00
0d538246fb Merge commit '98dbdc464b0f53ecc89af58cc994c7e8d7617e4e' 2017-05-09 14:01:13 -07:00
7c3cb24485 Add a keepdim parameter for reduction functions over a single dimension.
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
2017-05-09 14:01:03 -07:00
af790f86f3 Add a keepdim parameter for reduction functions over a single dimension.
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
2017-05-09 11:55:42 -07:00
906c550e10 Add a keepdim parameter for reduction functions over a single dimension.
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
2017-05-09 11:55:29 -07:00
5f308b50fb Add a keepdim parameter for reduction functions over a single dimension.
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
2017-05-09 11:55:20 -07:00
98dbdc464b Add a keepdim parameter for reduction functions over a single dimension.
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
2017-05-09 11:54:58 -07:00
e70164316c Merge commit '91a118c116d15d280a99a39666d298be15c6d592' 2017-05-08 16:58:56 -07:00
33b3968660 add larger tests for qr 2017-05-08 16:58:54 -07:00
91a118c116 Fix bug in magma qr decomposition and add tests for larger matrices 2017-05-08 16:44:15 -07:00
0764589ed1 Merge commit '008a8c9720183d7bf8b00bf64d8d21c62270089f' 2017-05-08 16:24:14 -07:00
27671c800d Merge commit '105df5844dca21f964d180a918c808489862941f' 2017-05-08 16:23:12 -07:00
d0504aa41d Implement lgamma function. 2017-05-08 16:21:26 -07:00
008a8c9720 Implement lgamma function. 2017-05-08 16:20:52 -07:00
105df5844d Implement lgamma function. 2017-05-08 16:20:39 -07:00
50bf7d5cbc Merge commit '066fbcd014fa4092152b2cd04ad1d92fc8d7bd59' 2017-05-08 16:13:57 -07:00
066fbcd014 use current stream in cat array kernel launch 2017-05-08 16:12:10 -07:00
ecf29f10ad Merge commit '22bbd7ac33ba51469cc913cb01fcd3b70a42e528' 2017-05-08 16:10:00 -07:00
22bbd7ac33 s/IndexType/long 2017-05-08 16:09:02 -07:00
2075abbe30 Gloo: Added a way to create connected contexts from another context
Summary:
Added a context factory that allows you to use an existing context to
create other fully connected contexts much more cheaply (without having
to rely on a store).

Limitations:
  - The backing context needs to be fully connected

Reviewed By: andrewwdye, pietern

Differential Revision: D4985121

fbshipit-source-id: 31ceabccbb679cedb18ec9927b6c166bef5989bb
2017-05-08 16:02:04 -07:00
e694db0eeb Raise error when Variable is converted to bool. Fixes #1482. (#1491) 2017-05-08 23:14:11 +02:00
c5ae79fe4e Make clamp twice differentiable (#1514) 2017-05-08 23:12:42 +02:00
4ad2e155bc Make nn.Sequential more pythonic (#1510)
A minor fix which uses `enumerate` during iteration.
2017-05-08 07:32:07 -07:00
6d693fe413 Add F.normalize (#1467) 2017-05-07 13:54:16 +02:00
23b556ef77 Expose custom attributes from C++ functions (#1430) 2017-05-07 13:49:55 +02:00
e3f41a4962 Add high order gradient support for Sigmoid (#1496) 2017-05-07 13:00:20 +02:00
90e9f8a476 Avoid segfault when calling join_with with self as arg (#1493) 2017-05-07 00:35:11 +02:00
5f15a9e0cb Add a note about THPFunction_asFunction 2017-05-06 14:28:32 -07:00
ff0ff33a11 Fix docs for InstanceNorm (#1477) 2017-05-04 18:11:15 -04:00
eb2c6ea874 set deviceId_ to -1 when CudaDevicePointer and CudaStream do not have valid data
Summary: Set deviceId_ to -1 when CudaDevicePointer and CudaStream do not have valid data

Reviewed By: andrewwdye

Differential Revision: D4881374

fbshipit-source-id: e973a70e2e6e4519f5fdc2ad4e76f232d9593751
2017-05-04 15:05:27 -07:00
e64b2e1cd7 add documentation for cwrap plugins (#1474) 2017-05-04 17:50:58 -04:00
7d40140bfb Document squeeze behavior on 1-dimensional tensors of size 1. (#1470) 2017-05-04 16:54:22 +02:00
e50c7daaf9 Use Qr factorization to get orthogonal matrix in orthogonal init (#1453) 2017-05-04 07:11:59 -04:00
600f366a13 Merge commit 'a6876a4783ce3d1bb3c6ba69f54c31983097ed17' 2017-05-04 06:51:10 -04:00
a6876a4783 fix corner-case in MaxPooling 2017-05-04 06:50:15 -04:00
4e18d89791 added twice differentiation for a bunch of ops (#1426) 2017-05-04 06:47:14 -04:00
de9845588d Merge commit 'c061ed5bda238e1276601593343c10428d01eaae' 2017-05-03 23:14:26 -04:00
c061ed5bda handle beta=0 for gemv with transpose 2017-05-03 23:05:41 -04:00
e9d648c5e7 Fix memory leak introduced by 72e8190 (#1464) 2017-05-03 18:38:56 -04:00
80c0a8776b Fix #1447: sparse_mask doesn't make sense with uncoalesced tensors (#1458)
* Make sparseMask error if mask is uncoalesced.

Fixes #1447.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add test for sparse adagrad.

Previously, the sparse codepath was not exercised at all; this commit
adds a very simple test case "sparse Rosenbrock"; the idea is to do
Rosenbrock but then knock out one of the dimensions so that the
tensor is sparse.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 17:53:45 -04:00
4ec0435b39 Report overall size of sparse tensors. (#1461)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 17:51:56 -04:00
f8be3a20d3 Fix scatter_ documentation typo. (#1463) 2017-05-03 17:31:04 -04:00
7b21b0b6d7 Retry on write EINTR in sync mode
Summary:
We weren't handling an edge case where write(2) would return EINTR
when in sync mode. The Pair::write function would return false
indicating it didn't complete the write whereas the send function
expects it to complete when in sync mode. With this change we now
advance the cursor and retry the write when fewer than expected bytes
were written.

Also see https://github.com/facebookincubator/gloo/issues/34

Reviewed By: andrewwdye

Differential Revision: D4996949

fbshipit-source-id: 3bad4fa3d0a01517f20b64904aa71410641fa60f
2017-05-03 14:26:26 -07:00
0910e0ac90 Fix memory leak in coalesce. (#1460)
Fixes #1449.

For future reference, we should have a doc explaining our ref-counting
conventions; it looks like this bug slipped by because we assumed that
newTensor was taking ownership of the pointers it was passed in.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 13:29:39 -04:00
93094294ba function backward attempted to multiply tuple by variables (#1459)
One line fix--changed it to multiple the grad_variables by the
len(variables) when grad_variables is None.
2017-05-03 13:12:21 -04:00
743e4894d2 Prefix values/indices/sparse_mask/nnz with underscore (#1457)
As discussed in #1441.

I also added some docs giving clear guidance about how to coalescing
in sparse tensors.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 11:14:10 -04:00
f273377d19 add device asserts in scatter/gather kernels 2017-05-03 11:12:26 -04:00
836332e0a1 Merge commit 'f1591fade5c8df5272b79ab1bd8b0b261bb5606a' 2017-05-03 11:11:43 -04:00
f1591fade5 add device asserts in scatter/gather kernels 2017-05-03 11:10:31 -04:00
2e7635b929 Add flexible bilinear upsampling aspect ratio redux (#1317) 2017-05-03 08:46:28 -04:00
e9953c4595 A number of post-merge fixes for test_sparse (#1444)
* Simplify _gen_sparse

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Randomly generate an uncoalesced tensor and test with it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Simpler implementation of cpu_only suggested by @apaszke

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Better implementation of randn, suggested by @soumith

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Lint fix.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix CUDA type error.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 08:43:03 -04:00
72e8190994 Use at most one shared_ptr block at a time to manage THPFunctions (#1454)
* Fix failing ln in build_all.sh

* Use at most one shared_ptr block at a time to manage THPFunctions
2017-05-03 08:15:36 -04:00
e1278d4ee2 Fix typo in autograd docs 2017-05-03 03:11:55 -07:00
66bd200de0 bug fix - add previous slot offset to calculated slot value in halving-doubling algorithms
Summary: Previous slot offset was not added to the calculated value for the slot to be used in halving-doubling algorithms. If multiple instances were running, slot values could collide.

Reviewed By: pietern

Differential Revision: D4986618

fbshipit-source-id: 56b9220c91f31cc016d37e82907221460de70657
2017-05-02 16:19:55 -07:00
574cfe3cf3 Improve kthvalue documentation. (#1448)
1) Fix "kth" attr specification -- I can't get sphinx to generate `k`th,
but `k` th works with a space, unlike now where the highlighting continues
until the next attr.
2) Specify the size of the return tensors.
3) Add an example of the return tensor sizes with more than 1 dimension.
2017-05-02 17:22:02 -04:00
699755e04f Convert contiguous() call in adagrad to out-of-place coalesce. (#1446)
We missed this one in f2903332c7dce1fbb7d7d9f18dcfba8e853581df!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-02 16:51:54 -04:00
fb07914c0c Recommendations for workflow when modifying C files. (#1443)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-02 15:46:45 -04:00
aa2ee86375 pytorch/thpp ~= facebook/thpp (#1445) 2017-05-02 15:46:10 -04:00
ecd51f8510 docs fixes 2017-05-02 15:42:33 -04:00
5aa1f769d3 Fix torch.dist documentation: function returns a float. (#1440) 2017-05-02 14:38:48 -04:00
eecc807a75 Keep track of number of in-flight send operations
Summary:
This helps guard against programming errors where waitSend is called
before send is called. It uses a std::atomic to keep overhead low.

Reviewed By: andrewwdye

Differential Revision: D4984604

fbshipit-source-id: 04a63b1ba088e3bcba0abff40771af666deb15e5
2017-05-02 09:35:46 -07:00
5386012164 Check return value of ibv_reg_mr for error
Summary:
This returns EFAULT when passing a GPU memory pointer (for GPUDirect)
and the ibverbs driver can't map the GPUs memory. Since the error is
pretty cryptic, crash with a more useful message.

```
terminate called after throwing an instance of 'gloo::EnforceNotMet'
  what(): [enforce fail at gloo/transport/ibverbs/buffer.cc:46] mr_ !=
  nullptr. ibv_reg_mr: Bad address (kernel module 'nv_peer_mem' not
  loaded; did you specify a GPU pointer?)
```

Reviewed By: andrewwdye

Differential Revision: D4982966

fbshipit-source-id: 72c220fe22a3bc59396cfff992ad5f0f9c5bf83a
2017-05-02 09:11:15 -07:00
4bf813e068 Document cdata non-NULL invariant, and consequence Python side. (#1435)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-02 11:17:20 -04:00
3b4bc721ef fix osx build and suppress clang warnings (#1432) 2017-05-02 09:33:24 -04:00
dca208b525 Refactor test_sparse to reduce boilerplate. (#1421)
* Refactor test_sparse to reduce boilerplate.

Instead of manually creating a helper function, threading an is_cuda
parameter around, and creating a test method for CUDA and non-CUDA
variants, we take a different approach:

- There is now some new member variables initialized in setUp which
  control the aspects of how we carry out the test; at the moment,
  it's just whether or not we are using CUDA or not.  This means
  you don't have to pass is_cuda around, or do a conditional to
  get the triplet of constructors you need.

  I'll note that I am not a big fan of member variables in test
  objects, but these are (intended to be) immutable so I think
  it should be OK.

- Instead of manually defining test_foo and test_foo_cuda, we now
  have a new TestCudaSparse class which overrides setUp (from above)
  to swap in the CUDA implementation.  Way less boilerplate, and NO
  metaprogramming needed.

  If you need to opt out of CUDA testing, there is a new cpu_only
  decorator you can use.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-01 21:52:58 -04:00
181cb15c72 Fix formatting error in docs.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-01 21:47:22 -04:00
7df8fbb64f Generalize halving-doubling to support non-power-of-two cases using binary blocks algorithm
Summary: A generalized version of halving-doubling that supports non-power-of-two number of processes by breaking up execution into blocks that are powers of two and communicating interblock after the intrablock reduce-scatter. Non-power-of-two cases will have some degree of load imbalance compared to power-of-two, but cases with few large blocks (e.g. 8 + 4 or 16 + 8) should still perform relatively well.

Reviewed By: pietern

Differential Revision: D4955947

fbshipit-source-id: af4f218fedb6adf475530c38386978b81f4f2b74
2017-05-01 16:05:22 -07:00
5c7453447f Fix bugs, rename differentiate to grad, make it more flexible 2017-05-01 16:44:56 -04:00
87164f554d Bug fixes 2017-05-01 16:44:56 -04:00
267e7c0431 Fix memory issues with Conv and BatchNorm 2017-05-01 16:44:56 -04:00
e5db8f98be Add torch.autograd.differentiate 2017-05-01 16:44:56 -04:00
20aa5b066f Convert some of the functions to new format
Also, fix a lot of issues that appeared after the previous commits.
2017-05-01 16:44:56 -04:00
de9998e198 Add support for the new Function format 2017-05-01 16:44:56 -04:00
702a2e3bc5 Make Variables not subclass Function anymore
Because of this Variables can no longer appear in the graph.
Every usage of a leaf Variable will leave an AccumulateGrad
function that has no outputs, but modifies var.grad as a side
effect.
2017-05-01 16:44:56 -04:00
2ca787fcf4 Refactor attribute names in autograd 2017-05-01 16:44:56 -04:00
2ec629bef9 Set SO_REUSEADDR to try and prevent bind errors
Summary:
After running the test suite many times we end up with a zillion
connections in TIME_WAIT state. Setting SO_REUSEADDR seems like it
should help binding to ports regardless of the TIME_WAIT state.

Reviewed By: andrewwdye

Differential Revision: D4979606

fbshipit-source-id: b611f9c9e11aba858dc192f6bca3d64e10100b52
2017-05-01 13:36:14 -07:00
2197e4c766 version bump 2017-05-01 15:54:52 -04:00
2a28283680 Fix pair destructor if in CONNECTING state
Summary:
It can happen that a pair is destructed while in CONNECTING
state when some unrelated code throws an exception after the connect
function has been called. The most likely place for this to happen is
when connecting pair A is in progress while connecting pair B throws
an exception. The exception will force destruction of all references
to pair A, even if it is in the CONNECTING state.

Also see https://github.com/facebookincubator/gloo/issues/33

Reviewed By: andrewwdye

Differential Revision: D4979557

fbshipit-source-id: 0cddddd3f478106f1694603fe7f2efe15a2d9aa1
2017-05-01 12:41:07 -07:00
4624278b1d Make sparse documentation title consistent with others. (#1420)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-01 11:48:00 -04:00
79d4ac670c Add map_location to load_url (#1418) 2017-05-01 10:21:30 -04:00
4ebf3ff46d Add base for CUDA allReduce and broadcast in DataChannelGloo 2017-05-01 01:49:10 -07:00
ac3ba9a2ad Rebase fixes 2017-05-01 01:49:10 -07:00
14e1bfddbc Change warning message in MPI 2017-05-01 01:49:10 -07:00
c19fbd3364 Update comments; Add inline accessors for value_type tuple in GlooCache 2017-05-01 01:49:10 -07:00
a17d96d571 Add multiple thread support for DataChannels
Previously, when using same data channel in multiple thread environment,
one didn't have any guarantee that there won't be any deadlocks
or even errors.
2017-05-01 01:49:10 -07:00
b7dcc29430 Forward declare GlooCache key_type 2017-05-01 01:49:10 -07:00
18b4dcd28b Remove unused variable in macro 2017-05-01 01:49:10 -07:00
be81304d27 Moved GlooCache to new file; Functions renames; Minor fixes 2017-05-01 01:49:10 -07:00
f07f13c6e9 Change Store exception handling 2017-05-01 01:49:10 -07:00
310d08c37b Fix store and all operations 2017-05-01 01:49:10 -07:00
234df2138a Fix compilation errors 2017-05-01 01:49:10 -07:00
2b340e7d50 Add python tests; Remove broken prefix store creation 2017-05-01 01:49:09 -07:00
6888c61fa8 Fix DataChannelGloo compilation 2017-05-01 01:49:09 -07:00
ba3328b365 Add DataChannelGloo tests 2017-05-01 01:49:09 -07:00
3b4fe5dfc4 Add isend/irecv; Add all types generator for template functions; Minor refactor 2017-05-01 01:49:09 -07:00
ce42761628 Add groups 2017-05-01 01:49:09 -07:00
df4791d6c0 Implement DataChannelGloo 2017-05-01 01:49:09 -07:00
7e8830c3d5 Initial gloo bindings 2017-05-01 01:49:09 -07:00
b91cec7f66 Fix THD library build for CUDA 2017-05-01 01:49:09 -07:00
765aeb1a08 Fix nonzero bug 2017-05-01 01:49:09 -07:00
280e2a94e5 Worker init clarification; Inform on error thread notification failure 2017-05-01 01:49:09 -07:00
e7f453b5de Add barrier to test; Minor changes; 2017-05-01 01:49:09 -07:00
8030aa0f1b Refactor error thread 2017-05-01 01:49:09 -07:00
40ad2cde62 Remove unnecessary nonzeroElems function 2017-05-01 01:49:09 -07:00
af4a978c44 Move error thread to CommandChannel; Minor fixes; 2017-05-01 01:49:09 -07:00
fe5fc6723f Remove unnecessary code 2017-05-01 01:49:09 -07:00
6e6179633b Minor fixes in THDMasterWorkerInit 2017-05-01 01:49:09 -07:00
c97e60c45d Add actual error reporting in Master 2017-05-01 01:49:09 -07:00
2cdb368f97 Add error handling in MasterWorker mode 2017-05-01 01:49:09 -07:00
a5b2f3461a Review fixes 2017-05-01 01:49:09 -07:00
d3e60599d2 Add benchmark scripts (#66) 2017-05-01 01:49:09 -07:00
98d8e0b040 Lapack functions implementation #2 + fixes after review 2017-05-01 01:49:09 -07:00
fe2c360eda Lapack function implementation #1 2017-05-01 01:49:08 -07:00
59ae109bbb Implement functions from set 1 (except Lapack) 2017-05-01 01:49:08 -07:00
8623076654 Add convertToRank to do bound checking 2017-05-01 01:49:08 -07:00
a362b4f367 Add support for unsigned char aka byte to MPI 2017-05-01 01:49:08 -07:00
ef724e355c Change rank type: int -> std::uint32_t; Minor fixes 2017-05-01 01:49:08 -07:00
e863d27393 Tweaks, fixes, cleanup in DataChannelTCP 2017-05-01 01:49:08 -07:00
4c388f9398 Revert structure changes; Minor fixes 2017-05-01 01:49:08 -07:00
6740d1d904 Rewrite CommandChannel 2017-05-01 01:49:08 -07:00
f891d9b1bf Don't build tests by default 2017-05-01 01:49:08 -07:00
a81f330854 Rename construct -> new; Minor fixes 2017-05-01 01:49:08 -07:00
c02241edbd Minor code refactor 2017-05-01 01:49:08 -07:00
f30a92fa17 Fix invalid socket initialization 2017-05-01 01:49:08 -07:00
1391ff99f4 Use TCP_NODELAY for data sockets 2017-05-01 01:49:08 -07:00
43019bd88a Always loop over all possible addresses in worker 2017-05-01 01:49:08 -07:00
d6380910f5 Removed unnecessary code; Minor fixes 2017-05-01 01:49:08 -07:00
04491e84e4 Fix build with CUDA 2017-05-01 01:49:08 -07:00
e247249a5f Implement TH_API functions from the set 4 2017-05-01 01:49:08 -07:00
0160438eb9 added logical not operator for ByteTensor (#1403) 2017-04-30 08:47:24 -04:00
7dd8571bc6 fix avg_pool docs in nn.functional 2017-04-30 08:44:43 -04:00
48a7869b23 Doc fixes (#1409) 2017-04-30 08:28:19 -04:00
582fd3db7d fix osx build 2017-04-29 09:29:57 -04:00
9169f60a84 Parallelize TensorMethods.cpp builds (#1400) 2017-04-29 09:07:21 -04:00
457d78a7d9 Use THCUNN backward kernels for Tanh and Sigmoid in Autograd (#1399) 2017-04-29 09:07:03 -04:00
a071ccbea6 fix NCCL makefile for CUDA 7.5 (#1401) 2017-04-29 09:04:01 -04:00
db1eb66456 corrected docstring for Dropout (#1404) 2017-04-29 13:40:47 +02:00
45020a74cd remove inplace pow and fix contiguous -> coalesce (#1398) 2017-04-28 18:26:29 -04:00
9c01f5d6b2 Document hybrid sparse tensors.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-28 23:53:01 +02:00
cbb9f08b71 Add new init methods gain, eye and dirac (#1172) 2017-04-28 17:16:40 -04:00
f75ab857b8 Add safeCoalesce() to tests 2017-04-28 17:11:05 -04:00
f2903332c7 Make coalesce() out of place 2017-04-28 17:11:05 -04:00
9643be76f9 speed up accumulation 2017-04-28 17:11:05 -04:00
4f09461d24 Rename sparse tensor contiguous() to coalesce() 2017-04-28 17:11:05 -04:00
bafb2e5cc2 Implement sparse pow. (#1387) 2017-04-28 23:06:09 +02:00
28a7fbbdf5 Documentation fix for torch.gather 2017-04-28 22:45:14 +02:00
4c1cdb6148 Refactor Python string utility function 2017-04-28 21:25:26 +02:00
775481ed56 re-enable dilated convolutions on Kepler (#1394) 2017-04-28 14:42:19 -04:00
5b2aac7c73 Merge commit '224f5eabf5cfb3a19abc1819f7dac230500b6bdb' 2017-04-28 13:48:06 -04:00
224f5eabf5 half<->float conversion cleanup (#680) 2017-04-28 19:46:42 +02:00
fd490c6490 Merge commit 'd6a31c68a0f39656257322a55c9e04dd579de828' 2017-04-28 13:42:23 -04:00
d6a31c68a0 Add option to disable ppc64le's VSX support
Set environment variable TH_NO_VSX=1 to disable VSX.
2017-04-28 13:41:03 -04:00
96a281dfab Add one more missing self.dilation parameter. (#1392)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-28 19:16:32 +02:00
94b147fd41 Allows dicts batches in dataloader. (#1354)
* Allow dicts in Dataloader

* use collections.Sequence instead of collections.Iterable in dataloader
2017-04-28 19:14:52 +02:00
c26f6877a0 guard topk for half (#759) 2017-04-28 11:57:15 -04:00
8908000262 function -> lambda in test 2017-04-28 10:31:40 -04:00
8b1d5727d8 fix minor docs 2017-04-28 10:13:52 -04:00
75f1989bec Add nn.Bilinear and tests 2017-04-28 10:11:30 -04:00
e221536ad8 Merge commit 'a44317fea88adddded91e068088415de1e66fd4b' 2017-04-28 08:04:39 -04:00
a44317fea8 Change magma_sgesvd to magma_sgesdd which is significantly faster 2017-04-28 08:03:39 -04:00
24e5a9057e Revert "Parallelize TensorMethods.cpp builds (#1364)" (#1390)
This reverts commit 060048bcd808893ba3113d09273a42642904078a.
2017-04-28 07:59:40 -04:00
060048bcd8 Parallelize TensorMethods.cpp builds (#1364) 2017-04-28 07:45:21 -04:00
77035d151e make topk test unique 2017-04-28 07:30:25 -04:00
50c9c23525 enable topk for all cuda 2017-04-28 07:14:21 -04:00
3f81803b09 Merge commit '69574a6dc4036b0113c512a1b2d74e23682c8a3b' 2017-04-28 07:08:43 -04:00
d421c473a9 Merge commit '928f6516c16ff91c0a789d0a653551041d1bafd0' 2017-04-28 07:07:24 -04:00
48f9e526ea implement expand/expandAs in CPU/GPU code 2017-04-28 07:06:25 -04:00
69574a6dc4 implement expand/expandAs in CPU/GPU code 2017-04-28 07:04:08 -04:00
928f6516c1 implement expand/expandAs in CPU/GPU code 2017-04-28 07:03:51 -04:00
b93b525a1c Enable specifying of margin in HingeEmbeddingLoss (#1378)
Previously it was not possible to set a value for the margin in the HingeEmbeddingLoss in the constructor. This patch fixes the issue and makes the loss behave as it is described in the docs. 

A discussion of this issue can be viewed here:
https://discuss.pytorch.org/t/issue-with-setting-margin-for-hingeembeddingloss/2088
2017-04-28 06:58:48 -04:00
8db2cf6182 temp fix for transposed dilated convolution (#1388) 2017-04-28 02:53:37 +02:00
7e8ef0e22a Actually pass dilation to the underlying operators. (#1386)
No tests for now; we'll need some sort of shape DSL to concisely
represent them.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-27 23:38:01 +02:00
27990fee54 Use fully qualified name as tp_name for tensors and storages (#1379) 2017-04-27 16:26:44 -04:00
2ef7331007 Update sparse.py 2017-04-27 02:25:00 +02:00
c2cfa4cf5b Add THGenerate*Type.h for all types (#1014) 2017-04-27 01:11:56 +02:00
c915f8ddbf Signal error on connection error instead of asserting
Summary: No need to assert on connection errors.

Reviewed By: andrewwdye

Differential Revision: D4957698

fbshipit-source-id: b47f6f0f098dbf7d212701c5cb68e34b2c1c9522
2017-04-26 16:07:13 -07:00
b39a2f2cbb Documentation for sparse tensors. (#1366) 2017-04-26 21:43:05 +02:00
d9f01397b3 s/NOCUDA/NO_CUDA/
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-26 21:42:09 +02:00
8ca7bf2ab3 Check argument types in 'checkTypes' (#1363)
Fixes #1357
2017-04-26 15:00:41 -04:00
8950f41da3 Install CUDA headers.
Summary:
This PR makes cmake installs the gloo CUDA headers if USE_CUDA is enabled.
Closes https://github.com/facebookincubator/gloo/pull/29

Differential Revision: D4946856

Pulled By: pietern

fbshipit-source-id: a688c3794c4a5e34b664e7bdeb4e1148f6504419
2017-04-25 22:42:12 -07:00
afd01164f8 Install missing headers.
Summary:
This PR installs missing include headers.
Closes https://github.com/facebookincubator/gloo/pull/30

Differential Revision: D4946478

Pulled By: pietern

fbshipit-source-id: da2d532afc43cf9e5e7fc764dc7821e2dfca6b37
2017-04-25 09:42:21 -07:00
a123247240 Move SIGPIPE initializer to test main
Summary:
It should be up to the program including Gloo to ignore SIGPIPE.
We have seen a case where the EPIPE errno is not properly handled in
an unrelated piece of code. Having SIGPIPE fire means we can get a
core and debug this further.

Reviewed By: andrewwdye

Differential Revision: D4896727

fbshipit-source-id: f6fe2d3f8dc68a9e6c2c457639b45f8aee2d7b20
2017-04-25 09:08:27 -07:00
41705ce7d5 Add zero padding module (#1326) 2017-04-25 16:58:51 +02:00
88fc1d39ff Generic TopK implementation (#744)
* move TopK to generic

* partial genericization of kernel code

* introduce TopKTypeConfig, specialize radix type and conversion for floats

* implement topk for byte tensor

* implement for char tensor

* implement for int tensor, extend test to check indices as well

* works for longs too

* make bitfield set/get a struct, add support for 64-bit types

* extend to double tensor

* implement for half tensor

* asserts; test fix
2017-04-25 16:39:20 +02:00
9899512401 Remove common.h from root
Summary: This file was left over after a recent refactoring but is not used.

Reviewed By: andrewwdye

Differential Revision: D4940265

fbshipit-source-id: 01f8c5fbc73dd0ca0a92306dbfef22ff28133750
2017-04-24 13:51:15 -07:00
d95feb3feb Only build on 64-bit systems
Summary:
While it is theoretically possible to make Gloo work on 32-bit systems, it's unlikely anybody would ever use it on 32-bit systems. This removes the expectation that it should work...

Fixes #28
Closes https://github.com/facebookincubator/gloo/pull/31

Differential Revision: D4939073

Pulled By: pietern

fbshipit-source-id: 8c60804f7ae5cf835332871a424aefa2c498e8a4
2017-04-24 10:38:45 -07:00
3ab074b3c5 Fix torch.stack() with Variable inputs (#1345) 2017-04-24 12:20:51 -04:00
6a69f7007b Revert "add keyword out for autograd function Concat to match torch.cat (#1336)" (#1340)
This reverts commit 71b9dea6ecc2278511ba6c2531437d27d9a2b8c8.
2017-04-23 19:19:27 +02:00
71b9dea6ec add keyword out for autograd function Concat to match torch.cat (#1336) 2017-04-23 15:36:24 +02:00
fa4f363b93 Instance norm (#1283)
* instance norm

* fix whitespaces

* whitespaces

* docs

* "C" letter was cyrillic in docs, fixed

* remove force_eval, fix non contiguous case
2017-04-23 14:49:15 +02:00
aab30d4ea2 Fix errors when no CUDA devices are available (#1334)
Fixes #1267

This fixes a number of issues when PyTorch was compiled with CUDA
support but run on a machine without any GPUs. Now, we treat all errors
from cudaGetDeviceCount() as if the machine has no devices.
2017-04-23 14:45:27 +02:00
2b56711c24 Indexing fix for fused GRU/LSTM kernels when all tensors are not contiguous. (#1325) 2017-04-22 04:22:32 -04:00
2fa3365f94 Merge commit '5224fc56b03b6468cb85ccf39034b8ab0d76d04e' 2017-04-22 01:14:34 -07:00
5224fc56b0 fix typo 2017-04-22 10:14:09 +02:00
4373580e6b Merge commit 'e80a3a7f7b8d0e179c1481e0744f08e9385b31f3' 2017-04-22 01:11:10 -07:00
d9406a8a1a Merge commit '10387a3f35573462e18219c321ff550757ce9b09' 2017-04-22 01:10:53 -07:00
e80a3a7f7b Indexing fix for fused GRU/LSTM kernels when all tensors are not contiguous. 2017-04-22 01:09:46 -07:00
5b83fe6781 add contiguous checks 2017-04-22 09:57:36 +02:00
24d92b5d9f Concatenate directly into shared memory when constructing batches (#1323)
This saves an extra memory copy, which speeds up data loading a bit
(5-10% with accimage).

As part of this change:

 * torch.cat accepts keyword argument out
 * sepcifiying out=None is treated like not specifying out
2017-04-22 03:40:30 -04:00
1375694853 Document torchvision members 2017-04-21 12:50:36 -07:00
be5e399d46 Add a simple README for torch/lib. (#1322)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-21 15:06:12 -04:00
10387a3f35 fix gradBias checks 2017-04-20 19:21:50 -04:00
a782a6231f Merge commit 'e788ea40de0f7ef393f1b602098a6775a95d8976' 2017-04-20 19:00:45 -04:00
e788ea40de fix typo in TH_APPLY for _dimOffset 2017-04-20 18:59:12 -04:00
6089900011 grammar/typo: "There's 3" -> "There are three"
Summary: Closes https://github.com/facebookincubator/gloo/pull/27

Differential Revision: D4919746

Pulled By: pietern

fbshipit-source-id: 35733b75fc169d2ccff8b10df013eed8c279dfd5
2017-04-20 15:19:56 -07:00
81345306c8 Merge commit '8236d38e81396ac48697ac289c0476cff18a8e08' 2017-04-20 15:03:48 -07:00
f0a19e2617 Merge commit '331219c5506b26bf0906b7acdafb4823e07a924e' 2017-04-20 15:01:22 -07:00
8236d38e81 add cusparse link dependency 2017-04-20 14:31:30 -07:00
8adf8fe2ed create and expose handles for cusparse 2017-04-20 14:30:14 -07:00
d2472d1ab5 Disable cudnn dilated convolutions for kepler. (#1308) 2017-04-20 15:31:45 -04:00
331219c550 define abs for short too 2017-04-20 09:55:17 -07:00
7805ac9098 Base Store::wait() should ignore timeout for back compat
Summary: PrefixStore::wait() uses a default timeout if unspecified. This is incompatible when using PrefixStore to wrap a Store implementation that does not support timeout. Instead the base Store::wait(keys, timeout) implementation is called, throwing an exception. This change modifies the base implementation to ignore the timeout.

Differential Revision: D4916517

fbshipit-source-id: 3cdd83bd209bf938b58442d82f3fc245e68019ad
2017-04-19 16:49:44 -07:00
5f65ee9ca0 Add more newContiguous calls and checks 2017-04-19 14:01:31 -07:00
f9149b1f2e Fix halving-doubling corner cases
Summary: Fixes for corner cases with small element counts. Fixed problems include (1) calling range on out of bounds pointers, (2) failing to allocate send or receive buffers in cases where they correspond to out of bounds indices for reduce-scatter, but are needed in the allgather, (3) not allocating enough receive buffer space (more than count_ bytes may be needed in some cases)

Reviewed By: pietern

Differential Revision: D4912656

fbshipit-source-id: 0409d01894ff9c93ef1a1fdf8021c9ecf62f9b57
2017-04-19 12:20:28 -07:00
a8e6610e3d Fix argument typo in pad_packed_sequence docstring (#1300) 2017-04-19 13:50:59 -04:00
56cc1e219b Fix include in mpi/context.cc
Summary:
memcpy comes from cstring

See https://github.com/caffe2/caffe2/issues/286

Reviewed By: Yangqing

Differential Revision: D4914228

fbshipit-source-id: de60c2a98feb4228546a8f1fe237a090101f50e4
2017-04-19 10:19:55 -07:00
1607042bf4 Add timeout parameter and default to rendezvous Store::wait()
Summary: TSIA. Defaulting to 30s.

Reviewed By: pietern

Differential Revision: D4909202

fbshipit-source-id: 7f86f390077a19e559c90a1aa3aa768e273325d1
2017-04-19 10:11:56 -07:00
7d023cda6c Add timeout to RedisStore::wait()
Summary: Add a default 60s timeout to RedisStore::wait() to avoid blocking indefinitely	when peer machines are unavailable.

Reviewed By: pietern

Differential Revision: D4908699

fbshipit-source-id: 39de9066633e8b0c8d1ee198b6bf3f70d3961196
2017-04-19 09:58:05 -07:00
9e8b4ef075 Include THCNumerics.cuh in THCAtomics.cuh. (#752) 2017-04-19 12:08:22 -04:00
a35f507532 Update functional.py (#1298) 2017-04-19 11:07:12 -04:00
6aa22beb86 Fix loss.py docs (#1296) 2017-04-19 11:03:15 -04:00
71bf8fb55b Clean up fd from destructor when in listening state
Summary:
It's possible the pair is in the listening state when it is
destructed. The fd will not have been cleaned up in that case, so we
shouldn't assert that being the case.

Reviewed By: andrewwdye

Differential Revision: D4909964

fbshipit-source-id: 7103d74910e3bcf5de9f4658d8f1f682b6c8a70c
2017-04-18 17:51:49 -07:00
c7d83a16f6 Update README.md 2017-04-18 19:05:18 -04:00
934816c01c Change the default algo for cuDNN conv forward to PRECOMP_GEMM (#1290) 2017-04-18 19:01:47 -04:00
5a0510934f Merge commit 'fcf4deac7d215f134ea25cd3def8b564b58b033c' 2017-04-18 15:21:20 -07:00
fc19473501 Corrections in legacy modules. (#1286) 2017-04-18 17:13:53 -04:00
34546f022a Expose dilated convolutions.
Fixes #1225.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-18 17:13:02 -04:00
ab77742f6e Add some missing documentation for arguments.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-04-18 17:13:02 -04:00
701e63107f speed improvements, fix tests 2017-04-18 12:46:54 -07:00
655c22569e CPU hspmm + more efficient reorder 2017-04-18 12:46:54 -07:00
cd3bbc9dfd more operations and optimizations (hspmm, reorder, ...) 2017-04-18 12:46:54 -07:00
1018b238ac make gradients contiguous in adagrad 2017-04-18 12:46:54 -07:00
e27bd4ce7a faster cadd 2017-04-18 12:46:54 -07:00
b2acc33c73 contiguousValues method 2017-04-18 12:46:54 -07:00
40804830b8 mark_contiguous operation 2017-04-18 12:46:54 -07:00
01d84c5f9d revert sparse cuda index type change 2017-04-18 12:46:54 -07:00
88b42324e7 spcadd, sparseMask, cadd, csub, cmul + tests 2017-04-18 12:46:54 -07:00
ec260fe8e9 add test for dsmm 2017-04-18 12:46:54 -07:00
328b416068 THCS contiguous + to_dense 2017-04-18 12:46:54 -07:00
4bde9efbd7 Update CONTRIBUTING.md 2017-04-18 15:39:58 -04:00
ff781ed059 Update CONTRIBUTING.md 2017-04-18 15:39:26 -04:00
8f9a1af253 Merge commit 'fcf4deac7d215f134ea25cd3def8b564b58b033c' 2017-04-18 12:22:44 -07:00
31900b6bae Merge commit '1feb120d938d47c01900f656322f16bc41d08af3' 2017-04-18 12:22:27 -07:00
46cf6ff5fb fix batchnorm docs (#1284) 2017-04-18 15:12:38 -04:00
fcf4deac7d Fused RNN kernel remove explicit instantiation, isn't needed. 2017-04-18 11:07:58 -07:00
1feb120d93 Mark input as optional for gradInput in Tanh and Sigmoid 2017-04-18 10:33:33 -07:00
2ca071d730 Remove double precision math from LogSigmoid too 2017-04-18 10:28:13 -07:00
8a901c510d Update ops for Sigmoid and Tanh 2017-04-18 09:55:11 -07:00
ed60fe0ed6 Gloo benchmarking and script updates
Summary: Add AllgatherRing and CudaBroadcastOneToAll to benchmark. Add host info and algorithm sweep to chronos script.

Reviewed By: pietern

Differential Revision: D4901111

fbshipit-source-id: 1421025d39b914b14e857f21c43eac30c9c9dd2f
2017-04-18 09:06:34 -07:00
f67ab32d34 Output peer address on network failures
Summary: Output peer address on network failures. This change will help in root causing network failures.

Differential Revision: D4899129

fbshipit-source-id: 60a762c6551a726081d5335ab478da8dd7f6dad7
2017-04-17 13:50:24 -07:00
9150e33765 Add support for creating docsets. (#1276)
Docsets are an offline documentation format introduced by Dash.app and
supported by Zeal and some other open-source clones.
2017-04-17 16:35:02 -04:00
e4478804ce Fix patched_make_field for newer Sphinx versions. (#1275)
Not sure since which version that change is needed, but using v1.5.5 here.
2017-04-17 16:17:58 -04:00
a220f2c3aa Fix group-convolution w/o biases on CPU. (#1273)
* Fix group-convolution w/o biases on CPU.

Not having this guard will cause a crash further down in the `cat`
function when it uses the first element in the passed list to create a
new tensor. (And even after that, cat doesn't handle nulls well.)

* Added test for groupconv w/o bias on CPU.
2017-04-17 14:53:28 -04:00
15267ac009 fix typo 2017-04-15 13:08:58 -04:00
0cb60e7d5a Retrieve ethernet interface link speed
Summary: Retrieve ethernet interface link speed

Reviewed By: pietern

Differential Revision: D4880290

fbshipit-source-id: 91f1555d9bb35ff41dc731e082365a9002bb1661
2017-04-14 14:41:01 -07:00
b61174047f Add threshold to switch between host/device reduce and bcast depending on buffer size
Summary: Device reduce is more efficient for large buffer sizes. For smaller buffers, host reduce may be more efficient in some cases and frees up the GPU for other work.

Reviewed By: andrewwdye

Differential Revision: D4885855

fbshipit-source-id: 7dc522e8c93e1a94427730aca6af03b7e93e660d
2017-04-13 15:05:47 -07:00
8d93fcf13f Don't allow overwriting keys in HashStore
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4885102

fbshipit-source-id: c46c180fa8e6dd354921d562830b3515ba91c964
2017-04-13 12:35:32 -07:00
a559893c9f Instantiate nccl type templates for gloo (minus half)
Summary:
Instantiate nccl type templates for gloo (minus half).
half requires at a minumum ifdefing CUDA_HAS_HALF and likely requires
more work given that operators aren't defined on it, so skipping it
for now.

Reviewed By: pietern

Differential Revision: D4876217

fbshipit-source-id: 833d2aec12789cbaf9e0a201b979a420fbe6732f
2017-04-13 10:52:38 -07:00
50c2759afe Expose missing headers
Summary: Closes https://github.com/facebookincubator/gloo/pull/25

Differential Revision: D4883908

Pulled By: pietern

fbshipit-source-id: 662a8fdf83ad099295b11043194de25c747e8286
2017-04-13 10:08:06 -07:00
cb66e9cf78 torch.diag bug fix (#1251) 2017-04-12 20:59:12 -07:00
735f5af87e Add new variant of halving/doubling algorithm that pipelines local reduce/broadcast with communication steps
Summary: Added a pipelined version of cuda halving/doubling algorithm. Half the buffer is reduced prior to first send and the other half prior to reducing the result from first receive. Broadcasts are started asynchronously as soon as each new message is received. New code was added as a new algorithm, as pipelining makes performance worse for small buffer sizes.

Reviewed By: pietern

Differential Revision: D4847109

fbshipit-source-id: 5aa55de95f8c94069380af7396f2b5b6297dcbea
2017-04-12 18:01:22 -07:00
c852883086 add named_parameters that yield name and value of parameters (#1242) 2017-04-12 16:32:36 -07:00
ab77e4c3d7 Merge commit '62c584ba7972dbba404766aa06d1a558282b4169' 2017-04-12 15:06:58 -07:00
2444278b8b Merge commit '4336e9ea6641b8ac2814eaef2adef64e4106459c' 2017-04-12 15:06:10 -07:00
62c584ba79 Fix abs with char and short cuda types. (#747) 2017-04-12 15:04:59 -07:00
fbd53d87bf block wide reduction with multiple values to reduce at once (#745) 2017-04-12 15:04:43 -07:00
71303b8af4 Autograd deadlock for recent glibc fix (#1243) 2017-04-12 22:24:31 +02:00
4336e9ea66 Revert "make it compile on Windows + use ilp64 MKL" (#1002) 2017-04-12 12:07:16 -07:00
d48afd41f9 Add print string for MaxPool3d, change for MaxPool2d (#1115) 2017-04-12 15:58:28 +02:00
e21e4bf3e8 add pyyaml to conda note here as well 2017-04-11 21:21:18 -07:00
8e36339911 Merge commit '0925c91e80cc1b3a86fcbc54570f5bb204c9cb77' 2017-04-11 18:00:44 -07:00
5391fe8953 addr zeroes output buffer when beta=0 2017-04-11 18:00:11 -07:00
0925c91e80 addr zeroes output buffer when beta=0 2017-04-11 17:59:42 -07:00
253c854da5 update Dockerfile not to use requirements.txt 2017-04-11 15:42:05 -07:00
7c59754d24 update source build instructions 2017-04-11 15:24:31 -07:00
2bf7dc643f Merge commit 'aec658f8708a6f4448329da006d14ff2e13dc821' 2017-04-11 15:02:36 -07:00
ce30c76823 Merge commit '2b37ecfccf810a8e21c2c9ac9a943ce2f7c01015' 2017-04-11 15:02:16 -07:00
a8d60ad3ac fix THNN headers 2017-04-11 15:00:30 -07:00
aec658f870 fix THNN headers 2017-04-11 14:57:11 -07:00
2b37ecfccf fix THNN headers 2017-04-11 14:56:53 -07:00
01a35dcace Fix coalesced CUDA collectives for nonhomogeneous lists 2017-04-11 14:48:54 -07:00
afeeb81e79 Add support for keyword arguments in torch.cat 2017-04-11 14:48:54 -07:00
6002f94232 Fix is_tensor and is_storage for old-style classes 2017-04-11 14:48:54 -07:00
a5c7d98611 Import TripletMarginLoss 2017-04-11 14:48:54 -07:00
605b3c86ce Retain the type of numpy scalars in collate_fn 2017-04-11 14:48:54 -07:00
2087b1157a Improve serialization error messages 2017-04-11 14:48:54 -07:00
81e972031d Handle all errors if Module's sources can't be retrieved 2017-04-11 14:48:54 -07:00
e9ff57176b Fused pointwise kernels for GRU/LSTM 2017-04-11 13:42:06 -07:00
a739960515 Merge commit 'cfa504691c2ce5e10010ffb6cd43001c59109aea' 2017-04-11 13:41:54 -07:00
f43320dbf2 Merge commit '0dc52abe9a673547caf79ac64c73e8e16fb37b33' 2017-04-11 13:41:42 -07:00
cfa504691c Fused pointwise kernels for GRU/LSTM 2017-04-11 13:36:38 -07:00
0dc52abe9a Fused pointwise kernels for GRU/LSTM 2017-04-11 13:36:02 -07:00
0b50f794e9 Use thnn version of Tanh/Sigmoid instead of autograd. (#1234) 2017-04-11 12:49:57 -07:00
2abbb5133c Fixing function signatures: long -> ptrdiff_t (#1232) 2017-04-11 11:37:21 -07:00
fcf8387779 Fix ibv_devices wrapper if device list is empty
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4866469

fbshipit-source-id: 6bbde8ec9d71ea89ccdab379d48d122b90237460
2017-04-11 11:04:54 -07:00
ade105fb7c update README to install pyyaml from conda (#1231) 2017-04-11 10:23:45 -07:00
4e693d12ab Merge commit '79c4cb96b16dac603247ffd88c473e84565915a9' 2017-04-10 14:35:54 -07:00
79c4cb96b1 fix memory leak in btrisolve and getri 2017-04-10 14:35:07 -07:00
97bd6aae37 Throw error if Redis replies with error
Summary:
The code already asserted, but only on the reply type, so it didn't
include the actual error message. This makes debugging problems much
easier when people have problems running the benchmark suite.

Differential Revision: D4860022

fbshipit-source-id: 659bc461a724603375bff18eac90eca658492b05
2017-04-10 10:49:59 -07:00
f618ea9f31 Update README.md
Summary:
Mention GPUDirect in README
Closes https://github.com/facebookincubator/gloo/pull/24

Differential Revision: D4860167

Pulled By: pietern

fbshipit-source-id: 80804c778cdc6a9bcd8febe7e05142145cc6c61b
2017-04-10 10:49:59 -07:00
f6fef3718e fix typo in autograd.rst (#1219) 2017-04-10 01:16:59 -04:00
3fcdd6a42b Reuse sockaddr information from device
Summary: This is cheaper than doing getaddrinfo for every pair.

Reviewed By: andrewwdye

Differential Revision: D4850102

fbshipit-source-id: e77f468f099f63860b52fdd0dcc57a8a7a91a448
2017-04-09 16:37:41 -07:00
707c1ca4cc Function to retrieve PCI bus ID from device
Summary:
Part of this change is to perform a getaddrinfo in the TCP device
class so we can figure out the interface and subsequently PCI bus ID
of the NIC used for its traffic. This information can be used in a
later diff to avoid doing getaddrinfo calls in the TCP pairs and have
them reuse the information that is resolved by the device.

The PCI bus ID can be used to compute distance between NICs and GPUs
and make informed decisions on where to allocate scratch buffers.

Reviewed By: andrewwdye

Differential Revision: D4850035

fbshipit-source-id: 575e401a9273300bc720c814fef8971846ec748c
2017-04-09 16:37:41 -07:00
bc0ed9298d remove incorrect version in readme 2017-04-09 14:44:44 -04:00
040cf42643 Merge pull request #455 from twitter-forks/indexlinear
Adding Indexlinear
2017-04-09 13:52:56 -04:00
6d9ad1d66a Adding IndexLinear (#1181)
* Add IndexLinear

* Fixes to IndexLinear

- Fix IndexLinear test
- make it better for multithreaded case
- fix a glitch in the C code
- improve the reset() method
- fix the weight allocation.
- remove "fakeBatch" possibility as it's not used
- clamp normalized values at evaluation time instead of just dividing by max.
- add assert on the keys/values dimensions in IndexLinear.
- invert order of weightDecay in the case of output dim > 1.

* Changes required to support IndexLinear in CUDA

* Adding support for flattened inputs for IndexLinear

* Doc for IndexLinear + fix for when the input format changes from one batch to another.

* Cleaning up IndexLinear documentation

* Changes required to build with latest torch

* Adding benchmark script for IndexLinear

* Bugfixes and cleanup of IndexLinear.lua

- Fixed bug that occurs when performing multiple accGradParams +
  updateParams

- All the data required for the updates is put in a single table

- Added :pararameters method
2017-04-09 13:51:45 -04:00
64ee4056d7 updated docker image inside the docs (#1216) 2017-04-08 10:29:03 -04:00
55d69b5ade Merge commit '88bcfc15316e3c878237a8f95aeb6e72402c90ff' 2017-04-07 17:20:52 -07:00
0d7d6e1f0d Merge commit '662163bef68a9d64f3cb13a903638c870c0b4aa6' 2017-04-07 17:20:15 -07:00
b16a352a3b Fix remainder and cremainder for integer types 2017-04-07 17:17:44 -07:00
88bcfc1531 Fix remainder and cremainder for integer types 2017-04-07 17:16:59 -07:00
662163bef6 Fix remainder and cremainder for integer types 2017-04-07 17:16:31 -07:00
4026593240 check for beta=0 and avoid multiply in sparse mm (#1211)
* check for beta=0 and avoid multiply in sparse mm
2017-04-07 20:14:32 -04:00
a931064a52 Merge commit '441d75ce569f89bad3e2f1f2a2075e68ae3bc76b' 2017-04-07 16:57:05 -07:00
441d75ce56 Adapts basic operations to new THXVector interface 2017-04-07 16:56:12 -07:00
3de56785fa fix conv1d test and add for padding 2017-04-07 13:56:02 -07:00
5ee8536a02 Merge commit 'a89317a9d407241c97fe4486b3c88de8578445d7' 2017-04-07 13:49:18 -07:00
f00a5d2f54 Merge commit '66a20e5c328836c1eb720cf4e2eb916366aae487' 2017-04-07 13:47:25 -07:00
a89317a9d4 fix types in unfold.c 2017-04-07 13:32:04 -07:00
e48db02e10 remove unused python-level BatchNorm.py 2017-04-07 16:27:16 -04:00
7f2553bc6f dont use cudnn batchnorm for cudnn < 5.1.10 2017-04-07 16:27:16 -04:00
66a20e5c32 Support TORCH_NVCC_FLAGS environment variable
This is already supported in cutorch since august 2016, and is used in
pytorch integration (to reduce the binary size).
2017-04-07 18:23:22 +02:00
37d95687c4 Merge commit 'ae1c365dbdbf667ae24c57eec9f2e6b9debf16bd' 2017-04-06 16:37:31 -07:00
f0c7124420 Allow support for negative dimension argument for all functions 2017-04-06 16:37:00 -07:00
ae1c365dbd Add TH_INDEX_BASE to nDimension and stride functions 2017-04-06 16:30:11 -07:00
6fd9b53d93 Include common/linux.{h,cc} in CMake build
Summary:
Forgot to include these in a previous commit.
Closes https://github.com/facebookincubator/gloo/pull/23

Differential Revision: D4847072

Pulled By: pietern

fbshipit-source-id: 08aa9e8fa47377eb8c7747bd577eec7e615789f1
2017-04-06 15:20:59 -07:00
e692c38fcf Compute distance metric between PCI devices
Summary:
With this we can compute the best GPU device to reduce on. It is not
always the one CUDA indicates as GPU 0.

Reviewed By: andrewwdye

Differential Revision: D4845581

fbshipit-source-id: 13e0500f54fd507899646f781a97c09abcd3b056
2017-04-06 13:50:07 -07:00
5dfa73702f Display runtime information in benchmark output
Summary:
This makes it easier to capture, compare, contrast results with
different parameters.

Reviewed By: andrewwdye

Differential Revision: D4843715

fbshipit-source-id: ba6916dcd5f8bcc615d6edce1a54657241357c31
2017-04-06 11:06:23 -07:00
95140094cb Use CudaStream as first class object
Summary:
Instead of having every CudaDevicePointer "own" a stream, this change
moves to using CudaStream as first class object. It was pretty clunky
to use the copy{To,From}* functions on the CUDA pointer classes to
copy stuff around. For example it was not clear whether the stream
belonging to the source or destination was used to execute the copy
on. There is no longer such ambiguity after this change.

To make this work the CudaBroadcastOneToAll algorithm was changed to
include the workspace template argument, but only has the
CudaHostWorkspace implementation. The CudaDeviceWorkspace
implementation is left to be done for another change (that's not the
purpose of this change).

Reviewed By: andrewwdye

Differential Revision: D4841615

fbshipit-source-id: d0c1b9ba948ff6167832515afa7bdd2b32b48064
2017-04-06 11:06:23 -07:00
ef95926103 Move setTimeout to Device and set default tcp timeout to 30 sec
Summary: Make timeout a device attribute. Now the pair will configure timeout when connecting based on device timeout settings, instead of needing to be set explicitly on each pair. Set default tcp timeout to 30 sec.

Reviewed By: pietern

Differential Revision: D4838918

fbshipit-source-id: e6e6ee36c662eb5e7ba5354c904e50f9dcac258f
2017-04-06 08:50:21 -07:00
e7f5220dfa device_ids can be None again in data_parallel (#1187) 2017-04-06 10:30:53 -04:00
a7ae04a657 fix precedence problem when building with debug python (#1201) 2017-04-06 10:30:16 -04:00
7f03182bfa sizeAverage -> size_average in docs 2017-04-06 01:31:02 -04:00
9f2a5d804d Add a flag to fix when dataset size is not divisible by batch size. (#1133) 2017-04-06 00:18:43 -04:00
aa506fa4d7 fix docs typo 2017-04-05 23:42:02 -04:00
955869a09a fix cuda_allreduce_halving_doubling to correctly copy between and reduce on GPU buffers
Summary: cuda_allreduce_halving_doubling was not properly handling the case where buffers are allocated in GPU memory, trying to reduce and copy from them as if they were in system memory.

Reviewed By: pietern

Differential Revision: D4840259

fbshipit-source-id: 2615360cd2f1d9c7a37fb0bcdf33ff35528b2c75
2017-04-05 19:56:20 -07:00
d82cad3019 implement nn.Module.__dir__ (#1142) 2017-04-05 22:18:34 -04:00
9504246c32 add triplet margin loss (#1165) 2017-04-05 22:17:58 -04:00
81cf3dbf79 Merge commit '6bd4ecd15390517c68d598d236ffb0929ade277c' 2017-04-05 19:07:01 -07:00
12f1b4f76c Merge commit '84bdbe5ab4b602b021ff494487c8ad57457052d3' 2017-04-05 19:06:14 -07:00
84bdbe5ab4 btrisolve: Add sz checks, correct B's ordering, support nrhs>1. 2017-04-05 19:05:20 -07:00
85954032d9 fix doc formatting 2017-04-05 22:02:29 -04:00
1a04b92226 add note regarding SGD momentum 2017-04-05 20:45:41 -04:00
8a822d48f5 Update README.md
Summary:
Clarify that Redis Cluster is not supported. Also see #21.
Closes https://github.com/facebookincubator/gloo/pull/22

Differential Revision: D4837375

Pulled By: pietern

fbshipit-source-id: 6e3575b3b8dae6ca62beb765da15d8506da4abdb
2017-04-05 13:06:48 -07:00
5511ad258b cuda version of recursive halving/doubling allreduce
Summary: Basic port of the CPU halving/doubling algorithm. No pipelining is done between reduce/broadcast and communication.

Reviewed By: pietern

Differential Revision: D4823693

fbshipit-source-id: b18045d64edf90361bf7713f4ccb2e074757780f
2017-04-05 12:39:16 -07:00
75a635630d Update to ignore zero targets
If the target is zero, loss and gradient of input are set to zero. It
is useful for variable-length natural language generation models.
2017-04-05 11:51:54 -07:00
8e6524938b Undo D4832492 for Gloo
Summary: No folly dependency in Gloo.

Reviewed By: andrewwdye

Differential Revision: D4835050

fbshipit-source-id: 97d0c14fb770fdde68206ca5a20a974bef156392
2017-04-05 09:51:05 -07:00
4e4cfd8b2b Fix main()s to call folly::init/initFacebook/registrationComplete (part 14)
Summary:
Required for D4821763
Based on targets from https://fb.facebook.com/groups/fbcode/permalink/1304073246296178/ (I also excluded those targets which do not depend on folly:singleton).

Reviewed By: meyering

Differential Revision: D4832492

fbshipit-source-id: fcb4ce42e9e5359d4752769f77d7271e550201fe
2017-04-04 20:50:47 -07:00
6bd4ecd153 Use thrust::inclusive_scan for 1D cumsum/cumprod (#742)
For large 1D tensors thrust::inclusive_scan is much faster than our
current implementation.
2017-04-04 21:05:10 -04:00
5c802c5ba9 Refactor AllgatherRing to use remote buffer offset
Summary: Refactor AllgatherRing algorithm to remove all memcpy in the communication rounds by using outPtrs as send/receive buffer + remote buffer offset.

Reviewed By: pietern

Differential Revision: D4793186

fbshipit-source-id: 645d0758d246fd0b493e3fe312a8441d86f6d169
2017-04-04 17:08:26 -07:00
04f5b5ea83 Merge commit '5b40e4245d573ae0a6c2da70a0b712528aab2bce' 2017-04-04 15:39:35 -07:00
5b40e4245d Fix typo and make btrisolve work for doubles on the CPU. 2017-04-04 18:29:30 -04:00
ae5865082c Move common algorithm stuff into algorithm.h
Summary:
Combines the top level common.h with algorithm.h. With algorithm.h in
the common package, CUDA algorithms only need a dependency on that
package. CudaBroadcastOneToAll still depended on broadcast.h so this
change also removes that dependency and has it subclass the Algorithm
class.

Reviewed By: andrewwdye

Differential Revision: D4826885

fbshipit-source-id: 930037e39f7a2c941868e53f0bbc54e3f2e0b184
2017-04-04 13:05:50 -07:00
f86beccc5b Use workspace pattern with CudaAllreduceRingChunked
Summary:
GPUDirect support for CudaAllreduceRingChunked by adding a workspace
template parameter and adding workspace specific init functions.

To support this change the CUDA LocalOp classes had to be changed a
bit to take an extra destination/source pointer. This allows reduction
of 1-N pointers into a target pointer, where the target may live on
device or live on host. If it lives on the host, the NCCL operation
that executes the reduction is followed by a D-to-H memory copy. If
there is only a single input pointer, no reduction needs to happen and
the class just executes the D-to-H memory copy. The net result is that
we can interchangeably use device or host pointers as target for
reduction or source for broadcast and these LocalOp what you would
expect them to do.

Reviewed By: andrewwdye

Differential Revision: D4825236

fbshipit-source-id: 048ec6cbc5a0500bafbe1b3f6abe1e2e5f3a2675
2017-04-04 13:05:50 -07:00
d122b4e4ec Update btrisolve docs to the newest interface. 2017-04-04 15:21:16 -04:00
ccfc4567dc Merge pull request #78 from ilya-biryukov/master
Fix compilation error when compiling with 'clang -x cuda'.
2017-04-04 09:47:52 -07:00
81008aa111 Handle errors in sync IO path.
Summary: Fixes for handling errors and timeouts in blocking and polling sync paths. Add test coverage for errors and timeouts.

Reviewed By: pietern

Differential Revision: D4823498

fbshipit-source-id: 93721947a6404ca9cea6a4869f4156f8d270a981
2017-04-04 09:37:33 -07:00
0cdf10478d Start benchmark element sweep at 100
Summary:
Anything number of elements below this always fits in a single packet
and will yield ~identical results.

Differential Revision: D4825190

fbshipit-source-id: 71ac77456049e991da5059d5a029c5e9d2a67ed7
2017-04-03 23:50:38 -07:00
4de82cfa0f Use CudaAllreduceRing<CudaDeviceWorkspace> for GPUDirect
Summary:
The existing CudaAllreduceRing with a CudaDeviceWorkspace
template parameter now has the same effect.

Reviewed By: andrewwdye

Differential Revision: D4823393

fbshipit-source-id: 88fe497a983b26a281a3a74fe3bdc02c0c87c523
2017-04-03 20:05:25 -07:00
1ac8251373 Use gloo::make_unique to fix build for C++11
Summary: Closes https://github.com/facebookincubator/gloo/pull/20

Differential Revision: D4820325

Pulled By: pietern

fbshipit-source-id: 00a870f71e8e98ce6d06da261dcaed83b81ec81c
2017-04-03 17:07:04 -07:00
511ca3ea1b Add tests for tcp transport failures
Summary:
Implement a file store for multi-process transport failure testing. Add test cases to spawn multi-process tcp communication, and verify that all processes throw the expected IoException.

A future diff will add coverage for connectivity failures, sync modes, and ibverbs.

Reviewed By: pietern

Differential Revision: D4807794

fbshipit-source-id: 35212719d46e6d875eacb341fae25681f39053bc
2017-04-03 16:08:39 -07:00
8ce1382e99 make it compile on Windows + use ilp64 MKL (#981) 2017-04-03 18:02:15 -04:00
22cdef3ddc recursive halving/doubling allreduce
Summary:
Allreduce using recursive halving and doubling algorithm. Algorithm is described in http://www.mcs.anl.gov/~thakur/papers/ijhpca-coll.pdf (see top diagram on page 12). Algorithm consists of 2 lg P stages, the first log P performing a reduce-scatter and the second log P the allgather. Message size is variable across steps. The early stages of the reduce-scatter and the late stages of allgather send the largest messages. The communication is structured such that the largest messages are sent between nearby ranks, which could be useful if elements are ranked in locality-aware fashion.

So far this supports only power-of-two number of processing elements.

I have attempted to minimize the amount of synchronization/ hand-shaking. Messages are received at different offsets of the output buffer for each communication step. Send offsets in the reduce-scatter steps become receive offsets in the allgather and vice versa. The reuse of buffers across reduce-scatter and allgather steps requires synchronization. Right now the algorithm is inefficient in terms of memory use, requiring 3x memory currently. This can be reduced, but would require additional synchronization.

Reviewed By: pietern

Differential Revision: D4795878

fbshipit-source-id: fcc6597ef6a99cd102fce2b8e4562d93088d39dc
2017-04-03 14:05:44 -07:00
148b11847b Remove useless base class in allreduce.h
Summary:
Didn't provide enough value now that ReductionFunction and
CudaReductionFunction are no longer related.

Reviewed By: andrewwdye

Differential Revision: D4819295

fbshipit-source-id: e6479769af7f78d486bee7d9c31f049430cdc775
2017-04-03 11:09:50 -07:00
b3a2f30715 Extra workspace template parameter for CUDA algorithm
Summary:
To bring the GPUDirect and non-GPUDirect implementations of CUDA aware
algorithms closer together this change introduces CUDA workspaces.
There's an implementation for a host side workspace and a device side
workspace. The former is used for transports that don't support
GPUDirect and the latter for ones that do. CUDA algorithms will take
an extra template parameter for this workspace and this will determine
whether they can be used for GPUDirect or not.

The workspaces only define their respective pointer types right now
but may contain local operation construction functions at a later
point in time.

Reviewed By: andrewwdye

Differential Revision: D4802826

fbshipit-source-id: cb1d71a224ce0165afd07fb9092ad54d3e07c8cf
2017-04-03 11:09:50 -07:00
91c4ba7980 Add torch.arange and deprecate torch.range 2017-04-03 10:38:58 -04:00
03f1cab801 Unify argument names in norm and renorm 2017-04-03 10:38:58 -04:00
fa2c566353 Add Variable.type_as 2017-04-03 10:38:58 -04:00
2d1122739c Raise AttributeError in Module.__getattr__ 2017-04-03 10:38:58 -04:00
7861f585fe Reshape grad in dot 2017-04-03 10:38:58 -04:00
3abf2ef225 Merge pull request #991 from BTNC/win
add /arch:AVX /arch:AVX2 explicitly for msvc so it compiles on windows
2017-04-02 13:32:57 -04:00
70c4b82eba add /arch:AVX /arch:AVX2 explicitly for msvc 2017-04-02 20:47:29 +08:00
274b5c9003 Allow unhashable inputs to parallel_apply 2017-04-01 20:11:20 +02:00
dfa2d26830 * make random_ range correct when both lower and upper are specified 2017-03-31 15:37:24 -04:00
559ae078b8 Fix Option constructor in invalid argument error printing code (#1160) 2017-03-31 15:35:35 -04:00
030ff4928a Merge commit 'a216e377b3844ac9c7882bd391a00f4e0ae718e7' 2017-03-31 11:45:37 -07:00
0829bffdec Merge commit '403cad46dc91a2bc2f6889754055decd6f3d53c7' 2017-03-31 11:45:24 -07:00
ffc7911bec Merge commit 'd8ae7893e056ebf4e7a5e96bab2c3b69f196ddfd' 2017-03-31 11:45:06 -07:00
ff1fde6151 Merge commit 'a3bfb9f376a57fb63e89ddf70f57353f19ed9d69' 2017-03-31 11:44:48 -07:00
a216e377b3 Merge pull request #456 from twitter-forks/addmm-fixes
Using temporary variables when performing transpose + addmm
2017-03-31 14:44:07 -04:00
b13b7010b9 check for nvidia driver's sufficiency before checking for number of CUDA devices (#1156) 2017-03-31 12:19:59 -04:00
a3bfb9f376 THVector_(add),(mul) -> (adds),(mul) for VSX.
This was previously completed for other architectures.
2017-03-31 08:50:23 -07:00
5c79046d39 Use persistent tensor to store exp_inf (part of optimizer's state) (#1152) 2017-03-31 10:30:31 -04:00
30fd222b80 implement autograd function cross (#1138) 2017-03-31 01:45:51 -04:00
3b7b23df66 Move CUDA collectives to cuda_collectives.h
Summary:
The CUDA algorithms all had their own version of local reduction and
broadcast. This commit consolidates them and allows all CUDA
algorithms to work with CudaDevicePointer instances.

Reviewed By: andrewwdye

Differential Revision: D4797968

fbshipit-source-id: cccef39fce01905a2cd757ccbcffd29803411409
2017-03-30 15:06:03 -07:00
d933287114 Add a barrier after verification iteration in benchmarks to prevent a race with regular iterations
Summary: Verification was sometimes failing for allreduce halving-doubling. Pieter noticed that it is due to verification step racing with the regular iterations.

Reviewed By: pietern

Differential Revision: D4804558

fbshipit-source-id: f645cb2e332e449a993a634c5bdb42c2dcb8613b
2017-03-30 14:14:32 -07:00
761eef1f19 Minor typo fix in backward function in torch/autograd/variable.py (#1143) 2017-03-30 11:23:28 -04:00
d8ae7893e0 Get rid of warp-synchronous code (#739)
Time to get rid of warp-synchronous code. It will break!
2017-03-30 01:20:43 -04:00
90b872c670 Add GPUDirect capable version of CudaAllreduceRing
Summary:
This is a copy of CudaAllreduceRing that doesn't stage the locally
reduced buffer in host memory but uses the GPU side buffers directly.

Eventually I would like this to be absorbed back into
CudaAllreduceRing, but for now it's a good place to compare the two
implementations and abstract the parts that make sense, until they are
identical again.

Reviewed By: andrewwdye

Differential Revision: D4791629

fbshipit-source-id: 5ad065cb94adb968aeee2379327be313638f2161
2017-03-29 18:50:11 -07:00
a95ce9e98f Using temporary variables when performing transpose + addmm 2017-03-29 16:56:39 -07:00
403cad46dc Using temporary variables when performing transpose + addmm 2017-03-29 16:14:13 -07:00
b8ccf42c74 Constify algorithm constructors
Summary: TSIA

Reviewed By: gchanan

Differential Revision: D4795492

fbshipit-source-id: aaad7afd373e40fa4669129cf2c98594c4091153
2017-03-29 14:21:03 -07:00
8aa1cefed8 Fix deadlock in autograd (#1140) 2017-03-29 16:19:40 -04:00
4b147e2079 Settable timeout for tcp read/write
Summary: Add a setTimeout() API to the Pair interface. Implement in the tcp transport for connect, read, and write, and across blocking, polling, and async configurations. Ibverbs implementation to come later.

Reviewed By: pietern

Differential Revision: D4787932

fbshipit-source-id: 6072dc0c0add1700f84a72b83e4388b29b044ec1
2017-03-29 09:07:04 -07:00
0d908d813b Implements Cumsum function for autograd (#1122) 2017-03-29 17:45:57 +02:00
1c391f6f93 bump version 2017-03-29 10:08:34 -04:00
be146fd721 Add btriunpack and update the btrifact test. 2017-03-29 13:42:13 +02:00
2979f4b989 add more functions to docs 2017-03-29 01:29:17 -04:00
22b3600f19 add samplers to documentation 2017-03-29 00:33:07 -04:00
215813d7ac Change dockerfile to support for cudnn v6 (#1135) 2017-03-28 20:05:04 -04:00
80e88a88ed Fix ibverbs completion queue capacity
Summary:
The header already contained an analysis of required completion queue
depth but the queue pair was still initialized with a maximum queue
depth of kMaxBuffers. This change fixes that and updates the analysis
to talk separately about receive and send completion queues.

Reviewed By: andrewwdye

Differential Revision: D4785786

fbshipit-source-id: 4dc302d523a3b7162dc261d14cfcc755681febf8
2017-03-28 10:06:50 -07:00
dc7695a47a Update links for tutorials in README (#1123) 2017-03-28 14:21:40 +02:00
032a65edff modify pip uninstall command in CONTRIBUTING.md 2017-03-28 14:20:49 +02:00
55546359b6 Retry on EINTR for writev in tcp/pair.cc
Summary: TSIA

Differential Revision: D4783319

fbshipit-source-id: 610d1a65a54048e7c56610632ccfe271eac85b6c
2017-03-27 17:35:45 -07:00
fe3d5a63f2 Support multiple predefined reduction functions
Summary:
Predefining the reduction functions makes it easy to provide a set of
fast implementations. Eigen is used to implement them if it is found.

Reviewed By: andrewwdye

Differential Revision: D4780868

fbshipit-source-id: e825cf2e5cfe8ec27d587c5aff4002534b1c670d
2017-03-27 14:35:02 -07:00
e4b4e515cd add mode to cwrap 2017-03-27 13:29:14 -07:00
4b1f5f4bd6 Merge commit 'afd576ec0e389db3e47efe44652c488b1706f168' 2017-03-27 13:26:50 -07:00
37718e207d Add remote offset argument to buffer send
Summary: This makes it possible to write to any offset in a remote buffer.

Reviewed By: andrewwdye

Differential Revision: D4779776

fbshipit-source-id: f5a44cc705df5141bd720ff4e3fec8697f707a70
2017-03-27 13:07:17 -07:00
afd576ec0e Add mode kernel 2017-03-27 15:58:47 -04:00
95aa2af377 btrisolve: Make a Tensor method and update argument order
Also update docs for btrifact and btrisolve to the newest interface.
2017-03-27 15:46:49 -04:00
6774d39c96 Merge commit '5d274cd4991022d63b014cc8917e00c15441d3f4' 2017-03-27 11:54:08 -07:00
567faedc59 Merge commit '8051dec608368fed3569c7513292785083adc53c' 2017-03-27 11:53:41 -07:00
7c2c7e8e31 Move NCCL code to subdirectory and backfill ops
Summary:
All operations supported by NCCL are now available through the Gloo
wrappers. Algorithm wrappers for them are forthcoming so that they
can be used interchangeably with other implementations.

Since not all of them require same-sized source and destination
pointers, I moved assertions on number of elements to the op
constructors.

Reviewed By: andrewwdye

Differential Revision: D4771292

fbshipit-source-id: 2f34629507b5e1cb9ae8d6d2f02de0a7f641a341
2017-03-27 09:50:40 -07:00
3eab8a71e2 Added docstring to add_module (#1116) 2017-03-27 11:09:24 -04:00
2fd4d088ff add Adaptive pooling methods to docs 2017-03-26 22:43:46 -04:00
5d274cd499 Update btrisolve argument order. 2017-03-26 13:07:24 -04:00
8051dec608 Update btrisolve argument order. 2017-03-26 13:06:34 -04:00
f2c1071c33 Adaptive max and average pooling (1D & 2D) (#1084) 2017-03-26 17:09:28 +02:00
bb71117ecc Cwrap arg assign (#1102) 2017-03-26 13:53:28 +02:00
d25433a099 Fix docker build commands (#1103) 2017-03-25 16:18:33 -04:00
7dd45490f8 don't use inplace backward, remove unnecessary zero for grad_input (#1079) 2017-03-25 20:04:48 +01:00
bf632544e6 Pass NULL rinfo_ to btrifact by default (#1089) 2017-03-24 19:49:40 -04:00
282402d4f3 Revert "Add back zero fill for ger" (#1093)
This reverts commit 5a761dbe65d2221e9c200b3f8ea0590b5d9b923f.
2017-03-24 19:49:31 -04:00
1461709ea0 Improving the performance of IndexLinear:updateOutput
- Removes separate kernel for updateOutputTrain
2017-03-24 16:34:31 -07:00
cce03074f5 Merge commit '3acbbb30f2bdc6ccf4ffb6f7d568e7916d4e384d' 2017-03-24 16:19:44 -07:00
f2f63773d8 Merge commit '52911f9e47f679045a238eb9dfdc5db55bf98cc9' 2017-03-24 16:19:19 -07:00
84aa41824c Merge commit 'b4fe5ad641181f30bdcc4749c949206a3ebb04b4' 2017-03-24 16:19:05 -07:00
25c8a117af Merge commit 'e8196f990db4ba368010f0d950bebf1fb13c2888' 2017-03-24 16:18:52 -07:00
ae122707b5 Don't do extra resize in linear bias 2017-03-24 23:41:15 +01:00
b4fe5ad641 Use zero instead of mul when beta == 0 in addr 2017-03-24 13:09:00 -07:00
5a761dbe65 Add back zero fill for ger
Ger does not have beta argument, so has to be zero-filled.
2017-03-24 21:03:02 +01:00
dd893391d5 Add argument to children to yield the name of the modules (#941) 2017-03-24 20:02:05 +01:00
649f04d077 Added Pascal nvcc flags, bumped version 2017-03-24 11:58:14 -07:00
f45ef5fdb8 AllGather algorithm [CPU]
Summary: Allgather ring CPU implementation. Its does |buffers| x |contextSize| passes.

Reviewed By: pietern

Differential Revision: D4723809

fbshipit-source-id: ffd8366ac7e1746555474e173143d33cee497822
2017-03-24 11:06:57 -07:00
e8196f990d Make rinfo_ argument optional in btrifact 2017-03-24 09:01:36 -07:00
269b77a1b2 Make rinfo_ optional in btrifact 2017-03-24 09:00:39 -07:00
476d85dd3f DataLoader: Fix batch data type for numpy array (#1074) 2017-03-24 11:34:24 -04:00
63f6c0d692 add Pairwise distance (#835) 2017-03-24 11:29:40 -04:00
b546fa3fcd add assertTrue to padding tests 2017-03-24 15:27:51 +01:00
1d656b6769 Ensure displayed progress in ProgressMonitor is between 0 and 100%.
Fixes #1086
2017-03-24 15:21:52 +01:00
3acbbb30f2 Fix inconsistent in-place and out-of-place for HardTanh
in-place and out-of-place updateGradOutput results are different where input=min_val or input=max_val
2017-03-23 17:27:29 -07:00
52911f9e47 Fix inconsistent in-place and out-of-place implementations
Currently in-place and out-of-place updateGradOutput will produce different results for input=max_val or input=min_val - in-place won't backprop gradient where input=max_val or input=min_val, out-of-place will backprop gradient in this case.
2017-03-23 17:22:55 -07:00
a65e0f488c Remove zero fill where not needed (#1077) 2017-03-23 19:44:00 -04:00
8dc5d2a22e export current_blas_handle 2017-03-23 23:32:45 +01:00
ed97f3f854 Adding support for flattened inputs for IndexLinear
- Adding relevant tests
2017-03-23 14:18:41 -07:00
a231fe8fc5 IndexLinear support for cunn 2017-03-23 14:18:01 -07:00
bb353ccc17 Add batch triangular factorization and solves, add IntegerTensor to cwrap (#903) 2017-03-23 15:06:00 -04:00
ced0054a9e Fix formula for stddevs grad in Normal function (#1076) 2017-03-23 14:32:34 -04:00
68ee5ede29 make inplace tests compare input grads 2017-03-23 18:54:00 +01:00
2966e3295d Make static/shared configurable and install optional
Summary:
This makes it possible to embed Gloo in a project without CMake
installing Gloo headers and/or libraries, or having a runtime
dependency (and statically link to it).

Also:
* Install benchmark tools
* Statically link to NCCL if the bundled version is used
Closes https://github.com/facebookincubator/gloo/pull/19

Differential Revision: D4762432

Pulled By: pietern

fbshipit-source-id: cf38903e6c51f2480fba4ff18cbdc0c9080df0c4
2017-03-23 09:06:37 -07:00
4df98e2927 Merge commit '3865606299b1fbcd0a94cef4a66c1bc007246da8' 2017-03-23 08:39:43 -07:00
6ccac5ce28 Merge commit 'd3334db6274d7a3cd07f20d583056e453dc8134d' 2017-03-23 08:39:30 -07:00
3865606299 adding batch triangular factorization and solves, add IntegerTensor to cwrap 2017-03-23 11:37:00 -04:00
d3334db627 adding batch triangular factorization and solves, add IntegerTensor to cwrap 2017-03-23 11:35:35 -04:00
50f5a4dd18 fix BCE loss formula visualization (#1072) 2017-03-23 11:27:21 -04:00
b60936b9ae fix NLLLoss2d documentation 2017-03-23 10:06:40 -04:00
2d750b9da5 fix typo 2017-03-23 09:40:06 -04:00
ca376d4584 implement autograd function trace 2017-03-23 10:37:52 +01:00
ef183a1d23 Merge commit '5cd313ed23a3b11ddd739bcfedaee6e310e4e438' 2017-03-22 19:25:46 -07:00
f4d8944973 fix OSX fread bug (#1068) 2017-03-22 22:06:14 -04:00
6b7aef63ac Added support for multidimensional tensors in PReLU; Channel number now in second dimension 2017-03-22 20:36:52 -04:00
b3ab4b1094 Check torch.backends.cudnn.enabled, padding, and output_padding (#996)
* Check torch.backends.cudnn.enabled
* Don't allow negative padding and output_padding values
2017-03-22 19:42:11 -04:00
1e8cb82a2d Break only after the update in L-BFGS 2017-03-22 18:58:42 -04:00
dd399a8d68 Return total param norm from clip_grad_norm 2017-03-22 18:58:42 -04:00
faac0f5c25 Fix torch.cat bugs
Always use PySequence API and disallow catting along inexistent
dimensions.
2017-03-22 18:58:42 -04:00
c36f47bd1e Make random_ exclusive and make generator kwarg only in all random
functions
2017-03-22 18:58:42 -04:00
3d1888cd95 Fix size mismatch in CosineEmbeddingLoss backward 2017-03-22 18:58:42 -04:00
97a82a3018 fix formatting in upsampling docs (#1067) 2017-03-22 18:06:31 -04:00
5cd313ed23 Fix TH_TENSOR_APPLYX_D in the case where the dimension of interest is the inner dimension 2017-03-22 13:15:01 -07:00
b414494035 Merge commit '714b2b8bf657afe41cc8503998b6d919339b8075' 2017-03-22 12:49:29 -07:00
c10efc646e Merge commit 'e17d84d38edf6094175deead555abbc96321b69f' 2017-03-22 12:49:11 -07:00
348531ad8d Merge commit '0056b0883426e38ffbd646c040b6c281d12673f2' 2017-03-22 12:48:57 -07:00
9d83121ef5 Don't add options to CUDA_NVCC_FLAGS if already set
Summary:
This may be the case when the Gloo CMake files are sources from a
parent project that has already imported CMake CUDA support. If these
checks are not performed then CUDA_NVCC_FLAGS might contain
conflicting options.

Verified this works while working on Gloo for Caffe2.
Closes https://github.com/facebookincubator/gloo/pull/18

Differential Revision: D4756179

Pulled By: pietern

fbshipit-source-id: 32fc39ec2322cce5899a2398ebbf8395d3917502
2017-03-22 12:35:04 -07:00
6d7cb31e53 MPI: Duplicate MPI_Comm and allreduce maxLength as MPI_ UNSIGNED_LONG.
Summary:
Some small MPI-related changes:
1) Instead of making an object copy of the MPI_Comm, call MPI_Comm_dup;
because the (passed-in) communicator is used later via the call to
connectFullMesh this guarantees that the communicator will not have been
freed by user before connectFullMesh is called.

2) Allreduce for maxLength is done on an unsigned long type; use the
corresponding MPI type.
Closes https://github.com/facebookincubator/gloo/pull/17

Differential Revision: D4754195

Pulled By: pietern

fbshipit-source-id: 863fd33c726f88120f8f5ee61964c3525babbf97
2017-03-22 09:26:00 -07:00
30a9cf7a46 Mark transport pair after IO error and propagate to calling threads
Summary:
This change solidifies IO error handling between threads and successive transport API calls. When an IO exception occurs, signal all buffers of the error, propagating the exception from the device thread or single user thread onto all user threads. Store the exception in the pair and check on future API calls or device events. Swallow all IO exceptions in the device loop.

Right now IO exceptions during portions of the listen/connect phase will result in an indefinite wait in the peer. I will address this with a configurable timeout (t16205269).

Reviewed By: pietern

Differential Revision: D4749248

fbshipit-source-id: c75ee3b20875d561bf84631e5384e28015dabad3
2017-03-22 09:06:24 -07:00
714b2b8bf6 Merge pull request #453 from apaszke/lookup_renorm
Cast accumulator in LookupTable renorm to accreal
2017-03-22 11:53:41 -04:00
fe4bd5066b Added support for multidimensional tensors in PReLU; Channel number now in second dimension 2017-03-22 11:45:02 -04:00
e17d84d38e Added support for multidimensional tensors in PReLU; Channel number now in second dimension 2017-03-22 11:44:28 -04:00
b9aef6bc03 Fixing default values for LR and Epsilon (#895)
It seems that the default values for LR and Epsilon (previously, 1E-2 and 1E-38 respectively) were different from the ones recommended by the authors (2E-3 and 1E-8, respectively). Other packages such as Keras (https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L474) and Lasagne (https://github.com/Lasagne/Lasagne/blob/master/lasagne/updates.py#L612) use the suggested values as well.
2017-03-22 11:34:39 -04:00
0056b08834 Narrow V when returning only some right singular vectors 2017-03-22 08:33:03 -07:00
bd0df61bb5 Cast accumulator in LookupTable renorm to accreal 2017-03-22 08:29:39 -07:00
d9678c2e34 Correct typo in batchnorm documentation 2017-03-22 13:55:45 +01:00
b3c0aa3b7d fix a typo in ffi doc (#1055) 2017-03-21 15:37:48 -05:00
8fc9c79287 Add nccl submodule 2017-03-21 17:53:58 +00:00
4fce1a389f Include CUDA support in CMake build
Summary:
* Pull in NCCL submodule
* Include (heavily modified) CUDA/NCCL build files from [Caffe2](https://github.com/caffe2/caffe2)
* Build CUDA enabled benchmark/test
* Enable CUDA build in Travis configuration
Closes https://github.com/facebookincubator/gloo/pull/16

Differential Revision: D4746784

Pulled By: pietern

fbshipit-source-id: b5c6cbcd8ac8b30c071851cdc7ae88c69c0ab4d6
2017-03-21 10:51:57 -07:00
8ce56c30d4 Convert runtime errors to gloo exceptions
Summary:
Bubble up gloo configuration and network errors as exceptions. The caller may be able to recover. Other unexpected failures continue to be handled as fatal with GLOO_ENFORCE

Modify ibverb API validation to check for != 0 instead of -1 to conform with API definition.

Still need to convert some errors in the rendezvous code and add documentation.

Will pass device loop errors onto the calling thread in a future diff

Reviewed By: pietern

Differential Revision: D4730362

fbshipit-source-id: c801adb353013e7f541ab01ac16a0cc71c1c36b2
2017-03-20 13:50:29 -07:00
4667f936e3 Add explicit dependency on pthreads
Summary:
Got linker errors on Ubuntu 16.04 (not on 14.04).
Adding the pthreads dependency explicitly fixes it.
Closes https://github.com/facebookincubator/gloo/pull/15

Differential Revision: D4739081

Pulled By: pietern

fbshipit-source-id: 6bae7d361d934e93560d28a76c3dca4a4236f113
2017-03-20 11:52:41 -07:00
4eaa30b634 Build tweaks
Summary:
* Mention submodules in README
* Remove fetch.sh from third-party directory
* Rename benchmark/test build targets
Closes https://github.com/facebookincubator/gloo/pull/14

Differential Revision: D4739077

Pulled By: pietern

fbshipit-source-id: 859c1cac0c0163870eae8f18e4e2f177a6bc8890
2017-03-20 11:35:19 -07:00
77fbc12f23 Fix some deadlocks when torch_shm_manager is not found (#1030)
- Add additional timeouts to test_multiprocessing to reduce chances of
   hanging indefintely on failure
 - Add missing header guards
 - Fix typo
 - Check that torch_shm_manager exists in torch/__init__.py
2017-03-17 18:28:39 -04:00
7e46eb1613 Fixes for Prod and Expand functions (#1026)
Thanks to @ChangYong-Oh for the original implementation.
2017-03-17 18:24:44 -04:00
821656d2d8 add CONTRIBUTING document 2017-03-17 07:59:37 -04:00
86e40ed875 Fix a typo in docs about pinned memory buffers (#1023)
* remove misleading guide for BCELoss

* fix docs about pinned memory buffers
2017-03-17 05:08:03 -04:00
1d0699e147 Define exception hierarchy
Summary: Define an exception hierarchy for gloo runtime errors. Keep GLOO_ENFORCE macros for assertions.

Reviewed By: pietern

Differential Revision: D4724124

fbshipit-source-id: 22f0581b06524579e86fe335770bdb620d20e258
2017-03-16 15:08:01 -07:00
b9379cfab7 Use cuDNN and NCCL symbols from _C library (#1017)
This ensures that we use the same library at the C++ level and with
Python ctypes. It moves the searching for the correct library from
run-time to compile-time.
2017-03-16 16:10:17 -04:00
f0b75c4aa4 Merge pull request #729 from shenxiul/cuda_linspace
linspace and logspace for CUDA Tensors
2017-03-16 14:03:00 -04:00
7654b3f49e Add function to compute cross_entropy for 2D image (#802) 2017-03-16 17:34:04 +01:00
37ebbc2809 the length of any item in padded_sequence should be greater than 0 (#1013) 2017-03-16 17:32:43 +01:00
8241cd7b6e Fix compilation error when compiling with 'clang -x cuda'.
Functions vFetch and vStore are not found by ADL with clang,
so they need to be declared before usage in ReduceCopy.
2017-03-16 12:01:11 +01:00
a7781fdebc Use default Redis port in RedisStore constructor
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4718573

fbshipit-source-id: c0b9aa78cf1f4db910526841c0172537b9243f7e
2017-03-15 22:19:51 -07:00
29ddbc3e37 implement linspace, logspace and range in CUDA 2017-03-15 20:50:30 -07:00
16a133ed9a Fixes for testing on FB infra (#1009)
- make each test in test_autograd have a unique name ignoring case
 - assemble all tests when test_legacy_nn is imported
 - import Python.h in PtrWrapper.h
2017-03-15 18:37:11 -04:00
1aa665f6a8 Documentation
Summary:
* Add separate file for rendezvous docs
* Mention using MPI for rendezvous
* Fix algorithm docs formatting
Closes https://github.com/facebookincubator/gloo/pull/13

Differential Revision: D4715442

Pulled By: pietern

fbshipit-source-id: 0469ab8d16fd489a38c399ec2b25860d1225ce72
2017-03-15 14:58:51 -07:00
c4d1318662 Fix map_location in torch.load (#1006) 2017-03-15 16:54:19 -04:00
379ae6d865 Refactor out dispatchStateless (#1007)
Some of the error messages were incorrect due to erroneous
'tensor == THPDefaultTensorClass' checks
2017-03-15 16:24:55 -04:00
24376ff9d3 Merge pull request #723 from killeent/scan-primitive
add implementation of inclusive scan via upsweep-downsweep
2017-03-15 14:37:21 -04:00
6ac793dcbe Reuse ncclComm_t across algorithm instances
Summary: Initializing ncclComm_t is expensive. Allocate a set of ncclComm_t for each unique device set and cache for reuse. With this change the CudaAllreduceChunked tests runtime improved from ~170 sec -> ~10 sec on my machine. There is no improvement in the benchmark numbers because the algorithm instance is only allocated once.

Reviewed By: pietern

Differential Revision: D4708943

fbshipit-source-id: 85b85070586d6683a762b8282df593ca831e7bc7
2017-03-15 09:51:43 -07:00
e00d9c1fd8 Execute benchmark through mpirun
Summary:
This change includes CMake changes to compile the MPI assets when the USE_MPI flag is enabled. If so, the benchmark tool can now be launched through mpirun.

Includes the changes done in #11.
Closes https://github.com/facebookincubator/gloo/pull/12

Reviewed By: Yangqing

Differential Revision: D4712060

Pulled By: pietern

fbshipit-source-id: 0d0e93882f5822583f59304d4256dbdf5dea7483
2017-03-15 08:21:12 -07:00
be6322e4b5 Update nn.init docstrings to correctly reference the module (#1001) 2017-03-15 11:17:59 -04:00
62063b2f62 Fix docs for pointwise ops (#845) (#985)
* add torch.nn.init docs to the source folder
2017-03-15 11:08:05 -04:00
13b1580613 add F.pad to docs 2017-03-15 00:09:14 -04:00
fe788f5003 Use correct event to synchronize destination buffer in NCCLElement
Summary: NCCLOp::runNCCL is mistakenly recording an event in the source pointer after the NCCL op. This results in NCCLOp::wait() returning without synchronizing with the output buffer. The synchronous tests using NCCL fail.

Reviewed By: pietern

Differential Revision: D4708860

fbshipit-source-id: 0c36511e260b587d410e5c9604552ceedd06d988
2017-03-14 19:20:59 -07:00
e50a1f19b3 Use streams in scatter to overlap copy with compute 2017-03-14 22:46:07 +01:00
e86db387ba Fix conv1d backward segfault (#999) 2017-03-14 16:15:53 -04:00
1bf61b8adc Add googletest submodule 2017-03-14 03:39:54 +00:00
704ee3ca68 Use cudart symbols from the main program.
Our extension library links against cudart and pulls in the symbols. Use
LoadLibrary(None) to use the same symbols as the _C extension.

This fixes the PyTorch wheel when you don't have system CUDA installed.
2017-03-13 19:45:34 -04:00
9004652c7b updated the documentation to remove the unnecessary copy grads when using multiprocessing 2017-03-13 19:04:17 -04:00
aca6ce984c change lookup table sort 2017-03-13 13:55:16 -07:00
ed8773f7bd add legacy_serialized.pt to gitignore 2017-03-13 16:37:35 -04:00
0f7b7b27b1 Fix build for CMake 2.8.12
Summary:
This is the minimum required CMake version (also the version that is available on Ubuntu Trusty (14.04)).
Closes https://github.com/facebookincubator/gloo/pull/9

Reviewed By: Yangqing

Differential Revision: D4698659

Pulled By: pietern

fbshipit-source-id: bf01541fe485c03e7c665f175c2887feaf9516a3
2017-03-13 13:06:15 -07:00
48f48b6ff2 fix more flaky VolumetricMaxPooling tests 2017-03-13 14:38:27 -04:00
615b27eadf fix corner case in SetItem of Variable 2017-03-13 14:38:27 -04:00
86ede33035 CMake improvements for Gloo
Summary: Install headers and add .. to include directories

Reviewed By: pietern

Differential Revision: D4695500

fbshipit-source-id: f48a49f03e575408829793cb63bfdb16d8e3a309
2017-03-13 11:06:05 -07:00
bd09055207 Synchronize all NCCL ops with shared per-device streams
Summary:
Allocate a set of per-device streams used to serialize NCCL op scheduling. These ensure concurrent NCCL ops are not interleaved across devices (i.e., through priority scheduling), resulting in deadlock.

Synchronize source and destination streams with NCCL streams.

Reviewed By: pietern

Differential Revision: D4685360

fbshipit-source-id: 3c228b195b0a0d9d7cccc720163898d344a5ed4c
2017-03-13 09:20:05 -07:00
4bd220d91a Travis contbuild scripts and cmake fix.
Summary:
TSIA. Redoing #7 to kick travis.
Closes https://github.com/facebookincubator/gloo/pull/8

Reviewed By: Yangqing

Differential Revision: D4697132

Pulled By: pietern

fbshipit-source-id: d03148aeddb2cf927b4ef3689c97d9ba4f4cdc9d
2017-03-13 08:36:10 -07:00
170d790b66 fix doc of conv3d in conv.py (#989)
the second dimension should be height.
2017-03-13 11:30:13 -04:00
e216f557fd Fixes issue returning strings from a Dataloader with pin_memory=True (#908) 2017-03-13 10:11:07 +01:00
997312c233 Add WeightedRandomSampler (#980)
Samples elements from `[0,..,len(weights)-1]` with given probabilities (weights). So far there is no mean to either introduce sample weights in loss functions or while sampling from a dataset. This is an attempt to add the functionality for the latter issue.
2017-03-13 00:27:05 -04:00
d602b3a834 Allow submodules and parameters to shadow attrs on assignment 2017-03-12 13:31:32 -04:00
f531d98341 Fix memory leak in torch.from_numpy 2017-03-12 13:31:32 -04:00
6bdd5ecaf5 Remove some unnecessary AutoGPU calls 2017-03-12 13:31:32 -04:00
bfbde9d6eb Fix Embedding bug when max_norm was used 2017-03-12 13:31:32 -04:00
b9c816a796 Fix run_test.sh --coverage option. (#983) 2017-03-11 19:26:02 -05:00
2f5c215d34 Update setup.py (#981)
Adding `description` to `setup.py`
2017-03-11 12:14:07 -05:00
01650ac9de add torch.nn.init docs to the source folder (#979) 2017-03-11 10:11:30 -05:00
ce536aa355 fix example in docs for NLLLoss 2017-03-10 16:48:08 -05:00
fc0af33a18 key only block-wide bitonic sort 2017-03-10 11:50:43 -08:00
c7c4778af6 modify docs of broadcast to fix issuse #940 (#970) 2017-03-10 09:54:43 -05:00
d873077349 Create context from existing MPI communicator
Summary:
This makes it easy to use Gloo transports and algorithms in existing
MPI environments.

Reviewed By: andrewwdye

Differential Revision: D4685999

fbshipit-source-id: cfc7d0e445893512b4e4ed2abe1bb280d83b9c70
2017-03-09 23:06:18 -08:00
0c38827318 Split out rendezvous specifics from context
Summary:
How pairs are setup and connected to one another is specific to
whatever underlying rendezvous mechanism is used. This change moves
the `connectFullMesh` function into a subclass in the `rendezvous`
directory. This prepares for a separate MPI context that can setup
pairs between processes using an existing MPI communicator.

Reviewed By: andrewwdye

Differential Revision: D4684755

fbshipit-source-id: 9eb643b8ba545b3e6f9a36b65642b3b04a5f0077
2017-03-09 23:06:18 -08:00
fb766c00b3 Align async\wait pattern to use wait() naming
Summary: TSIA

Reviewed By: pietern

Differential Revision: D4686783

fbshipit-source-id: ccbdace0d53219bd4b881ea27f7f972b206215b6
2017-03-09 21:20:45 -08:00
e600c9830a Fix up NCCLElement construction in CudaBroadcastOneToAll
Summary: TSIA

Reviewed By: pietern

Differential Revision: D4686520

fbshipit-source-id: 657ca90aa1971be152b037563105a9f490137a69
2017-03-09 20:37:03 -08:00
73a65cd29f simple ordering fix to avoid gcc warning 2017-03-09 17:10:59 -08:00
b785ed0ac0 Fix Embedding and CosineEmbeddingLoss on non-float CUDA (#965) 2017-03-09 18:04:40 -05:00
b2d077d81d Update _tensor_docs.py (#966) 2017-03-09 18:04:19 -05:00
4814b0bc09 Recompose NCCLElement of src/dst CudaDevicePointers
Summary: CudaDevicePointer has the information we need for a NCCL op. Refactor NCCLElement as a composition of src and dst CudaDevicePointers. This allows for separate streams for src and dst, and will simplify a future change to use a static set of streams for all NCCL ops.

Reviewed By: pietern

Differential Revision: D4679483

fbshipit-source-id: 75656cc2fa5b5e2a6c096d914d2111769a47291b
2017-03-09 12:26:55 -08:00
b1c2714ad5 Add momentum and centered options to RMSProp (#810)
* add momentum and centered options

Add two options :
 - Momentum (like SGD's momentum)
- Centered RMSprop, as in Graves 2013 ( https://arxiv.org/abs/1308.0850 ) : grad is normalized by running estimation of its variance

* somme PEP8

* bug in default

* bug2

* sign mistake

* alloc of momentum & centered only if needed

* add link to docstring

* some pep8 on docstring

* implement __setstate__() for backward compatibilty

* correct grammar mistake

* multiply by lr when adding delta to params

* rename momentum variables

* change __init__ params order
2017-03-09 10:04:32 +01:00
a462edd0f6 Docs(RNN|GRU|LSTM): Note dropout applies to all layers *except* the last layer (#961)
This is an important clarification to make as otherwise users are misled as to where they may need to add dropout and to clarify the situation would need to delve into the backend implementation. 
4647f753bc/torch/nn/_functions/rnn.py (L73)
2017-03-08 18:09:11 -05:00
c2425fc9a1 Fix build warning for C file 2017-03-08 21:28:57 +01:00
fbcedf2da2 Merge commit '3d95e13b332e1b31d706b59c3b67f886958ece79' 2017-03-08 09:09:46 -08:00
3d95e13b33 Check event_count before merging blocks 2017-03-08 08:49:04 -08:00
228e1a8696 Add CUDA caching allocator accessor 2017-03-08 08:29:50 -08:00
be0e8c0009 Use sequential slot numbers from context
Summary:
Add a nextSlot() function to the context that increments and
returns a slot number. This enables multiple algorithms sharing the
pairs part of a context. The slot numbers were hardcoded before this
change, which prevented reuse.

After this change, some of the tests can be changed to run multiple
times (or do a parameter sweep) without respawning a new threadpool or
allocating new fixtures.

Also change some internally used variable names for more consistency.

Reviewed By: andrewwdye

Differential Revision: D4668268

fbshipit-source-id: 65cbc8f2666f0b7d2f1c72574b86d913f5855d62
2017-03-08 08:23:03 -08:00
3fa8a3ff46 add implementation of inclusive scan via upsweep-downsweep 2017-03-08 07:34:14 -08:00
4647f753bc Merge commit '0f872ed02fbaf5b326f235b3f18724171b061416' 2017-03-07 14:45:01 -08:00
7ba5e7cea1 fix VolumetricMaxPooling test instability (#952) 2017-03-07 10:55:46 -05:00
9b626a8047 Fix documentation - replace 'matrix' with 'vector' (#951) 2017-03-07 10:40:18 -05:00
bd0e9a73c7 Fix some simple build error on MacOS (#949)
Issue #948

Signed-off-by: Zhou Chang <achang.zhou@gmail.com>
2017-03-07 09:47:49 -05:00
7bddd586f7 Change PrefixStore to take a Store reference
Summary:
Taking ownership of a std::unique_ptr is a bit awkward. It's actually
useful to reuse the underlying store and create multiple prefix stores
against it.

Reviewed By: andrewwdye

Differential Revision: D4662354

fbshipit-source-id: eaf62f7d5a97d6ee848252ff3124c28da349f6f2
2017-03-06 22:19:49 -08:00
da10450535 Allow multiple input pointers to broadcast algorithms
Summary:
This changes the constructor prototype of the broadcast algorithms.
They now take the rank of the root process and the rank of the root
pointer. The root process now also broadcasts locally, among the
specified pointers, in addition to broadcasting to its peer processes.

The broadcast tests are made more robust to use a different value at
every index for every buffer, like the allreduce tests. To accomodate
multiple input buffers for CPU side algorithms, I added a Fixture
helper, and renamed the existing Fixture class to CudaFixture.

The broadcast tests contain a few TODOs since they don't vary the root
process or root pointer yet. I anecdotally verified this does work,
but didn't want to include the necessary changes to do so in this
commit (it requires some changes in rendezvous and NCCL code). A fix
for this is forthcoming.

Reviewed By: andrewwdye

Differential Revision: D4661635

fbshipit-source-id: c069e0d4e8f676a63efd74b15ea1156adcc09477
2017-03-06 22:19:49 -08:00
2b1cd919ce Update extending.rst (#933) 2017-03-06 23:23:14 -05:00
8e46a15605 add docs for set_printoptions to sphinx (#945) 2017-03-06 21:52:37 -05:00
15a9fbdedb Merge pull request #881 from colesbury/parallelize_backwards
Parallelize autograd backwards
2017-03-06 16:57:19 -05:00
6336300880 Fix bug where adding a hook could replace an existing hook.
We were keying hooks by RemovableHandle id. However, we don't hold onto
handles and ids of dead objects can be reused. This replaces id(handle)
with a global counter.
2017-03-06 12:47:53 -08:00
5073132837 Implement 'pre' and 'post' hooks at the C++ autograd level 2017-03-06 12:47:53 -08:00
65b66264d4 Improve broadcast/reduce performance by coalescing tensors 2017-03-06 12:47:53 -08:00
0f872ed02f Add THCCachingAllocator_recordStream()
This is similar to THCCachingHostAllocator_recordEvent() but on CUDA
allocations. It's useful for overlapping copies with computation. The
workflow is approximately:

  0. allocate dst tensor on copy stream
  1. copy from CPU to GPU on copy stream
  2. synchronize the main stream with the copy stream via
     cudaStreamWaitEvent
  3. THCCachingAllocator_recordStream(dst, main_stream)

The recordStream() call is necessary to prevent the dst tensor from
begin reused on the copy stream before the main stream finishes work.

Previously, you would need to insert a second cudaStreamWaitEvent before
dst is freed to force the copy stream to wait on the main stream.
2017-03-06 10:50:19 -08:00
761d6799be code syntax error in document (serialization.rst) (#937) 2017-03-06 10:06:04 -05:00
0d179aa8db Updated datasets.rst, combined all commits (#931)
Added MNIST in the docs

Updated incomplete cifar doc

Updated the datasets.rst to include all datasets
2017-03-05 17:38:28 -05:00
5b171ad7c2 remove misleading guide for BCELoss (#924) 2017-03-05 14:31:01 -05:00
ac9245aeb3 import numpy before setting dlopen flags (#928) 2017-03-05 14:30:13 -05:00
60736bdf99 fix corner case in kwargs for DataParallel (#930) 2017-03-05 14:27:52 -05:00
7d58765cee docs: Fixed example code bug in extending module doc. 2017-03-05 12:09:08 -05:00
76f7d749e4 bump version 2017-03-05 08:49:52 -08:00
0b7374eb44 add THCS to build_all flags 2017-03-05 11:32:43 -05:00
6fff764155 replace old select_compute_arch.cmake with new 2017-03-05 11:32:43 -05:00
8ced72ccb8 link THPP to THCS when CUDA available 2017-03-05 11:32:43 -05:00
b1ae7f90d5 Added functionality for data parallel table (#843) 2017-03-05 02:35:46 +01:00
8b61ee522e Merge commit 'aec182ae72d51dad0f46cdfe7ff9a41380d7da35' 2017-03-04 08:58:21 -08:00
76ca3eb191 Merge commit 'fea50a51ee2d9af15c42f785ab2232469357b557' 2017-03-04 08:58:02 -08:00
fea50a51ee reintroduce USE_AVX* for files which dont have -mavx* set 2017-03-04 08:55:43 -08:00
51e589ed73 fix critical bug in adds SSE implementation 2017-03-04 08:39:19 -08:00
2e87643761 remove fastmath for everything except simd/convolve 2017-03-04 08:16:47 -08:00
ba9a85f271 fix bug introduced in #952 2017-03-03 21:00:05 -08:00
a22fd7194e More assertions for state change in TCP transport
Summary:
I have seen a stress run crash with unexpected state. Adding these
assertions will give more information when it happens again.

```
terminate called after throwing an instance of 'gloo::EnforceNotMet'
  what():  [enforce fail at gloo/transport/tcp/pair.cc:407] false. Unexpected state: 5
```

Reviewed By: andrewwdye

Differential Revision: D4652216

fbshipit-source-id: e787f4097f5ab32367dd9fa5a336d0389b97e955
2017-03-03 14:20:07 -08:00
0714d7a3ca set AVX/AVX2 flags only for specific files 2017-03-03 12:17:14 -08:00
fb7bafdd0f Update README.md
Summary:
Fix styling in README
Closes https://github.com/facebookincubator/gloo/pull/4

Differential Revision: D4651501

Pulled By: pietern

fbshipit-source-id: e2d4384ac94972f6c4fc03467564460ea4ce5c85
2017-03-03 11:40:02 -08:00
34ce58c909 Parallelize backwards 2017-03-03 11:26:00 -08:00
c238ee3681 Fix issues with lazy grad initialization (#912) 2017-03-03 14:23:51 -05:00
e1d7eaf7d8 Latency optimization tips
Summary: Closes https://github.com/facebookincubator/gloo/pull/3

Differential Revision: D4651203

Pulled By: pietern

fbshipit-source-id: 202afcbe26ec77ea93e48e72fea0d36f18b1b026
2017-03-03 11:05:17 -08:00
f5338a1fb8 compile AVX and AVX2 intrinsic code in separate files. Cleanup use of USE_AVX and USE_AVX2 macros in favor of __AVX__ and __AVX2__ 2017-03-03 10:30:18 -08:00
d96ad41191 cleanup TH CMakeLists and THGeneral.h of unused flags 2017-03-03 09:48:26 -08:00
f17cfe4293 sparse tensor operations (#735) 2017-03-03 18:37:03 +01:00
aec182ae72 Support half precision in baddbmm 2017-03-03 16:15:39 +01:00
c93c884ee2 Add negative dimension to transpose and tests (#792) 2017-03-03 09:31:22 -05:00
c42a2d4d24 Fix dimension check for cat (#959)
* Use TH_INDEX_BASE when verifying dimension for cat

* Adding tests for cat when no dimension is specified.

- Also renamed ldimension to cat_dimension to be more specific.
2017-03-03 09:05:06 -05:00
f89252c336 Merge pull request #719 from twitter-forks/cat-fix
Fixes to cat
2017-03-03 09:04:06 -05:00
490c15fae9 Fix slicing with step (#905) 2017-03-03 09:00:14 -05:00
7e3b572ca7 Document algorithm semantics
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4647587

fbshipit-source-id: a804e7479e6e2f511bfa59712b4b4a88bdf657e3
2017-03-02 21:35:28 -08:00
5fbcd88102 Rename public member fields on gloo::Context
Summary:
The fields are public so their names should not end with an
underscore.

Reviewed By: andrewwdye

Differential Revision: D4645038

fbshipit-source-id: c12b47affbe511383a4722717a06abb61918473b
2017-03-02 19:49:45 -08:00
f2d72ba10f Revert "make handles to be thread-local"
This reverts commit 0720ba53b344809ce3d0bdfb1ea561afa5fe0646.
2017-03-02 17:48:24 -08:00
2108b42b92 Fix bug in cat when dimension is not specified.
- Code was using dimension specified which was negative
- Changed the cat_dimension variable to be more explicit
- Fixed code to use the cat_dimension variable
2017-03-02 16:14:09 -08:00
bae8df62d3 Add missing THCudaCheck around cudaMemcpy 2017-03-02 16:13:39 -08:00
a2b2880cc2 Remove underscores from public fields in NCCLContext
Summary: Remove underscores from public fields in NCCLContext

Reviewed By: pietern

Differential Revision: D4645857

fbshipit-source-id: 2c28a1c23d31097d685c0768dad9b99bbef7b171
2017-03-02 16:05:15 -08:00
70fc15c05c More documentation
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4644734

fbshipit-source-id: 50f5fadd2c5cd04e06a025f5538187ed852e669a
2017-03-02 15:50:37 -08:00
98775b6bb4 Merge pull request #718 from killeent/templatize-scan
genericize PrefixSum --> PrefixScan via binary operator template parameter
2017-03-02 17:50:56 -05:00
b7cc2a501f genericize PrefixSum --> prefixScan 2017-03-02 14:31:27 -08:00
0720ba53b3 make handles to be thread-local 2017-03-02 11:10:49 -08:00
ff5fa11129 make mkl link to threaded version with GCC (#958) 2017-03-02 13:37:25 -05:00
837023bb4f Change benchmarks to support multiple input buffers
Summary:
The NCCL code used in CUDA-aware allreduce does local reduction of N
buffers prior to putting anything on the wire. Supporting this in the
benchmark tool to measure the impact under various configurations.

Other minor tweaks in this change:
* Specify sub-second iteration time
* Templatize allreduce benchmarks (the algorithms share a constructor
  prototype)

Reviewed By: andrewwdye

Differential Revision: D4639517

fbshipit-source-id: f7417d3e9f79278a3b1eca48d779f48b77e5260c
2017-03-02 10:16:39 -08:00
e88d241757 Cuda algorithms should return asynchronously if device streams are passed in
Summary: Cuda algorithms take an optional set of device streams to sequence operations. If streams are provided, the algorithms should enqueue final output buffer operations on the associated stream and return asynchronously. Destructors that allocate streams/events should synchronize before tearing down.

Reviewed By: pietern

Differential Revision: D4636447

fbshipit-source-id: 32ec2adc214c83b0b4bc0fff8993ab196459117b
2017-03-02 10:16:38 -08:00
ecb37e4439 Update tests to cover potential reordering problems
Summary:
With this change, every buffer gets assigned a different
value at every index. This means reordering of segments (e.g. in the
chunked algorithm) would surface as test errors.

Reviewed By: andrewwdye

Differential Revision: D4636368

fbshipit-source-id: 464eb1515d1590e12481961d427a92e2ebb3be82
2017-03-02 10:16:38 -08:00
0c88194807 CUDA documentation
Summary: CUDA documentation detailing high-level support for CUDA in gloo algorithms, usage of streams, and synchronizing memory management.

Reviewed By: pietern

Differential Revision: D4633120

fbshipit-source-id: d88e230c8dc82fe48cda0f401b61758fa4f07f2e
2017-03-02 10:16:38 -08:00
50e73a8313 Support synchronous mode in ibverbs transport
Summary:
Synchronous mode means using the calling thread instead of the device
thread for completion handling. Since this saves a context switch in
the critical path, this is very beneficial for low latency algorithms.

For example: the p99 of a 4-way barrier drops from 17us to 4us.

Reviewed By: andrewwdye

Differential Revision: D4626948

fbshipit-source-id: 013b1680497589fe5ad0bca38600bce6a410200b
2017-03-02 10:16:38 -08:00
fc7f026980 Refactor ibverbs transport to prepare for sync mode
Summary:
All pairs created by a device would use the same completion queue.
Supporting sync mode that way is difficult, as there is no way to
filter completions for a particular pair. This change refactors this
to use a single completion queue per pair so that this is no longer an
issue. This change is a preparation for supporting synchronous mode
(where the calling thread itself will poll the ibv library for
completions instead of the device thread).

This change also includes a refactoring of the way transient memory
regions are handled so that they are properly deregistered and
deallocated when no longer needed.

Reviewed By: andrewwdye

Differential Revision: D4625146

fbshipit-source-id: 21bf5ab321534fbd5c03f12049c10fc67da68944
2017-03-02 10:16:38 -08:00
9f18f83375 Downcase setMutex
Summary: TSIA

Reviewed By: andrewwdye

Differential Revision: D4626965

fbshipit-source-id: 2d32b07182202f65e673795aefacc6cc991d3c7c
2017-03-02 10:16:38 -08:00
9c114e6f1c Fix compile error
Summary: std::atomic was not defined for cuda.cu.

Reviewed By: andrewwdye

Differential Revision: D4624611

fbshipit-source-id: 973bba10026e065667d6a576055d00505ee02d62
2017-03-02 10:16:38 -08:00
0e78a59610 add mutex getter/setter to synchronize CUDA and NCCL ops
Summary: Allow gloo consumers to assign a mutex to synchronize CUDA malloc/free and NCCL operations.

Reviewed By: pietern

Differential Revision: D4622135

fbshipit-source-id: 60acd7c01a677a0df5415fe38e6ef5a2e7c8606a
2017-03-02 10:16:38 -08:00
5e7f5db332 add subset samplers (#888) 2017-03-02 09:26:10 -05:00
b5f7592140 boolean mode in module.train 2017-03-02 09:18:05 -05:00
f366e5fc81 Support int16 numpy conversions
issue #891
2017-03-02 09:15:57 -05:00
48f087f6ce C99 cleanup broke MSVC (#952)
* __pragma for MSVC.
2017-03-02 08:57:28 -05:00
7fef264bfa Bumping version to 1.3.3 2017-03-01 16:44:27 -08:00
8996811936 Only enable peer access for ring neighbors.
This enables support for systems with more than 9 GPUs attached to a single PCIe root complex.
2017-03-01 16:42:38 -08:00
c219a183d0 Fix copy/paste typo in error message 2017-03-01 16:42:38 -08:00
8e1d6f9b60 Fix crash in Reduce when non-root ranks have invalid recvbuff 2017-03-01 16:42:38 -08:00
7ad948ffa9 fix tests to not sys.exit(), also fix fatal error on THC initialization 2017-03-01 17:37:04 -05:00
3277d83648 Add Nesterov Momentum (#887) 2017-03-01 20:49:59 +01:00
1487278fdf Allow backprop through cuDNN RNN in eval mode
Handling of dropout descriptors has been improved too.
2017-03-01 19:42:39 +01:00
977630bc15 Handle duplicate backward roots in autograd 2017-03-01 19:42:39 +01:00
12efd53dba ConstantPad2d and F.pad (#856) 2017-03-01 19:39:44 +01:00
37e05485d9 added initialization schemes in torch.nn.init (#833) 2017-03-01 19:34:13 +01:00
c76770f40e Merge commit 'dfca8dfdc5988813ed5673589ffa4fdd1c4f3d2d' 2017-03-01 09:29:51 -08:00
da725830c2 Add support for variable length sequences in RNNs (#873) 2017-03-01 17:36:32 +01:00
fc6fcf23f7 Lock the cudaFree mutex. (#880)
Prevents NCCL calls from overlapping with cudaFree() which can lead to
deadlocks.
2017-03-01 11:29:25 -05:00
b190f1b5bc Add another pinned memory test.
Checks that pinned memory freed on a different GPU from which it was
allocated isn't re-used too soon.
2017-03-01 12:22:31 +01:00
dfca8dfdc5 ensure valid index in multinomial 2017-02-28 14:48:48 -08:00
b46d5e0b04 Fix NN bindings 2017-02-28 14:35:38 -08:00
f19a11a306 Merge commit '8e8022b7351401911e10b94aeb5ae35d32907705' 2017-02-28 14:35:20 -08:00
cfcf69703f Merge commit '80429ad9f7c4775f7f88344a2cf037e499f060b8' 2017-02-28 14:35:00 -08:00
e22b8e0d17 Merge commit '3cc89afde68a831434f3abe9e3af2ac0b134215e' 2017-02-28 14:34:44 -08:00
fbfba6bdca Merge commit '6ff77503645da59eeca5be473a1902e523c4adb3' 2017-02-28 14:34:29 -08:00
3cc89afde6 Merge pull request #713 from killeent/multinomial-indexing-fix
fix indexing bug in sampleMultinomialOnce
2017-02-28 17:13:44 -05:00
1e4aee057c Merge pull request #712 from killeent/multinomial-fixes
Fix sampleMultinomialOnce to better handle large distribution values
2017-02-28 17:12:48 -05:00
8dfcf7e35a Merge pull request #709 from colesbury/pinned_memory
Fix bug where pinned memory event could be recorded on incorrect device
2017-02-28 16:56:21 -05:00
76de151ddd Fix bug where pinned memory event could be recorded on incorrect device 2017-02-28 13:48:56 -08:00
2676cc46c2 fix indexing bug in sampleMultinomialOnce 2017-02-28 13:40:15 -08:00
1bf7bc9768 refactor sampleMultinomialOnce to use <real, accreal>, assertion for sum overflow 2017-02-28 12:46:12 -08:00
3c41c9fe46 Add AutoGPU RAII that doesn't depend on Python API (#875)
Separates out non-Python part of AutoGPU. This also compiles without
CUDA which is useful for generic tensor code.

Also fixes a bug where THCPAutoGPU may not always switch the device:

  THCPAutoGPU guard(-1);
  guard.setDevice(0);
  guard.setDevice(1);
  guard.setDevice(0);  // would not switch batch to 0
2017-02-28 14:39:20 -05:00
6ff7750364 add TH_TENSOR_APPLY variants for optimized redux (+refactor) 2017-02-28 10:30:31 -08:00
4d25c3d048 address comments and add tests 2017-02-28 10:23:36 -08:00
267b7ade50 Speed up reductions on non-contiguous dimensions 2017-02-28 10:23:36 -08:00
80429ad9f7 THVector_(add) -> THVector_(adds) 2017-02-28 12:20:44 -05:00
5ca6516ecb THVector_(add),(mul),(div) -> (adds),(muls),(divs) 2017-02-28 12:10:47 -05:00
67f94557ff Expose torch.HalfTensor 2017-02-27 19:35:47 -05:00
61bd5a0643 [Lint] Address F811 2017-02-27 19:33:00 -05:00
748d011c8b [Lint] Address F812 2017-02-27 19:33:00 -05:00
5d5cfe2e57 [Lint] Address E731 2017-02-27 19:33:00 -05:00
7cbe255296 [Lint] Use flake8 instead of pep8 2017-02-27 19:33:00 -05:00
4ef303698c Merge pull request #711 from gchanan/getDeviceAllocator
Add getter for cuda device allocator.
2017-02-27 19:29:39 -05:00
83e8b3f6c3 Add getter for cuda device allocator. 2017-02-27 15:44:44 -08:00
502ebed796 Fix one more reference cycle and ensure correct flag propagation (#868) 2017-02-27 18:38:29 -05:00
68ff58d771 Expose a mutex that is held around cudaFree() calls.
NCCL can deadlock if cudaFree() is called while it's launching kernels.
This exposes a mutex that can be held to prevent cudaFree() calls in the
caching allocator.
2017-02-27 15:08:30 -08:00
969c1602e6 Add Tensor::copy() to THPP
For now, this only supports copying from the same type. We can add
polymorphic copying in the future.
2017-02-27 21:33:40 +01:00
2d4d3b18dd Use NCCL operations in AllreduceChunked
Summary: The AllReduceChunked algorithm currently performs the local reduce/broadcast of local device buffers in host memory. This diff updates the algorithm to execute the local reduce/broadcast steps using NCCL operations before copying a single device buffer to/from host memory.

Reviewed By: pietern

Differential Revision: D4587441

fbshipit-source-id: 4de689f59a6cf898b8eecd3c3b9f57f77124c0e3
2017-02-27 09:59:29 -08:00
5e1d6a3691 Update functional.py (#862)
Fixed documentation error in conv3d
2017-02-27 10:42:02 -05:00
533cfc0381 Minor fix of docs of ModuleList and ParameterList (#861) 2017-02-27 10:09:54 +01:00
2b23712dc3 Improve autograd memory usage (#859) 2017-02-26 22:37:26 -05:00
88275da5e8 CUDA documentation tweaks (#858) 2017-02-26 20:37:43 +01:00
bd7a5ad6f0 Make Optimizer.load_state_dict use __setstate__ 2017-02-26 20:02:42 +01:00
1f6f82dbcf Fall back to indexing compatible with numpy 2017-02-26 20:02:42 +01:00
1f8939937a Allow using expand to broadcast tensors 2017-02-26 20:02:42 +01:00
b3d41a5f96 Add docs for ModuleList and ParameterList 2017-02-26 20:02:42 +01:00
fec2d493a9 Reshape grad_output in basic ops 2017-02-26 20:02:42 +01:00
86ee75f63f Fix for Long and Byte tensor indexing of Variables 2017-02-26 20:02:42 +01:00
31941918cf Prevent creation of reference cycles with leaf Variables that don't require grad
Also, raise an error immediately, if a leaf that requiers_grad is
modified in-place. Some comments were updated too.
2017-02-26 20:02:42 +01:00
19a65d2bea Expose stateless methods for torch.cuda.HalfTensor 2017-02-26 20:02:42 +01:00
819d4b2b83 Add finite differences gradcheck (#851) 2017-02-26 08:35:24 -05:00
b87c113cf4 CUDA documentation enhancement and docs versioning (#848)
* Add more detail to CUDA documentation

Also adds better cross-linking to the pages that discuss relevant topics.

* Adds recommendation to torch.save docs

* Make the version numbers for the docs dynamic

Might need tweaks for beta, 1.0, etc.
2017-02-26 08:33:26 -05:00
b25182971f readme change for getting clarity on binaries 2017-02-26 07:52:13 -05:00
1ee2c47e37 Correcting the description of LSTM attributes (#854) 2017-02-26 13:30:55 +01:00
2dc563f1f1 Fix indexing when passing only an Ellipsis 2017-02-25 23:34:09 +01:00
15ba71a275 Rebase fixes 2017-02-25 17:14:52 +01:00
e5b3fc49d6 Implementation of the 3rd set of tensor functions 2017-02-25 17:14:52 +01:00
ae1766951d Link TH and THPP to THD (#57)
* Fix THD library build

* THPP dependency added

* Minor cleanup; Fix build on OSX
2017-02-25 17:14:52 +01:00
02d08dafd9 Add support for IPv6 in Data Channel TCP (#53) 2017-02-25 17:14:52 +01:00
13a5090695 Added a size change in MaxPool1d module and improved tests (#771) (#832)
Backend is SpatialDilatedMaxPooling, so change 3D input (N*C*L)
to 4D size (N*C*1*L). Then output indices will range from 0 to L.
This range will not cause UnMaxPool1D error.

Signed-off-by: Zhou Chang <achang.zhou@gmail.com>
2017-02-25 08:53:30 -05:00
8e32e4c04c make wrap_generic_function importable 2017-02-24 14:27:54 -08:00
cf991310c3 c++ virtual function fix 2017-02-24 13:22:44 -08:00
938706099e adding environment flags to disable SIMD codepaths 2017-02-24 07:35:11 -05:00
3330287dc7 Update dataloader.py (#837) 2017-02-23 14:38:41 -05:00
38c8520adf adding unsqueeze to docs 2017-02-23 12:13:25 -05:00
492e1746af Fix THFree in THTensorApply 2017-02-23 06:01:13 -05:00
91a8109cfd Use C99 for openmp cleanup 2017-02-23 06:01:13 -05:00
161490d34a Add memcpy copy 2017-02-23 06:01:13 -05:00
9c302852eb comments fix 2017-02-23 06:01:13 -05:00
8654fcfd60 THVectorDefault style fix 2017-02-23 06:01:13 -05:00
b3d527d9a0 Tab style fix 2017-02-23 06:01:13 -05:00
4d495218c9 THTensorApply3 contiguous optimizations 2017-02-23 06:01:13 -05:00
13a041284c THTensorApply2 copy optimization 2017-02-23 06:01:13 -05:00
c60c1a003d TH_TENSOR_APPLY2 contiguous optimization 2017-02-23 06:01:13 -05:00
97add1a5ea comment fix 2017-02-23 06:01:13 -05:00
ca02930e47 Fill bug fix 2017-02-23 06:01:13 -05:00
20d5e95077 THTensorApply3 compress counter 2017-02-23 06:01:13 -05:00
eb4a7dc11d THTensorApply change dims to sizes 2017-02-23 06:01:13 -05:00
f722498b72 THTensorApply2 counter compress 2017-02-23 06:01:13 -05:00
aadfb6fe83 THTensorApply reduce memory overhead 2017-02-23 06:01:13 -05:00
6c273594c9 THTensorApply Counter compress 2017-02-23 06:01:13 -05:00
e475c82fa1 Add isTransposed judge and enable multithread of fill functions 2017-02-23 06:01:09 -05:00
0c2e6665df Add AVX copy 2017-02-23 05:50:34 -05:00
6295e6e94b Rebase master 2017-02-23 05:50:34 -05:00
670a4aa708 Fix AVX2 bugs 2017-02-23 05:50:34 -05:00
1bdc2e64ed Add fma cadd 2017-02-23 05:50:34 -05:00
c587be1e50 Add THVector Fill 2017-02-23 05:50:34 -05:00
bd481596f5 optimize THVector add mul div 2017-02-23 05:50:34 -05:00
a504d56b43 Fix THVector cmul AVX bug 2017-02-23 05:50:30 -05:00
91c4dfccea Use THVector cadd AVX 2017-02-23 05:46:44 -05:00
27f618c44d Add THVector Fill AVX 2017-02-23 05:46:44 -05:00
a14482a1df Add THVector cadd AVX 2017-02-23 05:46:40 -05:00
aa50c5734b Add THVector AVX cmul 2017-02-23 05:46:07 -05:00
293001a4fe Add THVector SSE div cdiv 2017-02-23 05:46:07 -05:00
638cfdf150 Add SSE add 2017-02-23 05:46:07 -05:00
5f80a14525 Separate SSE and AVX 2017-02-23 05:46:07 -05:00
1342fd3975 Remove THTensorMathSIMD THTensorMathDispatch 2017-02-23 05:46:07 -05:00
8d4af38489 Add THVector div cdiv 2017-02-23 05:46:07 -05:00
575a064e66 Remove THVector diff 2017-02-23 05:46:07 -05:00
3ab21a3c4f Merge THVector mul AVX 2017-02-23 05:46:07 -05:00
2f592e6c7d Remove THVector scale 2017-02-23 05:46:07 -05:00
5661ffb766 Merge THVector mul 2017-02-23 05:46:03 -05:00
9b74503daa Merge THVector cmul 2017-02-23 05:40:33 -05:00
24848f1cd8 Change THVector mul to cmul 2017-02-23 05:40:33 -05:00
a31a07ede9 Merge THVector add 2017-02-23 05:40:33 -05:00
c8c4c9b23d Change THVector add to cadd and fix NEON 2017-02-23 05:40:33 -05:00
e1ed9303f0 Add multi-thread add 2017-02-23 05:40:33 -05:00
a43aab13c2 Fix THTensorMath.c style 2017-02-23 05:40:33 -05:00
c698b4a45e Add Dispaches for div and mul 2017-02-23 05:40:29 -05:00
c6a0ffab50 Add AVX single float and double float add 2017-02-23 05:40:24 -05:00
8ba7cc30d1 Add THTensorMathSIMD.c 2017-02-23 05:32:34 -05:00
61bf08ca24 Fix compilation for simd tensor add 2017-02-23 05:32:28 -05:00
6ada3c0c16 Fast floating point add kernel in intrinsics (11x speedup over default for 10k elements) 2017-02-23 05:11:44 -05:00
60061fbe79 Fixed up CPU dispatch and tested. Can begin implementing kernels 2017-02-23 05:11:44 -05:00
46e7042add SIMD helper header, modified add in THTensorMath to check dispatch 2017-02-23 05:11:44 -05:00
d0c182773b First commit for dynamic CPU dispatch: general framework in place (need to create dispatch tables and stubs for all functions and make impls have hidden linkage) 2017-02-23 05:11:44 -05:00
b6f60585b5 fix AVX2 detection bugs 2017-02-23 05:00:55 -05:00
4b0e3ee219 Merge pull request #699 from twitter-forks/bitops
Bitwise operations
2017-02-23 04:15:35 -05:00
838842d4b2 fix documentation error. [issue #790](https://github.com/pytorch/pytorch/issues/790) (#831) 2017-02-23 08:59:29 +01:00
e71cf20192 improved serialization (no tar copy) (#713) 2017-02-22 22:24:20 +01:00
adb4cb2b5b contiguous view backward (#816) 2017-02-21 19:09:36 -05:00
478d7446ef CMake fixes
Summary: Adds script to populate third-party directory.

Differential Revision: D4591509

fbshipit-source-id: 28934feb536a9f3a066d8c40988337f3dddffaed
2017-02-21 15:06:45 -08:00
df68230351 README and docs skeleton
Summary: TSIA

Differential Revision: D4591755

fbshipit-source-id: fa435f4ad6b97453c3c9516b4bfc9f8f0fb2e4f1
2017-02-21 10:52:04 -08:00
6073f9b46c update table in README.md
it removes the empty top row
2017-02-21 12:58:04 -05:00
8e8022b735 Merge pull request #418 from ruotianluo/adaptiveAverage
Add SpatialAdaptiveAveragePooling.
2017-02-21 09:15:12 -05:00
da82d2dd70 Merge pull request #434 from bottler/master
VolumetricFractionalMaxPooling like spatial
2017-02-21 09:13:59 -05:00
82176473a5 Merge pull request #442 from twitter-forks/half-fixes
Convert real to accreal in libTHCUNN
2017-02-21 09:12:56 -05:00
2d269a9a72 Merge pull request #1137 from twitter-forks/half-fixes
Using accreal instead of real in the API
2017-02-21 09:12:32 -05:00
240372a991 Fixed topk documentation for largest=True 2017-02-21 04:38:24 -05:00
5b10411c8c Fixed some mistakes in examples
Fixed mistakes in LSTMCell and GRUCell examples.
2017-02-21 04:17:28 -05:00
4c474a9939 Improve prodall CUDA test 2017-02-20 23:28:31 -08:00
7ea6ae57c8 Support numpy arrays in default_collate 2017-02-20 23:28:31 -08:00
42633f8986 Fix misspelling and add support for weights in NLLLoss2d 2017-02-20 23:28:31 -08:00
84248690a9 Add support for indexing with None and slices with positive steps 2017-02-20 23:28:31 -08:00
53409ca0fb Fix a warning in THPP 2017-02-20 23:28:31 -08:00
c2c1710047 Add clip_grad_norm 2017-02-20 23:28:31 -08:00
876202503f Support multiple inputs in data parallel 2017-02-20 23:28:31 -08:00
946a7d9bc3 Make input contiguous only once in backward of cuDNN RNN 2017-02-20 23:28:31 -08:00
608bcd3b15 Return correct number of gradients from cuDNN RNN 2017-02-20 23:28:31 -08:00
632b02a477 Add checks for reward type and size in StochasticFunction 2017-02-20 23:28:31 -08:00
0db9c63300 Use library_dirs in setup.py 2017-02-20 23:28:31 -08:00
873ed4e6b6 Add better error message for conversion of CUDA tensors to numpy 2017-02-20 23:28:31 -08:00
01bd43037d add docs to torch/cuda/random 2017-02-20 20:43:47 -05:00
68c9e3f232 Fixed typo in GRUCell example 2017-02-21 01:37:04 +01:00
a25c8555eb Fixed paper references 2017-02-21 00:27:18 +01:00
d6ca3820aa Optionally specify stream for pointers in CUDA algorithms
Summary:
Work may be queued on CUDA streams for asynchronous execution. The
memory backed by pointers passed to any algorithm can therefore be
mutated after constructing an algorithm instance. By also passing in
the streams these mutations happen on, the algorithms can synchronize
with these mutations to ensure no invalid data is used.

By passing in these streams, any work done by these algorithms will
*also* be queued, which effectively removes a single synchronization
step from any algorithm run.

Differential Revision: D4589394

fbshipit-source-id: 0c8cd6ba9c9018f33d6f4c55a037083fc4164acb
2017-02-20 14:15:53 -08:00
dfd1dff383 Merge commit '4ca26fbc1b7be4e369f84e95df16431bb2f1dcb7' 2017-02-20 08:05:19 -08:00
8f391d4d51 Merge commit 'ee43cd7adca3b24a2071ce6c55dcd3a95a2b6ff6' 2017-02-20 07:55:46 -08:00
2a6b7685ae Merge commit 'f6c1bbfa483ad19c500dc94838baaa69f02d240b' 2017-02-20 07:55:19 -08:00
eb9573107d Merge commit '34b7fed802db1fda6322a70b648dcc4947858719' 2017-02-20 07:54:51 -08:00
ee43cd7adc Do SpatialClassNLLCriterion sizeAverage in a separate kernel 2017-02-20 06:54:23 -08:00
4ca26fbc1b Remove averaging from prodall 2017-02-20 11:37:53 +01:00
c165226325 Print a readable error message when arguments are on different GPUs 2017-02-20 11:35:50 +01:00
0722775ca3 AllreduceRingChunked/CudaAllReduceTest should use the chunked algorithm
Summary: I was mistakenly calling the non-chunked algorithm for the chunked test.

Reviewed By: pietern

Differential Revision: D4580160

fbshipit-source-id: 9d62a68e9e86cc6e596d90ff8854c585a0e8855c
2017-02-17 19:17:44 -08:00
49295ebe54 Add sequential to documentation 2017-02-18 08:42:43 +05:30
455038e470 Use a more stable formula for spatial LogSoftMax 2017-02-17 13:05:45 -08:00
ca7f02ea0c Add shape checks for SpatialClassNLLCriterion 2017-02-17 13:01:56 -08:00
04aba1caec Fix cuDNN dropout desc for multi-gpu (#772) 2017-02-17 19:16:12 +01:00
420488349f Implement CUDA-aware allreduce chunked
Summary:
First pass at a CUDA-aware allreduce chunked implementation. For now the algorithm runs on the CPU and is mostly copy/paste from allreduce_ring.h. A subsequent pass will offload to the GPU.

Serialize cuda test to avoid intermittent failures due to memory contention.

Reviewed By: pietern

Differential Revision: D4576959

fbshipit-source-id: e1f292a05b88ff24c33e549d4a52e770a21f85d2
2017-02-17 09:06:05 -08:00
f6c1bbfa48 Merge pull request #1105 from ruotianluo/adaptiveAvg
Add SpatialAdaptiveAveragePooling
2017-02-17 10:52:33 -05:00
4e2c8c6db5 Merge pull request #1123 from bottler/master
VolumetricFractionalMaxPooling like Spatial...
2017-02-17 10:42:21 -05:00
1a5cae7340 Add busy-poll option in TCP transport
Summary: Ideally we would want the driver to busy-poll for us. In absence of driver support, spinning with MSG_DONTWAIT flag seems to be helping a lot too. Of course, we pay the price of burning one core for polling. Sigh.

Reviewed By: pietern

Differential Revision: D4576242

fbshipit-source-id: 85d9e1b786fbb6053864fba80f3e5ecc80fe221d
2017-02-17 07:31:32 -08:00
c26b9c0a5e Update rnn.py
Based on the https://github.com/pytorch/pytorch/blob/master/torch/backends/cudnn/rnn.py#L302 line, the output is returned in a (0,1) transposed version, if the batch_first argument is set to true.
2017-02-17 14:37:14 +01:00
4dd19988c3 Add benchmark option to display nanoseconds
Summary:
Latency optimization is going well and I've seen the odd case of <10us
measurements. This option makes the benchmark tool display nanos
instead.

Differential Revision: D4575925

fbshipit-source-id: 98dbd3b39e31cbcdd4c146613f6630e721187e1e
2017-02-16 21:16:26 -08:00
15ef008877 Using accreal instead of real in the API
- This reverts commit 7a07afe545b4deae5919d9dc268bfac3d37398c7.
- Includes fixes for TemporalRowConvolution
2017-02-16 17:34:11 -08:00
b14d6318f8 Convert real to accreal in libTHCUNN
- This reverts commit 0d85922d116879448485ef88ae21e83a9255a0b0.
- Includes fixes for TemporalRowConvolution
2017-02-16 17:33:03 -08:00
93002720eb Extract CudaDevicePointer for reuse across CUDA-aware algorithms
Summary:
The CudaDevicePointer optionally takes an existing stream on
which it runs any operation associated with the pointer (for now just
memcpy's, but this likely will includes kernel execution in the
future).

Differential Revision: D4574035

fbshipit-source-id: ddd7972a3874012059f1fde1b341fd6edd69102d
2017-02-16 14:05:52 -08:00
cb91078e01 Support synchronous mode for TCP transport
Summary:
In synchronous mode, it is not the device thread that is responsible
for handling I/O, but the user thread itself. Calling waitRecv on a
buffer will trigger the read function on the pair to be called. This
eliminates the context switch necessary if the device thread is
handling all I/O. For benchmarks with small numbers of elements this
reduces latency by as much as 20%.

Reviewed By: plapukhov

Differential Revision: D4549998

fbshipit-source-id: ab718ba090c06d7c7aa4065cc9f92bd96b9e4a35
2017-02-15 17:31:06 -08:00
34b7fed802 Fix gcc 4.4.7 build. 2017-02-15 09:06:25 -08:00
ee52f89772 Implement CUDA BroadcastOneToAll algorithm
Summary:
Implement CUDA BroadcastOneToAll algorithm for GPU addresses. Refactor cuda.h into cuda_private.h to allow inclusion of <cuda.h> in public headers without polluting the namespace.

Port broadcast tests to GPU variants.

* this revision is based on Peter's revision D4546932

Differential Revision: D4547382

fbshipit-source-id: 3d294ad8862b04fb783ba22e5c925b8d7cbc8a8d
2017-02-14 18:46:56 -08:00
6aa8c932fc Benchmark for CUDA-aware algorithms
Summary:
Separate benchmark build target for CUDA-aware algorithms.

This is needed to keep CUDA an optional dependency.

Differential Revision: D4546932

fbshipit-source-id: b73176ae9067233f883d51ba3ab4efbb13a6f86f
2017-02-13 21:32:58 -08:00
8821f4aba6 Fix race in benchmark tool
Summary: TSIA

Reviewed By: plapukhov

Differential Revision: D4549105

fbshipit-source-id: 61c8966e429e0701677f441aeaaf27fdc5e669e7
2017-02-13 21:32:58 -08:00
5e06634f7e Implement initial CUDA-aware allreduce
Summary:
This CUDA-aware ring allreduce is based on the regular ring allreduce.
It runs the reduction algorithm on the CPU and is therefore most
suited for smaller buffers.

Both the device-to-host memcpy's at the start of the algorithm and the
host-to-device memcpy's at the end of the algorithm are kicked off
asynchronously in an attempt to parallize as much as possible.

Reviewed By: Yangqing

Differential Revision: D4542816

fbshipit-source-id: 101dfad276ca79703e37ff93fb1b6d467295f66b
2017-02-13 21:32:58 -08:00
b82c4b3d38 Split benchmark code into multiple files
Summary:
The CUDA benchmark suite will be a separate build target, so the
runner should be reused.

Reviewed By: Yangqing

Differential Revision: D4545092

fbshipit-source-id: 6ccf2d30f5d35c74fc59851b25416bfe6863d62c
2017-02-13 21:32:58 -08:00
72fd605b01 Fix std::accumulate
Summary:
Testing pull request again.
Closes https://github.com/facebookincubator/gloo/pull/2

Reviewed By: pietern

Differential Revision: D4542327

Pulled By: Yangqing

fbshipit-source-id: 5bd66c32c7249f1327225117815bef64b8708722
2017-02-10 10:12:37 -08:00
750fb5cc73 Fixes to support short and char tensors for bitwise operations 2017-02-09 18:52:59 -08:00
0f4749907a Adding bitwise operations
- lshift, rshift, bitand, bitor, bitxor
2017-02-09 18:11:58 -08:00
bd2dc63ef6 Adding bitand, bitor and bitxor 2017-02-09 17:06:04 -08:00
19a8795450 Changes to shift operations
- renaming lsh -> lshift, rsh -> rshift
- adding componentwise functions
2017-02-09 15:41:07 -08:00
7547a06c4f Avoiding duplicated unsigned as it causes error on gcc. 2017-02-09 13:29:05 -08:00
8929b75795 Added shift operations. 2017-02-09 13:28:36 -08:00
efd8998690 Import gloo
Summary:
In the GitHub repository this directory will be mirrored similar to
folly, such that the repository has a single top level directory
called "gloo". This allows for versioning or renaming of the
project root, without having to mangle the include paths; they will
always use the "gloo" prefix.

fbshipit-source-id: 24502e4185fc7cbe19b5249f83609e2b8118e9d7
2017-02-09 12:33:54 -08:00
024d1e2678 Merge pull request #69 from cwhipkey/master
Qualify nullptr_t with std::
2017-02-08 09:17:50 -08:00
5eab428294 Qualify nullptr_t with std::. 2017-02-08 07:06:31 -08:00
8aa259b52b review comments from gchanan 2017-02-06 11:08:23 +00:00
41ddc2a786 VolumetricFractionalMaxPooling like Spatial... 2017-02-01 12:01:09 +00:00
e4886f6589 VolumetricFractionalMaxPooling like spatial 2017-02-01 11:52:49 +00:00
2b948c42cd Add SpatialAdaptiveAveragePooling. 2017-01-14 19:44:07 -06:00
b2ae054410 Add SpatialAdaptiveAveragePooling. 2017-01-14 15:27:52 -06:00
2a974f5ca2 Fix 1.3.2 compilation 2016-12-08 09:11:43 -08:00
648e9fbb58 Adding missing file 2016-12-05 18:06:24 -08:00
34d27771c6 1.3.2 release
Broadcast tuning
Better checking of inputs
Copy/reduce code simplification
2016-12-01 15:17:50 -08:00
1093821c33 Replace min BW by average BW in tests 2016-12-01 15:16:35 -08:00
ddddfba1c0 Merge pull request #54 from peterhj/peterhj-staticlib
Add a static library target "staticlib" to the Makefile.
2016-11-28 09:15:39 -08:00
5765d608cc Add a static library target "staticlib" to the Makefile.
Rename the static library "libnccl_static.a" to disambiguate from the
dynamic libraries.
2016-11-24 11:31:03 -08:00
c2c515516b Remove irrelevant output from ncclReduce Fortran tests 2016-11-21 10:18:04 -08:00
9c18468fe2 Add Copyright header to Fortran bindings source files 2016-11-21 10:17:58 -08:00
5f2b32e45b Add Fortran bindings 2016-11-17 15:33:34 -08:00
534b9a1697 Bump to 1.3.1 2016-10-13 10:33:05 -07:00
b2781d0501 Fix primitives function prototype 2016-10-13 10:32:42 -07:00
bf7d1514f7 NVML (libwrap) : import the needed definitions 2016-10-13 10:28:59 -07:00
8bb06c94be Improved allreduce segmentation for small sizes 2016-10-07 12:42:23 -07:00
977 changed files with 92495 additions and 19857 deletions

18
.gitignore vendored
View File

@ -5,6 +5,7 @@ torch.egg-info/
torch/version.py
torch/csrc/generic/TensorMethods.cpp
torch/lib/*.so*
torch/lib/*.a*
torch/lib/*.dylib*
torch/lib/*.h
torch/lib/build
@ -19,8 +20,10 @@ torch/csrc/nn/THCUNN.cpp
torch/csrc/nn/THNN_generic.cwrap
torch/csrc/nn/THNN_generic.cpp
torch/csrc/nn/THNN_generic.h
torch/csrc/generated
docs/src/**/*
test/data/legacy_modules.t7
test/data/gpu_tensors.pt
test/htmlcov
test/.coverage
*/*.pyc
@ -31,3 +34,18 @@ test/.coverage
*/*.so*
*/**/*.so*
*/**/*.dylib*
test/data/legacy_serialized.pt
test/data/linear.pt
# IPython notebook checkpoints
.ipynb_checkpoints
# Editor temporaries
*.swn
*.swo
*.swp
*~
# OSX dir files
.DS_Store

View File

@ -1,7 +1,8 @@
# https://travis-ci.org/pytorch/pytorch
language: python
dist: trusty
python:
- 2.7.8
- 2.7.9
- 2.7
- 3.5
- 3.6
@ -18,7 +19,8 @@ install:
- export CC="ccache gcc-4.8"
- export CXX="ccache g++-4.8"
- ccache --show-stats
- travis_retry pip install -r requirements.txt
- travis_retry pip install --upgrade pip setuptools wheel
- travis_retry pip install -r requirements.txt --only-binary=scipy
- python setup.py install
script:
@ -43,5 +45,5 @@ matrix:
env: LINT_CHECK
python: "2.7"
addons: true
install: pip install pep8
script: pep8
install: pip install flake8
script: flake8

185
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,185 @@
## Contributing to PyTorch
If you are interested in contributing to PyTorch, your contributions will fall
into two categories:
1. You want to propose a new Feature and implement it
- post about your intended feature, and we shall discuss the design and
implementation. Once we agree that the plan looks good, go ahead and implement it.
2. You want to implement a feature or bug-fix for an outstanding issue
- Look at the outstanding issues here: https://github.com/pytorch/pytorch/issues
- Especially look at the Low Priority and Medium Priority issues
- Pick an issue and comment on the task that you want to work on this feature
- If you need more context on a particular issue, please ask and we shall provide.
Once you finish implementing a feature or bugfix, please send a Pull Request to
https://github.com/pytorch/pytorch
If you are not familiar with creating a Pull Request, here are some guides:
- http://stackoverflow.com/questions/14680711/how-to-do-a-github-pull-request
- https://help.github.com/articles/creating-a-pull-request/
## Developing locally with PyTorch
To locally develop with PyTorch, here are some tips:
1. Uninstall all existing pytorch installs
```
conda uninstall pytorch
pip uninstall torch
pip uninstall torch # run this command twice
```
2. Locally clone a copy of PyTorch from source:
```
git clone https://github.com/pytorch/pytorch
cd pytorch
```
3. Install PyTorch in `build develop` mode:
A full set of instructions on installing PyTorch from Source are here:
https://github.com/pytorch/pytorch#from-source
The change you have to make is to replace
```
python setup.py install
```
with
```
python setup.py build develop
```
This is especially useful if you are only changing Python files.
This mode will symlink the python files from the current local source tree into the
python install.
Hence, if you modify a python file, you do not need to reinstall pytorch again and again.
For example:
- Install local pytorch in `build develop` mode
- modify your python file `torch/__init__.py` (for example)
- test functionality
- modify your python file `torch/__init__.py`
- test functionality
- modify your python file `torch/__init__.py`
- test functionality
You do not need to repeatedly install after modifying python files.
## Writing documentation
PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters to
fit into Jupyter documentation popups.
## Managing multiple build trees
One downside to using `python setup.py develop` is that your development
version of pytorch will be installed globally on your account (e.g., if
you run `import torch` anywhere else, the development version will be
used.
If you want to manage multiple builds of PyTorch, you can make use of
[conda environments](https://conda.io/docs/using/envs.html) to maintain
separate Python package environments, each of which can be tied to a
specific build of PyTorch. To set one up:
```
conda create -n pytorch-myfeature
source activate pytorch-myfeature
# if you run python now, torch will NOT be installed
python setup.py build develop
```
## C++ Development tips
If you are working on the C++ code, there are a few important things that you
will want to keep in mind:
1. How to rebuild only the code you are working on, and
2. How to make rebuilds in the absence of changes go faster.
### Build only what you need.
`python setup.py build` will build everything, but since our build system is
not very optimized for incremental rebuilds, this will actually be very slow.
Far better is to only request rebuilds of the parts of the project you are
working on:
- Working on `torch/csrc`? Run `python setup.py develop` to rebuild
(NB: no `build` here!)
- Working on `torch/lib/TH`, did not make any cmake changes, and just want to
see if it compiles? Run `(cd torch/lib/build/TH && make install -j$(getconf _NPROCESSORS_ONLN))`. This
applies for any other subdirectory of `torch/lib`. **Warning: Changes you
make here will not be visible from Python.** See below.
- Working on `torch/lib` and want to run your changes / rerun cmake? Run
`python setup.py build_deps`. Note that this will rerun cmake for
every subdirectory in TH; if you are only working on one project,
consider editing `torch/lib/build_all.sh` and commenting out the
`build` lines of libraries you are not working on.
On the initial build, you can also speed things up with the environment
variables `DEBUG` and `NO_CUDA`.
- `DEBUG=1` will enable debug builds (-g -O0)
- `NO_CUDA=1` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.
For example:
```
NO_CUDA=1 DEBUG=1 python setup.py build develop
```
Make sure you continue to pass these flags on subsequent builds.
### Make no-op build fast.
Python `setuptools` is pretty dumb, and always rebuilds every C file in a
project. Using ccache in a situation like this is a real time-saver. However, by
default, ccache does not properly support CUDA stuff, so here are the
instructions for installing a custom `ccache` fork that has CUDA support:
```
# install and export ccache
if ! ls ~/ccache/bin/ccache
then
sudo apt-get update
sudo apt-get install -y automake autoconf
sudo apt-get install -y asciidoc
mkdir -p ~/ccache
pushd /tmp
rm -rf ccache
git clone https://github.com/colesbury/ccache -b ccbin
pushd ccache
./autogen.sh
./configure
make install prefix=~/ccache
popd
popd
mkdir -p ~/ccache/lib
mkdir -p ~/ccache/cuda
ln -s ~/ccache/bin/ccache ~/ccache/lib/cc
ln -s ~/ccache/bin/ccache ~/ccache/lib/c++
ln -s ~/ccache/bin/ccache ~/ccache/lib/gcc
ln -s ~/ccache/bin/ccache ~/ccache/lib/g++
ln -s ~/ccache/bin/ccache ~/ccache/cuda/nvcc
~/ccache/bin/ccache -M 25Gi
fi
export PATH=~/ccache/lib:$PATH
export CUDA_NVCC_EXECUTABLE=~/ccache/cuda/nvcc
```
Hope this helps, and thanks for considering to contribute.

View File

@ -1,10 +1,13 @@
FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu14.04
FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04
RUN echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
curl \
vim \
ca-certificates \
libjpeg-dev \
libpng-dev &&\
@ -15,7 +18,7 @@ RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-4.
~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda install conda-build && \
/opt/conda/bin/conda create -y --name pytorch-py35 python=3.5.2 numpy scipy ipython mkl&& \
/opt/conda/bin/conda create -y --name pytorch-py35 python=3.5.2 numpy pyyaml scipy ipython mkl&& \
/opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/envs/pytorch-py35/bin:$PATH
RUN conda install --name pytorch-py35 -c soumith magma-cuda80
@ -23,11 +26,11 @@ RUN conda install --name pytorch-py35 -c soumith magma-cuda80
WORKDIR /opt/pytorch
COPY . .
RUN cat requirements.txt | xargs -n1 pip install --no-cache-dir && \
TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
CMAKE_LIBRARY_PATH=/opt/conda/envs/pytorch-py35/lib \
CMAKE_INCLUDE_PATH=/opt/conda/envs/pytorch-py35/include \
RUN TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
pip install -v .
RUN git clone https://github.com/pytorch/vision.git && cd vision && pip install -v .
WORKDIR /workspace
RUN chmod -R a+w /workspace

159
README.md
View File

@ -2,54 +2,72 @@
--------------------------------------------------------------------------------
PyTorch is a python package that provides two high-level features:
- Tensor computation (like numpy) with strong GPU acceleration
- Deep Neural Networks built on a tape-based autograd system
PyTorch is a Python package that provides two high-level features:
- Tensor computation (like NumPy) with strong GPU acceleration
- Deep neural networks built on a tape-based autograd system
You can reuse your favorite python packages such as numpy, scipy and Cython to extend PyTorch when needed.
You can reuse your favorite Python packages such as NumPy, SciPy and Cython to extend PyTorch when needed.
We are in an early-release Beta. Expect some adventures and rough edges.
We are in an early-release beta. Expect some adventures and rough edges.
- [More About PyTorch](#more-about-pytorch)
- [More about PyTorch](#more-about-pytorch)
- [Installation](#installation)
- [Binaries](#binaries)
- [From source](#from-source)
- [Docker image](#docker-image)
- [From Source](#from-source)
- [Docker Image](#docker-image)
- [Getting Started](#getting-started)
- [Communication](#communication)
- [Releases and Contributing](#releases-and-contributing)
- [The Team](#the-team)
| System | Python | Status |
| System | 2.7 | 3.5 |
| --- | --- | --- |
| Linux CPU | 2.7.8, 2.7, 3.5, nightly | [![Build Status](https://travis-ci.org/pytorch/pytorch.svg?branch=master)](https://travis-ci.org/pytorch/pytorch) |
| Linux GPU | 2.7 | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py2)](https://build.pytorch.org/job/pytorch-master-py2) |
| Linux GPU | 3.5 | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py3)](https://build.pytorch.org/job/pytorch-master-py3) |
| Linux CPU | [![Build Status](https://travis-ci.org/pytorch/pytorch.svg?branch=master)](https://travis-ci.org/pytorch/pytorch) | [![Build Status](https://travis-ci.org/pytorch/pytorch.svg?branch=master)](https://travis-ci.org/pytorch/pytorch) |
| Linux GPU | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py2-linux)](https://build.pytorch.org/job/pytorch-master-py2-linux) | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py3-linux)](https://build.pytorch.org/job/pytorch-master-py3-linux) |
| macOS CPU | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py2-osx-cpu)](https://build.pytorch.org/job/pytorch-master-py2-osx-cpu) | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py3-osx-cpu)](https://build.pytorch.org/job/pytorch-master-py3-osx-cpu) |
## More about PyTorch
At a granular level, PyTorch is a library that consists of the following components:
| \_ | \_ |
| ------------------------ | --- |
| torch | a Tensor library like NumPy, with strong GPU support |
| torch.autograd | a tape based automatic differentiation library that supports all differentiable Tensor operations in torch |
| torch.nn | a neural networks library deeply integrated with autograd designed for maximum flexibility |
| torch.optim | an optimization package to be used with torch.nn with standard optimization methods such as SGD, RMSProp, LBFGS, Adam etc. |
| torch.multiprocessing | python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and hogwild training. |
| torch.utils | DataLoader, Trainer and other utility functions for convenience |
| torch.legacy(.nn/.optim) | legacy code that has been ported over from torch for backward compatibility reasons |
<table>
<tr>
<td><b> torch </b></td>
<td> a Tensor library like NumPy, with strong GPU support </td>
</tr>
<tr>
<td><b> torch.autograd </b></td>
<td> a tape-based automatic differentiation library that supports all differentiable Tensor operations in torch </td>
</tr>
<tr>
<td><b> torch.nn </b></td>
<td> a neural networks library deeply integrated with autograd designed for maximum flexibility </td>
</tr>
<tr>
<td><b> torch.multiprocessing </b></td>
<td> Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training. </td>
</tr>
<tr>
<td><b> torch.utils </b></td>
<td> DataLoader, Trainer and other utility functions for convenience </td>
</tr>
<tr>
<td><b> torch.legacy(.nn/.optim) </b></td>
<td> legacy code that has been ported over from torch for backward compatibility reasons </td>
</tr>
</table>
Usually one uses PyTorch either as:
- A replacement for numpy to use the power of GPUs.
- a replacement for NumPy to use the power of GPUs.
- a deep learning research platform that provides maximum flexibility and speed
Elaborating further:
### A GPU-ready Tensor library
### A GPU-Ready Tensor Library
If you use numpy, then you have used Tensors (a.k.a ndarray).
If you use NumPy, then you have used Tensors (a.k.a ndarray).
<p align=center><img width="30%" src="docs/source/_static/img/tensor_illustration.png" /></p>
@ -60,15 +78,15 @@ We provide a wide variety of tensor routines to accelerate and fit your scientif
such as slicing, indexing, math operations, linear algebra, reductions.
And they are fast!
### Dynamic Neural Networks: Tape based Autograd
### Dynamic Neural Networks: Tape-Based Autograd
PyTorch has a unique way of building neural networks: using and replaying a tape recorder.
Most frameworks such as `TensorFlow`, `Theano`, `Caffe` and `CNTK` have a static view of the world.
Most frameworks such as TensorFlow, Theano, Caffe and CNTK have a static view of the world.
One has to build a neural network, and reuse the same structure again and again.
Changing the way the network behaves means that one has to start from scratch.
With PyTorch, we use a technique called Reverse-mode auto-differentiation, which allows you to
With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to
change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes
from several research papers on this topic, as well as current and past work such as
[autograd](https://github.com/twitter/torch-autograd),
@ -80,69 +98,68 @@ You get the best of speed and flexibility for your crazy research.
<p align=center><img width="80%" src="docs/source/_static/img/dynamic_graph.gif" /></p>
### Python first
### Python First
PyTorch is not a Python binding into a monolothic C++ framework.
PyTorch is not a Python binding into a monolithic C++ framework.
It is built to be deeply integrated into Python.
You can use it naturally like you would use numpy / scipy / scikit-learn etc.
You can use it naturally like you would use NumPy / SciPy / scikit-learn etc.
You can write your new neural network layers in Python itself, using your favorite libraries
and use packages such as Cython and Numba.
Our goal is to not reinvent the wheel where appropriate.
### Imperative experiences
### Imperative Experiences
PyTorch is designed to be intuitive, linear in thought and easy to use.
When you execute a line of code, it gets executed. There isn't an asynchronous view of the world.
When you drop into a debugger, or receive error messages and stack traces, understanding them is straight-forward.
The stack-trace points to exactly where your code was defined.
When you drop into a debugger, or receive error messages and stack traces, understanding them is straightforward.
The stack trace points to exactly where your code was defined.
We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines.
### Fast and Lean
PyTorch has minimal framework overhead. We integrate acceleration libraries
such as Intel MKL and NVIDIA (CuDNN, NCCL) to maximize speed.
At the core, its CPU and GPU Tensor and Neural Network backends
PyTorch has minimal framework overhead. We integrate acceleration libraries
such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed.
At the core, its CPU and GPU Tensor and neural network backends
(TH, THC, THNN, THCUNN) are written as independent libraries with a C99 API.
They are mature and have been tested for years.
Hence, PyTorch is quite fast -- whether you run small or large neural networks.
Hence, PyTorch is quite fast whether you run small or large neural networks.
The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives.
We've written custom memory allocators for the GPU to make sure that
your deep learning models are maximally memory efficient.
This enables you to train bigger deep learning models than before.
### Extensions without pain
### Extensions without Pain
Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straight-forward
Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward
and with minimal abstractions.
You can write new neural network layers in Python using the torch API
[or your favorite numpy based libraries such as SciPy](https://github.com/pytorch/tutorials/blob/master/Creating%20extensions%20using%20numpy%20and%20scipy.ipynb).
[or your favorite NumPy-based libraries such as SciPy](http://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).
If you want to write your layers in C/C++, we provide an extension API based on
[cffi](http://cffi.readthedocs.io/en/latest/) that is efficient and with minimal boilerplate.
There is no wrapper code that needs to be written. [You can see an example here](https://github.com/pytorch/extension-ffi).
[cffi](http://cffi.readthedocs.io/en/latest/) that is efficient and with minimal boilerplate.
There is no wrapper code that needs to be written. You can see [a tutorial here](http://pytorch.org/tutorials/advanced/c_extension.html) and [an example here](https://github.com/pytorch/extension-ffi).
## Installation
### Binaries
- Anaconda
```bash
conda install pytorch torchvision -c soumith
```
Commands to install from binaries via Conda or pip wheels are on our website:
### From source
[http://pytorch.org](http://pytorch.org)
### From Source
If you are installing from source, we highly recommend installing an [Anaconda](https://www.continuum.io/downloads) environment.
You will get a high-quality BLAS library (MKL) and you get a controlled compiler version regardless of your Linux distro.
Once you have [anaconda](https://www.continuum.io/downloads) installed, here are the instructions.
Once you have [Anaconda](https://www.continuum.io/downloads) installed, here are the instructions.
If you want to compile with CUDA support, install
- [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) 7.5 or above
- [NVIDIA CuDNN](https://developer.nvidia.com/cudnn) v5.x
- [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) v5.x or above
If you want to disable CUDA support, export environment variable `NO_CUDA=1`.
@ -150,64 +167,70 @@ If you want to disable CUDA support, export environment variable `NO_CUDA=1`.
On Linux
```bash
export CMAKE_PREFIX_PATH=[anaconda root directory]
export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]
# Install basic dependencies
conda install numpy mkl setuptools cmake gcc cffi
conda install numpy pyyaml mkl setuptools cmake gcc cffi
# Add LAPACK support for the GPU
conda install -c soumith magma-cuda75 # or magma-cuda80 if CUDA 8.0
conda install -c soumith magma-cuda80 # or magma-cuda75 if CUDA 7.5
```
On OSX
```bash
export CMAKE_PREFIX_PATH=[anaconda root directory]
conda install numpy setuptools cmake cffi
conda install numpy pyyaml setuptools cmake cffi
```
#### Install PyTorch
On Linux
```bash
export MACOSX_DEPLOYMENT_TARGET=10.9 # if OSX
pip install -r requirements.txt
python setup.py install
```
On OSX
```bash
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
```
### Docker image
Dockerfiles are supplied to build images with cuda support and cudnn v5 and cudnn v6 RC. Build them as usual
Dockerfile is supplied to build images with cuda support and cudnn v6. Build as usual
```
docker build . -t pytorch-cudnnv5
docker build -t pytorch .
```
or
Alternatively, if you want a runtime image, build with
```
docker build . -t pytorch-cudnnv6 -f tools/docker/Dockerfile-v6
docker build -t pytorch . -f tools/docker/Dockerfile_runtime
```
and run them with nvidia-docker:
and run with nvidia-docker:
```
nvidia-docker run --rm -ti --ipc=host pytorch-cudnnv5
nvidia-docker run --rm -ti --ipc=host pytorch
```
Please note that pytorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g.
Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g.
for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you
should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.
should increase shared memory size either with `--ipc=host` or `--shm-size` command line options to `nvidia-docker run`.
## Getting Started
Three pointers to get you started:
- [Tutorials: notebooks to get you started with understanding and using PyTorch](https://github.com/pytorch/tutorials)
- [Tutorials: get you started with understanding and using PyTorch](http://pytorch.org/tutorials/)
- [Examples: easy to understand pytorch code across all domains](https://github.com/pytorch/examples)
- The API Reference: [http://pytorch.org/docs/](http://pytorch.org/docs/)
## Communication
* forums: discuss implementations, research, etc. http://discuss.pytorch.org
* github issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.
* slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . If you need a slack invite, ping us at soumith@pytorch.org
* GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.
* Slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . If you need a slack invite, ping us at soumith@pytorch.org
* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: http://eepurl.com/cbG0rv
## Releases and Contributing
PyTorch has a 90 day release cycle (major releases).
It's current state is Beta (v0.1.6), we expect no obvious bugs. Please let us know if you encounter a bug by [filing an issue](https://github.com/pytorch/pytorch/issues).
PyTorch has a 90 day release cycle (major releases).
It's current state is Beta, we expect no obvious bugs. Please let us know if you encounter a bug by [filing an issue](https://github.com/pytorch/pytorch/issues).
We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

View File

@ -63,11 +63,16 @@ function(CUDA_DETECT_INSTALLED_GPUS OUT_VARIABLE)
"}\n")
execute_process(COMMAND "${CUDA_NVCC_EXECUTABLE}" "--run" "${cufile}"
"-ccbin" ${CMAKE_CXX_COMPILER}
WORKING_DIRECTORY "${PROJECT_BINARY_DIR}/CMakeFiles/"
RESULT_VARIABLE nvcc_res OUTPUT_VARIABLE nvcc_out
ERROR_QUIET OUTPUT_STRIP_TRAILING_WHITESPACE)
if(nvcc_res EQUAL 0)
# only keep the last line of nvcc_out
STRING(REGEX REPLACE ";" "\\\\;" nvcc_out "${nvcc_out}")
STRING(REGEX REPLACE "\n" ";" nvcc_out "${nvcc_out}")
list(GET nvcc_out -1 nvcc_out)
string(REPLACE "2.1" "2.1(2.0)" nvcc_out "${nvcc_out}")
set(CUDA_GPU_DETECT_OUTPUT ${nvcc_out} CACHE INTERNAL "Returned GPU architetures from detect_gpus tool" FORCE)
endif()
@ -116,13 +121,13 @@ function(CUDA_SELECT_NVCC_ARCH_FLAGS out_variable)
set(add_ptx TRUE)
set(arch_name ${CMAKE_MATCH_1})
endif()
if(arch_name MATCHES "([0-9]\\.[0-9])$")
if(arch_name MATCHES "(^[0-9]\\.[0-9](\\([0-9]\\.[0-9]\\))?)$")
set(arch_bin ${CMAKE_MATCH_1})
set(arch_ptx ${arch_bin})
else()
# Look for it in our list of known architectures
if(${arch_name} STREQUAL "Fermi")
set(arch_bin 2.0 "2.1(2.0)")
set(arch_bin "2.0 2.1(2.0)")
elseif(${arch_name} STREQUAL "Kepler+Tegra")
set(arch_bin 3.2)
elseif(${arch_name} STREQUAL "Kepler+Tesla")
@ -173,11 +178,11 @@ function(CUDA_SELECT_NVCC_ARCH_FLAGS out_variable)
# Tell NVCC to add binaries for the specified GPUs
foreach(arch ${cuda_arch_bin})
if(arch MATCHES "([0-9]+)\\(([0-9]+)\\)")
# User explicitly specified PTX for the concrete BIN
# User explicitly specified ARCH for the concrete CODE
list(APPEND nvcc_flags -gencode arch=compute_${CMAKE_MATCH_2},code=sm_${CMAKE_MATCH_1})
list(APPEND nvcc_archs_readable sm_${CMAKE_MATCH_1})
else()
# User didn't explicitly specify PTX for the concrete BIN, we assume PTX=BIN
# User didn't explicitly specify ARCH for the concrete CODE, we assume ARCH=CODE
list(APPEND nvcc_flags -gencode arch=compute_${arch},code=sm_${arch})
list(APPEND nvcc_archs_readable sm_${arch})
endif()

View File

@ -12,7 +12,14 @@ BUILDDIR = build
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
docset: html
doc2dash --name $(SPHINXPROJ) --icon $(SOURCEDIR)/_static/img/pytorch-logo-flame.png --enable-js --online-redirect-url http://pytorch.org/docs/ --force $(BUILDDIR)/html/
# Manually fix because Zeal doesn't deal well with `icon.png`-only at 2x resolution.
cp $(SPHINXPROJ).docset/icon.png $(SPHINXPROJ).docset/icon@2x.png
convert $(SPHINXPROJ).docset/icon@2x.png -resize 16x16 $(SPHINXPROJ).docset/icon.png
.PHONY: help Makefile docset
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).

View File

@ -112,3 +112,7 @@ footer p {
nav .hidden-section {
display: inherit;
}
.wy-side-nav-search>div.version {
color: #000;
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 1010 B

View File

@ -0,0 +1,33 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
height="40.200001"
width="40.200001"
xml:space="preserve"
viewBox="0 0 40.200002 40.2"
y="0px"
x="0px"
id="Layer_1"
version="1.1"><metadata
id="metadata4717"><rdf:RDF><cc:Work
rdf:about=""><dc:format>image/svg+xml</dc:format><dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" /><dc:title></dc:title></cc:Work></rdf:RDF></metadata><defs
id="defs4715" /><style
id="style4694"
type="text/css">
.st0{fill:#F05732;}
.st1{fill:#9E529F;}
.st2{fill:#333333;}
</style><path
style="fill:#f05732"
id="path4696"
d="m 26.975479,12.199999 c -1.3,-1 -1.8,3.9 -4.4,3.9 -3,0 -4,-12.9999998 -6.3,-12.9999998 -0.7,0 -0.8,-0.4 -7.9000003,21.2999998 -2.9000001,9 4.4000003,15.8 11.8000003,15.8 4.6,0 12.3,-3 12.3,-12.6 0,-7.1 -3.5,-13.9 -5.5,-15.4 z m -6.9,23.1 c -3.7,0 -6.7,-3.1 -6.7,-7 0,-3.9 3,-7 6.7,-7 3.7,0 6.7,3.1 6.7,7 0,3.8 -3,7 -6.7,7 z"
class="st0" /><path
style="fill:#9e529f"
id="path4698"
d="m 24.075479,-7.6293945e-7 c -0.5,0 -1.8,2.49999996293945 -1.8,3.59999996293945 0,1.5 1,2 1.8,2 0.8,0 1.8,-0.5 1.8,-2 -0.1,-1.1 -1.4,-3.59999996293945 -1.8,-3.59999996293945 z"
class="st1" /></svg>

After

Width:  |  Height:  |  Size: 1.5 KiB

View File

@ -9,6 +9,8 @@ Automatic differentiation package - torch.autograd
.. autofunction:: backward
.. autofunction:: grad
Variable
--------
@ -20,7 +22,7 @@ of a couple in-place methods, that would overwrite inputs required for
gradient computation). In most cases Tensors can be safely replaced with
Variables and the code will remain to work just fine. Because of this,
we're not documenting all the operations on variables, and you should
refere to :class:`torch.Tensor` docs for this purpose.
refer to :class:`torch.Tensor` docs for this purpose.
In-place operations on Variables
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -38,8 +40,8 @@ All :class:`Variable` s keep track of in-place operations applied to them, and
if the implementation detects that a variable was saved for backward in one of
the functions, but it was modified in-place afterwards, an error will be raised
once backward pass is started. This ensures that if you're using in-place
functions and not seing any errors, you can be sure that the computed gradients
are correct.
functions and not seeing any errors, you can be sure that the computed
gradients are correct.
.. autoclass:: Variable

View File

@ -74,9 +74,11 @@ author = 'Torch Contributors'
# built documents.
#
# The short X.Y version.
version = '0.1.6'
# TODO: change to [:2] at v1.0
version = 'master (' + torch.__version__ + ' )'
# The full version, including alpha/beta/rc tags.
release = '0.1.6'
# TODO: verify this works as expected
release = 'master'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
@ -111,7 +113,7 @@ html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
#
html_theme_options = {
'collapse_navigation': False,
'display_version': False,
'display_version': True,
'logo_only': True,
}
@ -202,7 +204,10 @@ from sphinx.util.docfields import TypedField
from sphinx import addnodes
def patched_make_field(self, types, domain, items):
def patched_make_field(self, types, domain, items, **kw):
# `kw` catches `env=None` needed for newer sphinx while maintaining
# backwards compatibility when passed along further down!
# type: (List, unicode, Tuple) -> nodes.field
def handle_item(fieldarg, content):
par = nodes.paragraph()
@ -222,7 +227,7 @@ def patched_make_field(self, types, domain, items):
typename = typename.replace('float', 'python:float')
typename = typename.replace('type', 'python:type')
par.extend(self.make_xrefs(self.typerolename, domain, typename,
addnodes.literal_emphasis))
addnodes.literal_emphasis, **kw))
else:
par += fieldtype
par += nodes.Text(')')

View File

@ -25,3 +25,10 @@ Streams and events
.. autoclass:: Event
:members:
NVIDIA Tools Extension (NVTX)
-----------------------------
.. autofunction:: torch.cuda.nvtx.mark
.. autofunction:: torch.cuda.nvtx.range_push
.. autofunction:: torch.cuda.nvtx.range_pop

View File

@ -5,3 +5,9 @@ torch.utils.data
.. autoclass:: Dataset
.. autoclass:: TensorDataset
.. autoclass:: DataLoader
.. autoclass:: torch.utils.data.sampler.Sampler
.. autoclass:: torch.utils.data.sampler.SequentialSampler
.. autoclass:: torch.utils.data.sampler.RandomSampler
.. autoclass:: torch.utils.data.sampler.SubsetRandomSampler
.. autoclass:: torch.utils.data.sampler.WeightedRandomSampler
.. autoclass:: torch.utils.data.distributed.DistributedSampler

165
docs/source/distributed.rst Normal file
View File

@ -0,0 +1,165 @@
.. role:: hidden
:class: hidden-section
Distributed communication package - torch.distributed
=====================================================
.. automodule:: torch.distributed
.. currentmodule:: torch.distributed
Currently torch.distributed supports three backends, each with
different capabilities. The table below shows which functions are available
for use with CPU / CUDA tensors.
MPI supports cuda only iff the implementation used to build PyTorch supports it.
+------------+-----------+-----------+-----------+
| Backend | ``tcp`` | ``gloo`` | ``mpi`` |
+------------+-----+-----+-----+-----+-----+-----+
| Device | CPU | GPU | CPU | GPU | CPU | GPU |
+============+=====+=====+=====+=====+=====+=====+
| send | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| recv | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| broadcast | ✓ | ✘ | ✓ | ✓ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| all_reduce | ✓ | ✘ | ✓ | ✓ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| reduce | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| all_gather | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| gather | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| scatter | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| barrier | ✓ | ✘ | ✓ | ✓ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
Initialization
--------------
The package needs to be initialized using the :func:`torch.distributed.init_process_group`
function before calling any other methods.
.. autofunction:: init_process_group
.. autofunction:: get_rank
.. autofunction:: get_world_size
--------------------------------------------------------------------------------
Currently three initialization methods are supported:
TCP initialization
^^^^^^^^^^^^^^^^^^
Initialization will utilize a network address reachable from all processes.
If the address belongs to one of the machines, initialization requires that all processes
have manually specified ranks.
Alternatively, the address has to be a valid IP multicast address, in which case,
ranks can be assigned automatically. Multicast initialization also supports
a ``group_name`` argument, which allows you to use the same address for multiple jobs,
as long as they use different group names.
::
import torch.distributed as dist
# Use address of one of the machines
dist.init_process_group(init_method='tcp://10.1.1.20:23456', rank=args.rank, world_size=4)
# or a multicast address - rank will be assigned automatically if unspecified
dist.init_process_group(init_method='tcp://[ff15:1e18:5d4c:4cf0:d02d:b659:53ba:b0a7]:23456',
world_size=4)
Shared file-system initialization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Another initialization method makes use of a file system shared and visible from
all machines in a group. The URL should start with ``file://`` and contain a path
to a non-existent file (in an existing directory) on a shared file system.
This initialization method also supports a ``group_name`` argument, which allows you to
use the same shared file path for multiple jobs, as long as they use different
group names.
.. warning::
This method assumes that the file system supports locking using ``fcntl`` - most
local systems and NFS support it.
::
import torch.distributed as dist
# Rank will be assigned automatically if unspecified
dist.init_process_group(init_method='file:///mnt/nfs/sharedfile', world_size=4,
group_name=args.group)
Environment variable initialization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This method will read the configuration from environment variables, allowing
one to fully customize how the information is obtained. The variables to be set
are:
* ``MASTER_PORT`` - required; has to be a free port on machine with rank 0
* ``MASTER_ADDR`` - required (except for rank 0); address of rank 0 node
* ``WORLD_SIZE`` - required; can be set either here, or in a call to init function
* ``RANK`` - required; can be set either here, or in a call to init function
The machine with rank 0 will be used to set up all connections.
This is the default method, meaning that ``init_method`` does not have to be specified (or
can be ``env://``).
Groups
------
By default collectives operate on the default group (also called the world) and
require all processes to enter the distributed function call. However, some workloads can benefit
from more fine-grained communication. This is where distributed groups come
into play. :func:`~torch.distributed.new_group` function can be
used to create new groups, with arbitrary subsets of all processes. It returns
an opaque group handle that can be given as a ``group`` argument to all collectives
(collectives are distributed functions to exchange information in certain well-known programming patterns).
.. autofunction:: new_group
Point-to-point communication
----------------------------
.. autofunction:: send
.. autofunction:: recv
:func:`~torch.distributed.isend` and :func:`~torch.distributed.irecv`
return distributed request objects when used. In general, the type of this object is unspecified
as they should never be created manually, but they are guaranteed to support two methods:
* ``is_completed()`` - returns True if the operation has finished
* ``wait()`` - will block the process until the operation is finished.
``is_completed()`` is guaranteed to return True once it returns.
.. autofunction:: isend
.. autofunction:: irecv
Collective functions
--------------------
.. autofunction:: broadcast
.. autofunction:: all_reduce
.. autofunction:: reduce
.. autofunction:: all_gather
.. autofunction:: gather
.. autofunction:: scatter
.. autofunction:: barrier

View File

@ -24,11 +24,13 @@ PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
torch
tensors
sparse
storage
nn
optim
torch.autograd <autograd>
torch.multiprocessing <multiprocessing>
torch.distributed <distributed>
torch.legacy <legacy>
cuda
ffi

View File

@ -83,6 +83,6 @@ the current process group, and will keep track of all shared memory allocations.
Once all processes connected to it exit, it will wait a moment to ensure there
will be no new connections, and will iterate over all shared memory files
allocated by the group. If it finds that any of them still exist, they will be
deallocated. We've tested this method and it prooved to be robust to various
deallocated. We've tested this method and it proved to be robust to various
failures. Still, if your system has high enough limits, and ``file_descriptor``
is a supported strategy, we do not recommend switching to this one.

View File

@ -22,6 +22,24 @@ Containers
.. autoclass:: Module
:members:
:hidden:`Sequential`
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: Sequential
:members:
:hidden:`ModuleList`
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: ModuleList
:members:
:hidden:`ParameterList`
~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: ParameterList
:members:
Convolution Layers
----------------------------------
@ -132,6 +150,65 @@ Pooling Layers
.. autoclass:: LPPool2d
:members:
:hidden:`AdaptiveMaxPool1d`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: AdaptiveMaxPool1d
:members:
:hidden:`AdaptiveMaxPool2d`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: AdaptiveMaxPool2d
:members:
:hidden:`AdaptiveAvgPool1d`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: AdaptiveAvgPool1d
:members:
:hidden:`AdaptiveAvgPool2d`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: AdaptiveAvgPool2d
:members:
Padding Layers
--------------
:hidden:`ReflectionPad2d`
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: ReflectionPad2d
:members:
:hidden:`ReplicationPad2d`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: ReplicationPad2d
:members:
:hidden:`ReplicationPad3d`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: ReplicationPad3d
:members:
:hidden:`ZeroPad2d`
~~~~~~~~~~~~~~~~~~~
.. autoclass:: ZeroPad2d
:members:
:hidden:`ConstantPad2d`
~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: ConstantPad2d
:members:
Non-linear Activations
----------------------------------
@ -153,6 +230,12 @@ Non-linear Activations
.. autoclass:: ELU
:members:
:hidden:`SELU`
~~~~~~~~~~~~~~
.. autoclass:: SELU
:members:
:hidden:`PReLU`
~~~~~~~~~~~~~~~
@ -259,6 +342,23 @@ Normalization layers
.. autoclass:: BatchNorm3d
:members:
:hidden:`InstanceNorm1d`
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: InstanceNorm1d
:members:
:hidden:`InstanceNorm2d`
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: InstanceNorm2d
:members:
:hidden:`InstanceNorm3d`
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: InstanceNorm3d
:members:
Recurrent layers
----------------------------------
@ -330,6 +430,12 @@ Dropout layers
.. autoclass:: Dropout3d
:members:
:hidden:`AlphaDropout`
~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: AlphaDropout
:members:
Sparse layers
----------------------------------
@ -340,6 +446,27 @@ Sparse layers
.. autoclass:: Embedding
:members:
:hidden:`EmbeddingBag`
~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: EmbeddingBag
:members:
Distance functions
----------------------------------
:hidden:`CosineSimilarity`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: CosineSimilarity
:members:
:hidden:`PairwiseDistance`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: PairwiseDistance
:members:
Loss functions
----------------------------------
@ -368,6 +495,12 @@ Loss functions
.. autoclass:: NLLLoss
:members:
:hidden:`PoissonNLLLoss`
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: PoissonNLLLoss
:members:
:hidden:`NLLLoss2d`
~~~~~~~~~~~~~~~~~~~
@ -386,6 +519,12 @@ Loss functions
.. autoclass:: BCELoss
:members:
:hidden:`BCEWithLogitsLoss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: BCEWithLogitsLoss
:members:
:hidden:`MarginRankingLoss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -434,6 +573,12 @@ Loss functions
.. autoclass:: MultiMarginLoss
:members:
:hidden:`TripletMarginLoss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: TripletMarginLoss
:members:
Vision layers
----------------
@ -444,28 +589,80 @@ Vision layers
.. autoclass:: PixelShuffle
:members:
:hidden:`Upsample`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: Upsample
:members:
:hidden:`UpsamplingNearest2d`
~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: UpsamplingNearest2d
:members:
:hidden:`UpsamplingBilinear2d`
~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: UpsamplingBilinear2d
:members:
Multi-GPU layers
----------------
DataParallel layers (multi-GPU, distributed)
--------------------------------------------
:hidden:`DataParallel`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: DataParallel
:members:
:hidden:`DistributedDataParallel`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: torch.nn.parallel.DataParallel
:members:
Utilities
---------
:hidden:`clip_grad_norm`
~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: torch.nn.utils.clip_grad_norm
:hidden:`weight_norm`
~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: torch.nn.utils.weight_norm
:hidden:`remove_weight_norm`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: torch.nn.utils.remove_weight_norm
.. currentmodule:: torch.nn.utils.rnn
:hidden:`PackedSequence`
~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: torch.nn.utils.rnn.PackedSequence
:hidden:`pack_padded_sequence`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: torch.nn.utils.rnn.pack_padded_sequence
:hidden:`pad_packed_sequence`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: torch.nn.utils.rnn.pad_packed_sequence
torch.nn.functional
===================
@ -557,6 +754,27 @@ Pooling functions
.. autofunction:: lp_pool2d
:hidden:`adaptive_max_pool1d`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: adaptive_max_pool1d
:hidden:`adaptive_max_pool2d`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: adaptive_max_pool2d
:hidden:`adaptive_avg_pool1d`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: adaptive_avg_pool1d
:hidden:`adaptive_avg_pool2d`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: adaptive_avg_pool2d
Non-linear activation functions
-------------------------------
@ -586,6 +804,11 @@ Non-linear activation functions
.. autofunction:: elu
:hidden:`selu`
~~~~~~~~~~~~~~
.. autofunction:: selu
:hidden:`leaky_relu`
~~~~~~~~~~~~~~~~~~~~
@ -664,6 +887,11 @@ Normalization functions
.. autofunction:: batch_norm
:hidden:`normalize`
~~~~~~~~~~~~~~~~~~~~
.. autofunction:: normalize
Linear functions
----------------
@ -680,35 +908,123 @@ Dropout functions
.. autofunction:: dropout
Loss functions
--------------
:hidden:`nll_loss`
~~~~~~~~~~~~~~~~~~
.. autofunction:: nll_loss
:hidden:`kl_div`
~~~~~~~~~~~~~~~~
.. autofunction:: kl_div
:hidden:`cross_entropy`
:hidden:`alpha_dropout`
~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: cross_entropy
.. autofunction:: alpha_dropout
:hidden:`dropout2d`
~~~~~~~~~~~~~~~~~~~
.. autofunction:: dropout2d
:hidden:`dropout3d`
~~~~~~~~~~~~~~~~~~~
.. autofunction:: dropout3d
Distance functions
----------------------------------
:hidden:`pairwise_distance`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: pairwise_distance
:hidden:`cosine_similarity`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: cosine_similarity
Loss functions
--------------
:hidden:`binary_cross_entropy`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: binary_cross_entropy
:hidden:`poisson_nll_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: poisson_nll_loss
:hidden:`cosine_embedding_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: cosine_embedding_loss
:hidden:`cross_entropy`
~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: cross_entropy
:hidden:`hinge_embedding_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: hinge_embedding_loss
:hidden:`kl_div`
~~~~~~~~~~~~~~~~
.. autofunction:: kl_div
:hidden:`l1_loss`
~~~~~~~~~~~~~~~~~
.. autofunction:: l1_loss
:hidden:`mse_loss`
~~~~~~~~~~~~~~~~~~
.. autofunction:: mse_loss
:hidden:`margin_ranking_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: margin_ranking_loss
:hidden:`multilabel_margin_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: multilabel_margin_loss
:hidden:`multilabel_soft_margin_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: multilabel_soft_margin_loss
:hidden:`multi_margin_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: multi_margin_loss
:hidden:`nll_loss`
~~~~~~~~~~~~~~~~~~
.. autofunction:: nll_loss
:hidden:`binary_cross_entropy_with_logits`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: binary_cross_entropy_with_logits
:hidden:`smooth_l1_loss`
~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: smooth_l1_loss
:hidden:`soft_margin_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: soft_margin_loss
:hidden:`triplet_margin_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: triplet_margin_loss
Vision functions
----------------
@ -716,3 +1032,51 @@ Vision functions
~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: pixel_shuffle
:hidden:`pad`
~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: pad
:hidden:`upsample`
~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: upsample
:hidden:`upsample_nearest`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: upsample_nearest
:hidden:`upsample_bilinear`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: upsample_bilinear
:hidden:`grid_sample`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: grid_sample
:hidden:`affine_grid`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: affine_grid
torch.nn.init
=============
.. currentmodule:: torch.nn.init
.. autofunction:: calculate_gain
.. autofunction:: uniform
.. autofunction:: normal
.. autofunction:: constant
.. autofunction:: eye
.. autofunction:: dirac
.. autofunction:: xavier_uniform
.. autofunction:: xavier_normal
.. autofunction:: kaiming_uniform
.. autofunction:: kaiming_normal
.. autofunction:: orthogonal
.. autofunction:: sparse

View File

@ -67,18 +67,18 @@ model. ``volatile`` also determines that ``requires_grad is False``.
Volatile differs from :ref:`excluding-requires_grad` in how the flag propagates.
If there's even a single volatile input to an operation, its output is also
going to be volatile. Volatility spreads accross the graph much easier than
going to be volatile. Volatility spreads across the graph much easier than
non-requiring gradient - you only need a **single** volatile leaf to have a
volatile output, while you need **all** leaves to not require gradient to
have an output the doesn't require gradient. Using volatile flag you don't
have an output that doesn't require gradient. Using volatile flag you don't
need to change any settings of your model parameters to use it for
inference. It's enough to create a volatile input, and this will ensure that
no intermediate states are saved.
.. code::
>>> regular_input = Variable(torch.randn(5, 5))
>>> volatile_input = Variable(torch.randn(5, 5), volatile=True)
>>> regular_input = Variable(torch.randn(1, 3, 227, 227))
>>> volatile_input = Variable(torch.randn(1, 3, 227, 227), volatile=True)
>>> model = torchvision.models.resnet18(pretrained=True)
>>> model(regular_input).requires_grad
True
@ -86,21 +86,28 @@ no intermediate states are saved.
False
>>> model(volatile_input).volatile
True
>>> model(volatile_input).creator is None
>>> model(volatile_input).grad_fn is None
True
How autograd encodes the history
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Each Variable has a ``.creator`` attribute, that points to the function, of
which it is an output. This is an entry point to a directed acyclic graph (DAG)
consisting of :class:`Function` objects as nodes, and references between them
being the edges. Every time an operation is performed, a new :class:`Function`
representing it is instantiated, its :meth:`~torch.autograd.Function.forward`
method is called, and its output :class:`Variable` s creators are set to it.
Then, by following the path from any :class:`Variable` to the leaves, it is
possible to reconstruct the sequence of operations that has created the data,
and automatically compute the gradients.
Autograd is reverse automatic differentiation system. Conceptually,
autograd records a graph recording all of the operations that created
the data as you execute operations, giving you a directed acyclic graph
whose leaves are the input variables and roots are the output variables.
By tracing this graph from roots to leaves, you can automatically
compute the gradients using the chain rule.
Internally, autograd represents this graph as a graph of
:class:`Function` objects (really expressions), which can be
:meth:`~torch.autograd.Function.apply` ed to compute the result of
evaluating the graph. When computing the forwards pass, autograd
simultaneously performs the requested computations and builds up a graph
representing the function that computes the gradient (the ``.grad_fn``
attribute of each :class:`Variable` is an entry point into this graph).
When the forwards pass is completed, we evaluate this graph in the
backwards pass to compute the gradients.
An important thing to note is that the graph is recreated from scratch at every
iteration, and this is exactly what allows for using arbitrary Python control

View File

@ -0,0 +1,113 @@
.. _broadcasting-semantics:
Broadcasting semantics
======================
Many PyTorch operations support :any:`NumPy Broadcasting Semantics <numpy.doc.broadcasting>`.
In short, if a PyTorch operation supports broadcast, then its Tensor arguments can be
automatically expanded to be of equal sizes (without making copies of the data).
General semantics
-----------------
Two tensors are "broadcastable" if the following rules hold:
- Each tensor has at least one dimension.
- When iterating over the dimension sizes, starting at the trailing dimension,
the dimension sizes must either be equal, one of them is 1, or one of them
does not exist.
For Example::
>>> x=torch.FloatTensor(5,7,3)
>>> y=torch.FloatTensor(5,7,3)
# same shapes are always broadcastable (i.e. the above rules always hold)
>>> x=torch.FloatTensor()
>>> y=torch.FloatTensor(2,2)
# x and y are not broadcastable, because x does not have at least 1 dimension
# can line up trailing dimensions
>>> x=torch.FloatTensor(5,3,4,1)
>>> y=torch.FloatTensor( 3,1,1)
# x and y are broadcastable.
# 1st trailing dimension: both have size 1
# 2nd trailing dimension: y has size 1
# 3rd trailing dimension: x size == y size
# 4th trailing dimension: y dimension doesn't exist
# but:
>>> x=torch.FloatTensor(5,2,4,1)
>>> y=torch.FloatTensor( 3,1,1)
# x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3
If two tensors :attr:`x`, :attr:`y` are "broadcastable", the resulting tensor size
is calculated as follows:
- If the number of dimensions of :attr:`x` and :attr:`y` are not equal, prepend 1
to the dimensions of the tensor with fewer dimensions to make them equal length.
- Then, for each dimension size, the resulting dimension size is the max of the sizes of
:attr:`x` and :attr:`y` along that dimension.
For Example::
# can line up trailing dimensions to make reading easier
>>> x=torch.FloatTensor(5,1,4,1)
>>> y=torch.FloatTensor( 3,1,1)
>>> (x+y).size()
torch.Size([5, 3, 4, 1])
# but not necessary:
>>> x=torch.FloatTensor(1)
>>> y=torch.FloatTensor(3,1,7)
>>> (x+y).size()
torch.Size([3, 1, 7])
>>> x=torch.FloatTensor(5,2,4,1)
>>> y=torch.FloatTensor(3,1,1)
>>> (x+y).size()
RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1
In-place semantics
------------------
One complication is that in-place operations do not allow the in-place tensor to change shape
as a result of the broadcast.
For Example::
>>> x=torch.FloatTensor(5,3,4,1)
>>> y=torch.FloatTensor(3,1,1)
>>> (x.add_(y)).size()
torch.Size([5, 3, 4, 1])
# but:
>>> x=torch.FloatTensor(1,3,1)
>>> y=torch.FloatTensor(3,1,7)
>>> (x.add_(y)).size()
RuntimeError: The expanded size of the tensor (1) must match the existing size (7) at non-singleton dimension 2.
Backwards compatibility
-----------------------
Prior versions of PyTorch allowed certain pointwise functions to execute on tensors with different shapes,
as long as the number of elements in each tensor was equal. The pointwise operation would then be carried
out by viewing each tensor as 1-dimensional. PyTorch now supports broadcasting and the "1-dimensional"
pointwise behavior is considered deprecated and will generate a Python warning in cases where tensors are
not broadcastable, but have the same number of elements.
Note that the introduction of broadcasting can cause backwards incompatible changes in the case where
two tensors do not have the same shape, but are broadcastable and have the same number of elements.
For Example::
>>> torch.add(torch.ones(4,1), torch.randn(4))
would previously produce a Tensor with size: torch.Size([4,1]), but now produces a Tensor with size: torch.Size([4,4]).
In order to help identify cases in your code where backwards incompatibilities introduced by broadcasting may exist,
you may set `torch.utils.backcompat.broadcast_warning.enabled` to `True`, which will generate a python warning
in such cases.
For Example::
>>> torch.utils.backcompat.broadcast_warning.enabled=True
>>> torch.add(torch.ones(4,1), torch.ones(4))
__main__:1: UserWarning: self and other do not have the same shape, but are broadcastable, and have the same number of elements.
Changing behavior in a backwards incompatible manner to broadcasting rather than viewing as 1-dimensional.

View File

@ -1,3 +1,5 @@
.. _cuda-semantics:
CUDA semantics
==============
@ -10,7 +12,7 @@ of your selected device, and the results will be always placed in on the same
device as the tensor.
Cross-GPU operations are not allowed by default, with the only exception of
:meth:`~torch.Tensor.copy_`. Unless you enable peer-to-peer memory accesses
:meth:`~torch.Tensor.copy_`. Unless you enable peer-to-peer memory accesses,
any attempts to launch ops on tensors spread across different devices will
raise an error.
@ -60,4 +62,22 @@ Just pass an additional ``async=True`` argument to a :meth:`~torch.Tensor.cuda`
call. This can be used to overlap data transfers with computation.
You can make the :class:`~torch.utils.data.DataLoader` return batches placed in
pinned memory by passing ``pinned=True`` to its constructor.
pinned memory by passing ``pin_memory=True`` to its constructor.
.. _cuda-nn-dataparallel-instead:
Use nn.DataParallel instead of multiprocessing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Most use cases involving batched input and multiple GPUs should default to using
:class:`~torch.nn.DataParallel` to utilize more than one GPU. Even with the GIL,
a single python process can saturate multiple GPUs.
As of version 0.1.9, large numbers of GPUs (8+) might not be fully utilized.
However, this is a known issue that is under active development. As always,
test your use case.
There are significant caveats to using CUDA models with
:mod:`~torch.multiprocessing`; unless care is taken to meet the data handling
requirements exactly, it is likely that your program will have incorrect or
undefined behavior.

View File

@ -13,31 +13,28 @@ Extending :mod:`torch.autograd`
Adding operations to :mod:`~torch.autograd` requires implementing a new
:class:`Function` subclass for each operation. Recall that :class:`Function` s
are what :mod:`~torch.autograd` uses to compute the results and gradients, and
encode the operation history. Every new function requires you to implement 3
encode the operation history. Every new function requires you to implement 2
methods:
- ``__init__`` (*optional*) - if your operation is parametrized by/uses
objects different than :class:`Variable` s, you should pass them as arguments
to ``__init__``. For example, ``AddConstant`` function takes a scalar to add,
while ``Transpose`` requires specifying which two dimensions to swap. If your
function doesn't require any additional parameters, you can skip it.
- :meth:`~Function.forward` - the code that performs the operation. It can take
as many arguments as you want, with some of them being
optional, if you specify the default values. Keep in mind that only
:class:`Variable` s will be passed in here. You can return either a single
:class:`Variable` output, or a :class:`tuple` of :class:`Variable` s if there
are multiple. Also, please refer to the docs of :class:`Function` to find
descriptions of useful methods that can be called only from
:meth:`~Function.forward`.
as many arguments as you want, with some of them being optional, if you
specify the default values. All kinds of Python objects are accepted here.
:class:`Variable` arguments will be converted to :class:`Tensor` s before the
call, and their use will be registered in the graph. Note that this logic won't
traverse lists/dicts/any other data structures and will only consider Variables
that are direct arguments to the call. You can return either a single
:class:`Tensor` output, or a :class:`tuple` of :class:`Tensor` s if there are
multiple outputs. Also, please refer to the docs of :class:`Function` to find
descriptions of useful methods that can be called only from :meth:`~Function.forward`.
- :meth:`~Function.backward` - gradient formula. It will be given
as many arguments as there were outputs, with each of them representing
gradient w.r.t. that output. It should return as many :class:`Tensor` s as
there were inputs, with each of them containing the gradient w.r.t.
corresponding input. If you inputs didn't require gradient (see
:attr:`~Variable.needs_input_grad`), or it was non-differentiable, you
can return :class:`None`. Also, if you have optional arguments to
:meth:`~Variable.forward` you can return more gradients than there were
inputs, as long as they're all :any:`python:None`.
as many :class:`Variable` arguments as there were outputs, with each of them
representing gradient w.r.t. that output. It should return as many
:class:`Variable` s as there were inputs, with each of them containing the
gradient w.r.t. its corresponding input. If your inputs didn't require
gradient (see :attr:`~Variable.needs_input_grad`), or were non-:class:`Variable`
objects, you can return :class:`python:None`. Also, if you have optional
arguments to :meth:`~Variable.forward` you can return more gradients than there
were inputs, as long as they're all :any:`python:None`.
Below you can find code for a ``Linear`` function from :mod:`torch.nn`, with
additional comments::
@ -45,22 +42,25 @@ additional comments::
# Inherit from Function
class Linear(Function):
# Note that both forward and backward are @staticmethods
@staticmethod
# bias is an optional argument
def forward(self, input, weight, bias=None):
self.save_for_backward(input, weight, bias)
def forward(ctx, input, weight, bias=None):
ctx.save_for_backward(input, weight, bias)
output = input.mm(weight.t())
if bias is not None:
output += bias.unsqueeze(0).expand_as(output)
return output
# This function has only a single output, so it gets only one gradient
def backward(self, grad_output):
@staticmethod
def backward(ctx, grad_output):
# This is a pattern that is very convenient - at the top of backward
# unpack saved_tensors and initialize all gradients w.r.t. inputs to
# None. Thanks to the fact that additional trailing Nones are
# ignored, the return statement is simple even when the function has
# optional inputs.
input, weight, bias = self.saved_tensors
input, weight, bias = ctx.saved_variables
grad_input = grad_weight = grad_bias = None
# These needs_input_grad checks are optional and there only to
@ -76,15 +76,40 @@ additional comments::
return grad_input, grad_weight, grad_bias
Now, to make it easier to use these custom ops, we recommend wrapping them in
small helper functions::
Now, to make it easier to use these custom ops, we recommend aliasing their
``apply`` method::
def linear(input, weight, bias=None):
# First braces create a Function object. Any arguments given here
# will be passed to __init__. Second braces will invoke the __call__
# operator, that will then use forward() to compute the result and
# return it.
return Linear()(input, weight, bias)
linear = Linear.aply
Here, we give an additional example of a function that is parametrized by
non-Variable arguments::
class MulConstant(Function):
@staticmethod
def forward(ctx, tensor, constant):
# ctx is a context object that can be used to stash information
for backward computation
ctx.constant = constant
return tensor * constant
@staticmethod
def backward(ctx, grad_output):
# We return as many input gradients as there were arguments.
# Gradients of non-Tensor arguments to forward must be None.
return grad_output * ctx.constant, None
You probably want to check if the backward method you implemented actually
computes the derivatives of your function. It is possible by comparing with
numerical approximations using small finite differences::
from torch.autograd import gradcheck
# gradchek takes a tuple of tensor as input, check if your gradient
# evaluated with these tensors are close enough to numerical
# approximations and returns True if they all verify this condition.
input = (Variable(torch.randn(20,20).double(), requires_grad=True), Variable(torch.randn(30,20).double(), requires_grad=True),)
test = gradcheck(Linear.apply, input, eps=1e-6, atol=1e-4)
print(test)
Extending :mod:`torch.nn`
-------------------------
@ -132,7 +157,7 @@ This is how a ``Linear`` module can be implemented::
# nn.Parameters can never be volatile and, different than Variables,
# they require gradients by default.
self.weight = nn.Parameter(torch.Tensor(input_features, output_features))
if bias is not None:
if bias:
self.bias = nn.Parameter(torch.Tensor(output_features))
else:
# You should always register all possible parameters, but the

View File

@ -33,6 +33,8 @@ by the CUDA runtime.
kinds of data should be done with care. Note that this restriction doesn't
apply to shared CPU memory.
See also: :ref:`cuda-nn-dataparallel-instead`
Best practices and tips
-----------------------
@ -100,11 +102,6 @@ example below as well::
from model import MyModel
def train(model):
# This for loop will break sharing of gradient buffers. It's not
# necessary but it reduces the contention, and has a small memory cost
# (equal to the total size of parameters).
for param in model.parameters():
param.grad.data = param.grad.data.clone()
# Construct data_loader, optimizer, etc.
for data, labels in data_loader:
optimizer.zero_grad()

View File

@ -0,0 +1,34 @@
Serialization semantics
=======================
Best practices
--------------
.. _recommend-saving-models:
Recommended approach for saving a model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There are two main approaches for serializing and restoring a model.
The first (recommended) saves and loads only the model parameters::
torch.save(the_model.state_dict(), PATH)
Then later::
the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))
The second saves and loads the entire model::
torch.save(the_model, PATH)
Then later::
the_model = torch.load(PATH)
However in this case, the serialized data is bound to the specific classes
and the exact directory structure used, so it can break in various ways when
used in other projects, or after some serious refactors.

View File

@ -114,3 +114,21 @@ Algorithms
:members:
.. autoclass:: SGD
:members:
How to adjust Learning Rate
---------------------------
:mod:`torch.optim.lr_scheduler` provides several methods to adjust the learning
rate based on the number of epoches. :class:`torch.optim.lr_scheduler.ReduceLROnPlateau`
allows dynamic learning rate reducing based on some validation measurements.
.. autoclass:: torch.optim.lr_scheduler.LambdaLR
:members:
.. autoclass:: torch.optim.lr_scheduler.StepLR
:members:
.. autoclass:: torch.optim.lr_scheduler.MultiStepLR
:members:
.. autoclass:: torch.optim.lr_scheduler.ExponentialLR
:members:
.. autoclass:: torch.optim.lr_scheduler.ReduceLROnPlateau
:members:

114
docs/source/sparse.rst Normal file
View File

@ -0,0 +1,114 @@
.. currentmodule:: torch.sparse
torch.sparse
============
.. warning::
This API is currently experimental and may change in the near future.
Torch supports sparse tensors in COO(rdinate) format, which can
efficiently store and process tensors for which the majority of elements
are zeros.
A sparse tensor is represented as a pair of dense tensors: a tensor
of values and a tensor of indices. A sparse tensor can be constructed
by providing these two tensors, as well as the size of the sparse tensor
(which cannot be inferred from these tensors!)
>>> i = torch.LongTensor([[0, 1], [2, 0]])
>>> v = torch.FloatTensor([3, 4])
>>> torch.sparse.FloatTensor(i, v, torch.Size([2,3])).to_dense()
0 0 3
4 0 0
[torch.FloatTensor of size 2x2]
You can also construct hybrid sparse tensors, where only the first n
dimensions are sparse, and the rest of the dimensions are dense.
>>> i = torch.LongTensor([[2, 4]])
>>> v = torch.FloatTensor([[1, 3], [5, 7]])
>>> torch.sparse.FloatTensor(i, v).to_dense()
0 0
0 0
1 3
0 0
5 7
[torch.FloatTensor of size 5x2]
An empty sparse tensor can be constructed by specifying its size:
>>> torch.sparse.FloatTensor(2, 3)
SparseFloatTensor of size 2x3 with indices:
[torch.LongTensor with no dimension]
and values:
[torch.FloatTensor with no dimension]
.. note::
Our sparse tensor format permits *uncoalesced* sparse tensors, where
there may be duplicate coordinates in the indices; in this case,
the interpretation is that the value at that index is the sum of all
duplicate value entries. Uncoalesced tensors permit us to implement
certain operators more efficiently.
For the most part, you shouldn't have to care whether or not a
sparse tensor is coalesced or not, as most operations will work
identically given a coalesced or uncoalesced sparse tensor.
However, there are two cases in which you may need to care.
First, if you repeatedly perform an operation that can produce
duplicate entries (e.g., :func:`torch.sparse.FloatTensor.add`), you
should occasionally coalesce your sparse tensors to prevent
them from growing too large.
Second, some operators will produce different values depending on
whether or not they are coalesced or not (e.g.,
:func:`torch.sparse.FloatTensor._values` and
:func:`torch.sparse.FloatTensor._indices`, as well as
:func:`torch.Tensor._sparse_mask`). These operators are
prefixed by an underscore to indicate that they reveal internal
implementation details and should be used with care, since code
that works with coalesced sparse tensors may not work with
uncoalesced sparse tensors; generally speaking, it is safest
to explicitly coalesce before working with these operators.
For example, suppose that we wanted to implement an operator
by operating directly on :func:`torch.sparse.FloatTensor._values`.
Multiplication by a scalar can be implemented in the obvious way,
as multiplication distributes over addition; however, square root
cannot be implemented directly, since ``sqrt(a + b) != sqrt(a) +
sqrt(b)`` (which is what would be computed if you were given an
uncoalesced tensor.)
.. class:: FloatTensor()
.. method:: add
.. method:: add_
.. method:: clone
.. method:: dim
.. method:: div
.. method:: div_
.. method:: get_device
.. method:: hspmm
.. method:: mm
.. method:: mul
.. method:: mul_
.. method:: resizeAs_
.. method:: size
.. method:: spadd
.. method:: spmm
.. method:: sspaddmm
.. method:: sspmm
.. method:: sub
.. method:: sub_
.. method:: t_
.. method:: toDense
.. method:: transpose
.. method:: transpose_
.. method:: zero_
.. method:: coalesce
.. method:: is_coalesced
.. method:: _indices
.. method:: _values
.. method:: _nnz

View File

@ -13,7 +13,7 @@ Data type CPU tensor GPU tensor
======================== =========================== ================================
32-bit floating point :class:`torch.FloatTensor` :class:`torch.cuda.FloatTensor`
64-bit floating point :class:`torch.DoubleTensor` :class:`torch.cuda.DoubleTensor`
16-bit floating point N/A :class:`torch.cuda.HalfTensor`
16-bit floating point :class:`torch.HalfTensor` :class:`torch.cuda.HalfTensor`
8-bit integer (unsigned) :class:`torch.ByteTensor` :class:`torch.cuda.ByteTensor`
8-bit integer (signed) :class:`torch.CharTensor` :class:`torch.cuda.CharTensor`
16-bit integer (signed) :class:`torch.ShortTensor` :class:`torch.cuda.ShortTensor`
@ -196,9 +196,10 @@ view of a storage and defines numeric operations on it.
.. automethod:: lt
.. automethod:: lt_
.. automethod:: map_
.. automethod:: masked_copy_
.. automethod:: masked_scatter_
.. automethod:: masked_fill_
.. automethod:: masked_select
.. automethod:: matmul
.. automethod:: max
.. automethod:: mean
.. automethod:: median

View File

@ -8,6 +8,7 @@ Tensors
.. autofunction:: is_storage
.. autofunction:: set_default_tensor_type
.. autofunction:: numel
.. autofunction:: set_printoptions
Creation Ops
@ -20,6 +21,7 @@ Creation Ops
.. autofunction:: rand
.. autofunction:: randn
.. autofunction:: randperm
.. autofunction:: arange
.. autofunction:: range
.. autofunction:: zeros
@ -38,6 +40,7 @@ Indexing, Slicing, Joining, Mutating Ops
.. autofunction:: t
.. autofunction:: transpose
.. autofunction:: unbind
.. autofunction:: unsqueeze
Random sampling
@ -158,6 +161,8 @@ BLAS and LAPACK Operations
.. autofunction:: addr
.. autofunction:: baddbmm
.. autofunction:: bmm
.. autofunction:: btrifact
.. autofunction:: btrisolve
.. autofunction:: dot
.. autofunction:: eig
.. autofunction:: gels
@ -165,6 +170,7 @@ BLAS and LAPACK Operations
.. autofunction:: ger
.. autofunction:: gesv
.. autofunction:: inverse
.. autofunction:: matmul
.. autofunction:: mm
.. autofunction:: mv
.. autofunction:: orgqr

View File

@ -1,109 +1,112 @@
torchvision.datasets
====================
The following dataset loaders are available:
All datasets are subclasses of :class:`torch.utils.data.Dataset`
i.e, they have ``__getitem__`` and ``__len__`` methods implemented.
Hence, they can all be passed to a :class:`torch.utils.data.DataLoader`
which can load multiple samples parallelly using ``torch.multiprocessing`` workers.
For example: ::
imagenet_data = torchvision.datasets.ImageFolder('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
batch_size=4,
shuffle=True,
num_workers=args.nThreads)
- `COCO (Captioning and Detection)`_
- `LSUN Classification`_
- `ImageFolder`_
- `Imagenet-12`_
- `CIFAR10 and CIFAR100`_
The following datasets are available:
Datasets have the API:
.. contents:: Datasets
:local:
- ``__getitem__``
- ``__len__``
They all subclass from ``torch.utils.data.Dataset``
Hence, they can all be multi-threaded (python multiprocessing) using
standard torch.utils.data.DataLoader.
All the datasets have almost similar API. They all have two common arguments:
``transform`` and ``target_transform`` to transform the input and target respectively.
For example:
``torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)``
.. currentmodule:: torchvision.datasets
In the constructor, each dataset has a slightly different API as needed,
but they all take the keyword args:
- ``transform`` - a function that takes in an image and returns a
transformed version
- common stuff like ``ToTensor``, ``RandomCrop``, etc. These can be
composed together with ``transforms.Compose`` (see transforms section
below)
- ``target_transform`` - a function that takes in the target and
transforms it. For example, take in the caption string and return a
tensor of word indices.
MNIST
~~~~~
.. autoclass:: MNIST
COCO
~~~~
This requires the `COCO API to be installed`_
.. note ::
These require the `COCO API to be installed`_
Captions:
.. _COCO API to be installed: https://github.com/pdollar/coco/tree/master/PythonAPI
Captions
^^^^^^^^
.. autoclass:: CocoCaptions
:members: __getitem__
:special-members:
Detection
^^^^^^^^^
``dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])``
Example:
.. code:: python
import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
annFile = 'json annotation file',
transform=transforms.ToTensor())
print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample
print("Image Size: ", img.size())
print(target)
Output:
::
Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']
Detection:
^^^^^^^^^^
``dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])``
.. autoclass:: CocoDetection
:members: __getitem__
:special-members:
LSUN
~~~~
``dset.LSUN(db_path, classes='train', [transform, target_transform])``
.. autoclass:: LSUN
:members: __getitem__
:special-members:
- db\_path = root directory for the database files
- classes =
- train - all categories, training set
- val - all categories, validation set
- test - all categories, test set
- [bedroom\_train, church\_train, …] : a list of categories to load
ImageFolder
~~~~~~~~~~~
.. autoclass:: ImageFolder
:members: __getitem__
:special-members:
Imagenet-12
~~~~~~~~~~~
This should simply be implemented with an ``ImageFolder`` dataset.
The data is preprocessed `as described
here <https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset>`__
`Here is an
example <https://github.com/pytorch/examples/blob/27e2a46c1d1505324032b1d94fc6ce24d5b67e97/imagenet/main.py#L48-L62>`__.
CIFAR
~~~~~
``dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)``
.. autoclass:: CIFAR10
:members: __getitem__
:special-members:
``dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)``
STL10
~~~~~
- ``root`` : root directory of dataset where there is folder
``cifar-10-batches-py``
- ``train`` : ``True`` = Training set, ``False`` = Test set
- ``download`` : ``True`` = downloads the dataset from the internet and
puts it in root directory. If dataset already downloaded, do
.. _COCO (Captioning and Detection): #coco
.. _LSUN Classification: #lsun
.. _ImageFolder: #imagefolder
.. _Imagenet-12: #imagenet-12
.. _CIFAR10 and CIFAR100: #cifar
.. _COCO API to be installed: https://github.com/pdollar/coco/tree/master/PythonAPI
.. autoclass:: STL10
:members: __getitem__
:special-members:
SVHN
~~~~~
.. autoclass:: SVHN
:members: __getitem__
:special-members:
PhotoTour
~~~~~~~~~
.. autoclass:: PhotoTour
:members: __getitem__
:special-members:

View File

@ -1,11 +1,12 @@
torchvision.models
===================
.. currentmodule:: torchvision.models
.. currentmodule:: torchvision.models
.. automodule:: torchvision.models
:members: alexnet, resnet18, resnet34, resnet50, resnet101, resnet152,
vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19,
vgg19_bn
vgg19_bn, inception_v3, squeezenet1_0, squeezenet1_1, densenet121,
densenet169, densenet201, densenet161
:undoc-members:

View File

@ -3,3 +3,6 @@ torchvision
The :mod:`torchvision` package consists of popular datasets, model
architectures, and common image transformations for computer vision.
.. automodule:: torchvision
:members:

View File

@ -3,6 +3,8 @@ torchvision.transforms
.. currentmodule:: torchvision.transforms
Transforms are common image transforms. They can be chained together using :class:`Compose`
.. autoclass:: Compose
Transforms on PIL.Image
@ -24,16 +26,22 @@ Transforms on torch.\*Tensor
----------------------------
.. autoclass:: Normalize
:members: __call__
:special-members:
Conversion Transforms
---------------------
.. autoclass:: ToTensor
:members: __call__
:special-members:
.. autoclass:: ToPILImage
:members: __call__
:special-members:
Generic Transofrms
Generic Transforms
------------------
.. autoclass:: Lambda

119
setup.py
View File

@ -15,9 +15,23 @@ import os
from tools.setup_helpers.env import check_env_flag
from tools.setup_helpers.cuda import WITH_CUDA, CUDA_HOME
from tools.setup_helpers.cudnn import WITH_CUDNN, CUDNN_LIB_DIR, CUDNN_INCLUDE_DIR
from tools.setup_helpers.split_types import split_types
DEBUG = check_env_flag('DEBUG')
WITH_DISTRIBUTED = check_env_flag('WITH_DISTRIBUTED')
WITH_DISTRIBUTED = not check_env_flag('NO_DISTRIBUTED')
WITH_DISTRIBUTED_MW = WITH_DISTRIBUTED and check_env_flag('WITH_DISTRIBUTED_MW')
WITH_NCCL = WITH_CUDA and platform.system() != 'Darwin'
SYSTEM_NCCL = False
################################################################################
# Workaround setuptools -Wstrict-prototypes warnings
# I lifted this code from https://stackoverflow.com/a/29634231/23845
################################################################################
import distutils.sysconfig
cfg_vars = distutils.sysconfig.get_config_vars()
for key, value in cfg_vars.items():
if type(value) == str:
cfg_vars[key] = value.replace("-Wstrict-prototypes", "")
################################################################################
# Monkey-patch setuptools to compile in parallel
@ -75,6 +89,8 @@ class build_deps(Command):
build_all_cmd = ['bash', 'torch/lib/build_all.sh']
if WITH_CUDA:
build_all_cmd += ['--with-cuda']
if WITH_NCCL and not SYSTEM_NCCL:
build_all_cmd += ['--with-nccl']
if WITH_DISTRIBUTED:
build_all_cmd += ['--with-distributed']
if subprocess.call(build_all_cmd) != 0:
@ -134,6 +150,16 @@ class build_ext(setuptools.command.build_ext.build_ext):
print('-- Detected CUDA at ' + CUDA_HOME)
else:
print('-- Not using CUDA')
if WITH_NCCL and SYSTEM_NCCL:
print('-- Using system provided NCCL library')
elif WITH_NCCL:
print('-- Building NCCL library')
else:
print('-- Not using NCCL')
if WITH_DISTRIBUTED:
print('-- Building with distributed package ')
else:
print('-- Building without distributed package')
# cwrap depends on pyyaml, so we can't import it earlier
from tools.cwrap import cwrap
@ -144,10 +170,15 @@ class build_ext(setuptools.command.build_ext.build_ext):
from tools.cwrap.plugins.KwargsPlugin import KwargsPlugin
from tools.cwrap.plugins.NullableArguments import NullableArguments
from tools.cwrap.plugins.CuDNNPlugin import CuDNNPlugin
from tools.cwrap.plugins.WrapDim import WrapDim
from tools.cwrap.plugins.AssertNDim import AssertNDim
from tools.cwrap.plugins.Broadcast import Broadcast
from tools.cwrap.plugins.ProcessorSpecificPlugin import ProcessorSpecificPlugin
thp_plugin = THPPlugin()
cwrap('torch/csrc/generic/TensorMethods.cwrap', plugins=[
BoolOption(), thp_plugin, AutoGPU(condition='IS_CUDA'),
ArgcountSortPlugin(), KwargsPlugin()
ProcessorSpecificPlugin(), BoolOption(), thp_plugin,
AutoGPU(condition='IS_CUDA'), ArgcountSortPlugin(), KwargsPlugin(),
AssertNDim(), WrapDim(), Broadcast()
])
cwrap('torch/csrc/cudnn/cuDNN.cwrap', plugins=[
CuDNNPlugin(), NullableArguments()
@ -192,12 +223,12 @@ class clean(distutils.command.clean.clean):
################################################################################
include_dirs = []
library_dirs = []
extra_link_args = []
extra_compile_args = ['-std=c++11', '-Wno-write-strings']
if os.getenv('PYTORCH_BINARY_BUILD') and platform.system() == 'Linux':
print('PYTORCH_BINARY_BUILD found. Static linking libstdc++ on Linux')
extra_compile_args += ['-static-libstdc++']
extra_link_args += ['-static-libstdc++']
extra_compile_args = ['-std=c++11', '-Wno-write-strings',
# Python 2.6 requires -fno-strict-aliasing, see
# http://legacy.python.org/dev/peps/pep-3123/
'-fno-strict-aliasing']
cwd = os.path.dirname(os.path.abspath(__file__))
lib_path = os.path.join(cwd, "torch", "lib")
@ -210,9 +241,10 @@ include_dirs += [
tmp_install_path + "/include/TH",
tmp_install_path + "/include/THPP",
tmp_install_path + "/include/THNN",
tmp_install_path + "/include/ATen",
]
extra_link_args.append('-L' + lib_path)
library_dirs.append(lib_path)
# we specify exact lib names to avoid conflict with lua-torch installs
TH_LIB = os.path.join(lib_path, 'libTH.so.1')
@ -222,7 +254,11 @@ THCS_LIB = os.path.join(lib_path, 'libTHCS.so.1')
THNN_LIB = os.path.join(lib_path, 'libTHNN.so.1')
THCUNN_LIB = os.path.join(lib_path, 'libTHCUNN.so.1')
THPP_LIB = os.path.join(lib_path, 'libTHPP.so.1')
THD_LIB = os.path.join(lib_path, 'libTHD.so.1')
ATEN_LIB = os.path.join(lib_path, 'libATen.so.1')
GLOO_LIB = os.path.join(lib_path, 'libgloo.a')
GLOO_CUDA_LIB = os.path.join(lib_path, 'libgloo_cuda.a')
THD_LIB = os.path.join(lib_path, 'libTHD.a')
NCCL_LIB = os.path.join(lib_path, 'libnccl.so.1')
if platform.system() == 'Darwin':
TH_LIB = os.path.join(lib_path, 'libTH.1.dylib')
THS_LIB = os.path.join(lib_path, 'libTHS.1.dylib')
@ -231,38 +267,50 @@ if platform.system() == 'Darwin':
THNN_LIB = os.path.join(lib_path, 'libTHNN.1.dylib')
THCUNN_LIB = os.path.join(lib_path, 'libTHCUNN.1.dylib')
THPP_LIB = os.path.join(lib_path, 'libTHPP.1.dylib')
THD_LIB = os.path.join(lib_path, 'libTHD.1.dylib')
ATEN_LIB = os.path.join(lib_path, 'libATen.1.dylib')
NCCL_LIB = os.path.join(lib_path, 'libnccl.1.dylib')
if WITH_NCCL and subprocess.call('ldconfig -p | grep libnccl >/dev/null', shell=True) == 0:
SYSTEM_NCCL = True
main_compile_args = ['-D_THP_CORE']
main_libraries = ['shm']
main_link_args = [TH_LIB, THS_LIB, THPP_LIB, THNN_LIB]
main_link_args = [TH_LIB, THS_LIB, THPP_LIB, THNN_LIB, ATEN_LIB]
main_sources = [
"torch/csrc/PtrWrapper.cpp",
"torch/csrc/Module.cpp",
"torch/csrc/Generator.cpp",
"torch/csrc/Size.cpp",
"torch/csrc/Exceptions.cpp",
"torch/csrc/Tensor.cpp",
"torch/csrc/Storage.cpp",
"torch/csrc/DynamicTypes.cpp",
"torch/csrc/byte_order.cpp",
"torch/csrc/utils.cpp",
"torch/csrc/expand_utils.cpp",
"torch/csrc/utils/object_ptr.cpp",
"torch/csrc/utils/tuple_parser.cpp",
"torch/csrc/allocators.cpp",
"torch/csrc/serialization.cpp",
"torch/csrc/autograd/init.cpp",
"torch/csrc/autograd/engine.cpp",
"torch/csrc/autograd/function.cpp",
"torch/csrc/autograd/variable.cpp",
"torch/csrc/autograd/grad_buffer.cpp",
"torch/csrc/autograd/input_buffer.cpp",
"torch/csrc/autograd/python_function.cpp",
"torch/csrc/autograd/python_cpp_function.cpp",
"torch/csrc/autograd/python_variable.cpp",
"torch/csrc/autograd/python_engine.cpp",
"torch/csrc/autograd/python_hook.cpp",
"torch/csrc/autograd/functions/batch_normalization.cpp",
"torch/csrc/autograd/functions/convolution.cpp",
"torch/csrc/autograd/functions/basic_ops.cpp",
"torch/csrc/autograd/functions/tensor.cpp",
"torch/csrc/autograd/functions/accumulate_grad.cpp",
"torch/csrc/autograd/functions/utils.cpp",
"torch/csrc/autograd/functions/init.cpp",
"torch/csrc/nn/THNN_generic.cpp",
]
main_sources += split_types("torch/csrc/Tensor.cpp")
try:
import numpy as np
@ -283,8 +331,11 @@ if WITH_DISTRIBUTED:
"torch/csrc/distributed/Tensor.cpp",
"torch/csrc/distributed/Storage.cpp",
]
extra_compile_args += ['-DWITH_DISTRIBUTED_MW']
include_dirs += [tmp_install_path + "/include/THD"]
main_link_args += [THD_LIB]
if platform.system() == 'Linux':
main_link_args += [GLOO_LIB]
if WITH_CUDA:
cuda_lib_dirs = ['lib64', 'lib']
@ -295,30 +346,42 @@ if WITH_CUDA:
break
include_dirs.append(cuda_include_path)
include_dirs.append(tmp_install_path + "/include/THCUNN")
extra_link_args.append('-L' + cuda_lib_path)
library_dirs.append(cuda_lib_path)
extra_link_args.append('-Wl,-rpath,' + cuda_lib_path)
extra_compile_args += ['-DWITH_CUDA']
extra_compile_args += ['-DCUDA_LIB_PATH=' + cuda_lib_path]
main_libraries += ['cudart']
main_libraries += ['cudart', 'nvToolsExt']
main_link_args += [THC_LIB, THCS_LIB, THCUNN_LIB]
if platform.system() == 'Linux':
main_link_args += [GLOO_CUDA_LIB]
main_sources += [
"torch/csrc/cuda/Module.cpp",
"torch/csrc/cuda/Storage.cpp",
"torch/csrc/cuda/Stream.cpp",
"torch/csrc/cuda/Tensor.cpp",
"torch/csrc/cuda/AutoGPU.cpp",
"torch/csrc/cuda/utils.cpp",
"torch/csrc/cuda/expand_utils.cpp",
"torch/csrc/cuda/serialization.cpp",
]
main_sources += split_types("torch/csrc/cuda/Tensor.cpp")
if WITH_NCCL:
if SYSTEM_NCCL:
main_libraries += ['nccl']
else:
main_link_args += [NCCL_LIB]
extra_compile_args += ['-DWITH_NCCL']
if WITH_CUDNN:
main_libraries += ['cudnn']
include_dirs.append(CUDNN_INCLUDE_DIR)
extra_link_args.append('-L' + CUDNN_LIB_DIR)
library_dirs.append(CUDNN_LIB_DIR)
main_sources += [
"torch/csrc/cudnn/BatchNorm.cpp",
"torch/csrc/cudnn/Conv.cpp",
"torch/csrc/cudnn/cuDNN.cpp",
"torch/csrc/cudnn/GridSampler.cpp",
"torch/csrc/cudnn/AffineGridGenerator.cpp",
"torch/csrc/cudnn/Types.cpp",
"torch/csrc/cudnn/Handles.cpp",
]
@ -328,6 +391,18 @@ if DEBUG:
extra_compile_args += ['-O0', '-g']
extra_link_args += ['-O0', '-g']
if os.getenv('PYTORCH_BINARY_BUILD') and platform.system() == 'Linux':
print('PYTORCH_BINARY_BUILD found. Static linking libstdc++ on Linux')
# get path of libstdc++ and link manually.
# for reasons unknown, -static-libstdc++ doesn't fully link some symbols
CXXNAME = os.getenv('CXX', 'g++')
STDCPP_LIB = subprocess.check_output([CXXNAME, '-print-file-name=libstdc++.a'])
STDCPP_LIB = STDCPP_LIB[:-1]
if type(STDCPP_LIB) != str: # python 3
STDCPP_LIB = STDCPP_LIB.decode(sys.stdout.encoding)
main_link_args += [STDCPP_LIB]
version_script = os.path.abspath("tools/pytorch.version")
extra_link_args += ['-Wl,--version-script=' + version_script]
def make_relative_rpath(path):
if platform.system() == 'Darwin':
@ -340,7 +415,7 @@ def make_relative_rpath(path):
################################################################################
extensions = []
packages = find_packages(exclude=('tools.*',))
packages = find_packages(exclude=('tools', 'tools.*',))
C = Extension("torch._C",
libraries=main_libraries,
@ -348,6 +423,7 @@ C = Extension("torch._C",
language='c++',
extra_compile_args=main_compile_args + extra_compile_args,
include_dirs=include_dirs,
library_dirs=library_dirs,
extra_link_args=extra_link_args + main_link_args + [make_relative_rpath('lib')],
)
extensions.append(C)
@ -386,7 +462,7 @@ if WITH_CUDA:
)
extensions.append(THCUNN)
version = '0.1.9'
version = '0.2.0'
if os.getenv('PYTORCH_BUILD_VERSION'):
assert os.getenv('PYTORCH_BUILD_NUMBER') is not None
version = os.getenv('PYTORCH_BUILD_VERSION') \
@ -400,6 +476,7 @@ else:
setup(name="torch", version=version,
description="Tensors and Dynamic neural networks in Python with strong GPU acceleration",
ext_modules=extensions,
cmdclass={
'build': build,
@ -418,5 +495,5 @@ setup(name="torch", version=version,
'lib/*.h',
'lib/include/TH/*.h', 'lib/include/TH/generic/*.h',
'lib/include/THC/*.h', 'lib/include/THC/generic/*.h']},
install_requires=['pyyaml'],
install_requires=['pyyaml', 'numpy'],
)

View File

@ -1,26 +1,42 @@
import sys
import os
import argparse
import unittest
import warnings
import contextlib
from functools import wraps
from itertools import product
from copy import deepcopy
import torch
import torch.cuda
from torch.autograd import Variable, Function
from torch.autograd import Variable
torch.set_default_tensor_type('torch.DoubleTensor')
SEED = 0
SEED_SET = 0
def run_tests():
def parse_set_seed_once():
global SEED
global SEED_SET
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument('--seed', type=int, default=123)
args, remaining = parser.parse_known_args()
torch.manual_seed(args.seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(args.seed)
if SEED_SET == 0:
torch.manual_seed(args.seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(args.seed)
SEED = args.seed
SEED_SET = 1
remaining = [sys.argv[0]] + remaining
return remaining
def run_tests():
remaining = parse_set_seed_once()
unittest.main(argv=remaining)
@ -30,6 +46,32 @@ try:
except ImportError:
TEST_NUMPY = False
TEST_SCIPY = True
try:
import scipy
except ImportError:
TEST_SCIPY = False
def skipIfNoLapack(fn):
@wraps(fn)
def wrapper(*args, **kwargs):
try:
fn(*args, **kwargs)
except Exception as e:
if 'Lapack library not found' in e.args[0]:
raise unittest.SkipTest('Compiled without Lapack')
raise
return wrapper
def suppress_warnings(fn):
def wrapper(*args, **kwargs):
with warnings.catch_warnings():
warnings.simplefilter("ignore")
fn(*args, **kwargs)
return wrapper
def get_cpu_type(t):
assert t.__module__ == 'torch.cuda'
@ -48,7 +90,7 @@ def to_gpu(obj, type_map={}):
elif torch.is_storage(obj):
return obj.new().resize_(obj.size()).copy_(obj)
elif isinstance(obj, Variable):
assert obj.creator is None
assert obj.is_leaf
t = type_map.get(type(obj.data), get_gpu_type(type(obj.data)))
return Variable(obj.data.clone().type(t), requires_grad=obj.requires_grad)
elif isinstance(obj, list):
@ -89,23 +131,86 @@ def is_iterable(obj):
class TestCase(unittest.TestCase):
precision = 1e-5
def setUp(self):
torch.manual_seed(SEED)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(SEED)
def assertTensorsSlowEqual(self, x, y, prec=None, message=''):
max_err = 0
self.assertEqual(x.size(), y.size())
for index in iter_indices(x):
max_err = max(max_err, abs(x[index] - y[index]))
self.assertLessEqual(max_err, prec, message)
def safeCoalesce(self, t):
tc = t.coalesce()
value_map = {}
for idx, val in zip(t._indices().t(), t._values()):
idx_tup = tuple(idx)
if idx_tup in value_map:
value_map[idx_tup] += val
else:
value_map[idx_tup] = val.clone() if torch.is_tensor(val) else val
new_indices = sorted(list(value_map.keys()))
new_values = [value_map[idx] for idx in new_indices]
if t._values().ndimension() < 2:
new_values = t._values().new(new_values)
else:
new_values = torch.stack(new_values)
new_indices = t._indices().new(new_indices).t()
tg = t.new(new_indices, new_values, t.size())
self.assertEqual(tc._indices(), tg._indices())
self.assertEqual(tc._values(), tg._values())
return tg
def unwrapVariables(self, x, y):
if isinstance(x, Variable) and isinstance(y, Variable):
return x.data, y.data
elif isinstance(x, Variable) or isinstance(y, Variable):
raise AssertionError("cannot compare {} and {}".format(type(x), type(y)))
return x, y
def assertEqual(self, x, y, prec=None, message=''):
if prec is None:
prec = self.precision
if isinstance(x, Variable) and isinstance(y, Variable):
x = x.data
y = y.data
x, y = self.unwrapVariables(x, y)
if torch.is_tensor(x) and torch.is_tensor(y):
max_err = 0
super(TestCase, self).assertEqual(x.size(), y.size())
for index in iter_indices(x):
max_err = max(max_err, abs(x[index] - y[index]))
self.assertLessEqual(max_err, prec, message)
def assertTensorsEqual(a, b):
super(TestCase, self).assertEqual(a.size(), b.size())
if a.numel() > 0:
b = b.type_as(a)
b = b.cuda(device=a.get_device()) if a.is_cuda else b.cpu()
# check that NaNs are in the same locations
nan_mask = a != a
self.assertTrue(torch.equal(nan_mask, b != b))
diff = a - b
diff[nan_mask] = 0
if diff.is_signed():
diff = diff.abs()
max_err = diff.max()
self.assertLessEqual(max_err, prec, message)
self.assertEqual(x.is_sparse, y.is_sparse, message)
if x.is_sparse:
x = self.safeCoalesce(x)
y = self.safeCoalesce(y)
assertTensorsEqual(x._indices(), y._indices())
assertTensorsEqual(x._values(), y._values())
else:
assertTensorsEqual(x, y)
elif type(x) == str and type(y) == str:
super(TestCase, self).assertEqual(x, y)
elif type(x) == set and type(y) == set:
super(TestCase, self).assertEqual(x, y)
elif is_iterable(x) and is_iterable(y):
super(TestCase, self).assertEqual(len(x), len(y))
for x_, y_ in zip(x, y):
self.assertEqual(x_, y_, prec, message)
else:
@ -120,17 +225,22 @@ class TestCase(unittest.TestCase):
if prec is None:
prec = self.precision
if isinstance(x, Variable) and isinstance(y, Variable):
x = x.data
y = y.data
x, y = self.unwrapVariables(x, y)
if torch.is_tensor(x) and torch.is_tensor(y):
max_err = 0
if x.size() != y.size():
super(TestCase, self).assertNotEqual(x.size(), y.size())
for index in iter_indices(x):
max_err = max(max_err, abs(x[index] - y[index]))
self.assertGreaterEqual(max_err, prec, message)
self.assertGreater(x.numel(), 0)
y = y.type_as(x)
y = y.cuda(device=x.get_device()) if x.is_cuda else y.cpu()
nan_mask = x != x
if torch.equal(nan_mask, y != y):
diff = x - y
if diff.is_signed():
diff = diff.abs()
diff[nan_mask] = 0
max_err = diff.max()
self.assertGreaterEqual(max_err, prec, message)
elif type(x) == str and type(y) == str:
super(TestCase, self).assertNotEqual(x, y)
elif is_iterable(x) and is_iterable(y):
@ -149,66 +259,33 @@ class TestCase(unittest.TestCase):
return
raise AssertionError("object not found in iterable")
if sys.version_info < (3, 2):
# assertRaisesRegexp renamed assertRaisesRegex in 3.2
assertRaisesRegex = unittest.TestCase.assertRaisesRegexp
def make_jacobian(input, num_out):
if isinstance(input, Variable) and not input.requires_grad:
return None
if torch.is_tensor(input) or isinstance(input, Variable):
return torch.zeros(input.nelement(), num_out)
def download_file(url, binary=True):
if sys.version_info < (3,):
from urlparse import urlsplit
import urllib2
request = urllib2
error = urllib2
else:
return type(input)(filter(lambda x: x is not None,
(make_jacobian(elem, num_out) for elem in input)))
from urllib.parse import urlsplit
from urllib import request, error
filename = os.path.basename(urlsplit(url)[2])
data_dir = os.path.join(os.path.dirname(__file__), 'data')
path = os.path.join(data_dir, filename)
def iter_tensors(x, only_requiring_grad=False):
if torch.is_tensor(x):
yield x
elif isinstance(x, Variable):
if x.requires_grad or not only_requiring_grad:
yield x.data
else:
for elem in x:
for result in iter_tensors(elem, only_requiring_grad):
yield result
def contiguous(input):
if torch.is_tensor(input):
return input.contiguous()
elif isinstance(input, Variable):
return input.contiguous()
else:
return type(input)(contiguous(e) for e in input)
def get_numerical_jacobian(fn, input, target):
perturbation = 1e-6
# To be able to use .view(-1) input must be contiguous
input = contiguous(input)
output_size = fn(input).numel()
jacobian = make_jacobian(target, output_size)
# It's much easier to iterate over flattened lists of tensors.
# These are reference to the same objects in jacobian, so any changes
# will be reflected in it as well.
x_tensors = [t for t in iter_tensors(target, True)]
j_tensors = [t for t in iter_tensors(jacobian)]
outa = torch.DoubleTensor(output_size)
outb = torch.DoubleTensor(output_size)
# TODO: compare structure
for x_tensor, d_tensor in zip(x_tensors, j_tensors):
flat_tensor = x_tensor.view(-1)
for i in range(flat_tensor.nelement()):
orig = flat_tensor[i]
flat_tensor[i] = orig - perturbation
outa.copy_(fn(input))
flat_tensor[i] = orig + perturbation
outb.copy_(fn(input))
flat_tensor[i] = orig
outb.add_(-1, outa).div_(2 * perturbation)
d_tensor[i] = outb
return jacobian
if os.path.exists(path):
return path
try:
data = request.urlopen(url, timeout=15).read()
with open(path, 'wb' if binary else 'w') as f:
f.write(data)
return path
except error.URLError:
msg = "could not download test file '{}'".format(url)
warnings.warn(msg, RuntimeWarning)
raise unittest.SkipTest(msg)

View File

@ -7,8 +7,8 @@ from itertools import product
import torch
import torch.cuda
from torch.autograd import Variable
from common import TestCase, to_gpu, get_numerical_jacobian, iter_tensors, contiguous, \
freeze_rng_state
from common import TestCase, to_gpu, freeze_rng_state
from torch.autograd.gradcheck import get_numerical_jacobian, iter_tensors, contiguous
import torch.backends.cudnn
# tarfile module tries to obtain a file object name in python 3.3
@ -53,29 +53,31 @@ module_tests = [
dict(
module_name='ReLU',
input_size=(2, 3, 4, 5),
check_inplace=True
check_inplace=True,
),
dict(
module_name='ReLU6',
input_size=(2, 3, 4, 5),
check_inplace=True
check_inplace=True,
),
dict(
module_name='RReLU',
input_size=(1, 2, 2),
test_cuda=False
test_cuda=False,
check_gradgrad=False,
),
dict(
module_name='RReLU',
constructor_args=(0.1, 0.9),
input_size=(4, 4, 5),
desc='with_up_down',
test_cuda=False
test_cuda=False,
check_gradgrad=False,
),
dict(
module_name='Hardtanh',
input_size=(3, 2, 5),
reference_fn=lambda i, _: i.clamp(-1, 1)
reference_fn=lambda i, _: i.clamp(-1, 1),
),
dict(
module_name='Sigmoid',
@ -88,35 +90,35 @@ module_tests = [
dict(
module_name='Softmax',
input_size=(10, 20),
reference_fn=lambda i, _: torch.exp(i).div(torch.exp(i).sum(1).expand(10, 20))
reference_fn=lambda i, _: torch.exp(i).div(torch.exp(i).sum(1, True).expand(10, 20)),
),
dict(
module_name='Softmax2d',
input_size=(1, 3, 10, 20),
reference_fn=lambda i, _: torch.exp(i).div(torch.exp(i).sum(1).expand_as(i))
reference_fn=lambda i, _: torch.exp(i).div(torch.exp(i).sum(1, False)),
),
dict(
module_name='LogSoftmax',
input_size=(10, 20),
reference_fn=lambda i, _: torch.exp(i).div_(torch.exp(i).sum(1).expand(10, 20)).log_()
reference_fn=lambda i, _: torch.exp(i).div_(torch.exp(i).sum(1, True).expand(10, 20)).log_(),
),
dict(
module_name='LogSoftmax',
input_size=(1, 3, 10, 20),
reference_fn=lambda i, _: torch.exp(i).div_(torch.exp(i).sum(1).expand_as(i)).log_(),
desc='multiparam'
reference_fn=lambda i, _: torch.exp(i).div_(torch.exp(i).sum(1, False)).log_(),
desc='multiparam',
),
dict(
module_name='ELU',
constructor_args=(2.,),
input_size=(3, 2, 5),
check_inplace=True
),
# TODO: reference function
dict(
module_name='Hardshrink',
constructor_args=(2.,),
input_size=(4, 3, 2, 4)
input_size=(4, 3, 2, 4),
check_gradgrad=False,
),
dict(
module_name='LeakyReLU',
@ -133,53 +135,89 @@ module_tests = [
dict(
module_name='LogSigmoid',
input_size=(2, 3, 4),
reference_fn=lambda i, _: i.sigmoid().log()
reference_fn=lambda i, _: i.sigmoid().log(),
check_gradgrad=False,
),
dict(
module_name='Softplus',
input_size=(10, 20),
reference_fn=lambda i, _: torch.log(1 + torch.exp(i))
reference_fn=lambda i, _: torch.log(1 + torch.exp(i)),
check_gradgrad=False,
),
dict(
module_name='Softplus',
constructor_args=(2,),
input_size=(10, 20),
reference_fn=lambda i, _: 1. / 2. * torch.log(1 + torch.exp(2 * i)),
desc='beta'
desc='beta',
check_gradgrad=False,
),
dict(
module_name='Softshrink',
input_size=(3, 2, 5)
input_size=(3, 2, 5),
check_gradgrad=False,
),
dict(
module_name='Softshrink',
constructor_args=(1,),
input_size=(3, 2, 5),
desc='lambda'
desc='lambda',
check_gradgrad=False,
),
dict(
module_name='CrossMapLRN2d',
constructor_args=(5, 5e-3, 1e-3, 2),
input_size=(2, 3, 6, 6)
input_size=(2, 3, 6, 6),
check_gradgrad=False,
),
dict(
module_name='PReLU',
input_size=(2, 3, 4, 5)
input_size=(2, 3, 4),
reference_fn=lambda i, p: torch.clamp(i, min=0) + torch.clamp(i, max=0) * p[0][0],
desc='1d',
),
dict(
module_name='PReLU',
constructor_args=(3,),
input_size=(2, 3, 4),
desc='1d_multiparam',
reference_fn=lambda i, p: torch.clamp(i, min=0) + torch.clamp(i, max=0) * p[0][0],
),
dict(
module_name='PReLU',
input_size=(2, 3, 4, 5),
desc='2d',
reference_fn=lambda i, p: torch.clamp(i, min=0) + torch.clamp(i, max=0) * p[0][0],
),
dict(
module_name='PReLU',
constructor_args=(3,),
input_size=(2, 3, 4, 5),
desc='multiparam'
desc='2d_multiparam',
reference_fn=lambda i, p: torch.clamp(i, min=0) + torch.clamp(i, max=0) * p[0][0],
),
dict(
module_name='PReLU',
input_size=(2, 3, 4, 5, 6),
reference_fn=lambda i, p: torch.clamp(i, min=0) + torch.clamp(i, max=0) * p[0][0],
desc='3d',
),
dict(
module_name='PReLU',
constructor_args=(3,),
input_size=(2, 3, 4, 5, 6),
desc='3d_multiparam',
reference_fn=lambda i, p: torch.clamp(i, min=0) + torch.clamp(i, max=0) * p[0][0],
),
dict(
module_name='Softsign',
input_size=(3, 2, 5),
reference_fn=lambda i, _: i.div(1 + torch.abs(i))
reference_fn=lambda i, _: i.div(1 + torch.abs(i)),
),
dict(
module_name='Softmin',
input_size=(10, 20)
input_size=(10, 20),
check_gradgrad=False,
),
dict(
module_name='Tanhshrink',
@ -187,19 +225,32 @@ module_tests = [
),
]
criterion_tests = [
dict(module_name='L1Loss',
input_size=(2, 3, 4),
target=torch.randn(2, 3, 4),
reference_fn=lambda i, t, _: 1. / i.numel() *
sum((a - b).abs().sum() for a, b in zip(i, t))
sum((a - b).abs().sum() for a, b in zip(i, t)),
),
dict(
module_name='NLLLoss',
input=torch.rand(15, 10).log(),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
),
dict(
module_name='NLLLoss',
constructor_args=(None, False),
input=torch.rand(15, 10).log(),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
desc='no_size_average'
),
dict(
module_name='NLLLoss',
constructor_args=(None, True, 2),
input=torch.rand(15, 10).log(),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
desc='ignore_index'
),
dict(
module_name='NLLLoss',
constructor_args=(torch.rand(10),),
@ -207,113 +258,159 @@ criterion_tests = [
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
desc='weights',
),
dict(
module_name='NLLLoss',
constructor_args=(torch.rand(10), True, 2),
input=torch.rand(15, 10).add(1e-2).log(),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
desc='weights_ignore_index'
),
dict(
module_name='NLLLoss',
constructor_args=(torch.rand(10), True, -1),
input=torch.rand(15, 10).add(1e-2).log(),
target=torch.Tensor(15).uniform_().mul(10 + 1).floor().long() - 1,
desc='weights_ignore_index_neg'
),
dict(
module_name='KLDivLoss',
input=torch.rand(10, 10).log(),
target=torch.rand(10, 10)
target=torch.rand(10, 10),
check_gradgrad=False,
),
dict(
module_name='MSELoss',
input=torch.randn(2, 3, 4, 5),
target=torch.randn(2, 3, 4, 5),
reference_fn=lambda i, t, _: (i - t).abs().pow(2).sum() / i.numel()
reference_fn=lambda i, t, _: (i - t).abs().pow(2).sum() / i.numel(),
check_gradgrad=False,
),
dict(
module_name='BCELoss',
input=torch.rand(15, 10).clamp_(1e-2, 1 - 1e-2),
target=torch.randn(15, 10).gt(0).double()
target=torch.randn(15, 10).gt(0).double(),
check_gradgrad=False,
),
dict(
module_name='BCELoss',
constructor_args=(torch.rand(10),),
input=torch.rand(15, 10).clamp_(1e-2, 1 - 1e-2),
target=torch.randn(15, 10).gt(0).double(),
desc='weights'
desc='weights',
check_gradgrad=False,
),
dict(
module_name='CrossEntropyLoss',
input=torch.randn(15, 10),
target=torch.Tensor(15).uniform_().mul(10).floor().long()
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
check_gradgrad=False,
),
dict(
module_name='CrossEntropyLoss',
constructor_args=(torch.rand(10),),
input=torch.randn(15, 10),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
desc='weights'
desc='weights',
check_gradgrad=False,
),
dict(
module_name='NLLLoss2d',
input_size=(2, 3, 5, 5),
target=torch.rand(2, 5, 5).mul(3).floor().long()
target=torch.rand(2, 5, 5).mul(3).floor().long(),
),
dict(
module_name='NLLLoss2d',
constructor_args=(torch.rand(3),),
input_size=(2, 3, 5, 5),
target=torch.rand(2, 5, 5).mul(3).floor().long(),
desc='weights',
),
dict(
module_name='NLLLoss2d',
constructor_args=(None, True, 3),
input_size=(2, 3, 5, 5),
target=torch.rand(2, 5, 5).mul(4).floor().long(),
desc='ignore_index',
),
dict(
module_name='HingeEmbeddingLoss',
input=torch.rand(10),
target=torch.randn(10).gt(0).double().mul_(2).sub(1)
target=torch.randn(10).gt(0).double().mul_(2).sub(1),
check_gradgrad=False,
),
dict(
module_name='HingeEmbeddingLoss',
constructor_args=(0.5,),
input=torch.rand(10),
target=torch.randn(10).gt(0).double().mul_(2).sub(1),
desc='margin'
desc='margin',
check_gradgrad=False,
),
dict(
module_name='MultiLabelMarginLoss',
input_size=(5, 10),
target=torch.rand(5, 10).mul(10).floor().long()
target=torch.rand(5, 10).mul(10).floor().long(),
check_gradgrad=False,
),
dict(
module_name='MultiLabelSoftMarginLoss',
input_size=(5, 10),
target=torch.rand(5, 10).mul(2).floor()
target=torch.rand(5, 10).mul(2).floor(),
check_gradgrad=False,
),
dict(
module_name='MultiLabelSoftMarginLoss',
constructor_args=(torch.rand(10),),
input_size=(5, 10),
target=torch.rand(5, 10).mul(2).floor(),
desc='weights'
desc='weights',
check_gradgrad=False,
),
dict(
module_name='MultiMarginLoss',
input_size=(5, 10),
target=torch.rand(5).mul(8).floor().long()
target=torch.rand(5).mul(8).floor().long(),
check_gradgrad=False,
),
dict(
module_name='SmoothL1Loss',
input_size=(5, 10),
target=torch.randn(5, 10)
target=torch.randn(5, 10),
check_gradgrad=False,
),
dict(
module_name='SoftMarginLoss',
input_size=(5, 5),
target=torch.randn(5, 5).sign()
target=torch.randn(5, 5).sign(),
check_gradgrad=False,
),
dict(
module_name='CosineEmbeddingLoss',
input=(torch.rand(15, 10), torch.rand(15, 10)),
target=torch.randn(15).sign()
target=torch.randn(15).sign(),
check_gradgrad=False,
),
dict(
module_name='CosineEmbeddingLoss',
constructor_args=(0.7,),
input=(torch.rand(15, 10), torch.rand(15, 10)),
target=torch.randn(15).sign(),
desc='margin'
desc='margin',
check_gradgrad=False,
),
dict(
module_name='MarginRankingLoss',
input=(torch.randn(50).mul(10), torch.randn(50).mul(10)),
target=torch.randn(50).sign()
target=torch.randn(50).sign(),
check_gradgrad=False,
),
dict(
module_name='MarginRankingLoss',
constructor_args=(2,),
input=(torch.randn(50).mul(10), torch.randn(50).mul(10)),
target=torch.randn(50).sign(),
desc='margin'
desc='margin',
check_gradgrad=False,
),
]
@ -330,16 +427,20 @@ class NNTestCase(TestCase):
def _flatten_tensors(self, x):
if torch.is_tensor(x):
return x.view(-1)
if x.is_sparse:
return x.to_dense().view(-1)
else:
return x.view(-1)
elif isinstance(x, Variable):
return x.data.view(-1)
return self._flatten_tensors(x.data)
else:
return tuple(self._flatten_tensors(a) for a in x)
def _zero_grad_input(self, input):
if isinstance(input, Variable):
if input.requires_grad:
if input.requires_grad and input.grad is not None:
input.grad.data.zero_()
input.grad.detach_()
elif torch.is_tensor(input):
return
else:
@ -400,12 +501,11 @@ class NNTestCase(TestCase):
return out
res = tuple()
# TODO: enable non-contig tests
input = contiguous(input)
if jacobian_input:
res += get_numerical_jacobian(fw, input, input),
res += get_numerical_jacobian(fw, input, input, eps=1e-6),
if jacobian_parameters:
res += torch.cat(list(get_numerical_jacobian(fw, input, p) for p in param), 0),
res += torch.cat(list(get_numerical_jacobian(fw, input, p, eps=1e-6) for p in param), 0),
return res
def check_jacobian(self, module, input, jacobian_input=True):
@ -652,6 +752,7 @@ class CriterionTest(TestBase):
test_case.assertEqual(out, expected_out)
test_case.check_criterion_jacobian(module, input, self.target)
self._do_extra_tests(test_case, module, input, self.target)
def test_cuda(self, test_case):
if not TEST_CUDA or not self.should_test_cuda:
@ -678,3 +779,6 @@ class CriterionTest(TestBase):
test_case.assertEqual(cpu_gradInput, gpu_gradInput, 4e-4)
except NotImplementedError:
pass
def _do_extra_tests(self, test_case, module, input, target):
pass

View File

@ -6,7 +6,7 @@ COVERAGE=0
while [[ "$#" -gt 0 ]]; do
case "$1" in
-p|--python) PYCMD=$2; shift 2 ;;
-c|--coverage) COVERAGE=1; shift 2 ;;
-c|--coverage) COVERAGE=1; shift 1;;
--) shift; break ;;
*) echo "Invalid argument: $1!" ; exit 1 ;;
esac
@ -55,35 +55,39 @@ $PYCMD test_cuda.py $@
echo "Running NCCL tests"
$PYCMD test_nccl.py $@
################################################################################
if [[ "$TEST_DISTRIBUTED" -eq 1 ]]; then
distributed_set_up() {
export TEMP_DIR="$(mktemp -d)"
rm -rf "$TEMP_DIR/"*
mkdir "$TEMP_DIR/barrier"
mkdir "$TEMP_DIR/test_dir"
}
distributed_set_up() {
export TEMP_DIR="$(mktemp -d)"
rm -rf "$TEMP_DIR/"*
mkdir "$TEMP_DIR/barrier"
mkdir "$TEMP_DIR/test_dir"
}
distributed_tear_down() {
rm -rf "$TEMP_DIR"
}
distributed_tear_down() {
rm -rf "$TEMP_DIR"
}
trap distributed_tear_down EXIT SIGHUP SIGINT SIGTERM
trap distributed_tear_down EXIT SIGHUP SIGINT SIGTERM
echo "Running distributed tests for the TCP backend"
distributed_set_up
BACKEND=tcp WORLD_SIZE=3 $PYCMD ./test_distributed.py
distributed_tear_down
echo "Running distributed tests for the TCP backend"
distributed_set_up
BACKEND=tcp WORLD_SIZE=3 $PYCMD ./test_distributed.py
distributed_tear_down
echo "Running distributed tests for the MPI backend"
distributed_set_up
BACKEND=mpi mpiexec -n 3 $PYCMD ./test_distributed.py
distributed_tear_down
echo "Running distributed tests for the Gloo backend"
distributed_set_up
BACKEND=gloo WORLD_SIZE=3 $PYCMD ./test_distributed.py
distributed_tear_down
if [ -x "$(command -v mpiexec)" ]; then
echo "Running distributed tests for the MPI backend"
distributed_set_up
BACKEND=mpi mpiexec -n 3 $PYCMD ./test_distributed.py
distributed_tear_down
else
echo "Skipping MPI backend tests (MPI not found)"
fi
################################################################################
if [ "$1" == "coverage" ];
then
if [[ $COVERAGE -eq 1 ]]; then
coverage combine
coverage html
fi

File diff suppressed because it is too large Load Diff

View File

@ -7,12 +7,14 @@ import torch
import torch.cuda
import torch.cuda.comm as comm
from test_torch import TestTorch
from common import TestCase, get_gpu_type, to_gpu, freeze_rng_state, run_tests
HAS_CUDA = True
if not torch.cuda.is_available():
print('CUDA not available, skipping tests')
import sys
sys.exit()
TestCase = object # noqa: F811
HAS_CUDA = False
def is_floating(t):
@ -59,6 +61,13 @@ def small_2d_scaled(t, scale=10):
return make_tensor(t, S, S).mul(scale)
def small_2d_oneish(t):
if is_floating(t):
return make_tensor(t, S, S).clamp(min=0.99, max=1.01)
else:
return t(S, S).fill_(1)
def small_3d(t):
return make_tensor(t, S, S, S)
@ -85,23 +94,27 @@ def small_3d_positive(t):
def small_3d_unique(t):
return t(S, S, S).copy_(torch.range(1, S * S * S))
return t(S, S, S).copy_(torch.arange(1, S * S * S + 1).view(S, S, S))
def small_1d_lapack(t):
return t(1, 3).copy_(torch.range(1, 3).view(3))
return t(1, 3).copy_(torch.arange(1, 4).view(3))
def small_2d_lapack(t):
return t(3, 3).copy_(torch.range(1, 9).view(3, 3))
return t(3, 3).copy_(torch.arange(1, 10).view(3, 3))
def small_2d_lapack_skinny(t):
return t(3, 4).copy_(torch.range(1, 12).view(3, 4))
return t(3, 4).copy_(torch.arange(1, 13).view(3, 4))
def small_2d_lapack_fat(t):
return t(4, 3).copy_(torch.range(1, 12).view(4, 3))
return t(4, 3).copy_(torch.arange(1, 13).view(4, 3))
def large_2d_lapack(t):
return t(1000, 1000).normal_()
def new_t(*sizes):
@ -146,12 +159,15 @@ tests = [
('fmod', small_3d, lambda t: [small_3d_positive(t)], 'tensor'),
('chunk', medium_2d, lambda t: [4],),
('chunk', medium_2d, lambda t: [4, 1], 'dim'),
('chunk', medium_2d, lambda t: [4, -2], 'neg_dim'),
('clamp', medium_2d_scaled, lambda t: [-1, 5],),
('clone', medium_2d, lambda t: [],),
('contiguous', medium_2d, lambda t: [],),
('cross', new_t(M, 3, M), lambda t: [new_t(M, 3, M)(t)],),
('cumprod', small_3d, lambda t: [1],),
('cumprod', small_3d, lambda t: [-1], 'neg_dim'),
('cumsum', small_3d, lambda t: [1],),
('cumsum', small_3d, lambda t: [-1], 'neg_dim'),
('dim', small_3d, lambda t: [],),
('dist', small_2d, lambda t: [small_2d(t)],),
('dist', small_2d, lambda t: [small_2d(t), 3], '3_norm'),
@ -179,53 +195,75 @@ tests = [
# TODO: positive case
('kthvalue', small_3d_unique, lambda t: [3],),
('kthvalue', small_3d_unique, lambda t: [3, 1], 'dim'),
('kthvalue', small_3d_unique, lambda t: [3, -1], 'neg_dim'),
('lerp', small_3d, lambda t: [small_3d(t), 0.3],),
('max', small_3d_unique, lambda t: [],),
('max', small_3d_unique, lambda t: [1], 'dim'),
('max', small_3d_unique, lambda t: [-1], 'neg_dim'),
('max', medium_2d, lambda t: [medium_2d(t)], 'elementwise'),
('min', small_3d_unique, lambda t: [],),
('min', small_3d_unique, lambda t: [1], 'dim'),
('min', small_3d_unique, lambda t: [-1], 'neg_dim'),
('min', medium_2d, lambda t: [medium_2d(t)], 'elementwise'),
('mean', small_3d, lambda t: [],),
('mean', small_3d, lambda t: [-1], 'neg_dim'),
('mean', small_3d, lambda t: [1], 'dim'),
('mode', small_3d, lambda t: [],),
('mode', small_3d, lambda t: [1], 'dim'),
('mode', small_3d, lambda t: [-1], 'neg_dim'),
('remainder', small_3d, lambda t: [3], 'value'),
('remainder', small_3d, lambda t: [-3], 'negative_value'),
('remainder', small_3d, lambda t: [small_3d_positive(t)], 'tensor'),
('remainder', small_3d, lambda t: [0 - small_3d_positive(t)], 'negative_tensor'),
('std', small_3d, lambda t: [],),
('std', small_3d, lambda t: [1], 'dim'),
('std', small_3d, lambda t: [-1], 'neg_dim'),
('var', small_3d, lambda t: [],),
('var', small_3d, lambda t: [1], 'dim'),
('var', small_3d, lambda t: [-1], 'neg_dim'),
('ndimension', small_3d, lambda t: [],),
('nelement', small_3d, lambda t: [],),
('numel', small_3d, lambda t: [],),
('narrow', small_3d, lambda t: [1, 3, 2],),
('narrow', small_3d, lambda t: [-1, 3, 2], 'neg_dim'),
('nonzero', small_3d, lambda t: [],),
('norm', small_3d, lambda t: [],),
('norm', small_3d, lambda t: [3], '3_norm'),
('norm', small_3d, lambda t: [3, 0], '3_norm_dim'),
('norm', small_3d, lambda t: [3, -2], '3_norm_neg_dim'),
('ones', small_3d, lambda t: [1, 2, 3, 4, 5],),
('permute', new_t(1, 2, 3, 4), lambda t: [2, 1, 3, 0],),
('prod', small_3d, lambda t: [],),
('prod', small_2d_oneish, lambda t: [],),
('prod', small_3d, lambda t: [1], 'dim'),
('prod', small_3d, lambda t: [-1], 'neg_dim'),
('sum', small_2d, lambda t: [],),
('sum', small_3d, lambda t: [1], 'dim'),
('sum', small_3d, lambda t: [-1], 'neg_dim'),
('renorm', small_3d, lambda t: [2, 1, 1], '2_norm'),
('renorm', small_3d, lambda t: [2, -1, 1], '2_norm_neg_dim'),
('renorm', small_3d, lambda t: [1.5, 1, 1], '1_5_norm'),
('repeat', small_2d, lambda t: [2, 2, 2],),
('size', new_t(1, 2, 3, 4), lambda t: [],),
('size', new_t(1, 2, 3, 4), lambda t: [1], 'dim'),
('size', new_t(1, 2, 3, 4), lambda t: [-2], 'neg_dim'),
('sort', small_3d_unique, lambda t: [],),
('sort', small_3d_unique, lambda t: [1], 'dim'),
('sort', small_3d_unique, lambda t: [-1], 'neg_dim'),
('sort', small_3d_unique, lambda t: [1, True], 'dim_descending'),
('sort', small_3d_unique, lambda t: [-1, True], 'neg_dim_descending'),
('split', small_3d, lambda t: [2],),
('split', small_3d, lambda t: [2, 1], 'dim'),
('split', small_3d, lambda t: [2, -3], 'neg_dim'),
('squeeze', new_t(1, 2, 1, 4), lambda t: [],),
('squeeze', new_t(1, 2, 1, 4), lambda t: [2], 'dim'),
('squeeze', new_t(1, 2, 1, 4), lambda t: [-2], 'neg_dim'),
('t', new_t(1, 2), lambda t: [],),
('transpose', new_t(1, 2, 3, 4), lambda t: [1, 2],),
('transpose', new_t(1, 2, 3, 4), lambda t: [-1, -2], 'neg_dim'),
('to_list', small_3d, lambda t: [],),
('topk', small_3d, lambda t: [2, 1, False, True], 'dim_sort'),
('topk', small_3d, lambda t: [2, 1, True, True], 'dim_desc_sort'),
('topk', small_3d_unique, lambda t: [2, 1, False, True], 'dim_sort'),
('topk', small_3d_unique, lambda t: [2, -1, False, True], 'neg_dim_sort'),
('topk', small_3d_unique, lambda t: [2, 1, True, True], 'dim_desc_sort'),
('trace', medium_2d, lambda t: [],),
('tril', medium_2d, lambda t: [],),
('tril', medium_2d, lambda t: [2], 'positive'),
@ -234,6 +272,7 @@ tests = [
('triu', medium_2d, lambda t: [2], 'positive'),
('triu', medium_2d, lambda t: [-2], 'negative'),
('unsqueeze', new_t(2, 3, 4), lambda t: [2],),
('unsqueeze', new_t(2, 3, 4), lambda t: [-2], 'neg_dim'),
('view', small_3d, lambda t: [100, 10],),
('view_as', small_3d, lambda t: [t(100, 10)],),
('zero', small_3d, lambda t: [],),
@ -245,6 +284,8 @@ tests = [
('qr', small_2d_lapack, lambda t: [], 'square', float_types),
('qr', small_2d_lapack_skinny, lambda t: [], 'skinny', float_types),
('qr', small_2d_lapack_fat, lambda t: [], 'fat', float_types),
('qr', large_2d_lapack, lambda t: [], 'big', float_types),
('inverse', new_t(20, 20), lambda t: [], None, float_types),
]
@ -259,6 +300,7 @@ custom_precision = {
'baddbmm': 1e-4,
'rsqrt': 1e-4,
'cumprod': 1e-4,
'qr': 3e-4,
}
simple_pointwise = [
@ -375,7 +417,7 @@ class TestCuda(TestCase):
self.assertEqual(z.get_device(), 0)
self.assertIs(z.cuda(0), z)
def test_serialization(self):
def test_serialization_array_with_storage(self):
x = torch.randn(5, 5).cuda()
y = torch.IntTensor(2, 5).fill_(0).cuda()
q = [x, y, x, y.storage()]
@ -427,6 +469,32 @@ class TestCuda(TestCase):
def test_broadcast_gpu(self):
self._test_broadcast(torch.randn(5, 5))
@unittest.skipIf(torch.cuda.device_count() < 2, "only one GPU detected")
def test_broadcast_coalesced(self):
numel = 5
num_bytes = numel * 8
tensors = [
torch.randn(numel).long().cuda(),
torch.randn(numel).cuda(),
torch.randn(numel).long().cuda(),
torch.randn(numel).long().cuda(),
torch.randn(numel * 2).int().cuda(), # int is 2x shorter
torch.randn(numel).cuda(),
]
b_tensors = [comm.broadcast(t, (0, 1)) for t in tensors]
for (_, bt), t in zip(b_tensors, tensors):
self.assertEqual(bt.get_device(), 1)
self.assertEqual(bt, t)
self.assertIsInstance(bt, type(t))
bc_tensors = comm.broadcast_coalesced(tensors, (0, 1), buffer_size=num_bytes * 5 // 2)
bc_tensors_t = list(zip(*bc_tensors))
self.assertEqual(b_tensors, bc_tensors_t)
for (_, bt), (_, bct) in zip(b_tensors, bc_tensors_t):
self.assertEqual(bt.get_device(), bct.get_device())
self.assertIsInstance(bct, type(bt))
@unittest.skipIf(torch.cuda.device_count() < 2, "only one GPU detected")
def test_reduce_add(self):
x = torch.randn(5, 5)
@ -437,6 +505,32 @@ class TestCuda(TestCase):
self.assertEqual(result.get_device(), 0)
self.assertEqual(result.cpu(), x + y)
@unittest.skipIf(torch.cuda.device_count() < 2, "only one GPU detected")
def test_reduce_add_coalesced(self):
numel = 5
num_bytes = numel * 8
tensors = [
torch.randn(numel).long().cuda(),
torch.randn(numel).cuda(),
torch.randn(numel).long().cuda(),
torch.randn(numel).long().cuda(),
torch.randn(numel * 2).int().cuda(), # int is 2x shorter
torch.randn(numel).cuda(),
]
dup_tensors = [tensors, list(map(lambda t: t.cuda(1), tensors))]
r_tensors = list(map(comm.reduce_add, zip(*dup_tensors)))
for r, t in zip(r_tensors, tensors):
self.assertEqual(r.get_device(), t.get_device())
self.assertEqual(r, t * 2)
self.assertIsInstance(r, type(t))
rc_tensors = comm.reduce_add_coalesced(dup_tensors, buffer_size=num_bytes * 5 // 2)
self.assertEqual(r_tensors, rc_tensors)
for r, rc in zip(r_tensors, rc_tensors):
self.assertEqual(rc.get_device(), r.get_device())
self.assertIsInstance(rc, type(r))
def _test_scatter(self, input, chunk_sizes=None, dim=0):
if torch.cuda.device_count() < 2:
raise unittest.SkipTest("only one GPU detected")
@ -458,6 +552,9 @@ class TestCuda(TestCase):
def test_scatter_cpu_dim(self):
self._test_scatter(torch.randn(4, 4), dim=1)
def test_scatter_cpu_neg_dim(self):
self._test_scatter(torch.randn(4, 4), dim=-2)
def test_scatter_cpu_sizes(self):
self._test_scatter(torch.randn(6, 4), chunk_sizes=(2, 4))
@ -467,6 +564,9 @@ class TestCuda(TestCase):
def test_scatter_gpu_dim(self):
self._test_scatter(torch.randn(4, 4).cuda(), dim=1)
def test_scatter_gpu_neg_dim(self):
self._test_scatter(torch.randn(4, 4).cuda(), dim=-2)
def test_scatter_gpu_sizes(self):
self._test_scatter(torch.randn(6, 4).cuda(), chunk_sizes=(2, 4))
@ -497,11 +597,22 @@ class TestCuda(TestCase):
def test_from_sequence(self):
seq = [list(range(i * 4, i * 4 + 4)) for i in range(5)]
reference = torch.range(0, 19).resize_(5, 4)
reference = torch.arange(0, 20).resize_(5, 4)
for t in types:
cuda_type = get_gpu_type(t)
self.assertEqual(cuda_type(seq), reference)
def test_torch_manual_seed_seeds_cuda_devices(self):
with freeze_rng_state():
x = torch.zeros(4, 4).float().cuda()
torch.manual_seed(2)
self.assertEqual(torch.cuda.initial_seed(), 2)
x.uniform_()
torch.manual_seed(2)
y = x.clone().uniform_()
self.assertEqual(x, y)
self.assertEqual(torch.cuda.initial_seed(), 2)
def test_manual_seed(self):
with freeze_rng_state():
x = torch.zeros(4, 4).float().cuda()
@ -530,7 +641,7 @@ class TestCuda(TestCase):
self.assertIs(type(x_copy), type(x))
self.assertEqual(x_copy.get_device(), x.get_device())
def test_serialization_empty(self):
def test_serialization_array_with_empty(self):
x = [torch.randn(4, 4).cuda(), torch.cuda.FloatTensor()]
with tempfile.NamedTemporaryFile() as f:
torch.save(x, f)
@ -654,6 +765,38 @@ class TestCuda(TestCase):
self.assertTrue(event.query())
self.assertGreater(start_event.elapsed_time(event), 0)
def test_record_stream(self):
cycles_per_ms = get_cycles_per_ms()
t = torch.FloatTensor([1, 2, 3, 4]).pin_memory()
result = torch.cuda.FloatTensor(t.size())
stream = torch.cuda.Stream()
ptr = [None]
# Performs the CPU->GPU copy in a background stream
def perform_copy():
with torch.cuda.stream(stream):
tmp = t.cuda(async=True)
ptr[0] = tmp.data_ptr()
torch.cuda.current_stream().wait_stream(stream)
tmp.record_stream(torch.cuda.current_stream())
torch.cuda._sleep(int(50 * cycles_per_ms)) # delay the copy
result.copy_(tmp)
perform_copy()
with torch.cuda.stream(stream):
tmp2 = torch.cuda.FloatTensor(t.size())
tmp2.zero_()
self.assertNotEqual(tmp2.data_ptr(), ptr[0], 'allocation re-used to soon')
self.assertEqual(result.tolist(), [1, 2, 3, 4])
# Check that the block will be re-used after the main stream finishes
torch.cuda.current_stream().synchronize()
with torch.cuda.stream(stream):
tmp3 = torch.cuda.FloatTensor(t.size())
self.assertEqual(tmp3.data_ptr(), ptr[0], 'allocation not re-used')
def test_caching_pinned_memory(self):
cycles_per_ms = get_cycles_per_ms()
@ -673,40 +816,121 @@ class TestCuda(TestCase):
self.assertNotEqual(t.data_ptr(), ptr, 'allocation re-used too soon')
self.assertEqual(list(gpu_tensor), [1])
@unittest.skipIf(torch.cuda.device_count() < 2, "only one GPU detected")
def test_caching_pinned_memory_multi_gpu(self):
# checks that the events preventing pinned memory from being re-used
# too early are recorded on the correct GPU
cycles_per_ms = get_cycles_per_ms()
for decl in tests:
for t in types:
tensor = t()
gpu_tensor = get_gpu_type(t)()
if len(decl) == 3:
name, constr, arg_constr = decl
desc = ''
elif len(decl) == 4:
name, constr, arg_constr, desc = decl
elif len(decl) == 5:
name, constr, arg_constr, desc, type_subset = decl
if t not in type_subset:
continue
t = torch.FloatTensor([1]).pin_memory()
ptr = t.data_ptr()
gpu_tensor0 = torch.cuda.FloatTensor([0], device=0)
gpu_tensor1 = torch.cuda.FloatTensor([0], device=1)
precision = custom_precision.get(name, TestCuda.precision)
for inplace in (True, False):
if inplace:
name_inner = name + '_'
else:
name_inner = name
if not hasattr(tensor, name_inner):
continue
if not hasattr(gpu_tensor, name_inner):
print("Ignoring {}, because it's not implemented by torch.cuda.{}".format(
name_inner, gpu_tensor.__class__.__name__))
continue
with torch.cuda.device(1):
torch.cuda._sleep(int(50 * cycles_per_ms)) # delay the copy
gpu_tensor1.copy_(t, async=True)
test_name = 'test_' + t.__name__ + '_' + name_inner
if desc:
test_name += '_' + desc
del t
t = torch.FloatTensor([2]).pin_memory()
self.assertNotEqual(t.data_ptr(), ptr, 'allocation re-used too soon')
with torch.cuda.device(0):
gpu_tensor0.copy_(t, async=True)
self.assertEqual(gpu_tensor1[0], 1)
self.assertEqual(gpu_tensor0[0], 2)
@staticmethod
def _select_broadcastable_dims(dims_full=None):
return TestTorch._select_broadcastable_dims(dims_full)
def test_broadcast(self):
TestTorch._test_broadcast(self, lambda t: t.cuda())
def test_broadcast_fallback(self):
TestTorch._test_broadcast_fallback(self, lambda t: t.cuda())
def test_broadcast_fused_matmul(self):
TestTorch._test_broadcast_fused_matmul(self, lambda t: t.cuda())
def test_broadcast_batched_matmul(self):
TestTorch._test_broadcast_batched_matmul(self, lambda t: t.cuda())
def test_advancedindex(self):
TestTorch._test_advancedindex(self, lambda t: t.cuda())
def test_advancedindex_big(self):
TestTorch._test_advancedindex_big(self, lambda t: t.cuda())
def test_btrifact(self):
TestTorch._test_btrifact(self, lambda t: t.cuda())
def test_btrisolve(self):
TestTorch._test_btrisolve(self, lambda t: t.cuda())
def test_tensor_gather(self):
TestTorch._test_gather(self, lambda t: t.cuda(), False)
def test_tensor_scatter(self):
TestTorch._test_scatter_base(self, lambda t: t.cuda(), 'scatter_', test_bounds=False)
def test_tensor_scatterAdd(self):
TestTorch._test_scatter_base(self, lambda t: t.cuda(), 'scatter_add_', test_bounds=False)
def test_tensor_scatterFill(self):
TestTorch._test_scatter_base(self, lambda t: t.cuda(), 'scatter_', True, test_bounds=False)
def test_arange(self):
for t in ['IntTensor', 'LongTensor', 'FloatTensor', 'DoubleTensor']:
a = torch.cuda.__dict__[t]()
torch.arange(0, 10, out=a)
b = torch.__dict__[t]()
torch.arange(0, 10, out=b)
self.assertEqual(a, b.cuda())
def test_nvtx(self):
# Just making sure we can see the symbols
torch.cuda.nvtx.range_push("foo")
torch.cuda.nvtx.mark("bar")
torch.cuda.nvtx.range_pop()
if HAS_CUDA:
for decl in tests:
for t in types:
tensor = t()
gpu_tensor = get_gpu_type(t)()
if len(decl) == 3:
name, constr, arg_constr = decl
desc = ''
elif len(decl) == 4:
name, constr, arg_constr, desc = decl
elif len(decl) == 5:
name, constr, arg_constr, desc, type_subset = decl
if t not in type_subset:
continue
precision = custom_precision.get(name, TestCuda.precision)
for inplace in (True, False):
if inplace:
name_inner = name + '_'
else:
name_inner = name
if not hasattr(tensor, name_inner):
continue
if not hasattr(gpu_tensor, name_inner):
print("Ignoring {}, because it's not implemented by torch.cuda.{}".format(
name_inner, gpu_tensor.__class__.__name__))
continue
test_name = 'test_' + t.__name__ + '_' + name_inner
if desc:
test_name += '_' + desc
assert not hasattr(TestCuda, test_name), "Duplicated test name: " + test_name
setattr(TestCuda, test_name, compare_cpu_gpu(constr, arg_constr, name_inner, t, precision))
assert not hasattr(TestCuda, test_name), "Duplicated test name: " + test_name
setattr(TestCuda, test_name, compare_cpu_gpu(constr, arg_constr, name_inner, t, precision))
if __name__ == '__main__':
run_tests()

View File

@ -3,8 +3,8 @@ import sys
import torch
import traceback
import unittest
from torch.utils.data import Dataset, TensorDataset, DataLoader
from common import TestCase, run_tests
from torch.utils.data import Dataset, TensorDataset, DataLoader, ConcatDataset
from common import TestCase, run_tests, TEST_NUMPY
from common_nn import TEST_CUDA
@ -31,6 +31,38 @@ class TestTensorDataset(TestCase):
self.assertEqual(l[i], source[i][1])
class TestConcatDataset(TestCase):
def test_concat_two_singletons(self):
result = ConcatDataset([[0], [1]])
self.assertEqual(2, len(result))
self.assertEqual(0, result[0])
self.assertEqual(1, result[1])
def test_concat_two_non_singletons(self):
result = ConcatDataset([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
self.assertEqual(10, len(result))
self.assertEqual(0, result[0])
self.assertEqual(5, result[5])
def test_concat_two_non_singletons_with_empty(self):
# Adding an empty dataset somewhere is correctly handled
result = ConcatDataset([[0, 1, 2, 3, 4],
[],
[5, 6, 7, 8, 9]])
self.assertEqual(10, len(result))
self.assertEqual(0, result[0])
self.assertEqual(5, result[5])
def test_concat_raises_index_error(self):
result = ConcatDataset([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
with self.assertRaises(IndexError):
# this one goes to 11
result[11]
class ErrorDataset(Dataset):
def __init__(self, size):
@ -77,7 +109,7 @@ class TestDataLoader(TestCase):
errors = 0
while True:
try:
it.next()
next(it)
except NotImplementedError:
errors += 1
except StopIteration:
@ -91,6 +123,14 @@ class TestDataLoader(TestCase):
def test_sequential_batch(self):
self._test_sequential(DataLoader(self.dataset, batch_size=2))
def test_growing_dataset(self):
dataset = [torch.ones(4) for _ in range(4)]
dataloader_seq = DataLoader(dataset, shuffle=False)
dataloader_shuffle = DataLoader(dataset, shuffle=True)
dataset.append(torch.ones(4))
self.assertEqual(len(dataloader_seq), 5)
self.assertEqual(len(dataloader_shuffle), 5)
@unittest.skipIf(not TEST_CUDA, "CUDA unavailable")
def test_sequential_pin_memory(self):
loader = DataLoader(self.dataset, batch_size=2, pin_memory=True)
@ -116,6 +156,29 @@ class TestDataLoader(TestCase):
def test_shuffle_batch_workers(self):
self._test_shuffle(DataLoader(self.dataset, batch_size=2, shuffle=True, num_workers=4))
def _test_batch_sampler(self, **kwargs):
# [(0, 1), (2, 3, 4), (5, 6), (7, 8, 9), ...]
batches = []
for i in range(0, 100, 5):
batches.append(tuple(range(i, i + 2)))
batches.append(tuple(range(i + 2, i + 5)))
dl = DataLoader(self.dataset, batch_sampler=batches, **kwargs)
self.assertEqual(len(dl), 40)
for i, (input, _target) in enumerate(dl):
if i % 2 == 0:
offset = i * 5 // 2
self.assertEqual(len(input), 2)
self.assertEqual(input, self.data[offset:offset + 2])
else:
offset = i * 5 // 2
self.assertEqual(len(input), 3)
self.assertEqual(input, self.data[offset:offset + 3])
def test_batch_sampler(self):
self._test_batch_sampler()
self._test_batch_sampler(num_workers=4)
@unittest.skipIf(not TEST_CUDA, "CUDA unavailable")
def test_shuffle_pin_memory(self):
loader = DataLoader(self.dataset, batch_size=2, shuffle=True, num_workers=4, pin_memory=True)
@ -123,6 +186,22 @@ class TestDataLoader(TestCase):
self.assertTrue(input.is_pinned())
self.assertTrue(target.is_pinned())
@unittest.skipIf(not TEST_NUMPY, "numpy unavailable")
def test_numpy(self):
import numpy as np
class TestDataset(torch.utils.data.Dataset):
def __getitem__(self, i):
return np.ones((2, 3, 4)) * i
def __len__(self):
return 1000
loader = DataLoader(TestDataset(), batch_size=12)
batch = next(iter(loader))
self.assertIsInstance(batch, torch.DoubleTensor)
self.assertEqual(batch.size(), torch.Size([12, 2, 3, 4]))
def test_error(self):
self._test_error(DataLoader(ErrorDataset(100), batch_size=2, shuffle=True))
@ -157,6 +236,102 @@ class TestDataLoader(TestCase):
check_len(DataLoader(self.dataset, batch_size=2), 50)
check_len(DataLoader(self.dataset, batch_size=3), 34)
@unittest.skipIf(not TEST_NUMPY, "numpy unavailable")
def test_numpy_scalars(self):
import numpy as np
class ScalarDataset(torch.utils.data.Dataset):
def __init__(self, dtype):
self.dtype = dtype
def __getitem__(self, i):
return self.dtype()
def __len__(self):
return 4
dtypes = {
np.float64: torch.DoubleTensor,
np.float32: torch.FloatTensor,
np.float16: torch.HalfTensor,
np.int64: torch.LongTensor,
np.int32: torch.IntTensor,
np.int16: torch.ShortTensor,
np.int8: torch.CharTensor,
np.uint8: torch.ByteTensor,
}
for dt, tt in dtypes.items():
dset = ScalarDataset(dt)
loader = DataLoader(dset, batch_size=2)
batch = next(iter(loader))
self.assertIsInstance(batch, tt)
class StringDataset(Dataset):
def __init__(self):
self.s = '12345'
def __len__(self):
return len(self.s)
def __getitem__(self, ndx):
return (self.s[ndx], ndx)
class TestStringDataLoader(TestCase):
def setUp(self):
self.dataset = StringDataset()
@unittest.skipIf(not TEST_CUDA, "CUDA unavailable")
def test_shuffle_pin_memory(self):
loader = DataLoader(self.dataset, batch_size=2, shuffle=True, num_workers=4, pin_memory=True)
for batch_ndx, (s, n) in enumerate(loader):
self.assertIsInstance(s[0], str)
self.assertTrue(n.is_pinned())
class DictDataset(Dataset):
def __len__(self):
return 4
def __getitem__(self, ndx):
return {
'a_tensor': torch.Tensor(4, 2).fill_(ndx),
'another_dict': {
'a_number': ndx,
},
}
class TestDictDataLoader(TestCase):
def setUp(self):
self.dataset = DictDataset()
def test_sequential_batch(self):
loader = DataLoader(self.dataset, batch_size=2, shuffle=False)
batch_size = loader.batch_size
for i, sample in enumerate(loader):
idx = i * batch_size
self.assertEqual(set(sample.keys()), {'a_tensor', 'another_dict'})
self.assertEqual(set(sample['another_dict'].keys()), {'a_number'})
t = sample['a_tensor']
self.assertEqual(t.size(), torch.Size([batch_size, 4, 2]))
self.assertTrue((t[0] == idx).all())
self.assertTrue((t[1] == idx + 1).all())
n = sample['another_dict']['a_number']
self.assertEqual(n.size(), torch.Size([batch_size]))
self.assertEqual(n[0], idx)
self.assertEqual(n[1], idx + 1)
@unittest.skipIf(not TEST_CUDA, "CUDA unavailable")
def test_pin_memory(self):
loader = DataLoader(self.dataset, batch_size=2, pin_memory=True)
for batch_ndx, sample in enumerate(loader):
self.assertTrue(sample['a_tensor'].is_pinned())
self.assertTrue(sample['another_dict']['a_number'].is_pinned())
if __name__ == '__main__':
run_tests()

View File

@ -14,7 +14,12 @@ from common import TestCase
BACKEND = os.environ['BACKEND']
TEMP_DIR = os.environ['TEMP_DIR']
MASTER_PORT = '29500'
MASTER_ADDR = '127.0.0.1:' + MASTER_PORT
MASTER_ADDR = '127.0.0.1'
if not dist.is_available():
print('Distributed not available, skipping tests')
sys.exit(0)
@contextmanager
@ -64,7 +69,7 @@ class Barrier(object):
data = f.read()
if int(data) >= cls.barrier_id:
arrived += 1
if arrived == dist.get_num_processes():
if arrived == dist.get_world_size():
break
if time.time() - start_time > timeout:
@ -87,7 +92,7 @@ class _DistTestBase(object):
return (group, group_id, rank)
def _init_global_test(self):
group = [i for i in range(0, dist.get_num_processes())]
group = [i for i in range(0, dist.get_world_size())]
group_id = dist.group.WORLD
rank = dist.get_rank()
return (group, group_id, rank)
@ -96,7 +101,7 @@ class _DistTestBase(object):
def test_get_rank(self):
test_dir = os.path.join(TEMP_DIR, 'test_dir')
pid = str(os.getpid())
num_processes = dist.get_num_processes()
num_processes = dist.get_world_size()
with open(os.path.join(test_dir, pid), 'w') as f:
f.write(str(dist.get_rank()))
@ -117,15 +122,16 @@ class _DistTestBase(object):
self._barrier()
# SEND RECV
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support send/recv")
def test_send_recv(self):
rank = dist.get_rank()
tensor = _build_tensor(rank + 1)
for dest in range(0, dist.get_num_processes()):
for dest in range(0, dist.get_world_size()):
if dest == rank:
continue
dist.send(tensor, dest)
for src in range(0, dist.get_num_processes()):
for src in range(0, dist.get_world_size()):
if src == rank:
continue
tensor = _build_tensor(src + 1, value=-1)
@ -136,29 +142,32 @@ class _DistTestBase(object):
self._barrier()
# SEND RECV ANY SOURCE
@unittest.skipIf(BACKEND == 'gloo',
"Gloo does not support send/recv from any source")
def test_send_recv_any_source(self):
rank = dist.get_rank()
tensor = _build_tensor(10, rank)
for dest in range(0, dist.get_num_processes()):
for dest in range(0, dist.get_world_size()):
if dest == rank:
continue
dist.send(tensor, dest)
recv_ranks = set()
for src in range(0, dist.get_num_processes()):
for src in range(0, dist.get_world_size()):
if src == rank:
continue
tensor = _build_tensor(10, value=-1)
dist.recv(tensor)
recv_ranks.add(tensor.resize_(1)[0])
self.assertEqual(len(recv_ranks), dist.get_num_processes() - 1)
self.assertEqual(len(recv_ranks), dist.get_world_size() - 1)
self._barrier()
# ISEND
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support isend")
def test_isend(self):
rank = dist.get_rank()
world_size = dist.get_num_processes()
world_size = dist.get_world_size()
if rank == 0:
requests = [
@ -175,9 +184,10 @@ class _DistTestBase(object):
self._barrier()
# IRECV
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support irecv")
def test_irecv(self):
rank = dist.get_rank()
world_size = dist.get_num_processes()
world_size = dist.get_world_size()
if rank == 0:
expected_tensors = [_build_tensor(src, -1) for src in range(1, world_size)]
@ -196,13 +206,17 @@ class _DistTestBase(object):
self._barrier()
# BROADCAST
def _test_broadcast_helper(self, group, group_id, rank):
def _test_broadcast_helper(self, group, group_id, rank, cuda=False):
for src in group:
expected_tensor = _build_tensor(src + 1)
if cuda:
expected_tensor = expected_tensor.cuda()
if rank == src:
dist.broadcast(expected_tensor, src, group_id)
else:
tensor = _build_tensor(src + 1, -1)
if cuda:
tensor = tensor.cuda()
dist.broadcast(tensor, src, group_id)
self.assertEqual(tensor, expected_tensor)
@ -212,6 +226,11 @@ class _DistTestBase(object):
group, group_id, rank = self._init_global_test()
self._test_broadcast_helper(group, group_id, rank)
@unittest.skipIf(BACKEND != 'gloo', "Only Gloo backend supports CUDA allReduce")
def test_broadcast_cuda(self):
group, group_id, rank = self._init_global_test()
self._test_broadcast_helper(group, group_id, rank, True)
def test_broadcast_group(self):
group, group_id, rank = self._init_group_test()
self._test_broadcast_helper(group, group_id, rank)
@ -229,12 +248,14 @@ class _DistTestBase(object):
self._barrier()
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_sum(self):
group, group_id, rank = self._init_global_test()
self._test_reduce_helper(
group, group_id, rank, dist.reduce_op.SUM, 2, 10, 2 + (10 * (len(group) - 1))
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_product(self):
group, group_id, rank = self._init_global_test()
self._test_reduce_helper(
@ -242,24 +263,28 @@ class _DistTestBase(object):
2, 10, reduce((lambda x, y: x * y), [10] * (len(group) - 1), 2)
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_min(self):
group, group_id, rank = self._init_global_test()
self._test_reduce_helper(
group, group_id, rank, dist.reduce_op.MIN, 1010, 1, 1
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_max(self):
group, group_id, rank = self._init_global_test()
self._test_reduce_helper(
group, group_id, rank, dist.reduce_op.MAX, -1, 10, 10
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_group_sum(self):
group, group_id, rank = self._init_group_test()
self._test_reduce_helper(
group, group_id, rank, dist.reduce_op.SUM, 2, 10, 2 + (10 * (len(group) - 1))
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_group_product(self):
group, group_id, rank = self._init_group_test()
self._test_reduce_helper(
@ -267,12 +292,14 @@ class _DistTestBase(object):
2, 10, reduce((lambda x, y: x * y), [10] * (len(group) - 1), 2)
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_group_min(self):
group, group_id, rank = self._init_group_test()
self._test_reduce_helper(
group, group_id, rank, dist.reduce_op.MIN, 1010, 1, 1
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_group_max(self):
group, group_id, rank = self._init_group_test()
self._test_reduce_helper(
@ -280,14 +307,19 @@ class _DistTestBase(object):
)
# ALL REDUCE
def _test_all_reduce_helper(self, group, group_id, rank, op, master_value, worker_value, expected_value):
def _test_all_reduce_helper(self, group, group_id, rank, op, master_value,
worker_value, expected_value, cuda=False):
for src in group:
if rank == src:
tensor = _build_tensor(src + 1).fill_(master_value)
if cuda:
tensor = tensor.cuda()
dist.all_reduce(tensor, op, group_id)
self.assertEqual(tensor, _build_tensor(src + 1, expected_value))
else:
tensor = _build_tensor(src + 1).fill_(worker_value)
if cuda:
tensor = tensor.cuda()
dist.all_reduce(tensor, op, group_id)
self.assertEqual(tensor, _build_tensor(src + 1, expected_value))
@ -299,6 +331,13 @@ class _DistTestBase(object):
group, group_id, rank, dist.reduce_op.SUM, 2, 10, 2 + (10 * (len(group) - 1))
)
@unittest.skipIf(BACKEND != 'gloo', "Only Gloo backend supports CUDA allReduce")
def test_all_reduce_sum_cuda(self):
group, group_id, rank = self._init_global_test()
self._test_all_reduce_helper(
group, group_id, rank, dist.reduce_op.SUM, 2, 10, 2 + (10 * (len(group) - 1)), True
)
def test_all_reduce_product(self):
group, group_id, rank = self._init_global_test()
self._test_all_reduce_helper(
@ -348,20 +387,18 @@ class _DistTestBase(object):
for dest in group:
tensor = _build_tensor(dest + 1, -1)
expected_tensor = _build_tensor(dest + 1, rank)
if rank == dest:
tensors = [_build_tensor(dest + 1, i) for i in group]
dist.scatter_send(tensors, tensor, group_id)
self.assertEqual(tensor, expected_tensor)
else:
dist.scatter_recv(tensor, dest, group_id)
self.assertEqual(tensor, expected_tensor)
tensors = [_build_tensor(dest + 1, i) for i in group] if rank == dest else []
dist.scatter(tensor, src=dest, scatter_list=tensors, group=group_id)
self.assertEqual(tensor, expected_tensor)
self._barrier()
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support scatter")
def test_scatter(self):
group, group_id, rank = self._init_global_test()
self._test_scatter_helper(group, group_id, rank)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support scatter")
def test_scatter_group(self):
group, group_id, rank = self._init_group_test()
self._test_scatter_helper(group, group_id, rank)
@ -370,22 +407,21 @@ class _DistTestBase(object):
def _test_gather_helper(self, group, group_id, rank):
for dest in group:
tensor = _build_tensor(dest + 1, rank)
tensors = [_build_tensor(dest + 1, -1) for i in group] if rank == dest else []
dist.gather(tensor, dst=dest, gather_list=tensors, group=group_id)
if rank == dest:
tensors = [_build_tensor(dest + 1, -1) for i in group]
dist.gather_recv(tensors, tensor, group_id)
expected_tensors = [_build_tensor(dest + 1, i) for i in group]
for t1, t2 in zip(tensors, expected_tensors):
self.assertEqual(t1, t2)
else:
dist.gather_send(tensor, dest, group_id)
self._barrier()
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support gather")
def test_gather(self):
group, group_id, rank = self._init_global_test()
self._test_gather_helper(group, group_id, rank)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support gather")
def test_gather_group(self):
group, group_id, rank = self._init_group_test()
self._test_gather_helper(group, group_id, rank)
@ -437,13 +473,13 @@ class _DistTestBase(object):
group, group_id, rank = self._init_group_test()
self._test_barrier_helper(group, group_id, rank)
if BACKEND == 'tcp':
if BACKEND == 'tcp' or BACKEND == 'gloo':
WORLD_SIZE = os.environ['WORLD_SIZE']
class TestTCP(TestCase, _DistTestBase):
class TestTCPOrGloo(TestCase, _DistTestBase):
MANAGER_PROCESS_RANK = -1
JOIN_TIMEOUT = 5
JOIN_TIMEOUT = 10
@staticmethod
def manager_join(fn):
@ -486,7 +522,11 @@ if BACKEND == 'tcp':
def _run(self, rank):
self.rank = rank
dist.init_process_group(backend=BACKEND)
try:
dist.init_process_group(backend=BACKEND)
except RuntimeError as e:
if 'recompile' in e.args[0]:
sys.exit(0)
# self.id() == e.g. '__main__.TestDistributed.test_get_rank'
# We're retreiving a corresponding test and executing it.
getattr(self, self.id().split(".")[2])()

View File

@ -184,16 +184,16 @@ tests = [
OldModuleTest(nn.Sum,
(1,),
input_size=(2, 4, 5),
reference_fn=lambda i, _: i.sum(1).squeeze(1)),
reference_fn=lambda i, _: i.sum(1, keepdim=False)),
OldModuleTest(nn.Sum,
(1, True),
input_size=(2, 4, 5),
reference_fn=lambda i, _: i.sum(1).div(i.size(1)).squeeze(1),
reference_fn=lambda i, _: i.sum(1, keepdim=False).div(i.size(1)),
desc='sizeAverage'),
OldModuleTest(nn.Mean,
(1,),
input_size=(2, 4, 5),
reference_fn=lambda i, _: torch.mean(i, 1).squeeze(1)),
reference_fn=lambda i, _: torch.mean(i, 1, keepdim=False)),
OldModuleTest(lambda: nn.Sequential().add(nn.GradientReversal()).add(nn.GradientReversal()),
input_size=(4, 3, 2, 2),
fullname='GradientReversal'),
@ -233,19 +233,19 @@ tests = [
reference_fn=lambda i, _: torch.bmm(i[0], i[1].view(i[1].size(0), i[1].size(1), 1)).squeeze()),
OldModuleTest(nn.Max,
input_size=(4, 5, 3),
reference_fn=lambda i, _: torch.max(i, 0)[0].squeeze()),
reference_fn=lambda i, _: torch.max(i, 0, False)[0]),
OldModuleTest(nn.Max,
(1,),
input_size=(4, 5, 3),
reference_fn=lambda i, _: torch.max(i, 1)[0].squeeze(),
reference_fn=lambda i, _: torch.max(i, 1, False)[0],
desc='with_dimension'),
OldModuleTest(nn.Min,
input_size=(4, 5, 3),
reference_fn=lambda i, _: torch.min(i, 0)[0].squeeze()),
reference_fn=lambda i, _: torch.min(i, 0, False)[0]),
OldModuleTest(nn.Min,
(1,),
input_size=(4, 5, 3),
reference_fn=lambda i, _: torch.min(i, 1)[0].squeeze(),
reference_fn=lambda i, _: torch.min(i, 1, False)[0],
desc='with_dimension'),
OldModuleTest(nn.MixtureTable,
tuple(),
@ -483,14 +483,14 @@ tests = [
input_size=(1, 2, 4, 4, 4)),
OldModuleTest(nn.VolumetricMaxPooling,
(2, 2, 2),
input_size=(2, 3, 5, 5, 5)),
input=(torch.randn(2, 3, 5, 5, 5) * 1000)),
OldModuleTest(nn.VolumetricMaxPooling,
(2, 2, 2, 2, 2, 2),
input_size=(2, 3, 5, 5, 5),
input=(torch.randn(2, 3, 5, 5, 5) * 1000),
desc='stride'),
OldModuleTest(nn.VolumetricMaxPooling,
(2, 2, 2, 2, 2, 2, 1, 1, 1),
input_size=(2, 3, 5, 5, 5),
input=(torch.randn(2, 3, 5, 5, 5) * 1000),
desc='stride_padding'),
OldModuleTest(nn.VolumetricReplicationPadding,
(1, 2, 3, 4, 5, 6),
@ -532,7 +532,7 @@ for p in (1, 2, 1.5):
(p,),
input_size=(4, 5),
# Eh, we need to use p as a default, so it's passed by value
reference_fn=lambda i, _, p=p: i.div(i.norm(p, 1).expand_as(i)),
reference_fn=lambda i, _, p=p: i.div(i.norm(p, 1, True).expand_as(i)),
desc=str(p)),
)
for p in range(1, 4 + 1):
@ -807,14 +807,14 @@ class TestNN(NNTestCase):
str(m)
output = m.forward(input)
output2 = input.sum(1).expand(4, 5).repeat(num_modules, 1)
output2 = input.sum(1, True).expand(4, 5).repeat(num_modules, 1)
self.assertEqual(output2, output)
gradInput = m.backward(input, torch.ones(output2.size()))
gradInput2 = torch.ones(4, 2).fill_(num_modules * 5)
self.assertEqual(gradInput, gradInput2)
gradWeight = input.sum(0).expand(5, 2)
gradWeight = input.sum(0, keepdim=True).expand(5, 2)
for l in linears:
self.assertEqual(gradWeight, l.gradWeight)
@ -884,8 +884,8 @@ class TestNN(NNTestCase):
output2 = [input, input, input]
self.assertEqual(output2, output)
gradInput = module.backward(input, gradOutput)
gradInput2 = [_gradOutput[0].sum(0).squeeze(0), _gradOutput[1].sum(
0).squeeze(0), [_gradOutput[2].sum(0).squeeze(0)]]
gradInput2 = [_gradOutput[0].sum(0, keepdim=False), _gradOutput[1].sum(
0, keepdim=False), [_gradOutput[2].sum(0, keepdim=False)]]
self.assertTrue(isinstance(gradInput, list))
self.assertFalse(isinstance(gradInput[0], list))
self.assertFalse(isinstance(gradInput[1], list))
@ -1251,6 +1251,8 @@ class TestNN(NNTestCase):
self.assertIsInstance(module, type(reference))
prepare_tests()
if __name__ == '__main__':
prepare_tests()
run_tests()

View File

@ -75,13 +75,13 @@ def autograd_sharing(queue, ready, master_modified):
ready.set()
master_modified.wait()
expected_var = torch.range(1, 25).view(5, 5)
expected_var = torch.arange(1, 26).view(5, 5)
expected_var[0, 0] = 1000
is_ok = var.data.equal(expected_var)
var.data[:] = torch.ones(5, 5)
is_ok &= var.grad.data.equal(torch.zeros(5, 5))
var.grad.data[:] = torch.ones(5, 5)
is_ok &= var.grad is None
var._grad = Variable(torch.ones(5, 5), requires_grad=False)
queue.put(is_ok)
@ -112,9 +112,10 @@ class leak_checker(object):
# test is no more than 4 higher than the 10th available at the
# start. This attempts to catch file descriptor leaks, but allows
# one-off initialization that may use up a file descriptor
available_fds = self._get_next_fds(10)
self.test_case.assertLessEqual(
available_fds[-1] - self.next_fds[-1], 5)
# TODO: Disabled because this check is too flaky
# available_fds = self._get_next_fds(10)
# self.test_case.assertLessEqual(
# available_fds[-1] - self.next_fds[-1], 5)
self.test_case.assertFalse(self.has_shm_files())
return False
@ -149,9 +150,6 @@ class leak_checker(object):
class TestMultiprocessing(TestCase):
def __init__(self, *args, **kwargs):
super(TestMultiprocessing, self).__init__(*args, **kwargs)
def _test_sharing(self, ctx=mp, type=torch.FloatTensor, repeat=1):
def test_fill():
x = torch.zeros(5, 5).type(type)
@ -160,9 +158,11 @@ class TestMultiprocessing(TestCase):
data = [x, x[:, 1]]
q.put(data)
p = ctx.Process(target=simple_fill, args=(q, e))
p.daemon = True
lc.check_pid(p.pid)
p.start()
e.wait()
e.wait(10)
self.assertTrue(e.is_set())
self.assertTrue(data[0].eq(4).all())
self.assertTrue(data[1].eq(4).all())
p.join(1)
@ -172,6 +172,7 @@ class TestMultiprocessing(TestCase):
q = ctx.Queue()
e = ctx.Event()
p = ctx.Process(target=send_tensor, args=(q, e, type))
p.daemon = True
lc.check_pid(p.pid)
p.start()
t1 = q.get()
@ -183,7 +184,7 @@ class TestMultiprocessing(TestCase):
self.assertFalse(p.is_alive())
with leak_checker(self) as lc:
for i in range(repeat):
for _ in range(repeat):
test_fill()
test_receive()
@ -193,7 +194,7 @@ class TestMultiprocessing(TestCase):
data = [x.storage(), x.storage()[1:4], x, x[2], x[:, 1]]
q = ctx.Queue()
q.put(data)
new_data = q.get()
new_data = q.get(timeout=1)
self.assertEqual(new_data, data, 0)
storage_cdata = data[0]._cdata
self.assertEqual(new_data[0]._cdata, storage_cdata)
@ -264,15 +265,15 @@ class TestMultiprocessing(TestCase):
q.get()
with fs_sharing(), leak_checker(self) as lc:
for i in range(TEST_REPEATS):
for _ in range(TEST_REPEATS):
queue_put()
def test_inherit_tensor(self):
class SubProcess(mp.Process):
def __init__(self, tensor):
super(SubProcess, self).__init__()
self.tensor = tensor
self.daemon = True
def run(self):
self.tensor.add_(3)
@ -280,7 +281,7 @@ class TestMultiprocessing(TestCase):
t = torch.zeros(5, 5)
p = SubProcess(t.share_memory_())
p.start()
p.join()
p.join(1)
self.assertEqual(t, torch.ones(5, 5) * 3, 0)
@unittest.skipIf(not TEST_CUDA_IPC, 'CUDA IPC not available')
@ -296,7 +297,8 @@ class TestMultiprocessing(TestCase):
ctx = mp.get_context('spawn')
tensors = []
for i in range(5):
tensors += [torch.range(i * 5, (i * 5) + 4).cuda()]
device = i % 2
tensors += [torch.arange(i * 5, (i + 1) * 5).cuda(device)]
inq = ctx.Queue()
outq = ctx.Queue()
@ -311,8 +313,8 @@ class TestMultiprocessing(TestCase):
for i, tensor in enumerate(tensors):
v, device, tensor_size, storage_size = results[i]
self.assertEqual(v, torch.range(i * 5, (i * 5) + 4).sum())
self.assertEqual(device, 0)
self.assertEqual(v, torch.arange(i * 5, (i + 1) * 5).sum())
self.assertEqual(device, i % 2)
self.assertEqual(tensor_size, 5)
self.assertEqual(storage_size, 5)
@ -357,8 +359,9 @@ class TestMultiprocessing(TestCase):
master_modified = mp.Event()
queue = mp.Queue()
p = mp.Process(target=autograd_sharing, args=(queue, ready, master_modified))
p.daemon = True
p.start()
var.grad.data.zero_()
var._grad = Variable(torch.zeros(5, 5), requires_grad=False)
queue.put(var)
ready.wait()
@ -371,7 +374,8 @@ class TestMultiprocessing(TestCase):
self.assertEqual(var.data, torch.ones(5, 5))
self.assertEqual(var.grad.data, torch.ones(5, 5) * 4)
p.join()
p.join(1)
self.assertFalse(p.is_alive())
def test_variable_sharing(self):
configs = [
@ -380,15 +384,19 @@ class TestMultiprocessing(TestCase):
(False, True),
]
for requires_grad, volatile in configs:
var = Variable(torch.range(1, 25).view(5, 5),
var = Variable(torch.arange(1, 26).view(5, 5),
requires_grad=requires_grad,
volatile=volatile)
self._test_autograd_sharing(var)
def test_parameter_sharing(self):
param = Parameter(torch.range(1, 25).view(5, 5))
param = Parameter(torch.arange(1, 26).view(5, 5))
self._test_autograd_sharing(param)
def test_empty_shared(self):
t = torch.Tensor()
t.share_memory_()
def _test_is_shared(self):
t = torch.randn(5, 5)
self.assertFalse(t.is_shared())

View File

@ -6,12 +6,10 @@ import torch.cuda
from common import TestCase, run_tests
if not torch.cuda.is_available():
print('CUDA not available, skipping tests')
import sys
sys.exit()
nGPUs = torch.cuda.device_count()
if nGPUs == 0:
print('CUDA not available, skipping tests')
TestCase = object # noqa: F811
class TestNCCL(TestCase):

File diff suppressed because it is too large Load Diff

View File

@ -4,8 +4,11 @@ from copy import deepcopy
import torch
import torch.optim as optim
import torch.legacy.optim as old_optim
import torch.nn.functional as F
from torch.optim import SGD
from torch.autograd import Variable
from torch import sparse
from torch.optim.lr_scheduler import LambdaLR, StepLR, MultiStepLR, ExponentialLR, ReduceLROnPlateau
from common import TestCase, run_tests
@ -58,6 +61,49 @@ class TestOptim(TestCase):
self.assertLessEqual(params.data.dist(solution), initial_dist)
def _test_rosenbrock_sparse(self, constructor):
params_t = torch.Tensor([1.5, 1.5])
params = Variable(torch.Tensor([1.5, 1.5]), requires_grad=True)
params_c = Variable(torch.Tensor([1.5, 1.5]), requires_grad=True)
optimizer = constructor([params])
optimizer_c = constructor([params_c])
solution = torch.Tensor([1, 1])
initial_dist = params.data.dist(solution)
def eval(params, sparse_grad, w):
# Depending on w, provide only the x or y gradient
optimizer.zero_grad()
loss = rosenbrock(params)
loss.backward()
grad = drosenbrock(params.data)
# NB: We torture test the optimizer by returning an
# uncoalesced sparse tensor
if w:
i = torch.LongTensor([[0, 0]])
x = grad[0]
v = torch.DoubleTensor([x / 4., x - x / 4.])
else:
i = torch.LongTensor([[1, 1]])
y = grad[1]
v = torch.DoubleTensor([y - y / 4., y / 4.])
x = sparse.DoubleTensor(i, v, torch.Size([2]))
if sparse_grad:
params.grad.data = x
else:
params.grad.data = x.to_dense()
return loss
for i in range(2000):
# Do cyclic coordinate descent
w = i % 2
optimizer.step(functools.partial(eval, params, True, w))
optimizer_c.step(functools.partial(eval, params_c, False, w))
self.assertEqual(params.data, params_c.data)
self.assertLessEqual(params.data.dist(solution), initial_dist)
def _test_basic_cases_template(self, weight, bias, input, constructor):
weight = Variable(weight, requires_grad=True)
bias = Variable(bias, requires_grad=True)
@ -155,6 +201,9 @@ class TestOptim(TestCase):
def _build_params_dict(self, weight, bias, **kwargs):
return [dict(params=[weight]), dict(params=[bias], **kwargs)]
def _build_params_dict_single(self, weight, bias, **kwargs):
return [dict(params=bias, **kwargs)]
def test_sgd(self):
self._test_rosenbrock(
lambda params: optim.SGD(params, lr=1e-3),
@ -174,6 +223,11 @@ class TestOptim(TestCase):
self._build_params_dict(weight, bias, lr=1e-2),
lr=1e-3)
)
self._test_basic_cases(
lambda weight, bias: optim.SGD(
self._build_params_dict_single(weight, bias, lr=1e-2),
lr=1e-3)
)
def test_adam(self):
self._test_rosenbrock(
@ -236,6 +290,11 @@ class TestOptim(TestCase):
lr=1e-1)
)
def test_adagrad_sparse(self):
self._test_rosenbrock_sparse(
lambda params: optim.Adagrad(params, lr=1e-1)
)
def test_adamax(self):
self._test_rosenbrock(
lambda params: optim.Adamax(params, lr=1e-1),
@ -343,5 +402,157 @@ class TestOptim(TestCase):
optim.SGD(Variable(torch.randn(5, 5)), lr=3)
class SchedulerTestNet(torch.nn.Module):
def __init__(self):
super(SchedulerTestNet, self).__init__()
self.conv1 = torch.nn.Conv2d(1, 1, 1)
self.conv2 = torch.nn.Conv2d(1, 1, 1)
def forward(self, x):
return self.conv2(F.relu(self.conv1(x)))
class TestLRScheduler(TestCase):
def setUp(self):
self.net = SchedulerTestNet()
self.opt = SGD(
[{'params': self.net.conv1.parameters()}, {'params': self.net.conv2.parameters(), 'lr': 0.5}],
lr=0.05)
def test_step_lr(self):
# lr = 0.05 if epoch < 3
# lr = 0.005 if 30 <= epoch < 6
# lr = 0.0005 if epoch >= 9
single_targets = [0.05] * 3 + [0.005] * 3 + [0.0005] * 3 + [0.00005] * 3
targets = [single_targets, list(map(lambda x: x * 10, single_targets))]
scheduler = StepLR(self.opt, gamma=0.1, step_size=3)
epochs = 10
self._test(scheduler, targets, epochs)
def test_multi_step_lr(self):
# lr = 0.05 if epoch < 2
# lr = 0.005 if 2 <= epoch < 5
# lr = 0.0005 if epoch < 9
# lr = 0.00005 if epoch >= 9
single_targets = [0.05] * 2 + [0.005] * 3 + [0.0005] * 4 + [0.00005] * 3
targets = [single_targets, list(map(lambda x: x * 10, single_targets))]
scheduler = MultiStepLR(self.opt, gamma=0.1, milestones=[2, 5, 9])
epochs = 10
self._test(scheduler, targets, epochs)
def test_exp_lr(self):
single_targets = [0.05 * (0.9 ** x) for x in range(10)]
targets = [single_targets, list(map(lambda x: x * 10, single_targets))]
scheduler = ExponentialLR(self.opt, gamma=0.9)
epochs = 10
self._test(scheduler, targets, epochs)
def test_reduce_lr_on_plateau1(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 20]
metrics = [10 - i * 0.0167 for i in range(20)]
scheduler = ReduceLROnPlateau(self.opt, threshold_mode='abs', mode='min',
threshold=0.01, patience=5, cooldown=5)
epochs = 10
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau2(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 6 + [0.05] * 7 + [0.005] * 7 + [0.0005] * 2]
metrics = [10 - i * 0.0165 for i in range(22)]
scheduler = ReduceLROnPlateau(self.opt, patience=5, cooldown=0, threshold_mode='abs',
mode='min', threshold=0.1)
epochs = 22
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau3(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * (2 + 6) + [0.05] * (5 + 6) + [0.005] * 4]
metrics = [-0.8] * 2 + [-0.234] * 20
scheduler = ReduceLROnPlateau(self.opt, mode='max', patience=5, cooldown=5,
threshold_mode='abs')
epochs = 22
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau4(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 20]
metrics = [1.5 * (1.025 ** i) for i in range(20)] # 1.025 > 1.1**0.25
scheduler = ReduceLROnPlateau(self.opt, mode='max', patience=3,
threshold_mode='rel', threshold=0.1)
epochs = 20
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau5(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 6 + [0.05] * (5 + 6) + [0.005] * 4]
metrics = [1.5 * (1.005 ** i) for i in range(20)]
scheduler = ReduceLROnPlateau(self.opt, mode='max', threshold_mode='rel',
threshold=0.1, patience=5, cooldown=5)
epochs = 20
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau6(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 20]
metrics = [1.5 * (0.85 ** i) for i in range(20)]
scheduler = ReduceLROnPlateau(self.opt, mode='min', threshold_mode='rel',
threshold=0.1)
epochs = 20
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau7(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 6 + [0.05] * (5 + 6) + [0.005] * 4]
metrics = [1] * 7 + [0.6] + [0.5] * 12
scheduler = ReduceLROnPlateau(self.opt, mode='min', threshold_mode='rel',
threshold=0.1, patience=5, cooldown=5)
epochs = 20
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau8(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 6 + [0.4] * 14, [0.5] * 6 + [0.3] * 14]
metrics = [1.5 * (1.005 ** i) for i in range(20)]
scheduler = ReduceLROnPlateau(self.opt, mode='max', threshold_mode='rel', min_lr=[0.4, 0.3],
threshold=0.1, patience=5, cooldown=5)
epochs = 20
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_lambda_lr(self):
self.opt.param_groups[0]['lr'] = 0.05
self.opt.param_groups[1]['lr'] = 0.4
targets = [[0.05 * (0.9 ** x) for x in range(10)], [0.4 * (0.8 ** x) for x in range(10)]]
scheduler = LambdaLR(self.opt,
lr_lambda=[lambda x1: 0.9 ** x1, lambda x2: 0.8 ** x2])
epochs = 10
self._test(scheduler, targets, epochs)
def _test(self, scheduler, targets, epochs=10):
for epoch in range(epochs):
scheduler.step(epoch)
for param_group, target in zip(self.opt.param_groups, targets):
self.assertAlmostEqual(target[epoch], param_group['lr'],
msg='LR is wrong in epoch {}: expected {}, got {}'.format(
epoch, target[epoch], param_group['lr']), delta=1e-5)
def _test_reduce_lr_on_plateau(self, scheduler, targets, metrics, epochs=10, verbose=False):
for epoch in range(epochs):
scheduler.step(metrics[epoch])
if verbose:
print('epoch{}:\tlr={}'.format(epoch, self.opt.param_groups[0]['lr']))
for param_group, target in zip(self.opt.param_groups, targets):
self.assertAlmostEqual(target[epoch], param_group['lr'],
msg='LR is wrong in epoch {}: expected {}, got {}'.format(
epoch, target[epoch], param_group['lr']), delta=1e-5)
if __name__ == '__main__':
run_tests()

View File

@ -5,55 +5,129 @@ import itertools
import random
import unittest
from common import TestCase, run_tests
from common_nn import TEST_CUDA
from numbers import Number
SparseTensor = sparse.DoubleTensor
def cpu_only(inner):
def outer(self, *args, **kwargs):
if self.is_cuda:
raise unittest.SkipTest("Test is CPU-only")
inner(self, *args, **kwargs)
return outer
def cuda_only(inner):
def outer(self, *args, **kwargs):
if not self.is_cuda:
raise unittest.SkipTest("Test is GPU-only")
inner(self, *args, **kwargs)
return outer
class TestSparse(TestCase):
@staticmethod
def _gen_sparse(d, nnz, with_size):
v = torch.randn(nnz)
if isinstance(with_size, Number):
i = (torch.rand(d, nnz) * with_size).type(torch.LongTensor)
x = SparseTensor(i, v)
else:
i = torch.rand(d, nnz) * \
torch.Tensor(with_size).repeat(nnz, 1).transpose(0, 1)
i = i.type(torch.LongTensor)
x = SparseTensor(i, v, torch.Size(with_size))
def setUp(self):
# These parameters control the various ways we can run the test.
# We will subclass and override this method to implement CUDA
# tests
self.is_cuda = False
self.is_uncoalesced = False
self.IndexTensor = torch.LongTensor
self.ValueTensor = torch.DoubleTensor
self.SparseTensor = torch.sparse.DoubleTensor
return x, i, v
def _gen_sparse(self, d, nnz, with_size):
# TODO: Consider implementing this in the CUDA case by directly
# performing the operations on the GPU. You won't be able to
# use torch.rand/torch.randn in this case because they are
# CPU-only. If you do this, you can remove the is_cuda branch
# at the end.
#
# If you do this, be sure to update assert_uncoalesced too
if isinstance(with_size, Number):
with_size = [with_size] * d
if self.is_uncoalesced:
# We want to generate a tensor with a lot of uncoalesced
# entries to stress test whether or not we handle this
# (subtle) case correctly
v_size = [nnz * 2] + list(with_size[d:])
v = torch.randn(*v_size)
r = torch.rand(d, nnz)
# Repeat the indexes, so every position shows up twice
i = torch.cat([r, r], dim=1) * \
torch.Tensor(with_size[:d]).repeat(nnz * 2, 1).transpose(0, 1)
i = i.type(torch.LongTensor)
x = torch.sparse.DoubleTensor(i, v, torch.Size(with_size))
self.assert_uncoalesced(x)
else:
# Generate a sparse tensor with d sparse dimensions; the
# rest the dimensions with_size[d:] are dense.
v_size = [nnz] + list(with_size[d:])
v = torch.randn(*v_size)
i = torch.rand(d, nnz) * \
torch.Tensor(with_size[:d]).repeat(nnz, 1).transpose(0, 1)
i = i.type(torch.LongTensor)
x = torch.sparse.DoubleTensor(i, v, torch.Size(with_size))
if self.is_cuda:
return x.cuda(), i.cuda(), v.cuda()
else:
return x, i.clone(), v.clone()
def assert_uncoalesced(self, x):
"""
Test if a CPU tensor is uncoalesced. This is used to ensure
correctness of the uncoalesced tensor generation algorithm.
"""
assert not x.is_coalesced()
# Strategy: construct a new sparse tensor with the raw value
# field overwritten to a tensor of ones, coalesce it, and then
# check if any value entries are > 1 (which indicates that the
# original was uncoalesced.)
i = x._indices().clone()
v = x._values().clone().fill_(1)
y = torch.sparse.DoubleTensor(i, v, x.size())
z = self.safeCoalesce(y)
assert (z._values() > 1).sum() > 0
def randn(self, *args, **kwargs):
"""
Variant of torch.randn that also works in the TEST_CUDA case.
"""
# TODO: Put this in torch.cuda.randn
return self.ValueTensor(*args, **kwargs).normal_()
def test_basic(self):
x, i, v = self._gen_sparse(3, 10, 100)
self.assertEqual(i, x.indices())
self.assertEqual(v, x.values())
self.assertEqual(i, x._indices())
self.assertEqual(v, x._values())
x, i, v = self._gen_sparse(3, 10, [100, 100, 100])
self.assertEqual(i, x.indices())
self.assertEqual(v, x.values())
self.assertEqual(i, x._indices())
self.assertEqual(v, x._values())
self.assertEqual(x.ndimension(), 3)
self.assertEqual(x.nnz(), 10)
self.assertEqual(x.coalesce()._nnz(), 10)
for i in range(3):
self.assertEqual(x.size(i), 100)
# Make sure we can access empty indices / values
x = SparseTensor()
self.assertEqual(x.indices().numel(), 0)
self.assertEqual(x.values().numel(), 0)
x = self.SparseTensor()
self.assertEqual(x._indices().numel(), 0)
self.assertEqual(x._values().numel(), 0)
def test_to_dense(self):
i = torch.LongTensor([
i = self.IndexTensor([
[0, 1, 2, 2],
[0, 0, 0, 3],
[0, 0, 1, 4],
])
v = torch.Tensor([2, 1, 3, 4])
x = SparseTensor(i, v, torch.Size([3, 4, 5]))
res = torch.Tensor([
v = self.ValueTensor([2, 1, 3, 4])
x = self.SparseTensor(i, v, torch.Size([3, 4, 5]))
res = self.ValueTensor([
[[2, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
@ -73,58 +147,164 @@ class TestSparse(TestCase):
x.to_dense()
self.assertEqual(res, x.to_dense())
def test_shared(self):
i = self.IndexTensor([[2]])
v = self.ValueTensor([5])
x = self.SparseTensor(i, v, torch.Size([3]))
v[0] = 6
self.assertEqual(self.ValueTensor([0, 0, 6]), x.to_dense())
i[0][0] = 0
self.assertEqual(self.ValueTensor([6, 0, 0]), x.to_dense())
def test_to_dense_hybrid(self):
i = self.IndexTensor([
[0, 1, 2, 2],
[0, 0, 0, 3],
])
v = self.ValueTensor([[2, 3], [1, 2], [3, 4], [4, 5]])
x = self.SparseTensor(i, v, torch.Size([3, 4, 2]))
res = self.ValueTensor([
[[2, 3],
[0, 0],
[0, 0],
[0, 0]],
[[1, 2],
[0, 0],
[0, 0],
[0, 0]],
[[3, 4],
[0, 0],
[0, 0],
[4, 5]],
])
x.to_dense() # Tests double to_dense for memory corruption
x.to_dense()
x.to_dense()
self.assertEqual(res, x.to_dense())
def test_contig(self):
i = torch.LongTensor([
i = self.IndexTensor([
[1, 0, 35, 14, 39, 6, 71, 66, 40, 27],
[92, 31, 62, 50, 22, 65, 89, 74, 56, 34],
])
v = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
x = SparseTensor(i, v, torch.Size([100, 100]))
exp_i = torch.LongTensor([
v = self.ValueTensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
x = self.SparseTensor(i, v, torch.Size([100, 100]))
exp_i = self.IndexTensor([
[0, 1, 6, 14, 27, 35, 39, 40, 66, 71],
[31, 92, 65, 50, 34, 62, 22, 56, 74, 89],
])
exp_v = torch.Tensor([2, 1, 6, 4, 10, 3, 5, 9, 8, 7])
x.contiguous()
self.assertEqual(exp_i, x.indices())
self.assertEqual(exp_v, x.values())
exp_v = self.ValueTensor([2, 1, 6, 4, 10, 3, 5, 9, 8, 7])
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
i = torch.LongTensor([
i = self.IndexTensor([
[2, 0, 2, 1],
[0, 0, 3, 0],
[1, 0, 4, 0],
])
v = torch.Tensor([3, 2, 4, 1])
x = SparseTensor(i, v, torch.Size([3, 4, 5]))
exp_i = torch.LongTensor([
v = self.ValueTensor([3, 2, 4, 1])
x = self.SparseTensor(i, v, torch.Size([3, 4, 5]))
exp_i = self.IndexTensor([
[0, 1, 2, 2],
[0, 0, 0, 3],
[0, 0, 1, 4],
])
exp_v = torch.Tensor([2, 1, 3, 4])
exp_v = self.ValueTensor([2, 1, 3, 4])
x.contiguous()
self.assertEqual(exp_i, x.indices())
self.assertEqual(exp_v, x.values())
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
# Duplicate indices
i = torch.LongTensor([
i = self.IndexTensor([
[0, 0, 2, 0],
[0, 0, 3, 0],
[0, 0, 4, 0],
])
v = torch.Tensor([3, 2, 4, 1])
x = SparseTensor(i, v, torch.Size([3, 4, 5]))
exp_i = torch.LongTensor([
v = self.ValueTensor([3, 2, 4, 1])
x = self.SparseTensor(i, v, torch.Size([3, 4, 5]))
exp_i = self.IndexTensor([
[0, 2],
[0, 3],
[0, 4],
])
exp_v = torch.Tensor([6, 4])
exp_v = self.ValueTensor([6, 4])
x.contiguous()
self.assertEqual(exp_i, x.indices())
self.assertEqual(exp_v, x.values())
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
def test_contig_hybrid(self):
i = self.IndexTensor([
[1, 0, 35, 14, 39, 6, 71, 66, 40, 27],
[92, 31, 62, 50, 22, 65, 89, 74, 56, 34],
])
v = self.ValueTensor([
[1, 2], [2, 3], [3, 4], [4, 5], [5, 6],
[6, 7], [7, 8], [8, 9], [9, 10], [10, 11],
])
x = self.SparseTensor(i, v, torch.Size([100, 100, 2]))
exp_i = self.IndexTensor([
[0, 1, 6, 14, 27, 35, 39, 40, 66, 71],
[31, 92, 65, 50, 34, 62, 22, 56, 74, 89],
])
exp_v = self.ValueTensor([
[2, 3], [1, 2], [6, 7], [4, 5], [10, 11],
[3, 4], [5, 6], [9, 10], [8, 9], [7, 8],
])
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
i = self.IndexTensor([
[2, 0, 2, 1],
[0, 0, 3, 0],
[1, 0, 4, 0],
])
v = self.ValueTensor([[3, 3, 3], [2, 2, 2], [4, 4, 4], [1, 1, 1]])
x = self.SparseTensor(i, v, torch.Size([3, 4, 5, 3]))
exp_i = self.IndexTensor([
[0, 1, 2, 2],
[0, 0, 0, 3],
[0, 0, 1, 4],
])
exp_v = self.ValueTensor([[2, 2, 2], [1, 1, 1], [3, 3, 3], [4, 4, 4]])
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
# Duplicate indices
i = self.IndexTensor([
[0, 0, 2, 0],
[0, 0, 3, 0],
[0, 0, 4, 0],
])
v = self.ValueTensor([[3, 2, 3], [2, 1, 1], [4, 3, 4], [1, 1, 1]])
x = self.SparseTensor(i, v, torch.Size([3, 4, 5, 3]))
exp_i = self.IndexTensor([
[0, 2],
[0, 3],
[0, 4],
])
exp_v = self.ValueTensor([[6, 4, 5], [4, 3, 4]])
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
def test_clone(self):
x, _, _ = self._gen_sparse(4, 20, 5)
if self.is_uncoalesced:
self.assertFalse(x.is_coalesced())
y = x.clone()
self.assertFalse(y.is_coalesced())
x = x.coalesce()
self.assertTrue(x.is_coalesced())
y = x.clone()
self.assertTrue(y.is_coalesced())
def test_transpose(self):
x = self._gen_sparse(4, 20, 5)[0]
@ -139,6 +319,7 @@ class TestSparse(TestCase):
y = y.transpose(i, j)
self.assertEqual(x.to_dense(), y)
@cpu_only
def test_mm(self):
def test_shape(di, dj, dk):
x, _, _ = self._gen_sparse(2, 20, [di, dj])
@ -147,22 +328,23 @@ class TestSparse(TestCase):
alpha = random.random()
beta = random.random()
expected = torch.addmm(alpha, t, beta, x.to_dense(), y)
res = torch.addmm(alpha, t, beta, x, y)
expected = torch.addmm(alpha, t, beta, x.to_dense(), y)
self.assertEqual(res, expected)
expected = torch.addmm(t, x.to_dense(), y)
res = torch.addmm(t, x, y)
expected = torch.addmm(t, x.to_dense(), y)
self.assertEqual(res, expected)
expected = torch.mm(x.to_dense(), y)
res = torch.mm(x, y)
expected = torch.mm(x.to_dense(), y)
self.assertEqual(res, expected)
test_shape(10, 100, 100)
test_shape(100, 1000, 200)
test_shape(64, 10000, 300)
@cpu_only
def test_saddmm(self):
def test_shape(di, dj, dk):
x = self._gen_sparse(2, 20, [di, dj])[0]
@ -171,50 +353,273 @@ class TestSparse(TestCase):
alpha = random.random()
beta = random.random()
expected = torch.addmm(alpha, t.to_dense(), beta, x.to_dense(), y)
res = torch.saddmm(alpha, t, beta, x, y)
expected = torch.addmm(alpha, t.to_dense(), beta, x.to_dense(), y)
self.assertEqual(res.to_dense(), expected)
expected = torch.addmm(t.to_dense(), x.to_dense(), y)
res = torch.saddmm(t, x, y)
expected = torch.addmm(t.to_dense(), x.to_dense(), y)
self.assertEqual(res.to_dense(), expected)
expected = torch.mm(x.to_dense(), y)
res = torch.smm(x, y)
expected = torch.mm(x.to_dense(), y)
self.assertEqual(res.to_dense(), expected)
test_shape(7, 5, 3)
test_shape(1000, 100, 100)
test_shape(3000, 64, 300)
def test_dsmm(self):
def test_shape(di, dj, dk):
x = self._gen_sparse(2, 20, [di, dj])[0]
y = self.randn(dj, dk)
res = torch.dsmm(x, y)
expected = torch.mm(x.to_dense(), y)
self.assertEqual(res, expected)
test_shape(7, 5, 3)
test_shape(1000, 100, 100)
test_shape(3000, 64, 300)
def test_hsmm(self):
def test_shape(di, dj, dk):
x = self._gen_sparse(2, 20, [di, dj])[0]
y = self.randn(dj, dk)
res = torch.hsmm(x, y)
expected = torch.mm(x.to_dense(), y)
self.assertEqual(res.to_dense(), expected)
test_shape(7, 5, 3)
test_shape(1000, 100, 100)
test_shape(3000, 64, 300)
def _test_spadd_shape(self, shape_i, shape_v=None):
shape = shape_i + (shape_v or [])
x, _, _ = self._gen_sparse(len(shape_i), 10, shape)
y = self.randn(*shape)
r = random.random()
res = torch.add(y, r, x)
expected = y + r * x.to_dense()
self.assertEqual(res, expected)
# Non contiguous dense tensor
s = list(shape)
s[0] = shape[-1]
s[-1] = shape[0]
y = self.randn(*s)
y.transpose_(0, len(s) - 1)
r = random.random()
res = torch.add(y, r, x)
expected = y + r * x.to_dense()
self.assertEqual(res, expected)
def test_spadd(self):
def test_shape(*shape):
x, _, _ = self._gen_sparse(len(shape), 10, shape)
y = torch.randn(*shape)
r = random.random()
self._test_spadd_shape([5, 6])
self._test_spadd_shape([10, 10, 10])
self._test_spadd_shape([50, 30, 20])
self._test_spadd_shape([5, 5, 5, 5, 5, 5])
expected = y + r * x.to_dense()
res = torch.add(y, r, x)
def test_spadd_hybrid(self):
self._test_spadd_shape([5, 6], [2, 3])
self._test_spadd_shape([10, 10, 10], [3])
self._test_spadd_shape([50, 30, 20], [2])
self._test_spadd_shape([5, 5, 5, 5, 5, 5], [2])
self.assertEqual(res, expected)
def _test_basic_ops_shape(self, shape_i, shape_v=None):
shape = shape_i + (shape_v or [])
x1, _, _ = self._gen_sparse(len(shape_i), 9, shape)
x2, _, _ = self._gen_sparse(len(shape_i), 12, shape)
# Non contiguous dense tensor
s = list(shape)
s[0] = shape[-1]
s[-1] = shape[0]
y = torch.randn(*s).transpose_(0, len(s) - 1)
r = random.random()
y1 = x1 + x2
y2 = x1.clone()
y2.add_(x2)
expected = x1.to_dense() + x2.to_dense()
self.assertEqual(y1.to_dense(), expected)
self.assertEqual(y2.to_dense(), expected)
expected = y + r * x.to_dense()
res = torch.add(y, r, x)
y1 = x1 - x2
y2 = x1.clone()
y2.sub_(x2)
expected = x1.to_dense() - x2.to_dense()
self.assertEqual(y1.to_dense(), expected)
self.assertEqual(y2.to_dense(), expected)
self.assertEqual(res, expected)
y1 = x1 * x2
y2 = x1.clone()
y2.mul_(x2)
expected = x1.to_dense() * x2.to_dense()
self.assertEqual(y1.to_dense(), expected)
self.assertEqual(y2.to_dense(), expected)
test_shape(5, 6)
test_shape(10, 10, 10)
test_shape(50, 30, 20)
test_shape(5, 5, 5, 5, 5, 5)
y1 = x1 * 37.5
y2 = x1.clone()
y2.mul_(37.5)
expected = x1.to_dense() * 37.5
self.assertEqual(y1.to_dense(), expected)
self.assertEqual(y2.to_dense(), expected)
y1 = x1 / 37.5
y2 = x1.clone()
y2.div_(37.5)
expected = x1.to_dense() / 37.5
self.assertEqual(y1.to_dense(), expected)
self.assertEqual(y2.to_dense(), expected)
# TODO: add back inplace support
y1 = x1 ** 2
y2 = x1.clone()
y2 = y2.pow(2)
expected = x1.to_dense() ** 2
self.assertEqual(y1.to_dense(), expected)
self.assertEqual(y2.to_dense(), expected)
y = x1.clone()
y.zero_()
expected = torch.zeros(x1.size())
self.assertEqual(y.to_dense(), expected)
self.assertFalse(x1.is_coalesced())
y = x1.coalesce()
z = x1.coalesce()
self.assertFalse(x1.is_coalesced())
self.assertTrue(y.is_coalesced())
self.assertEqual(x1, y)
# check that coalesce is out of place
y._values().add_(1)
self.assertEqual(z._values() + 1, y._values())
def test_basic_ops(self):
self._test_basic_ops_shape([5, 6])
self._test_basic_ops_shape([10, 10, 10])
self._test_basic_ops_shape([50, 30, 20])
self._test_basic_ops_shape([5, 5, 5, 5, 5, 5])
def test_basic_ops_hybrid(self):
self._test_basic_ops_shape([5, 6], [2, 3])
self._test_basic_ops_shape([10, 10, 10], [3])
self._test_basic_ops_shape([50, 30, 20], [2])
self._test_basic_ops_shape([5, 5, 5, 5, 5, 5], [2])
def _test_sparse_mask_shape(self, shape_i, shape_v=None):
shape = shape_i + (shape_v or [])
x1, _, _ = self._gen_sparse(len(shape_i), 9, shape)
x2, _, _ = self._gen_sparse(len(shape_i), 12, shape)
y1 = x1 + x2
y2 = x1.clone()
y2.add_(x2)
expected = x1.to_dense() + x2.to_dense()
self.assertEqual(y1.to_dense(), expected)
self.assertEqual(y2.to_dense(), expected)
def _test_sparse_mask_fixed(self):
i = self.IndexTensor([
[1, 3, 0, 4],
[2, 1, 2, 3],
])
v = self.ValueTensor([1, 2, 3, 4])
x = self.SparseTensor(i, v, torch.Size([5, 4])).coalesce()
dense = self.ValueTensor([
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
])
exp_v = self.ValueTensor([7, 14, 3, 20])
res = dense._sparse_mask(x)
expected = self.SparseTensor(i, exp_v, torch.Size([5, 4]))
self.assertEqual(res, expected)
def test_sparse_mask(self):
self._test_sparse_mask_fixed()
self._test_sparse_mask_shape([5, 6])
self._test_sparse_mask_shape([10, 10, 10])
self._test_sparse_mask_shape([50, 30, 20])
self._test_sparse_mask_shape([5, 5, 5, 5, 5, 5])
def _test_sparse_mask_hybrid_fixed(self):
i = self.IndexTensor([
[1, 3, 0, 4],
[2, 1, 2, 3],
])
v = self.ValueTensor([[1, 2], [2, 3], [3, 4], [4, 5]])
# TODO: This is also testing that, if coalesce is a no-op,
# the indices don't get permuted. I don't know if we actually
# want to give this invariant.
x = self.SparseTensor(i, v, torch.Size([5, 4, 2])).coalesce()
dense = self.ValueTensor([
[[1, 3], [2, 2], [3, 3], [4, 2]],
[[5, 7], [6, 7], [7, 9], [8, 9]],
[[9, 2], [10, 4], [11, 1], [12, 3]],
[[13, 5], [14, 1], [15, 1], [16, 6]],
[[17, 7], [18, 2], [19, 7], [20, 1]],
])
res = dense._sparse_mask(x)
exp_v = self.ValueTensor([[7, 9], [14, 1], [3, 3], [20, 1]])
expected = self.SparseTensor(i, exp_v, torch.Size([5, 4, 2]))
self.assertEqual(res, expected)
def test_sparse_mask_hybrid(self):
self._test_sparse_mask_hybrid_fixed()
self._test_sparse_mask_shape([5, 6], [2, 3])
self._test_sparse_mask_shape([10, 10, 10], [3])
self._test_sparse_mask_shape([50, 30, 20], [2])
self._test_sparse_mask_shape([5, 5, 5, 5, 5, 5], [2])
@cuda_only
def test_storage_not_null(self):
x = torch.cuda.sparse.FloatTensor(2)
self.assertNotEqual(x.get_device(), -1)
@cuda_only
@unittest.skipIf(torch.cuda.device_count() < 2, "only one GPU detected")
def test_same_gpu(self):
i = self.IndexTensor([[2]]).cuda(1)
v = self.ValueTensor([5]).cuda(1)
x = self.SparseTensor(i, v, torch.Size([3]), device=1)
self.assertEqual(x.get_device(), 1)
self.assertEqual(x._values().get_device(), 1)
self.assertEqual(x._indices().get_device(), 1)
x = self.SparseTensor(3, device=1)
self.assertEqual(x.get_device(), 1)
self.assertEqual(x._values().get_device(), 1)
self.assertEqual(x._indices().get_device(), 1)
v = self.ValueTensor([5]).cuda(0)
self.assertRaises(RuntimeError, lambda: self.SparseTensor(i, v, torch.Size([3])))
class TestUncoalescedSparse(TestSparse):
def setUp(self):
super(TestUncoalescedSparse, self).setUp()
self.is_uncoalesced = True
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
class TestCudaSparse(TestSparse):
def setUp(self):
super(TestCudaSparse, self).setUp()
self.is_cuda = True
self.IndexTensor = torch.cuda.LongTensor
self.ValueTensor = torch.cuda.DoubleTensor
self.SparseTensor = torch.cuda.sparse.DoubleTensor
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
class TestCudaUncoalescedSparse(TestCudaSparse):
def setUp(self):
super(TestCudaUncoalescedSparse, self).setUp()
self.is_uncoalesced = True
if __name__ == '__main__':
run_tests()

File diff suppressed because it is too large Load Diff

View File

@ -6,9 +6,9 @@ import shutil
import random
import tempfile
import unittest
import sys
import traceback
import torch
import torch.utils.data
import torch.cuda
import warnings
from torch.autograd import Variable
@ -19,7 +19,7 @@ from torch.utils.serialization import load_lua
HAS_CUDA = torch.cuda.is_available()
from common import TestCase, run_tests
from common import TestCase, run_tests, download_file
try:
import cffi
@ -108,6 +108,44 @@ class DatasetMock(object):
return 10
class TestDataLoader(TestCase):
def setUp(self):
self.dataset = torch.randn(5, 3, 3, 2)
self.batch_size = 3
def test_single_keep(self):
dataloader = torch.utils.data.DataLoader(self.dataset,
batch_size=self.batch_size,
num_workers=0,
drop_last=False)
dataiter = iter(dataloader)
self.assertEqual(len(list(dataiter)), 2)
def test_single_drop(self):
dataloader = torch.utils.data.DataLoader(self.dataset,
batch_size=self.batch_size,
num_workers=0,
drop_last=True)
dataiter = iter(dataloader)
self.assertEqual(len(list(dataiter)), 1)
def test_multi_keep(self):
dataloader = torch.utils.data.DataLoader(self.dataset,
batch_size=self.batch_size,
num_workers=2,
drop_last=False)
dataiter = iter(dataloader)
self.assertEqual(len(list(dataiter)), 2)
def test_multi_drop(self):
dataloader = torch.utils.data.DataLoader(self.dataset,
batch_size=self.batch_size,
num_workers=2,
drop_last=True)
dataiter = iter(dataloader)
self.assertEqual(len(list(dataiter)), 1)
class TestTrainer(TestCase):
intervals = [
@ -296,40 +334,13 @@ class TestLuaReader(TestCase):
self.assertEqual(grad_input, test['grad_input'])
return do_test
@classmethod
def _download_data(cls, test_file_path):
if os.path.exists(test_file_path):
return
print('Downloading test file for TestLuaReader.')
DATA_URL = 'https://s3.amazonaws.com/pytorch/legacy_modules.t7'
urllib = cls._get_urllib('request')
data = urllib.urlopen(DATA_URL, timeout=15).read()
with open(test_file_path, 'wb') as f:
f.write(data)
@staticmethod
def _get_urllib(submodule):
if sys.version_info < (3,):
import urllib2
return urllib2
else:
import urllib.error
import urllib.request
return getattr(urllib, submodule)
@classmethod
def init(cls):
data_dir = os.path.join(os.path.dirname(__file__), 'data')
test_file_path = os.path.join(data_dir, 'legacy_modules.t7')
urllib = cls._get_urllib('error')
try:
cls._download_data(test_file_path)
except urllib.URLError as e:
warnings.warn(("Couldn't download the test file for TestLuaReader! "
"Tests will be incomplete!"), RuntimeWarning)
path = download_file('https://download.pytorch.org/test_data/legacy_modules.t7')
except unittest.SkipTest:
return
tests = load_lua(test_file_path)
tests = load_lua(path)
for name, test in tests['modules'].items():
test_name = 'test_' + name.replace('nn.', '')
setattr(cls, test_name, cls._module_test(name, test))

View File

@ -4,9 +4,12 @@ from string import Template
from copy import deepcopy
from .plugins import ArgcountChecker, OptionalArguments, ArgumentReferences, \
BeforeAfterCall, ConstantArguments, ReturnArguments, GILRelease
from ..shared import cwrap_common
class cwrap(object):
BASE_INDENT_SIZE = 6
RETURN_WRAPPERS = {
'void': Template('Py_RETURN_NONE;'),
'long': Template('return PyLong_FromLong($result);'),
@ -16,24 +19,28 @@ class cwrap(object):
OPTION_TEMPLATE = Template("""
${els}if ($arg_check) {
$pre_arg_assign
$arg_assign
$code
""")
ARG_ASSIGN_TEMPLATE = Template("""${type} ${name} = ${unpack};""")
OPTION_CODE_TEMPLATE = [
'$call',
'$return_result',
]
FUNCTION_CALL_TEMPLATE = Template("$capture_result$cname($arg_unpack);")
FUNCTION_CALL_TEMPLATE = Template("$capture_result$cname($call_arg);")
DEFAULT_PLUGIN_CLASSES = [ArgcountChecker, ConstantArguments, OptionalArguments,
ArgumentReferences, BeforeAfterCall, ReturnArguments, GILRelease]
def __init__(self, source, destination=None, plugins=[], default_plugins=True):
def __init__(self, source, destination=None, plugins=None, default_plugins=True):
if destination is None:
destination = source.replace('.cwrap', '.cpp')
self.plugins = plugins
self.plugins = [] if plugins is None else plugins
if default_plugins:
defaults = [cls() for cls in self.DEFAULT_PLUGIN_CLASSES]
self.plugins = defaults + self.plugins
@ -45,7 +52,10 @@ class cwrap(object):
with open(source, 'r') as f:
declarations = f.read()
# wrap all the declarations in the source .cwrap file
wrapper = self.wrap_declarations(declarations)
# let each plugin do any post-processing of the wrapped file
for plugin in self.plugins:
wrapper = plugin.process_full_file(wrapper)
@ -67,7 +77,7 @@ class cwrap(object):
elif line == ']]':
in_declaration = False
declaration = yaml.load('\n'.join(declaration_lines))
self.set_declaration_defaults(declaration)
cwrap_common.set_declaration_defaults(declaration)
# Pass declaration in a list - maybe some plugins want to add
# multiple wrappers
@ -95,24 +105,6 @@ class cwrap(object):
return '\n'.join(output)
def set_declaration_defaults(self, declaration):
declaration.setdefault('arguments', [])
declaration.setdefault('return', 'void')
if 'cname' not in declaration:
declaration['cname'] = declaration['name']
# Simulate multiple dispatch, even if it's not necessary
if 'options' not in declaration:
declaration['options'] = [{'arguments': declaration['arguments']}]
del declaration['arguments']
# Parse arguments (some of them can be strings)
for option in declaration['options']:
option['arguments'] = self.parse_arguments(option['arguments'])
# Propagate defaults from declaration to options
for option in declaration['options']:
for k, v in declaration.items():
if k != 'name' and k != 'options':
option.setdefault(k, v)
def parse_arguments(self, args):
new_args = []
for arg in args:
@ -130,6 +122,10 @@ class cwrap(object):
return new_args
def search_plugins(self, fnname, args, fallback):
"""Search plugins for the given function to call with args.
If not found, call fallback with args.
"""
for plugin in self.plugins:
wrapper = getattr(plugin, fnname)(*args)
if wrapper is not None:
@ -148,6 +144,9 @@ class cwrap(object):
def get_wrapper_template(self, declaration):
return self.search_plugins('get_wrapper_template', (declaration,), lambda _: None)
def get_assign_args(self, arguments):
return self.search_plugins('get_assign_args', (arguments,), lambda _: arguments)
def get_arg_accessor(self, arg, option):
def wrap_accessor(arg, _):
if arg.get('idx') is None:
@ -178,9 +177,44 @@ class cwrap(object):
res = tmpl.substitute(arg=accessor, idx=arg.get('idx'))
for plugin in self.plugins:
res = getattr(plugin, plugin_fn_name)(res, arg, accessor)
result.append(res)
return result
def build_option_args(self, arguments, arg_unpack):
assignement = []
call_arg = []
# If types or names needs to be changed
arguments = self.get_assign_args(arguments)
for arg, unpack in zip(arguments, arg_unpack):
if arg['type'] == 'CONSTANT':
call_arg.append(unpack)
else:
var_name = "arg_" + str(arg.get('assign_name', arg['name']))
res = self.ARG_ASSIGN_TEMPLATE.substitute(
type=arg['type'],
name=var_name,
unpack=unpack)
if var_name not in call_arg:
assignement.append(res)
call_arg.append(var_name)
return assignement, call_arg
def indent_code(self, code):
if code == '':
return code
code_lines = map(lambda s: s.strip(), code.split('\n'))
code = '\n'
depth = self.BASE_INDENT_SIZE
for line in code_lines:
depth -= line.count('}') * 2
code += ' ' * depth + line + '\n'
depth += line.count('{') * 2
depth += line.count('(') * 4
depth -= line.count(')') * 4
return code[:-1]
def generate_option(self, option, is_first):
checked_args = list(filter(
lambda arg: 'ignore_check' not in arg or not arg['ignore_check'],
@ -199,22 +233,29 @@ class cwrap(object):
for plugin in self.plugins:
arg_checks = plugin.process_all_checks(arg_checks, option)
# Generate unpacks
# Generate pre_arg assign
pre_arg_assign = []
for plugin in self.plugins:
pre_arg_assign = plugin.process_pre_arg_assign(pre_arg_assign, option)
# Generate arg assignment and call arguments
arg_unpack = self.map_selected_arguments('get_type_unpack',
'process_single_unpack', option, option['arguments'])
arg_unpack = ', '.join(arg_unpack)
arg_assign, call_arg = self.build_option_args(option['arguments'], arg_unpack)
call_arg = ', '.join(call_arg)
for plugin in self.plugins:
arg_unpack = plugin.process_all_unpacks(arg_unpack, option)
call_arg = plugin.process_all_call_arg(call_arg, option)
# Generate call
try:
return_result = self.get_return_wrapper(option).substitute()
call = self.FUNCTION_CALL_TEMPLATE.substitute(capture_result='',
cname=option['cname'], arg_unpack=arg_unpack)
cname=option['cname'], call_arg=call_arg)
except KeyError:
return_result = self.get_return_wrapper(option).substitute(result='__result')
call = self.FUNCTION_CALL_TEMPLATE.substitute(capture_result=(option['return'] + ' __result = '),
cname=option['cname'], arg_unpack=arg_unpack)
cname=option['cname'], call_arg=call_arg)
code_template = deepcopy(self.OPTION_CODE_TEMPLATE)
for plugin in self.plugins:
@ -222,19 +263,15 @@ class cwrap(object):
option)
code_template = Template('\n'.join(code_template))
code = code_template.substitute(call=call, return_result=return_result)
code_lines = map(lambda s: s.strip(), code.split('\n'))
code = '\n'
depth = 6
for line in code_lines:
depth -= line.count('}') * 2
code += ' ' * depth + line + '\n'
depth += line.count('{') * 2
depth += line.count('(') * 4
depth -= line.count(')') * 4
code = self.indent_code(code)
pre_arg_assign = self.indent_code('\n'.join(pre_arg_assign))
arg_assign = self.indent_code('\n'.join(arg_assign))
# Put everything together
return self.OPTION_TEMPLATE.substitute(
els=('} else ' if not is_first else ''),
arg_check=arg_checks,
pre_arg_assign=pre_arg_assign,
arg_assign=arg_assign,
code=code,
)

View File

@ -1,4 +1,6 @@
import os
from . import CWrapPlugin
from ...shared import cwrap_common
class ArgcountSortPlugin(CWrapPlugin):
@ -7,8 +9,7 @@ class ArgcountSortPlugin(CWrapPlugin):
self.descending = descending
def process_declarations(self, declarations):
def num_checked_args(option):
return sum(map(lambda a: not a.get('ignore_check', False), option['arguments']))
for declaration in declarations:
declaration['options'].sort(key=num_checked_args, reverse=self.descending)
cwrap_common.sort_by_number_of_options(declaration,
self.descending)
return declarations

View File

@ -0,0 +1,29 @@
from . import CWrapPlugin
from string import Template
class AssertNDim(CWrapPlugin):
PRE_CODE_TEMPLATE = Template(
"""if(THTensor_(nDimension)(LIBRARY_STATE ${arg_op}) != ${dim_value}) {
THError("Expected argument %s to have %d dimension(s), but has %d",
"${op}", ${dim_value}, THTensor_(nDimension)(LIBRARY_STATE ${arg_op}));
}
""")
def process_option_code_template(self, template, option):
new_code_pre = []
for _, arg in enumerate(option['arguments']):
if 'assert_ndim' not in arg:
continue
dim_value = arg.get('assert_ndim')
op = arg.get('assign_name', arg['name'])
arg_op = "arg_" + op
new_code_pre.append(self.PRE_CODE_TEMPLATE.substitute(op=op,
arg_op=arg_op,
dim_value=dim_value))
template = new_code_pre + template
return template

View File

@ -15,7 +15,9 @@ class AutoGPU(CWrapPlugin):
#endif
"""
def process_option_code_template(self, template, option):
def process_pre_arg_assign(self, template, option):
if not option.get('auto_gpu', True):
return template
call = 'THCPAutoGPU __autogpu_guard = THCPAutoGPU(args{});'.format(
', (PyObject*)self' if self.has_self else '')

View File

@ -18,6 +18,11 @@ class BeforeAfterCall(CWrapPlugin):
prepend_str = before_call_template.substitute(args)
template.insert(offset, prepend_str)
def process_pre_arg_assign(self, template, option):
if option.get('before_arg_assign'):
self.insert_snippet(template, option, 0, 'before_arg_assign')
return template
def process_option_code_template(self, template, option):
if option.get('before_call') or option.get('after_call'):
call_idx = template.index('$call')

View File

@ -1,6 +1,12 @@
from . import CWrapPlugin
from string import Template
import sys
if sys.version_info[0] == 3:
string_type = str
else:
string_type = basestring
class BoolOption(CWrapPlugin):
@ -9,11 +15,21 @@ class BoolOption(CWrapPlugin):
def is_bool_option(self, arg):
return arg['type'] == 'bool' and 'if_true' in arg and 'if_false' in arg
def process_declarations(self, declarations):
for declaration in declarations:
for option in declaration['options']:
for arg in option['arguments']:
if self.is_bool_option(arg):
arg['is_bool_option'] = True
if isinstance(arg['if_true'], string_type):
arg['type'] = 'const char*'
return declarations
def get_type_check(self, arg, option):
if self.is_bool_option(arg):
if arg.get('is_bool_option', False):
return Template('PyBool_Check($arg)')
def get_type_unpack(self, arg, option):
if self.is_bool_option(arg):
if arg.get('is_bool_option', False):
return Template(self.UNPACK_TEMPLATE.safe_substitute(
if_true=arg['if_true'], if_false=arg['if_false']))

View File

@ -0,0 +1,318 @@
from . import CWrapPlugin
from string import Template
# Arguments to the Broadcast Plugin:
# broadcast: args_to_broadcast_against [inplace] [fallback]
# [args_to_broadcast_against]: either a single argument (e.g. "arg1") or a comma-seperated
# list of two arguments (e.g. "tensor1,tensor2") indicating
# arguments to broadcast specified argument (usually "self") against
# [inplace] will generate code for in-place function, which doesn't allow the in-place
# argument to be broadcast
# [fallback] if tensors aren't broadcastable, preserves "element number" pointwise behavior,
# where only number of elements need to match, and tensors are viewed as 1-dimensional.
# [dims] specify if the tensors shouldn't be broadcast to a specific tensor or tensors, but a combination
# of individual dimension sizes of a set of tensors. For example: addbmm(C,A,B) a.k.a. [C + A @ B]
# broadcasts C to the first dimension of A and the second dimension of B. Each dimension is specified as
# [arg].dim[#] and dimensions are comma-separated. So, to specify that the tensor should be
# broadcast to 3-dimensions with sizes:
# tensor0->size[0] x tensor1->size[1] x tensor2->size[2]
# you would write:
# dims:tensor0.dim0,tensor1.dim1,tensor2.dim2
# [types] if the tensors should be of different types than THTensor, specify as X where
# the actual type to use is THXTensor (i.e. Byte for THByteTensor). If the type
# should be THTensor, use 'Real'
# For out of place:
# Two args: expand the two args together
# Three args (fused kernels): (e.g. addcmul) expand all three args together
# Sketch of proof that this is the same:
# consider addcmul, under expansion we want: a + (b * c) = (a + b * c) [all expanded together]
# Let e(i, j) be the expansion of i with j, e(i, j, k) be the expansion of i with j,k
#
# Then a + (b * c) = e(a, e(b,c) * e(c,b)) + e(e(b,c) * e(c,b), a)
# = e(a, e(b,c)) + e(e(b,c) * e(c,b), a) (only size matters for second param)
# = e(a,b,c) + e(e(b,c) * e(c,b), a) (by associativity of max in expand)
# = e(a,b,c) + e(b,c,a) * e(c,b,a) (see L1)
# which is a + b * c all expanded together
#
# L1: Show e(i * j, a) = e(i,a) * e(j,a) where i,j have same size
# Consider any index _{ s_0, ..., s_n}
# e(i * j, a) = (i*j)_{f(s_0), ...,f(s_n)} where f is the expansion of that dimension with a
# = i_{f(s_0), ..., f(s_n)} * j_{f(s_0), ..., f(s_n)} by definition of pointwise operator
# = e(i,a) * e(j,a)
class Broadcast(CWrapPlugin):
# Save and restore passed in arguments in case later plugins use
POST_TEMPLATE = Template(
"""${arg_op_other} = ${arg_op_other}_save;\n""")
def getPreArgStringTemplate(self, type=None):
if type is None:
ret = """THTensor *${arg_op_other}_save = ${arg_op_other};
THTensorPtr ${arg_op_other}_guard(THTensor_(new)(LIBRARY_STATE_NOARGS));\n"""
else:
cpu_t = "TH" + type + "Tensor"
gpu_t = "THCuda" + type + "Tensor"
ret = ("#if !IS_CUDA\n" +
cpu_t + " *${arg_op_other}_save = ${arg_op_other};\n" +
cpu_t + "Ptr ${arg_op_other}_guard(" + cpu_t + "_new(LIBRARY_STATE_NOARGS));\n" +
"#else\n" +
gpu_t + " *${arg_op_other}_save = ${arg_op_other};\n" +
"THPPointer<" + gpu_t + "> ${arg_op_other}_guard(\n" + gpu_t + "_new(LIBRARY_STATE_NOARGS));\n" +
"#endif\n")
return Template(ret)
def getExpandTemplate(self, expand_call, success_code, raise_errors):
if not raise_errors:
return Template(
"bool expand_success = false;\n" +
"try {\n" +
expand_call +
"\nexpand_success = true;\n" +
"}\n"
"catch (std::exception &e) {}\n" +
"if(expand_success) {\n" +
success_code +
"\n}\n")
else:
return Template(
expand_call + "\n" +
success_code + "\n")
def getOutPlacePreExpand2Template(self, raise_errors):
expand_code = """expand_outplace2(LIBRARY_STATE ${arg_op_a}_guard.get(), ${arg_op_other}_guard.get(),
${arg_op_a}, ${arg_op_other},
\"${op_a}\", \"${op_other}\", !${raise_errors});"""
success_code = """${arg_op_a} = ${arg_op_a}_guard.get();
${arg_op_other} = ${arg_op_other}_guard.get();"""
return self.getExpandTemplate(expand_code, success_code, raise_errors)
def getOutPlacePreExpand3Template(self, raise_errors):
expand_code = """expand_outplace3(LIBRARY_STATE ${arg_op_a}_guard.get(),
${arg_op_other1}_guard.get(), ${arg_op_other2}_guard.get(),
${arg_op_a}, ${arg_op_other1}, ${arg_op_other2},
\"${op_a}\", \"${op_other1}\", \"${op_other2}\", !${raise_errors});"""
success_code = """${arg_op_a} = ${arg_op_a}_guard.get();
${arg_op_other1} = ${arg_op_other1}_guard.get();
${arg_op_other2} = ${arg_op_other2}_guard.get();"""
return self.getExpandTemplate(expand_code, success_code, raise_errors)
OUT_PLACE_PRE_EXPAND_PRE_DIM_TEMPLATE = Template(
"""if(THTensor_(nDimension)(LIBRARY_STATE ${arg_op_dim}) <= ${arg_op_dim_value}) {
THError("Argument %s requires at least %d dimensions, but only has %d",
"${op_dim}", ${arg_op_dim_value} + 1, THTensor_(nDimension)(LIBRARY_STATE ${arg_op_dim}));
}
long ${arg_op_a}_dim${idx}_size = THTensor_(size)(LIBRARY_STATE ${arg_op_dim}, ${arg_op_dim_value});\n""")
OUT_PLACE_PRE_EXPAND1_DIM_TEMPLATE = Template(
"""THLongStoragePtr ${arg_op_a}_storage(THLongStorage_newWithSize1(${arg_op_a}_dim0_size));\n""")
OUT_PLACE_PRE_EXPAND2_DIM_TEMPLATE = Template(
"""THLongStoragePtr ${arg_op_a}_storage(
THLongStorage_newWithSize2(${arg_op_a}_dim0_size, ${arg_op_a}_dim1_size));\n""")
OUT_PLACE_PRE_EXPAND3_DIM_TEMPLATE = Template(
"""THLongStoragePtr ${arg_op_a}_storage(
THLongStorage_newWithSize3(${arg_op_a}_dim0_size, ${arg_op_a}_dim1_size, ${arg_op_a}_dim2_size));\n""")
def getOutPlacePreExpandPostDimTemplate(self, raise_errors):
expand_code = """expand(LIBRARY_STATE ${arg_op_a}_guard.get(), ${arg_op_a}, ${arg_op_a}_storage);"""
success_code = """${arg_op_a} = ${arg_op_a}_guard.get();"""
return self.getExpandTemplate(expand_code, success_code, raise_errors)
OUT_PLACE_PRE_TEMPLATE = Template(
"""${code_arg_op_a}${code_arg_op_other1}${code_arg_op_other2}
${expand_code}""")
def getInPlacePreExpand1Template(self, raise_errors):
expand_code = """expand_inplace1(LIBRARY_STATE ${arg_op_other}_guard.get(), ${arg_op_other}, ${arg_op_a},
\"${op_other}\", \"${op_a}\", !${raise_errors});"""
success_code = """${arg_op_other} = ${arg_op_other}_guard.get();"""
return self.getExpandTemplate(expand_code, success_code, raise_errors)
def getInPlacePreExpand2Template(self, raise_errors):
expand_code = """expand_inplace2(LIBRARY_STATE ${arg_op_other1}_guard.get(), ${arg_op_other2}_guard.get(),
${arg_op_other1}, ${arg_op_other2}, ${arg_op_a},
\"${op_other1}\", \"${op_other2}\", \"${op_a}\", !${raise_errors});"""
success_code = """${arg_op_other1} = ${arg_op_other1}_guard.get();
${arg_op_other2} = ${arg_op_other2}_guard.get();"""
return self.getExpandTemplate(expand_code, success_code, raise_errors)
IN_PLACE_PRE_TEMPLATE = Template(
"""${code_arg_op_other1}${code_arg_op_other2}
${expand_code}""")
def initialize(self, cwrap):
self.cwrap = cwrap
# Arguments:
# [0]: name of tensor to broadcast with (possibly two comma separated)
# [1] inplace (optional). In place operations only broadcast on second tensor argument
# [2] fallback (optional). Will fallback to applying to tensor of equal nElem if broadcast fails
def process_option_code_template(self, template, option):
new_code_pre = []
new_code_post = []
for _, arg in enumerate(option['arguments']):
if 'broadcast' not in arg:
continue
params = arg.get('broadcast').split(" ")
op_a = arg.get('assign_name', arg['name'])
in_place = "inplace" in params
raise_errors = "false" if "fallback" in params else "true"
param_others = params[0].split(",")
if len(param_others) > 2:
raise ValueError('Broadcast only supports up to 2 secondary parameters')
op_b = param_others[0]
op_c = param_others[1] if len(param_others) == 2 else None
arg_op_b = "arg_" + op_b
arg_op_a = "arg_" + op_a
arg_op_c = ("arg_" + op_c) if op_c else None
dims_kvs = []
for p in params:
if p.startswith("dims:"):
assert(raise_errors == "true")
if len(dims_kvs) != 0:
raise ValueError("multiple specifications of dims")
dims = p[len("dims:"):].split(",")
for dim in dims:
batchdim = dim.split(".")
assert len(batchdim) == 2
assert batchdim[1].startswith("dim")
dim_val = batchdim[1][len("dim"):]
dims_kvs.append({"op": batchdim[0], "arg_op": "arg_" + batchdim[0], "val": dim_val})
assert len(dims_kvs) <= 3
for p in params[1:]:
if p != "inplace" and p != "fallback" and not p.startswith("dims:") and not p.startswith("types:"):
raise ValueError("invalid parameter {}".format(p))
type_op_b = None
type_op_c = None
for p in params:
if p.startswith("types:"):
if not in_place and len(dims_kvs) > 0:
raise ValueError("type specification not supported yet for out-of-place functions "
"that specify explicit dimensions")
types = p[len("types:"):].split(",")
assert(len(types) == (2 if op_c else 1))
type_op_b = None if types[0] == "Real" else types[0]
if op_c:
type_op_c = None if types[1] == "Real" else types[1]
op_b_mapping = {
"op_a": op_a,
"op_other": op_b,
"arg_op_a": arg_op_a,
"arg_op_other": arg_op_b,
"raise_errors": raise_errors
}
op_c_mapping = {
"op_a": op_a,
"op_other": op_c,
"arg_op_a": arg_op_a,
"arg_op_other": arg_op_c,
"raise_errors": raise_errors
}
if in_place:
code_arg_op_other1 = self.getPreArgStringTemplate(type=type_op_b).substitute(op_b_mapping)
code_arg_op_other2 = (
self.getPreArgStringTemplate(type=type_op_c).substitute(op_c_mapping) if op_c else "")
if op_c:
expand_code = self.getInPlacePreExpand2Template(raise_errors == "true").substitute(
op_b_mapping,
op_other1=op_b,
op_other2=op_c,
arg_op_other1=arg_op_b,
arg_op_other2=arg_op_c)
else:
expand_code = self.getInPlacePreExpand1Template(raise_errors == "true").substitute(op_b_mapping)
new_code_pre.append(self.IN_PLACE_PRE_TEMPLATE.substitute(
arg_op_a=arg_op_a,
code_arg_op_other1=code_arg_op_other1,
code_arg_op_other2=code_arg_op_other2,
expand_code=expand_code,
raise_errors=raise_errors))
new_code_pre.append("")
post_code = self.POST_TEMPLATE.substitute(op_b_mapping)
if op_c:
post_code += self.POST_TEMPLATE.substitute(op_c_mapping)
new_code_post.append(post_code)
new_code_post.append("")
else:
if len(dims_kvs) != 0:
code_arg_op_a = self.getPreArgStringTemplate().substitute(arg_op_other=arg_op_a)
code_arg_op_other1 = ""
code_arg_op_other2 = ""
expand_code = ""
for idx, kv in enumerate(dims_kvs):
expand_code += self.OUT_PLACE_PRE_EXPAND_PRE_DIM_TEMPLATE.substitute(
arg_op_a=arg_op_a,
op_dim=kv["op"],
arg_op_dim=kv["arg_op"],
arg_op_dim_value=kv["val"],
idx=idx)
if len(dims_kvs) == 1:
expand_code += self.OUT_PLACE_PRE_EXPAND1_DIM_TEMPLATE.substitute(
arg_op_a=arg_op_a,
arg_op_dim0=dims_kvs[0]["arg_op"])
elif len(dims_kvs) == 2:
expand_code += self.OUT_PLACE_PRE_EXPAND2_DIM_TEMPLATE.substitute(
arg_op_a=arg_op_a,
arg_op_dim0=dims_kvs[0]["arg_op"],
arg_op_dim1=dims_kvs[1]["arg_op"])
else:
expand_code += self.OUT_PLACE_PRE_EXPAND3_DIM_TEMPLATE.substitute(
arg_op_a=arg_op_a,
arg_op_dim0=dims_kvs[0]["arg_op"],
arg_op_dim1=dims_kvs[1]["arg_op"],
arg_op_dim2=dims_kvs[2]["arg_op"])
expand_code += self.getOutPlacePreExpandPostDimTemplate(raise_errors == "true").substitute(
arg_op_a=arg_op_a,
raise_errors=raise_errors)
post_code = self.POST_TEMPLATE.substitute(arg_op_other=arg_op_a)
else:
code_arg_op_a = self.getPreArgStringTemplate().substitute(arg_op_other=arg_op_a)
code_arg_op_other1 = self.getPreArgStringTemplate(type=type_op_b).substitute(op_b_mapping)
code_arg_op_other2 = (self.getPreArgStringTemplate(type=type_op_c).substitute(op_c_mapping)
if op_c else "")
if op_c:
expand_code = self.getOutPlacePreExpand3Template(raise_errors == "true").substitute(
op_b_mapping,
op_other1=op_b,
op_other2=op_c,
arg_op_other1=arg_op_b,
arg_op_other2=arg_op_c)
else:
expand_code = self.getOutPlacePreExpand2Template(
raise_errors == "true").substitute(op_b_mapping)
post_code = self.POST_TEMPLATE.substitute(arg_op_other=arg_op_a)
post_code += self.POST_TEMPLATE.substitute(op_b_mapping)
post_code += self.POST_TEMPLATE.substitute(op_c_mapping) if op_c else ""
new_code_pre.append(self.OUT_PLACE_PRE_TEMPLATE.substitute(
code_arg_op_a=code_arg_op_a,
code_arg_op_other1=code_arg_op_other1,
code_arg_op_other2=code_arg_op_other2,
expand_code=expand_code))
new_code_pre.append("")
new_code_post.append(post_code)
new_code_post.append("")
template = new_code_pre + template + new_code_post
return template

View File

@ -1,4 +1,5 @@
from string import Template
import copy
from copy import deepcopy
from . import CWrapPlugin
from itertools import product
@ -17,6 +18,10 @@ class CuDNNPlugin(CWrapPlugin):
'double': Template('THPDoubleUtils_unpackReal($arg)'),
}
INPUT_ARGUMENT_MAP = {
'THTensor*': 'THVoidTensor*',
}
TYPE_CHECK = {
'Convolution*': Template('THPWrapper_check($arg)'),
'THTensor*': Template('(PyObject*)Py_TYPE($arg) == tensorClass'),
@ -79,6 +84,16 @@ static PyObject * $name(PyObject *self, PyObject *args, PyObject *kwargs)
def get_type_check(self, arg, option):
return self.TYPE_CHECK.get(arg['type'], None)
def get_assign_args(self, arguments):
assign_args = []
for arg in arguments:
arg = copy.copy(arg)
new_type = self.INPUT_ARGUMENT_MAP.get(arg['type'])
if new_type is not None:
arg['type'] = new_type
assign_args.append(arg)
return assign_args
def get_wrapper_template(self, declaration):
arg_desc = []
for option in declaration['options']:
@ -120,7 +135,7 @@ static PyObject * $name(PyObject *self, PyObject *args, PyObject *kwargs)
if arg['name'] in ['self', 'state', 'dataType', 'handle']:
arg['ignore_check'] = True
declaration['options'] = self.filter_unique_options(declaration['options'])
return declarations
return [d for d in declarations if not d.get('only_register', False)]
def filter_unique_options(self, options):
def signature(option):
@ -143,7 +158,7 @@ static PyObject * $name(PyObject *self, PyObject *args, PyObject *kwargs)
return self.preprocessor_guard(code, declaration['defined_if'])
return code
def process_all_unpacks(self, code, option):
def process_all_call_arg(self, code, option):
return 'state, ' + code
def declare_methods(self):

View File

@ -23,6 +23,8 @@ class GILRelease(CWrapPlugin):
]
def process_option_code_template(self, template, option):
if option.get('with_gil', False):
return template
call_idx = template.index('$call')
template.insert(call_idx, self.BEFORE_CALL)
template.insert(call_idx + 2, self.AFTER_CALL)

View File

@ -64,8 +64,9 @@ void $name($args)
'THTensor*': 'thpp::Tensor*',
'THCTensor*': 'thpp::Tensor*',
'THIndexTensor*': 'thpp::Tensor*',
'THCIndexTensor*': 'thpp::Tensor*',
'THIndex_t': 'long',
'real': 'double',
'accreal': 'double',
}
def __init__(self, header=False):
@ -89,8 +90,8 @@ void $name($args)
base_args = declaration['options'][0]['arguments']
for option in declaration['options']:
for idx, arg in enumerate(option['arguments']):
arg['formal_name'] = base_args[idx]['name']
arg['formal_type'] = base_args[idx]['type']
arg['assign_name'] = base_args[idx]['name']
arg['assign_type'] = base_args[idx]['type']
if idx != 1:
arg['ignore_check'] = True
return declarations
@ -98,11 +99,19 @@ void $name($args)
def get_arg_accessor(self, arg, option):
return self.get_type_unpack(arg, option)
def process_pre_arg_assign(self, pre_arg_assign, option):
if option['backend'] == 'cunn':
# Enclose arg_assign with CUDA guard
pre_arg_assign.append('#ifdef WITH_CUDA')
return pre_arg_assign
def process_option_code_template(self, template, option):
code = '// fill me in'
template = []
if option['backend'] == 'cunn':
template.append('#endif')
def base_cast(arg, CReal, real):
name = arg['formal_name']
name = 'arg_' + arg['assign_name']
type = arg['type']
if type in self.REAL_TENSOR_TYPES:
return ('(TH{CReal}Tensor*){name}->cdata()'
@ -120,7 +129,7 @@ void $name($args)
def cast(arg, CReal, real):
expr = base_cast(arg, CReal, real)
if arg.get('optional', False):
name = arg['formal_name']
name = 'arg_' + arg['assign_name']
return '{name} ? {expr} : NULL'.format(name=name, expr=expr)
return expr
@ -135,6 +144,7 @@ void $name($args)
name=option['cname'],
float_args=',\n'.join(float_args),
double_args=',\n'.join(double_args))
template.append(code)
elif option['backend'] == 'cunn':
float_args = []
@ -150,11 +160,13 @@ void $name($args)
float_args=',\n'.join(float_args),
double_args=',\n'.join(double_args),
half_args=',\n'.join(half_args))
template.append(code)
return [code, '']
template.append('')
return template
def get_type_unpack(self, arg, option):
return Template(arg['name'])
return Template(arg.get('assign_name', arg['name']))
def get_type_check(self, arg, option):
if option['backend'] == 'cunn':
@ -162,20 +174,20 @@ void $name($args)
else:
return Template('!is_cuda')
def get_formal_args(self, arguments):
formal_args = []
def get_assign_args(self, arguments):
assign_args = []
for arg in arguments:
arg = copy.copy(arg)
new_type = self.INPUT_ARGUMENT_MAP.get(arg['type'])
if new_type is not None:
arg['type'] = new_type
formal_args.append(arg)
return formal_args
assign_args.append(arg)
return assign_args
def get_wrapper_template(self, declaration):
# get formal arguments string
# get assign arguments string
base_arguments = declaration['options'][0]['arguments']
args = self.get_formal_args(base_arguments)
args = self.get_assign_args(base_arguments)
arg_str = ', '.join([arg['type'] + ' ' + arg['name'] for arg in args])
if self.header:
@ -185,7 +197,7 @@ void $name($args)
checked_args = []
for arg in base_arguments:
if arg['type'] in tensor_types:
name = arg.get('formal_name', arg['name'])
name = arg.get('assign_name', arg['name'])
name_str = name
if arg.get('optional', False):
name_str = '?' + name_str

View File

@ -24,6 +24,14 @@ class KwargsPlugin(CWrapPlugin):
for option in declaration['options']:
for arg in option['arguments']:
arg['no_kwargs'] = True
# we need to use offsets for arg position in *arg if kwarg_only args
# are not at the end
for declaration in declarations:
for option in declaration['options']:
offset = 0
for arg in option['arguments']:
if arg.get('kwarg_only'):
arg['no_idx'] = True
return declarations
def get_arg_accessor(self, arg, option):
@ -53,9 +61,9 @@ class KwargsPlugin(CWrapPlugin):
name not in seen_args):
seen_args.add(name)
args.append(name)
declarations = '\n '.join(['PyObject *__kw_{} = NULL;'.format(name) for name in args])
declarations = '\n '.join(['PyObject *__kw_{} = NULL;'.format(a) for a in args])
lookups = '\n '.join(
['__kw_{name} = PyDict_GetItemString(kwargs, "{name}");'.format(name=name) for name in args])
['__kw_{name} = PyDict_GetItemString(kwargs, "{name}");'.format(name=a) for a in args])
start_idx = code.find('{') + 1
new_code = self.WRAPPER_TEMPLATE.substitute(declarations=declarations, lookups=lookups)
return code[:start_idx] + new_code + code[start_idx:]

View File

@ -1,58 +1,18 @@
import os
from copy import deepcopy
from . import CWrapPlugin
from itertools import product
from ...shared import cwrap_common
class OptionalArguments(CWrapPlugin):
def process_declarations(self, declarations):
new_options = []
for declaration in declarations:
for option in declaration['options']:
optional_args = []
for i, arg in enumerate(option['arguments']):
if 'default' in arg:
optional_args.append(i)
for permutation in product((True, False), repeat=len(optional_args)):
option_copy = deepcopy(option)
for i, bit in zip(optional_args, permutation):
arg = option_copy['arguments'][i]
if not bit:
arg['type'] = 'CONSTANT'
arg['ignore_check'] = True
# PyYAML interprets NULL as None...
arg['name'] = 'NULL' if arg['default'] is None else arg['default']
new_options.append(option_copy)
declaration['options'] = self.filter_unique_options(new_options)
return declarations
cwrap_common.enumerate_options_due_to_default(
declaration,
allow_kwarg=True,
type_to_signature={},
remove_self=False)
def filter_unique_options(self, options):
def signature(option, kwarg_only_count):
if kwarg_only_count == 0:
kwarg_only_count = None
else:
kwarg_only_count = -kwarg_only_count
arg_signature = '#'.join(
arg['type']
for arg in option['arguments'][:kwarg_only_count]
if not arg.get('ignore_check'))
if kwarg_only_count is None:
return arg_signature
kwarg_only_signature = '#'.join(
arg['name'] + '#' + arg['type']
for arg in option['arguments'][kwarg_only_count:]
if not arg.get('ignore_check'))
return arg_signature + "#-#" + kwarg_only_signature
seen_signatures = set()
unique = []
for option in options:
for num_kwarg_only in range(0, len(option['arguments']) + 1):
sig = signature(option, num_kwarg_only)
if sig not in seen_signatures:
if num_kwarg_only > 0:
for arg in option['arguments'][-num_kwarg_only:]:
arg['kwarg_only'] = True
unique.append(option)
seen_signatures.add(sig)
break
return unique
return declarations

View File

@ -0,0 +1,90 @@
from copy import deepcopy
from . import CWrapPlugin
import yaml
class ProcessorSpecificPlugin(CWrapPlugin):
def process_declarations(self, declarations):
# In order to move Torch's random functions into the same cwrap
# declaration, we need to be able to handle the fact that on the CPU
# these functions take a generator argument, while on the GPU, they
# do not. As such, we would like to split those declarations at cwrap
# runtime into two separate declarations, one for the CPU (unchanged),
# and one for the GPU (with the generator argument removed).
#
# For example, the declaration arguments:
# arguments:
# - THTensor* self
# - arg: THGenerator* generator
# default: THPDefaultGenerator->cdata
# kwarg_only: True
#
# Would have the generator argument removed when generating for the GPU
# backend.
def arg_contains_generator(arg):
return (arg['type'] == 'THGenerator*' or (arg.get('default', None)
is not None and 'THPDefaultGenerator' in
str(arg.get('default', ""))))
def split_candidate(declaration):
# First, check and see if it is a declaration for both CPU/GPU
if all([proc in declaration['backends'] for
proc in ['CPU', 'CUDA']]):
for option in declaration['options']:
for argument in option['arguments']:
if arg_contains_generator(argument):
return True
return False
def can_we_handle_the_split(declaration):
# hook into here if the split cannot happen for some reason
return True
def generator_split(declaration):
# the split must make two changes: 1. remove the generator argument
# for the GPU, and 2. assign the correct backends/types to the
# split declaration
dec_cpu = declaration
dec_gpu = deepcopy(declaration)
# Remove GPU backend and types from dec_cpu
dec_cpu['backends'].remove('CUDA')
if dec_cpu.get('backend_type_pairs', False):
dec_cpu['backend_type_pairs'] = (
[pair for pair in dec_cpu['backend_type_pairs'] if
pair[1] == 'CPU'])
# also need to reach into options
for option in dec_cpu['options']:
option['backends'].remove('CUDA')
# Remove CPU backend and types from dec_gpu
dec_gpu['backends'].remove('CPU')
if dec_gpu.get('backend_type_pairs', False):
dec_gpu['backend_type_pairs'] = (
[pair for pair in dec_gpu['backend_type_pairs'] if
pair[1] == 'CUDA'])
# also need to reach into options
for option in dec_gpu['options']:
option['backends'].remove('CPU')
# Remove generator arguments from dec_gpu options
for option in dec_gpu['options']:
option['arguments'] = (
[arg for arg in option['arguments'] if
not arg_contains_generator(arg)])
return [dec_cpu, dec_gpu]
decs = []
for declaration in declarations:
if split_candidate(declaration):
assert(can_we_handle_the_split(declaration))
newdecs = generator_split(declaration)
decs.extend(newdecs)
else:
decs.append(declaration)
return decs

View File

@ -15,9 +15,12 @@ class THPPlugin(CWrapPlugin):
'THTensor*': Template('((THPTensor*)$arg)->cdata'),
'THBoolTensor*': Template('((THPBoolTensor*)$arg)->cdata'),
'THIndexTensor*': Template('((THPIndexTensor*)$arg)->cdata'),
'THIntegerTensor*': Template('((THPIntegerTensor*)$arg)->cdata'),
'THCudaTensor*': Template('((THCPFloatTensor*)$arg)->cdata'),
'THCudaDoubleTensor*': Template('((THCPDoubleTensor*)$arg)->cdata'),
'THCudaIntTensor*': Template('((THCPIntTensor*)$arg)->cdata'),
'THCudaLongTensor*': Template('((THCPLongTensor*)$arg)->cdata'),
'THSFloatTensor*': Template('((THSPFloatTensor*)$arg)->cdata'),
'THSDoubleTensor*': Template('((THSPDoubleTensor*)$arg)->cdata'),
@ -50,9 +53,12 @@ class THPPlugin(CWrapPlugin):
'THTensor*': Template('(PyObject*)Py_TYPE($arg) == THPTensorClass'),
'THBoolTensor*': Template('(PyObject*)Py_TYPE($arg) == THPBoolTensorClass'),
'THIndexTensor*': Template('(PyObject*)Py_TYPE($arg) == THPIndexTensorClass'),
'THIntegerTensor*': Template('(PyObject*)Py_TYPE($arg) == THPIntegerTensorClass'),
'THCudaTensor*': Template('(PyObject*)Py_TYPE($arg) == THCPFloatTensorClass'),
'THCudaDoubleTensor*': Template('(PyObject*)Py_TYPE($arg) == THCPDoubleTensorClass'),
'THCudaIntTensor*': Template('(PyObject*)Py_TYPE($arg) == THCPIntTensorClass'),
'THCudaLongTensor*': Template('(PyObject*)Py_TYPE($arg) == THCPLongTensorClass'),
'THSDoubleTensor*': Template('(PyObject*)Py_TYPE($arg) == THSPDoubleTensorClass'),
'THSFloatTensor*': Template('(PyObject*)Py_TYPE($arg) == THSPFloatTensorClass'),
@ -82,8 +88,11 @@ class THPPlugin(CWrapPlugin):
RETURN_WRAPPER = {
'THTensor*': Template('return THPTensor_(New)($result);'),
'THSTensor*': Template('return THSPTensor_(New)($result);'),
'THIndexTensor*': Template('return THPIndexTensor_(New)($result);'),
'THLongTensor*': Template('return THPLongTensor_New($result);'),
'THLongStorage*': Template('return THPLongStorage_New($result);'),
'THCudaIntTensor*': Template('return THCPIntTensor_New($result);'),
'THCudaLongTensor*': Template('return THCPLongTensor_New($result);'),
# TODO: make it smarter - it should return python long if result doesn't fit into an int
'long': Template('return PyInt_FromLong($result);'),
'accreal': Template('return THPUtils_(newAccreal)($result);'),
@ -118,7 +127,7 @@ PyObject * $name(PyObject *self, PyObject *args, PyObject *kwargs)
""")
ALLOCATE_TMPL = Template("""\
THP${type}TensorPtr _${name}_guard = (THP${type}Tensor*) THP${type}Tensor_NewEmpty();
THP${type}TensorPtr _${name}_guard((THP${type}Tensor*) THP${type}Tensor_NewEmpty());
if (!_${name}_guard.get()) return NULL;
THP${type}Tensor* $name = _${name}_guard.get();
""")
@ -149,6 +158,7 @@ ${cpu}
'THIntTensor*': _allocate('Int', ALLOCATE_TMPL),
'THBoolTensor*': _allocate('Byte', ALLOCATE_TMPL, ALLOCATE_CUDA),
'THIndexTensor*': _allocate('Long', ALLOCATE_TMPL, ALLOCATE_CUDA),
'THIntegerTensor*': _allocate('Int', ALLOCATE_TMPL, ALLOCATE_CUDA),
'THSTensor*': _allocate('', ALLOCATE_TMPL, sparse=True),
}
@ -163,10 +173,13 @@ ${cpu}
'THIntTensor*': '" THPModuleStr "IntTensor',
'THBoolTensor*': '" THPModuleStr "ByteTensor',
'THIndexTensor*': '" THPModuleStr "LongTensor',
'THIntegerTensor*': '" THPModuleStr "IntTensor',
'THFloatTensor*': '" THPModuleStr "FloatTensor',
'THDoubleTensor*': '" THPModuleStr "DoubleTensor',
'THCudaTensor*': 'torch.cuda.FloatTensor',
'THCudaDoubleTensor*': 'torch.cuda.DoubleTensor',
'THCudaIntTensor*': 'torch.cuda.IntTensor',
'THCudaLongTensor*': 'torch.cuda.LongTensor',
'THSize*': 'torch.Size',
'THStride*': 'tuple',
'long': 'int',
@ -174,10 +187,12 @@ ${cpu}
'double': 'float',
'accreal': '" RealStr "',
'bool': 'bool',
'const char*': 'bool', # Can come only from bool option.
}
OUT_INIT = """
__out = kwargs ? PyDict_GetItemString(kwargs, "out") : NULL;
if (__out == Py_None) { __out = NULL; __dictcount--; __argcount--; }
"""
def __init__(self):
@ -303,8 +318,6 @@ ${cpu}
def process_declarations(self, declarations):
new_declarations = []
register_only = [d for d in declarations if d.get('only_register', False)]
declarations = [d for d in declarations if not d.get('only_register', False)]
def has_arg_type(declaration, type_name):
return any(arg['type'] == type_name
@ -321,9 +334,101 @@ ${cpu}
for option in declaration['options']
for arg in option['arguments'])
def backends_types_to_defined_if_string(declaration):
# A declaration has two fields: 'backend', which stores a list of
# backends (currently 'cpu' and 'cuda') the declaration applies
# to, and 'types', which stores a list of real types the
# declaration applies to. In PyTorch, when a function is only
# supported by a subset of types, we wrap it in macro definition
# checks.
#
# Previously, we manually required the cwrap declaration to
# specify for which backend/type combinations a function was
# defined for. Now, we explicitly list the types and backends for
# a declaration, if it should only be supported for a specific
# subset of types, backends, or type-backend pairs.
types = declaration.get('types', [])
backends = declaration['backends']
all_backends = ['CPU', 'CUDA']
def get_defined_string(backend, real):
if backend == 'CUDA':
if real == 'all':
return "IS_CUDA"
else:
return 'CUDA_{0}'.format(real.upper())
else:
if real == 'all':
return "!IS_CUDA"
else:
return 'defined(TH_REAL_IS_{0})'.format(real.upper())
def expand_composite_type(p, t):
if t == 'floating_point':
result = ['double', 'float']
if p == 'CUDA':
result.append('half')
elif t == 'integral':
result = ['byte', 'char', 'short', 'int', 'long']
else:
result = [t]
return result
defineds = []
# The logic below does not handle corner cases well. We allow the
# declaration to have a field 'backend_type_pairs' that stores a
# dictionary from type --> backend representing allowed
# combinations. Let's use these first.
for pair in declaration.get('backend_type_pairs', []):
p, t = pair
defineds.extend([get_defined_string(p, et) for et in
expand_composite_type(p, t)])
# In the base case, types is empty and backends contains both
# 'CPU' and 'CUDA' --> this means we support all types, and our
# string should be empty, or simply the list of explict type
# backend pairs
if (len(types) == 0 and all([proc in backends for proc in
all_backends])):
return " || ".join(defineds)
# Case 2: types is empty, but only one backend type is specified
if len(types) == 0 and len(backends) == 1:
defineds.append('IS_CUDA' if backends[0] == 'CUDA' else
"!IS_CUDA")
return " || ".join(defineds)
# Else, we loop overall all of the backend, type pairs and add
# them
for p in backends:
for t in types:
defineds.extend([get_defined_string(p, et) for et in
expand_composite_type(p, t)])
return " || ".join(defineds)
for declaration in declarations:
# Disable all methods for THHalfTensor, unless cpu_half is True
dfstr = backends_types_to_defined_if_string(declaration)
if len(dfstr) > 0:
# for now, need to check for distributed defined if as well
if 'defined_if' in declaration:
declaration['defined_if'] += ' && (' + dfstr + ')'
else:
declaration['defined_if'] = dfstr
if not declaration.get('cpu_half', False):
defined_if = '!defined(TH_REAL_IS_HALF)'
if 'defined_if' in declaration:
defined_if += ' && (' + declaration['defined_if'] + ')'
declaration['defined_if'] = defined_if
if declaration.get('only_register', False):
continue
declaration.setdefault('python_name', declaration['name'])
declaration.setdefault('variables', [])
if has_arg_type(declaration, 'THSize*'):
@ -334,15 +439,23 @@ ${cpu}
declaration['variables'] += ['PyObject *__out;']
self.generate_out_options(declaration)
if has_long_args(declaration):
declaration['no_kwargs'] = True
for option in declaration['options']:
for arg in option['arguments']:
if arg.get('long_args', False):
arg['no_kwargs'] = True
for option in declaration['options']:
option['cname'] = 'TH{}Tensor_({})'.format(
'S' if option.get('sparse', False) else '', option['cname'])
if declaration.get('with_stateless', False) or declaration.get('only_stateless', False):
if option.get('sparse', False):
defined_if = option.get('defined_if', '')
option['defined_if'] = '!IS_DISTRIBUTED' + (' && ' if defined_if else '') + defined_if
variants = declaration.get('variants', ['method'])
if 'function' in variants:
stateless_declaration = self.make_stateless(declaration)
new_declarations.append(stateless_declaration)
self.stateless_declarations.append(stateless_declaration)
if declaration.get('only_stateless', False):
if 'method' not in variants:
continue
self.declarations.append(declaration)
@ -353,9 +466,15 @@ ${cpu}
if arg['name'] == 'self':
arg['ignore_check'] = True
declarations = [d for d in declarations if not d.get('only_stateless', False)]
self.declarations.extend(filter(lambda x: not x.get('only_stateless', False), register_only))
self.stateless_declarations.extend(filter(lambda x: x.get('only_stateless', False), register_only))
register_only = [d for d in declarations if d.get('only_register', False)]
declarations = [d for d in declarations
if (('method' in d.get('variants', ['method'])) and
(not d.get('only_register', False)))]
self.declarations.extend(filter(lambda x: 'method' in x.get('variants',
['method']), register_only))
self.stateless_declarations.extend(filter(lambda x: 'method' not in
x.get('variants', ['method']),
register_only))
self.process_docstrings()
@ -369,6 +488,7 @@ ${cpu}
for option in declaration['options']:
for arg in option['arguments']:
if arg['name'] == 'self':
arg['assign_name'] = 'self'
arg['name'] = 'source'
return declaration
@ -390,11 +510,14 @@ ${cpu}
if 'defined_if' in declaration:
entry = self.preprocessor_guard(entry, declaration['defined_if'])
tensor_methods += entry
return self.TENSOR_METHODS_DECLARATION.substitute(
generated = self.TENSOR_METHODS_DECLARATION.substitute(
methods=tensor_methods,
stateless=('' if not stateless else 'stateless_'),
sparse=('' if not sparse else 'S'),
)
if sparse:
generated = '#if !defined(TH_REAL_IS_HALF) && !IS_DISTRIBUTED\n' + generated + '\n#endif\n\n'
return generated
def process_full_file(self, code):
# We have to find a place before all undefs
@ -415,7 +538,7 @@ ${cpu}
return self.preprocessor_guard(code, declaration['defined_if'])
return code
def process_all_unpacks(self, code, option):
def process_all_call_arg(self, code, option):
return 'LIBRARY_STATE ' + code
def process_all_checks(self, code, option):
@ -434,12 +557,25 @@ ${cpu}
if any(arg.get('long_args', False) for arg in option['arguments']):
code = code.replace('__argcount ==', '__argcount >=')
expected = str(int(option.get('output_provided', False)))
expected = str(int(option.get('output_provided', False)) +
sum(not arg.get('no_kwargs', False) and not arg.get('ignore_check', False)
for arg in option['arguments']))
code = '__dictcount == ' + expected + ' &&\n ' + code
return code
def process_option_code_template(self, template, option):
def process_option_code(self, code, option):
if option.get('defined_if', ''):
defined_if = option['defined_if']
placeholder = ''
# This means that it's a first option, so we need a dummy if,
# so the next option can be an else if.
if 'else if' not in code:
placeholder = '\n #else\n if (false) {'
return '#if ' + defined_if + '\n ' + code + placeholder + '\n #endif\n'
return code
def process_pre_arg_assign(self, template, option):
new_args = []
for arg in option['arguments']:
if not option.get('output_provided', True) and arg.get('output'):

View File

@ -0,0 +1,40 @@
from . import CWrapPlugin
from string import Template
class WrapDim(CWrapPlugin):
NDIM_TEMPLATE = Template(
"""${arg_tensor}->nDimension""")
CODE_TEMPLATE = Template(
"""THPUtils_assert(${arg_dim} >= -(${ndim}) && ${arg_dim} < (${ndim}),
"dimension out of range (expected to be in range of [%d, %d], but got %d)",
-(${ndim}), (${ndim})-1, ${arg_dim});
if (${arg_dim} < 0) ${arg_dim} += (${ndim});""")
def initialize(self, cwrap):
self.cwrap = cwrap
def process_option_code_template(self, template, option):
new_code = []
for i, arg in enumerate(option['arguments']):
if 'wrap_dim' not in arg:
continue
params = arg.get('wrap_dim').split("+")
arg_tensor = params[0]
arg_tensor = "arg_" + arg_tensor
arg_dim = "arg_" + arg.get('assign_name', arg['name'])
params[0] = self.NDIM_TEMPLATE.substitute(arg_tensor=arg_tensor)
ndim = "+".join(params)
new_code.append(self.CODE_TEMPLATE.substitute(
arg_dim=arg_dim,
ndim=ndim))
new_code.append("")
template = new_code + template
return template

View File

@ -1,49 +1,422 @@
class CWrapPlugin(object):
"""Base class from which all cwrap plugins should inherit.
Override any of the following methods to implement the desired wrapping
behavior.
"""
def initialize(self, cwrap):
"""Initialize the Plugin class prior to calling any other functions.
It is used to give the Plugin access to the cwrap object's helper
functions and state.
Args:
cwrap: the cwrap object performing the wrapping.
"""
pass
def get_type_check(self, arg, option):
"""Used to generate code for runtime checks of object types.
The type can be found in arg['type']. For example, it could be
THTensor*. If this Plugin recognizes the type in arg, it should
return a Template string containing code that checks whether a
Python object is of this type. For example, the return type in
this case would be:
Template('(PyObject*)Py_TYPE($arg) == THPTensorClass')
As a simpler example, if the type == 'bool' then we would return:
Template('PyBool_Check($arg)')
Note that the name of the identifier that will be subsituted must be
$arg.
Args:
arg: a Python object with a 'type' field representing the type
to generate a check string for.
option: dictionary containing the information for this specific
option.
Returns:
A Template string as described above, or None if this Plugin does
not have a corresponding type check for the passed type.
"""
pass
def get_type_unpack(self, arg, option):
"""Used to generate code unpacking of Python objects into C types.
Similar to get_type_check, but for unpacking Python objects into their
corresponding C types. The type is once again accessible via
arg['type']. This time we return a Template string that unpacks an
object. For a THTensor*, we know that the corresponding PyTorch type is
a THPTensor*, so we need to get the cdata from the object. So we would
return:
Template('((THPTensor*)$arg)->cdata')
For a simpler type, such as a long, we could do:
Template('PyLong_AsLong($arg)')
though in practice we will use our own custom unpacking code. Once
again, $arg must be used as the identifier.
Args:
arg: a Python object with a 'type' field representing the type
to generate a unpack string for.
option: dictionary containing the information for this specific
option.
Returns:
A Template string as described above, or None if this Plugin does
not have a corresponding type unpack for the passed type.
"""
pass
def get_return_wrapper(self, option):
"""Used to generate code wrapping a function's return value.
Wrapped functions should always return a PyObject *. However,
internally, the code will be working with C objects or primitives.
Therefore, if a function has a return value we need to convert it back
to a PyObject * before the function returns. Plugins can override this
function to generate wrapper code for returning specific C types. The
type is accessible via option['return'].
Continuing on with our THTensor* example, we might do something like:
Template('return THPTensor_(New)($result);')
In general, you want to do return <statement>; In this case, we call
into THP's library routine that takes a THTensor* (the $result
identifier) and returns a PyObject *.
For a bool, we could do Template('return PyBool_FromLong($result);').
Note that in other cases, our logic might be more complicated. For
example, if our return value is also an argument to the function call,
we could need to increase the reference count prior to returning.
Args:
option: dictionary containing the information for this specific
option.
Returns:
A Template string as described above, or None if this Plugin does
not have a corresponding return wrapper for the functions return
type or specifier.
"""
pass
def get_wrapper_template(self, declaration):
"""Used to create a code template to wrap the options.
This function returns a Template string that contains the function call
for the overall declaration, including the method definition, opening
and closing brackets, and any additional code within the method body.
Look through the examples to get a sense of what this might look like.
The only requirements are that it contains unsubstituted template
identifiers for anything the cwrap engine expects.
Note that for any declaration only one Plugin can generate the wrapper
template.
Args:
declaration: the declaration for the wrapped method.
Returns:
A template string representing the entire function declaration,
with identifiers as necessary.
"""
pass
def get_assign_args(self, arguments):
"""Used to modify argument metadata prior to assignment.
We have already setup argument checking, and how to unpack arguments.
This function allows you to modify the metadata of an argument prior to
actually performing the assignment. For example, you might want to
check that an argument is of a specific type, but when unpacking it you
might want to treat it as a different type. This function will allow
you to do stuff like that --> e.g. you could set the 'type' field for a
particular argument to be something else.
Args:
arguments: a list of argument metadata dictionaries.
Returns:
The same list of arguments, with any modifications as you see fit.
"""
pass
def get_arg_accessor(self, arg, option):
"""Used to generate a string for accessing the passed arg.
One of the key components of the YAML definition for a method to be
wrapped are the arguments to that method. Override this function to
show how to access that specific arg in the code. For example, you
might do something different if the argument is a keyword argument, or
a constant, or self. The base cwrap plugin has a fallback arg accessor
for loading elements from the args PyObject * tuple passed to the
function.
Its best to look at some of the existing Plugins to get a sense of what
one might do.
Args:
arg: a dictionary specifying attributes of the arg to be accessed
option: dictionary containing the information for this specific
option.
Returns:
A a string (note: not a Template string!) of code that can be used
to access the given arg. If the plugin does not know how to access
the arg, return None.
"""
pass
def process_full_file(self, code):
"""Used to modify the code for the entire output file.
The last thing any plugin can do. Code contains the results of wrapping
all the declarations. The plugin can do things like adding header
guards, include statements, etc.
Args:
code: a string source code for the wrapped declarations.
Returns:
The same code, modified as the plugin sees fit.
"""
return code
def process_single_check(self, code, arg, arg_accessor):
"""Used to postprocess a type check.
Above we defined a function get_type_check that returns a Template
string that allows for type checking a PyObject * for a specific type.
In this function, the passed "code" is a combination of that type check
along with a specific arg_accessor pasted in. For example:
'(PyObject*)Py_TYPE(PyTuple_GET_ITEM(args, 1)) == THPTensorClass'
This function can be overriden to support modifying this check string.
For example, if an argument can be null, we might want to check and see
if the type is Py_None, as well.
Args:
code: The string code representing a type check for a specific
argument being accessed.
arg: dictionary containing properties of that specific argument
arg_accessor: the arg_accessor string for that specific argument.
Note that this is likely also embedded in code, but if you want to
be able to access this arg and throw away the other code, you can
do so.
Returns:
A string representing the processed check/access string for this
arg. If the plugin does not know how to modify a specific input, it
should return the original code.
"""
return code
def process_all_checks(self, code, option):
"""Used to generate additional checks based on all the individual ones.
After individually processing each argument with get_type_check,
get_arg_accessor, process_single_check, this function allows you to
inspect the combined checks and do any additional checking/modify that
string as you see fit. In particular, given code is a string like:
CHECK_TYPE(GET_ARG(0)) && CHECK_TYPE(GET_ARG(1)) && ..
We can process it as we see fit. For example, we may want to add a
check at the beginning that we have the specified number of arguments.
Args:
code: A string representing each argument check separated by an
'&&'. code can be None if there are no arguments to be checked.
option: dictionary containing the information for this specific
option.
Returns:
The modified code string with any additional checks, or just the
existing code if no modifications are to be made.
"""
return code
def process_single_unpack(self, code, arg, arg_accessor):
"""Used to postprocess a type unpack.
Same as process_single_check above, but for type unpacking. E.g. an
example code could be:
PyLong_FromLong(PyTuple_GET_ITEM(args, 0))
And this code could modify that as it sees fit. For example, if the
result of accessing the argument is None, we would not want to call the
unpacking code.
Args:
code: The string code representing a type unpack for a specific
argument being accessed.
arg: dictionary containing properties of that specific argument
arg_accessor: the arg_accessor string for that specific argument.
Note that this is likely also embedded in code, but if you want to
be able to access this arg and throw away the other code, you can
do so.
Returns:
A string representing the processed unpack/access string for this
arg. If the plugin does not know how to modify a specific input, it
should return the original code.
"""
return code
def process_all_unpacks(self, code, option):
def process_all_call_arg(self, code, option):
"""Used to modify the arguments to the underlying C function call.
Code is the string of comma-separated arguments that will be passed to
the wrapped C function. You can use this function to modify that string
as you see fit. For example, THP prepends the LIBRARY_STATE definition
so that the generated code will follow the conventions it uses for
writing one function for both TH/THC calls.
Args:
code: A string as described above.
option: dictionary containing the information for this specific
option.
Returns:
The same code, modified as the plugin sees fit.
"""
return code
def process_option_code(self, code, option):
"""Used to modify the entire code body for an option.
Code in this case is a string containing the entire generated code for
a specific option. Note that this body includes the checks for each
option, i.e. if (type checks for one permutation) { ... } else if (type
checks for another permutation) { ... } etc.
Args:
code: string representing the generated code for the option
option: dictionary containing the information for this specific
option.
Returns:
The same code, modified as the plugin sees fit.
"""
return code
def process_wrapper(self, code, declaration):
"""Used to modify the entire code body for a declaration.
Code in this case is a string containing the entire generated code for
a specific declaration. This code can be modified as the plugin sees
fit. For example, we might want to wrap the function in preprocessor
guards if it is only enabled for floats.
Args:
code: string representing the generated code for the declaration
declaration: the declaration metadata.
Returns:
The same code, modified as the plugin sees fit.
"""
return code
def process_declarations(self, declarations):
"""Used to process/modify the function's declaration.
Cwrap loads the YAML of a function to be cwrap'd into a dictionary.
This is known as the declaration. The cwrap code sets some defaults as
necessary, and then passes this dictionary to process_declarations.
Overriding this code allows the plugin to modify this declaration as it
sees fit prior to any code generation. The plugin may add, remove or
modify the fields of the declaration dictionary. It can also save state
to the Plugin for use in subsequent function overrides.
Its best to look at some of the existing Plugins to get a sense of what
one might do.
Args:
declarations: a list of declarations, i.e. dictionaries that define
the function(s) being wrapped. Note that this can be plural, so the
function must take care to modify each input declaration.
Returns:
Those same declarations, modified as the Plugin sees fit. Note that
you could insert a declaration, if you wanted to take an input
declaration and e.g. wrap it multiple times.
"""
return declarations
def process_option_code_template(self, template, option):
"""Used to modify the code template for the option.
The "code template" can be thought of the actual body implementing the
wrapped function call --> i.e. it is not the argument check,
assignment, etc. but the actual logic of the function. The template is
a list containing two operations: the $call, and the $return_result.
These represent the "locations" where the function call will happen,
and the function will return.
This function can modify the list to insert arbitrary code around the
$call and $return_result. For example, one might want to wrap the code
in a try/catch, or post-process the result in some way. This allows a
plugin to do that.
Args:
template: a list containing $call and $return_result, in addition
to any arbitrary code inserted by other plugins.
option: dictionary containing the information for this specific
option.
Returns:
The same "code template", possibly modified by this plugin.
"""
return template
def process_pre_arg_assign(self, template, option):
"""Used to include any code before argument assignment.
This function can be used to insert any code that will be part of the
resulting function. The code is inserted after argument checks occur,
but before argument assignment.
Args:
template: String representing the code to be inserted. If other
plugins have included code for pre_arg_assign, it will be included
here.
option: dictionary containing the information for this specific
option.
Returns:
template, with any additional code if needed.
"""
return template
@ -59,3 +432,5 @@ from .GILRelease import GILRelease
from .AutoGPU import AutoGPU
from .CuDNNPlugin import CuDNNPlugin
from .GenericNN import GenericNN
from .WrapDim import WrapDim
from .Broadcast import Broadcast

View File

@ -1,40 +0,0 @@
FROM nvidia/cuda:8.0-devel-ubuntu14.04
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
curl \
ca-certificates \
libjpeg-dev \
libpng-dev &&\
rm -rf /var/lib/apt/lists/*
RUN curl -fsSL http://developer.download.nvidia.com/compute/redist/cudnn/v6.0/cudnn-8.0-linux-x64-v6.0-rc.tgz -O && \
tar -xzf cudnn-8.0-linux-x64-v6.0-rc.tgz -C /usr/local && \
rm cudnn-8.0-linux-x64-v6.0-rc.tgz && \
ldconfig
RUN ln -s /usr/local/cuda/lib64/libcudnn.so.6.0.5 /usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.5
RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda install conda-build && \
/opt/conda/bin/conda create -y --name pytorch-py35 python=3.5.2 numpy scipy ipython mkl&& \
/opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/envs/pytorch-py35/bin:$PATH
RUN conda install --name pytorch-py35 -c soumith magma-cuda80
# This must be done before pip so that requirements.txt is available
WORKDIR /opt/pytorch
COPY . .
RUN cat requirements.txt | xargs -n1 pip install --no-cache-dir && \
TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
CMAKE_LIBRARY_PATH=/opt/conda/envs/pytorch-py35/lib \
CMAKE_INCLUDE_PATH=/opt/conda/envs/pytorch-py35/include \
pip install -v .
WORKDIR /workspace
RUN chmod -R a+w /workspace

View File

@ -0,0 +1,27 @@
FROM ubuntu:16.04
LABEL com.nvidia.volumes.needed="nvidia_driver"
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
git \
curl \
ca-certificates \
libjpeg-dev \
libpng-dev && \
rm -rf /var/lib/apt/lists/*
RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda install conda-build && \
/opt/conda/bin/conda create -y --name pytorch-py35 python=3.5.2 numpy pyyaml scipy ipython mkl&& \
/opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/envs/pytorch-py35/bin:$PATH
RUN conda install --name pytorch-py35 -c soumith magma-cuda80 && /opt/conda/bin/conda clean -ya
RUN conda install --name pytorch-py35 pytorch torchvision cuda80 -c soumith && /opt/conda/bin/conda clean -ya
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
WORKDIR /workspace
RUN chmod -R a+w /workspace

View File

@ -1 +1,2 @@
from .generate_wrappers import generate_wrappers, wrap_function, import_module
from .generate_wrappers import generate_wrappers, wrap_function, \
import_module, wrap_generic_function

View File

@ -3,26 +3,13 @@ import sys
from string import Template, ascii_lowercase
from ..cwrap import cwrap
from ..cwrap.plugins import StandaloneExtension, GenericNN, NullableArguments, AutoGPU
from ..shared import import_module
BASE_PATH = os.path.realpath(os.path.join(__file__, '..', '..', '..'))
WRAPPER_PATH = os.path.join(BASE_PATH, 'torch', 'csrc', 'nn')
THNN_UTILS_PATH = os.path.join(BASE_PATH, 'torch', '_thnn', 'utils.py')
def import_module(name, path):
if sys.version_info >= (3, 5):
import importlib.util
spec = importlib.util.spec_from_file_location(name, path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
elif sys.version_info >= (3, 0):
from importlib.machinery import SourceFileLoader
return SourceFileLoader(name, path).load_module()
else:
import imp
return imp.load_source(name, path)
thnn_utils = import_module('torch._thnn.utils', THNN_UTILS_PATH)
FUNCTION_TEMPLATE = Template("""\
@ -52,22 +39,27 @@ TYPE_TRANSFORMS = {
'Float': {
'THTensor*': 'THFloatTensor*',
'real': 'float',
'accreal': 'double',
},
'Double': {
'THTensor*': 'THDoubleTensor*',
'real': 'double',
'accreal': 'double',
},
'CudaHalf': {
'THCTensor*': 'THCudaHalfTensor*',
'real': 'half',
'accreal': 'float',
},
'Cuda': {
'THCTensor*': 'THCudaTensor*',
'real': 'float',
'accreal': 'float',
},
'CudaDouble': {
'THCTensor*': 'THCudaDoubleTensor*',
'real': 'double',
'accreal': 'double',
},
}
for t, transforms in TYPE_TRANSFORMS.items():
@ -83,14 +75,17 @@ def wrap_function(name, type, arguments):
cname = 'THNN_' + type + name
declaration = ''
declaration += 'extern "C" void ' + cname + \
'(' + ', '.join(TYPE_TRANSFORMS[type].get(arg.type, arg.type) for arg in arguments) + ');\n'
'(' + ', '.join(TYPE_TRANSFORMS[type].get(arg.type, arg.type)
for arg in arguments) + ');\n'
declaration += FUNCTION_TEMPLATE.substitute(name=type + name, cname=cname)
indent = ' ' * 4
dict_indent = ' ' * 6
prefix = indent + '- '
for arg in arguments:
if not arg.is_optional:
declaration += prefix + TYPE_TRANSFORMS[type].get(arg.type, arg.type) + ' ' + arg.name + '\n'
declaration += prefix + \
TYPE_TRANSFORMS[type].get(
arg.type, arg.type) + ' ' + arg.name + '\n'
else:
t = TYPE_TRANSFORMS[type].get(arg.type, arg.type)
declaration += prefix + 'type: ' + t + '\n' + \
@ -135,6 +130,7 @@ def wrap_cunn():
AutoGPU(has_self=False),
])
GENERIC_FUNCTION_TEMPLATE = Template("""\
[[
name: $name
@ -163,7 +159,7 @@ def wrap_generic():
defs = OrderedDict()
def should_wrap_function(name):
if name.startswith('LookupTable'):
if name.startswith('LookupTable_'):
return False
return (name.endswith('updateOutput') or
name.endswith('updateGradInput') or

12
tools/pytorch.version Executable file
View File

@ -0,0 +1,12 @@
{
global:
_TH*;
TH*;
*THP*;
*THCP*;
PyInit*;
init*;
state;
local:
*;
};

View File

@ -1,17 +1,39 @@
import ctypes.util
import os
import platform
import ctypes.util
from subprocess import Popen, PIPE
from .env import check_env_flag
def find_nvcc():
proc = Popen(['which', 'nvcc'], stdout=PIPE, stderr=PIPE)
out, err = proc.communicate()
out = out.decode().strip()
if len(out) > 0:
return os.path.dirname(out)
else:
return None
if check_env_flag('NO_CUDA'):
WITH_CUDA = False
CUDA_HOME = None
else:
CUDA_HOME = os.getenv('CUDA_HOME', '/usr/local/cuda')
if not os.path.exists(CUDA_HOME):
cudart_path = ctypes.util.find_library('cudart')
if cudart_path is not None:
CUDA_HOME = os.path.dirname(cudart_path)
# We use nvcc path on Linux and cudart path on macOS
osname = platform.system()
if osname == 'Linux':
cuda_path = find_nvcc()
else:
cudart_path = ctypes.util.find_library('cudart')
if cudart_path is not None:
cuda_path = os.path.dirname(cudart_path)
else:
cuda_path = None
if cuda_path is not None:
CUDA_HOME = os.path.dirname(cuda_path)
else:
CUDA_HOME = None
WITH_CUDA = CUDA_HOME is not None

View File

@ -1,4 +1,5 @@
import os
import sys
import glob
from itertools import chain
@ -9,6 +10,8 @@ from .cuda import WITH_CUDA, CUDA_HOME
def gather_paths(env_vars):
return list(chain(*(os.getenv(v, '').split(':') for v in env_vars)))
is_conda = 'conda' in sys.version or 'Continuum' in sys.version
conda_dir = os.path.join(os.path.dirname(sys.executable), '..')
WITH_CUDNN = False
CUDNN_LIB_DIR = None
@ -19,6 +22,7 @@ if WITH_CUDA and not check_env_flag('NO_CUDNN'):
os.path.join(CUDA_HOME, 'lib'),
os.path.join(CUDA_HOME, 'lib64'),
'/usr/lib/x86_64-linux-gnu/',
'/usr/lib/powerpc64le-linux-gnu/',
] + gather_paths([
'LIBRARY_PATH',
])))
@ -31,6 +35,9 @@ if WITH_CUDA and not check_env_flag('NO_CUDNN'):
'C_INCLUDE_PATH',
'CPLUS_INCLUDE_PATH',
])))
if is_conda:
lib_paths.append(os.path.join(conda_dir, 'lib'))
include_paths.append(os.path.join(conda_dir, 'include'))
for path in lib_paths:
if path is None or not os.path.exists(path):
continue

View File

@ -0,0 +1,58 @@
import os
this_file = os.path.dirname(os.path.abspath(__file__))
generated_dir = os.path.abspath(os.path.join(this_file, '..', '..', 'torch', 'csrc', 'generated'))
line_start = '//generic_include '
types = [
'Double',
'Float',
'Half',
'Long',
'Int',
'Short',
'Char',
'Byte'
]
generic_include = '#define {lib}_GENERIC_FILE "{path}"'
generate_include = '#include "{lib}/{lib}Generate{type}Type.h"'
def split_types(file_name):
assert file_name.startswith('torch/csrc/')
if not os.path.exists(generated_dir):
os.makedirs(generated_dir)
with open(file_name, 'r') as f:
lines = f.read().split('\n')
# Find //generic_include
for i, l in enumerate(lines):
if l.startswith(line_start):
args = l[len(line_start):]
lib_prefix, generic_file = filter(bool, args.split())
break
else:
raise RuntimeError("generic include not found")
gen_name_prefix = file_name[len('torch/csrc/'):].replace('/', '_').replace('.cpp', '')
gen_path_prefix = os.path.join(generated_dir, gen_name_prefix)
prefix = '\n'.join(lines[:i])
suffix = '\n'.join(lines[i + 1:])
to_build = []
g_include = generic_include.format(lib=lib_prefix, path=generic_file)
for t in types:
t_include = generate_include.format(lib=lib_prefix, type=t)
gen_path = gen_path_prefix + t + '.cpp'
to_build.append(gen_path)
with open(gen_path, 'w') as f:
f.write(prefix + '\n' +
g_include + '\n' +
t_include + '\n' +
suffix)
return to_build

3
tools/shared/__init__.py Normal file
View File

@ -0,0 +1,3 @@
from .module_loader import import_module
from .cwrap_common import set_declaration_defaults, \
sort_by_number_of_options, enumerate_options_due_to_default

View File

@ -0,0 +1 @@
../../torch/lib/ATen/common_with_cwrap.py

View File

@ -0,0 +1,16 @@
import sys
def import_module(name, path):
if sys.version_info >= (3, 5):
import importlib.util
spec = importlib.util.spec_from_file_location(name, path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
elif sys.version_info >= (3, 0):
from importlib.machinery import SourceFileLoader
return SourceFileLoader(name, path).load_module()
else:
import imp
return imp.load_source(name, path)

View File

@ -5,7 +5,7 @@ Additionally, it provides many utilities for efficient serializing of
Tensors and arbitrary types, and other useful utilities.
It has a CUDA counterpart, that enables you to run your tensor computations
on an NVIDIA GPU with compute capability >= 2.0.
on an NVIDIA GPU with compute capability >= 3.0.
"""
import sys
@ -15,7 +15,7 @@ from .version import __version__
__all__ = [
'typename', 'is_tensor', 'is_storage', 'set_default_tensor_type',
'set_rng_state', 'get_rng_state', 'manual_seed', 'initial_seed',
'save', 'load', 'set_printoptions', 'chunk', 'split', 'stack',
'save', 'load', 'set_printoptions', 'chunk', 'split', 'stack', 'matmul',
'DoubleStorage', 'FloatStorage', 'LongStorage', 'IntStorage',
'ShortStorage', 'CharStorage', 'ByteStorage',
'DoubleTensor', 'FloatTensor', 'LongTensor', 'IntTensor',
@ -31,6 +31,13 @@ __all__ = [
# automatically filled by the dynamic loader.
import os as _dl_flags
# if we have numpy, it *must* be imported before the call to setdlopenflags()
# or there is risk that later c modules will segfault when importing numpy
try:
import numpy as np
except:
pass
# first check if the os package has the required flags
if not hasattr(_dl_flags, 'RTLD_GLOBAL') or not hasattr(_dl_flags, 'RTLD_NOW'):
try:
@ -81,7 +88,7 @@ def is_tensor(obj):
Args:
obj (Object): Object to test
"""
return obj.__class__ in _tensor_classes
return type(obj) in _tensor_classes
def is_storage(obj):
@ -90,7 +97,7 @@ def is_storage(obj):
Args:
obj (Object): Object to test
"""
return obj.__class__ in _storage_classes
return type(obj) in _storage_classes
def set_default_tensor_type(t):
@ -122,6 +129,9 @@ def manual_seed(seed):
Args:
seed (int or long): The desired seed.
"""
if torch.cuda.is_available() and not torch.cuda._in_bad_fork:
torch.cuda.manual_seed_all(seed)
return default_generator.manual_seed(seed)
@ -151,6 +161,10 @@ class FloatStorage(_C.FloatStorageBase, _StorageBase):
pass
class HalfStorage(_C.HalfStorageBase, _StorageBase):
pass
class LongStorage(_C.LongStorageBase, _StorageBase):
pass
@ -191,6 +205,16 @@ class FloatTensor(_C.FloatTensorBase, _TensorBase):
return FloatStorage
class HalfTensor(_C.HalfTensorBase, _TensorBase):
def is_signed(self):
return True
@classmethod
def storage_type(cls):
return HalfStorage
class LongTensor(_C.LongTensorBase, _TensorBase):
def is_signed(self):
@ -244,12 +268,12 @@ class ByteTensor(_C.ByteTensorBase, _TensorBase):
_storage_classes = {
DoubleStorage, FloatStorage, LongStorage, IntStorage, ShortStorage,
CharStorage, ByteStorage,
CharStorage, ByteStorage, HalfStorage
}
_tensor_classes = {
DoubleTensor, FloatTensor, LongTensor, IntTensor, ShortTensor,
CharTensor, ByteTensor,
CharTensor, ByteTensor, HalfTensor
}
@ -261,19 +285,21 @@ set_default_tensor_type('torch.FloatTensor')
from .functional import *
################################################################################
# Initialize extension
################################################################################
def manager_path():
import os
path = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'lib', 'torch_shm_manager')
if not os.path.exists(path):
raise RuntimeError("Unable to find torch_shm_manager at " + path)
return path.encode('utf-8')
# Shared memory manager needs to know the exact location of manager executable
import os
manager_path = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'lib', 'torch_shm_manager')
if sys.version_info[0] >= 3:
manager_path = bytes(manager_path, 'ascii')
_C._initExtension(manager_path)
del os
_C._initExtension(manager_path())
del manager_path
################################################################################
@ -312,7 +338,10 @@ import torch.autograd
import torch.nn
import torch.optim
import torch.multiprocessing
import torch.sparse
import torch.utils.backcompat
_C._init_names(list(torch._tensor_classes) + list(torch._storage_classes))
# attach docstrings to torch and tensor functions
from . import _torch_docs, _tensor_docs
del _torch_docs, _tensor_docs
from . import _torch_docs, _tensor_docs, _storage_docs
del _torch_docs, _tensor_docs, _storage_docs

31
torch/_six.py Normal file
View File

@ -0,0 +1,31 @@
# Copyright (c) 2010-2017 Benjamin Peterson
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
def with_metaclass(meta, *bases):
"""Create a base class with a metaclass."""
# This requires a bit of explanation: the basic idea is to make a dummy
# metaclass for one level of class instantiation that replaces itself with
# the actual metaclass.
class metaclass(meta):
def __new__(cls, name, this_bases, d):
return meta(name, bases, d)
return type.__new__(metaclass, 'temporary_class', (), {})

43
torch/_storage_docs.py Normal file
View File

@ -0,0 +1,43 @@
"""Adds docstrings to Storage functions"""
import torch._C
from torch._C import _add_docstr as add_docstr
storage_classes = [
'DoubleStorageBase',
'FloatStorageBase',
'LongStorageBase',
'IntStorageBase',
'ShortStorageBase',
'CharStorageBase',
'ByteStorageBase',
]
def add_docstr_all(method, docstr):
for cls_name in storage_classes:
cls = getattr(torch._C, cls_name)
try:
add_docstr(getattr(cls, method), docstr)
except AttributeError:
pass
add_docstr_all('from_file',
"""
from_file(filename, shared=False, size=0) -> Storage
If shared is True then memory is shared between all processes. All changes are
written to the file. If shared is False then the changes on the storage do not
affect the file.
Size is the number of elements in the storage. If shared is False then the file
must contain at least `size * sizeof(Type)` bytes (`Type` is the type of
storage). If shared is True the file will be created if needed.
Args:
filename (str): file name to map
shared (bool): whether to share memory
size (int): number of elements in the storage
""")

File diff suppressed because it is too large Load Diff

View File

@ -67,7 +67,7 @@ def set_printoptions(
def _number_format(tensor, min_sz=-1):
min_sz = max(min_sz, 2)
tensor = torch.DoubleTensor(tensor.nelement()).copy_(tensor).abs_()
tensor = torch.DoubleTensor(tensor.size()).copy_(tensor).abs_().view(tensor.nelement())
pos_inf_mask = tensor.eq(float('inf'))
neg_inf_mask = tensor.eq(float('-inf'))

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +1,10 @@
import torch
import importlib
def _type(self, new_type=None, async=False):
"""Casts this object to the specified type.
"""Returns the type if `new_type` is not provided, else casts this object to
the specified type.
If this is already of the correct type, no copy is performed and the
original object is returned.
@ -21,6 +23,15 @@ def _type(self, new_type=None, async=False):
new_type = _import_dotted_name(new_type)
if new_type == type(self):
return self
if self.is_sparse:
if not new_type.is_sparse:
raise RuntimeError("Cannot cast sparse tensor to dense tensor")
new_type_name = new_type.__module__ + '.' + new_type.__name__
new_values_type_name = new_type_name.replace('.sparse', '')
new_values = self._values().type(new_values_type_name, async)
return new_type(self._indices(), new_values, self.size())
if new_type.is_sparse:
raise RuntimeError("Cannot cast dense tensor to sparse tensor")
return new_type(self.size()).copy_(self, async)
@ -39,16 +50,27 @@ def _cuda(self, device=None, async=False):
if self.is_cuda:
if device is None:
device = torch.cuda.current_device()
if self.get_device() != device:
with torch.cuda.device(device):
return type(self)(self.size()).copy_(self, async)
else:
if self.get_device() == device:
return self
else:
if device is None:
device = -1
with torch.cuda.device(device):
return self.type(getattr(torch.cuda, self.__class__.__name__), async)
with torch.cuda.device(device):
if self.is_sparse:
new_type = getattr(torch.cuda.sparse, self.__class__.__name__)
indices = self._indices().cuda(device, async)
values = self._values().cuda(device, async)
return new_type(indices, values, self.size())
else:
new_type = getattr(torch.cuda, self.__class__.__name__)
return new_type(self.size()).copy_(self, async)
def _rebuild_tensor(storage, storage_offset, size, stride):
class_name = storage.__class__.__name__.replace('Storage', 'Tensor')
module = importlib.import_module(storage.__module__)
tensor_class = getattr(module, class_name)
return tensor_class().set_(storage, storage_offset, size, stride)
def _range(*args, **kwargs):
@ -77,3 +99,47 @@ def _accumulate(iterable, fn=lambda x, y: x + y):
for element in it:
total = fn(total, element)
yield total
def _flatten_tensors(tensors):
"""Flatten tensors into a single contiguous 1D buffer"""
if len(tensors) == 1:
return tensors[0].contiguous().view(-1)
numels = [tensor.numel() for tensor in tensors]
size = sum(numels)
offset = 0
flat = tensors[0].new(size)
for tensor, numel in zip(tensors, numels):
flat.narrow(0, offset, numel).copy_(tensor, broadcast=False)
offset += numel
return flat
def _unflatten_tensors(flat, tensors):
"""View a flat buffer using the sizes of tensors"""
outputs = []
offset = 0
for tensor in tensors:
numel = tensor.numel()
outputs.append(flat.narrow(0, offset, numel).view_as(tensor))
offset += numel
return tuple(outputs)
def _take_tensors(tensors, size_limit):
"""Groups tensors into lists of up to size_limit bytes"""
buf = []
size = 0
last_type = type(tensors[0]) if len(tensors) > 0 else None
for tensor in tensors:
t = type(tensor)
param_size = tensor.numel() * tensor.element_size()
if t is not last_type or (size + param_size > size_limit and size > 0):
yield buf
last_type = t
size = 0
buf = []
buf.append(tensor)
size += param_size
if len(buf) > 0:
yield buf

View File

@ -5,21 +5,51 @@ changes to the existing code - you only need to wrap all tensors in
:class:`.Variable` objects.
"""
import torch
import warnings
from .variable import Variable
from .function import Function, NestedIOFunction
from .stochastic_function import StochasticFunction
from .gradcheck import gradcheck
__all__ = ['Variable', 'Function', 'StochasticFunction', 'backward']
def backward(variables, grad_variables, retain_variables=False):
def _make_grads(outputs, grads, user_create_graph):
if user_create_graph is not None:
create_graph = user_create_graph
else:
create_graph = any(isinstance(grad, Variable) and not grad.volatile
for grad in grads)
new_grads = []
for out, grad in zip(outputs, grads):
if isinstance(grad, Variable):
new_grads.append(grad)
elif torch.is_tensor(grad):
new_grads.append(Variable(grad, volatile=not create_graph))
elif grad is None:
if out.requires_grad:
if out.numel() != 1:
raise RuntimeError("grad can be implicitly created only for scalar outputs")
data = out.data
new_grads.append(
Variable(data.new().resize_as_(data).fill_(1), volatile=not create_graph))
else:
new_grads.append(None)
else:
raise TypeError("gradients can be either Tensors, Variables or None, but got " +
type(grad).__name__)
return tuple(new_grads), create_graph
def backward(variables, grad_variables=None, retain_graph=None, create_graph=None, retain_variables=None):
"""Computes the sum of gradients of given variables w.r.t. graph leaves.
The graph is differentiated using the chain rule. If any of ``variables``
are non-scalar (i.e. their data has more than one element) and require
gradient, the function additionaly requires specifying ``grad_variables``.
It should be a sequence of matching length, that containins gradient of
It should be a sequence of matching length, that contains gradient of
the differentiated function w.r.t. corresponding variables (``None`` is an
acceptable value for all variables that don't need gradient tensors).
@ -29,15 +59,98 @@ def backward(variables, grad_variables, retain_variables=False):
Arguments:
variables (sequence of Variable): Variables of which the derivative will be
computed.
grad_variables (sequence of Tensor): Gradients w.r.t. each element of
corresponding variables. Required only for non-scalar variables that
require gradient.
retain_variables (bool): If ``True``, buffers necessary for computing
gradients won't be freed after use. It is only necessary to
specify ``True`` if you want to differentiate some subgraph multiple
times.
grad_variables (sequence of (Tensor, Variable or None)): Gradients w.r.t.
each element of corresponding variables. Any tensors will be
automatically converted to Variables that are volatile unless
``create_graph`` is True. None values can be specified for scalar
Variables or ones that don't require grad. If a None value would
be acceptable for all grad_variables, then this argument is optional.
retain_graph (bool, optional): If False, the graph used to compute the grad
will be freed. Note that in nearly all cases setting this option to True
is not needed and often can be worked around in a much more efficient
way. Defaults to the value of ``create_graph``.
create_graph (bool, optional): If true, graph of the derivative will
be constructed, allowing to compute higher order derivative products.
Defaults to False, unless ``grad_variables`` contains at least one
non-volatile Variable.
"""
Variable._execution_engine.run_backward(
tuple(variables), tuple(grad_variables), retain_variables)
variables = (variables,) if isinstance(variables, Variable) else tuple(variables)
assert torch._C._autograd_init()
if grad_variables is None:
grad_variables = [None] * len(variables)
elif isinstance(grad_variables, Variable) or torch.is_tensor(grad_variables):
grad_variables = [grad_variables]
else:
grad_variables = list(grad_variables)
grad_variables, create_graph = _make_grads(variables, grad_variables, create_graph)
if retain_variables is not None:
if retain_graph is not None:
raise ValueError("only one of retain_graph and retain_variables can be specified")
retain_graph = retain_variables
warnings.warn("retain_variables option is deprecated and will be removed in 0.3. "
"Use retain_graph instead.")
elif retain_graph is None:
retain_graph = create_graph
Variable._execution_engine.run_backward(
variables, grad_variables, retain_graph)
def grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=None, only_inputs=True):
"""Computes and returns the sum of gradients of outputs w.r.t. the inputs.
``grad_outputs`` should be a sequence of length matching ``output``
containing the pre-computed gradients w.r.t. each of the outputs. If an
output doesn't require_grad, then the gradient can be ``None``).
Gradients can be given as Tensors when one doesn't need the graph of the
derivative, or as Variables, in which case the graph will be created.
If ``only_inputs`` is True, the function will only return a list of gradients
w.r.t the specified inputs. If it's False, then gradient w.r.t. all remaining
leaves will still be computed, and will be accumulated into their ``.grad``
attribute.
Arguments:
outputs (sequence of Variable): outputs of the differentiated function.
inputs (sequence of Variable): Inputs w.r.t. which the gradient will be
returned (and not accumulated into ``.grad``).
grad_outputs (sequence of Tensor or Variable): Gradients w.r.t. each output.
Any tensors will be automatically converted to Variables that are
volatile unless ``create_graph`` is True. None values can be
specified for scalar Variables or ones that don't require grad.
If a None value would be acceptable for all grad_variables, then
this argument is optional.
retain_graph (bool, optional): If False, the graph used to compute the grad
will be freed. Note that in nearly all cases setting this option to True
is not needed and often can be worked around in a much more efficient
way. Defaults to the value of ``create_graph``.
create_graph (bool, optional): If True, graph of the derivative will
be constructed, allowing to compute higher order derivative products.
Defaults to False, unless ``grad_variables`` contains at least one
non-volatile Variable.
only_inputs (bool, optional): If True, gradient w.r.t. leaves that are
part of the graph, but don't appear in ``inputs`` won't be computed
and accumulated. Defaults to True.
"""
outputs = (outputs,) if isinstance(outputs, Variable) else tuple(outputs)
inputs = (inputs,) if isinstance(inputs, Variable) else tuple(inputs)
if grad_outputs is None:
grad_outputs = [None] * len(outputs)
elif isinstance(grad_outputs, Variable) or torch.is_tensor(grad_outputs):
grad_outputs = [grad_outputs]
else:
grad_outputs = list(grad_outputs)
grad_outputs, create_graph = _make_grads(outputs, grad_outputs, create_graph)
if retain_graph is None:
retain_graph = create_graph
return Variable._execution_engine.run_backward(
outputs, grad_outputs, retain_graph,
inputs, only_inputs)
if not torch._C._autograd_init():
raise RuntimeError("autograd initialization failed")

View File

@ -1,189 +1,228 @@
import torch
from ..function import Function, InplaceFunction
from .utils import maybe_unexpand, maybe_unexpand_or_view
import math
class Add(InplaceFunction):
def forward(self, a, b):
if self.inplace:
self.mark_dirty(a)
@staticmethod
def forward(ctx, a, b, inplace=False):
ctx.a_size = a.size()
ctx.b_size = b.size()
if inplace:
ctx.mark_dirty(a)
return a.add_(b)
else:
return a.add(b)
def backward(self, grad_output):
return grad_output, grad_output
@staticmethod
def backward(ctx, grad_output):
return maybe_unexpand(grad_output, ctx.a_size), maybe_unexpand_or_view(grad_output, ctx.b_size), None
class Sub(InplaceFunction):
def forward(self, a, b):
if self.inplace:
self.mark_dirty(a)
@staticmethod
def forward(ctx, a, b, inplace=False):
ctx.a_size = a.size()
ctx.b_size = b.size()
if inplace:
ctx.mark_dirty(a)
return a.sub_(b)
else:
return a.sub(b)
def backward(self, grad_output):
return grad_output, grad_output.neg()
@staticmethod
def backward(ctx, grad_output):
return maybe_unexpand(grad_output, ctx.a_size), maybe_unexpand_or_view(grad_output.neg(), ctx.b_size), None
class Mul(Function):
def forward(self, a, b):
self.save_for_backward(a, b)
@staticmethod
def forward(ctx, a, b):
ctx.a_size = a.size()
ctx.b_size = b.size()
ctx.save_for_backward(a, b)
return a.mul(b)
def backward(self, grad_output):
a, b = self.saved_tensors
return grad_output.mul(b), grad_output.mul(a)
@staticmethod
def backward(ctx, grad_output):
a, b = ctx.saved_variables
return maybe_unexpand(grad_output.mul(b), ctx.a_size), maybe_unexpand_or_view(grad_output.mul(a), ctx.b_size)
class Div(Function):
def forward(self, a, b):
self.save_for_backward(a, b)
@staticmethod
def forward(ctx, a, b):
ctx.a_size = a.size()
ctx.b_size = b.size()
ctx.save_for_backward(a, b)
return a.div(b)
def backward(self, grad_output):
a, b = self.saved_tensors
return grad_output.div(b), grad_output.neg().mul(a).div_(b).div_(b)
@staticmethod
def backward(ctx, grad_output):
a, b = ctx.saved_variables
b_rec = b.reciprocal()
grad_a = grad_output.mul(b_rec)
grad_b = grad_output.neg().mul(a).mul(b_rec).mul(b_rec)
return maybe_unexpand(grad_a, ctx.a_size), maybe_unexpand_or_view(grad_b, ctx.b_size)
class Pow(Function):
def forward(self, a, b):
self.save_for_backward(a, b)
@staticmethod
def forward(ctx, a, b):
ctx.a_size = a.size()
ctx.b_size = b.size()
ctx.save_for_backward(a, b)
return a.pow(b)
def backward(self, grad_output):
a, b = self.saved_tensors
return grad_output.mul(b).mul_(a.pow(b - 1)), grad_output.mul(a.pow(b)).mul_(a.log())
@staticmethod
def backward(ctx, grad_output):
a, b = ctx.saved_variables
grad_a = grad_output.mul(b).mul(a.pow(b - 1))
grad_b = grad_output.mul(a.pow(b)).mul(a.log())
return maybe_unexpand(grad_a, ctx.a_size), maybe_unexpand_or_view(grad_b, ctx.b_size)
def sort_args(a, b):
return (a, b, True) if torch.is_tensor(a) else (b, a, False)
class AddConstant(InplaceFunction):
def __init__(self, constant, inplace=False):
super(AddConstant, self).__init__(inplace)
self.constant = constant
def forward(self, a):
if self.inplace:
self.mark_dirty(a)
return a.add_(self.constant)
@staticmethod
def forward(ctx, a, b, inplace=False):
tensor, constant, ctx.tensor_first = sort_args(a, b)
if inplace:
ctx.mark_dirty(tensor)
return tensor.add_(constant)
else:
return a.add(self.constant)
return tensor.add(constant)
def backward(self, grad_output):
return grad_output
@staticmethod
def backward(ctx, grad_output):
if ctx.tensor_first:
return grad_output, None, None
else:
return None, grad_output, None
class SubConstant(InplaceFunction):
def __init__(self, constant, sub_tensor=False, inplace=False):
super(SubConstant, self).__init__(inplace)
self.constant = constant
self.sub_tensor = sub_tensor
def forward(self, a):
if self.sub_tensor:
if a.is_signed() and self.inplace:
self.mark_dirty(a)
return a.neg_().add_(self.constant)
@staticmethod
def forward(ctx, a, b, inplace=False):
tensor, constant, ctx.tensor_first = sort_args(a, b)
if ctx.tensor_first:
if inplace:
ctx.mark_dirty(tensor)
return tensor.sub_(constant)
else:
assert not self.inplace, "can't perform (constant - tensor) " \
"subtraction in-place on an unsigned type"
return a.new().resize_as_(a).fill_(self.constant).sub_(a)
return tensor.sub(constant)
else:
if self.inplace:
self.mark_dirty(a)
return a.sub_(self.constant)
if inplace:
ctx.mark_dirty(tensor)
return tensor.neg_().add_(constant)
else:
return a.sub(self.constant)
return tensor.neg().add_(constant)
def backward(self, grad_output):
if self.sub_tensor:
return grad_output.neg()
@staticmethod
def backward(ctx, grad_output):
if ctx.tensor_first:
return grad_output, None, None
else:
return grad_output
return None, grad_output.neg(), None
class MulConstant(InplaceFunction):
def __init__(self, constant, inplace=False):
super(MulConstant, self).__init__(inplace)
self.constant = constant
def forward(self, a):
if self.inplace:
self.mark_dirty(a)
return a.mul_(self.constant)
@staticmethod
def forward(ctx, a, b, inplace=False):
tensor, ctx.constant, ctx.tensor_first = sort_args(a, b)
if inplace:
ctx.mark_dirty(tensor)
return tensor.mul_(ctx.constant)
else:
return a.mul(self.constant)
return tensor.mul(ctx.constant)
def backward(self, grad_output):
return grad_output.mul(self.constant)
@staticmethod
def backward(ctx, grad_output):
grad_input = grad_output.mul(ctx.constant)
if ctx.tensor_first:
return grad_input, None, None
else:
return None, grad_input, None
class DivConstant(InplaceFunction):
def __init__(self, constant, div_by_tensor=False, inplace=False):
super(DivConstant, self).__init__(inplace)
self.constant = constant
self.div_by_tensor = div_by_tensor
if self.inplace and self.div_by_tensor:
# TODO: actually, as long as the type is floating point, we can
raise RuntimeError("can't perform (constant / tensor) division in-place")
def forward(self, a):
if self.div_by_tensor:
self.save_for_backward(a)
return a.new().resize_as_(a).fill_(self.constant).div_(a)
else:
if self.inplace:
return a.div_(self.constant)
@staticmethod
def forward(ctx, a, b, inplace=False):
tensor, ctx.constant, ctx.tensor_first = sort_args(a, b)
ctx.inplace = inplace
if ctx.tensor_first:
if inplace:
ctx.mark_dirty(tensor)
return tensor.div_(ctx.constant)
else:
return a.div(self.constant)
def backward(self, grad_output):
if self.div_by_tensor:
a = self.saved_tensors[0]
return grad_output.neg().mul_(self.constant).div_(a).div_(a)
return tensor.div(ctx.constant)
else:
return grad_output.div(self.constant)
ctx.save_for_backward(tensor)
if inplace:
ctx.mark_dirty(tensor)
return tensor.reciprocal_().mul_(ctx.constant)
else:
return tensor.reciprocal().mul_(ctx.constant)
@staticmethod
def backward(ctx, grad_output):
if ctx.tensor_first:
return grad_output.div(ctx.constant), None, None
else:
v, = ctx.saved_variables
if ctx.inplace:
return None, grad_output.mul(v).mul(v).div_(-ctx.constant), None
else:
v_rep = v.reciprocal()
return None, grad_output.mul(v_rep).mul(v_rep).mul_(-ctx.constant), None
class PowConstant(Function):
def __init__(self, constant, tensor_power=False):
super(PowConstant, self).__init__()
self.constant = constant
self.tensor_power = tensor_power
def forward(self, a):
if self.tensor_power:
self.fw_result = torch.pow(self.constant, a)
return self.fw_result
@staticmethod
def forward(ctx, a, b):
tensor, ctx.constant, ctx.tensor_first = sort_args(a, b)
if ctx.tensor_first:
ctx.save_for_backward(tensor)
return tensor.pow(ctx.constant)
else:
self.save_for_backward(a)
return a.pow(self.constant)
result = torch.pow(ctx.constant, tensor)
ctx.save_for_backward(result)
return result
def backward(self, grad_output):
if self.tensor_power:
return grad_output.mul(self.fw_result).mul_(math.log(self.constant))
@staticmethod
def backward(ctx, grad_output):
if ctx.tensor_first:
var, = ctx.saved_variables
return grad_output.mul(ctx.constant).mul(var.pow(ctx.constant - 1)), None
else:
a = self.saved_tensors[0]
return grad_output.mul(self.constant).mul_(a.pow(self.constant - 1))
var_result, = ctx.saved_variables
return None, grad_output.mul(var_result).mul_(math.log(ctx.constant))
class Negate(InplaceFunction):
def forward(self, i):
if self.inplace:
@staticmethod
def forward(ctx, i, inplace=False):
if inplace:
ctx.mark_dirty(i)
return i.neg_()
else:
return i.neg()
def backward(self, grad_output):
return grad_output.neg()
@staticmethod
def backward(ctx, grad_output):
return grad_output.neg(), None

View File

@ -1,201 +1,224 @@
import torch
from ..function import Function, InplaceFunction
from .utils import maybe_unexpand
# TODO: no need to save all args if the grad w.r.t. some of them is not needed
class _BlasBase(InplaceFunction):
def __init__(self, alpha=1, beta=1, inplace=False):
super(_BlasBase, self).__init__(inplace)
self.alpha = alpha
self.beta = beta
def _get_output(self, arg):
if self.inplace:
self.mark_dirty(arg)
return arg
else:
return arg.new().resize_as_(arg)
def _get_output(ctx, arg, inplace=False):
if inplace:
ctx.mark_dirty(arg)
return arg
else:
return arg.new().resize_as_(arg)
class Addmm(_BlasBase):
class Addmm(InplaceFunction):
def forward(self, add_matrix, matrix1, matrix2):
self.save_for_backward(matrix1, matrix2)
output = self._get_output(add_matrix)
return torch.addmm(self.alpha, add_matrix, self.beta,
@staticmethod
def forward(ctx, add_matrix, matrix1, matrix2, alpha=1, beta=1, inplace=False):
ctx.alpha = alpha
ctx.beta = beta
ctx.add_matrix_size = add_matrix.size()
ctx.save_for_backward(matrix1, matrix2)
output = _get_output(ctx, add_matrix, inplace=inplace)
return torch.addmm(alpha, add_matrix, beta,
matrix1, matrix2, out=output)
def backward(self, grad_output):
matrix1, matrix2 = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
matrix1, matrix2 = ctx.saved_variables
grad_add_matrix = grad_matrix1 = grad_matrix2 = None
if self.needs_input_grad[0]:
grad_add_matrix = grad_output
if self.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(self.alpha)
if ctx.needs_input_grad[0]:
grad_add_matrix = maybe_unexpand(grad_output, ctx.add_matrix_size)
if ctx.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(ctx.alpha)
if self.needs_input_grad[1]:
grad_matrix1 = torch.mm(grad_output, matrix2.t())
if self.beta != 1:
grad_matrix1 *= self.beta
if ctx.needs_input_grad[1]:
if matrix1.stride() == (1, matrix1.size(0)):
# column major gradient if input is column major
grad_matrix1 = torch.mm(matrix2, grad_output.t()).t()
else:
grad_matrix1 = torch.mm(grad_output, matrix2.t())
if ctx.beta != 1:
grad_matrix1 *= ctx.beta
if self.needs_input_grad[2]:
grad_matrix2 = torch.mm(matrix1.t(), grad_output)
if self.beta != 1:
grad_matrix2 *= self.beta
if ctx.needs_input_grad[2]:
if matrix2.stride() == (1, matrix2.size(0)):
# column major gradient if input is column major
grad_matrix2 = torch.mm(grad_output.t(), matrix1).t()
else:
grad_matrix2 = torch.mm(matrix1.t(), grad_output)
if ctx.beta != 1:
grad_matrix2 *= ctx.beta
return grad_add_matrix, grad_matrix1, grad_matrix2
return grad_add_matrix, grad_matrix1, grad_matrix2, None, None, None
class Addbmm(_BlasBase):
class Addbmm(InplaceFunction):
def forward(self, add_matrix, batch1, batch2):
self.save_for_backward(batch1, batch2)
output = self._get_output(add_matrix)
return torch.addbmm(self.alpha, add_matrix, self.beta,
@staticmethod
def forward(ctx, add_matrix, batch1, batch2, alpha=1, beta=1, inplace=False):
ctx.alpha = alpha
ctx.beta = beta
ctx.add_matrix_size = add_matrix.size()
ctx.save_for_backward(batch1, batch2)
output = _get_output(ctx, add_matrix, inplace=inplace)
return torch.addbmm(alpha, add_matrix, beta,
batch1, batch2, out=output)
def backward(self, grad_output):
batch1, batch2 = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
batch1, batch2 = ctx.saved_variables
grad_add_matrix = grad_batch1 = grad_batch2 = None
if self.needs_input_grad[0]:
grad_add_matrix = grad_output
if self.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(self.alpha)
if ctx.needs_input_grad[0]:
grad_add_matrix = maybe_unexpand(grad_output, ctx.add_matrix_size)
if ctx.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(ctx.alpha)
if any(self.needs_input_grad[1:]):
if any(ctx.needs_input_grad[1:]):
batch_grad_output = (grad_output
.unsqueeze(0)
.expand(batch1.size(0), batch1.size(1), batch2.size(2)))
if self.needs_input_grad[1]:
if ctx.needs_input_grad[1]:
grad_batch1 = torch.bmm(batch_grad_output, batch2.transpose(1, 2))
if self.beta != 1:
grad_batch1 *= self.beta
if ctx.beta != 1:
grad_batch1 *= ctx.beta
if self.needs_input_grad[2]:
if ctx.needs_input_grad[2]:
grad_batch2 = torch.bmm(batch1.transpose(1, 2), batch_grad_output)
if self.beta != 1:
grad_batch2 *= self.beta
if ctx.beta != 1:
grad_batch2 *= ctx.beta
return grad_add_matrix, grad_batch1, grad_batch2
return grad_add_matrix, grad_batch1, grad_batch2, None, None, None
class Baddbmm(_BlasBase):
class Baddbmm(InplaceFunction):
def forward(self, add_batch, batch1, batch2):
self.save_for_backward(batch1, batch2)
output = self._get_output(add_batch)
return torch.baddbmm(self.alpha, add_batch, self.beta,
@staticmethod
def forward(ctx, add_batch, batch1, batch2, alpha=1, beta=1, inplace=False):
ctx.alpha = alpha
ctx.beta = beta
ctx.add_batch_size = add_batch.size()
ctx.save_for_backward(batch1, batch2)
output = _get_output(ctx, add_batch, inplace=inplace)
return torch.baddbmm(alpha, add_batch, beta,
batch1, batch2, out=output)
def backward(self, grad_output):
batch1, batch2 = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
batch1, batch2 = ctx.saved_variables
grad_add_batch = grad_batch1 = grad_batch2 = None
if self.needs_input_grad[0]:
grad_add_batch = grad_output
if self.alpha != 1:
grad_add_batch = grad_add_batch.mul(self.alpha)
if ctx.needs_input_grad[0]:
grad_add_batch = maybe_unexpand(grad_output, ctx.add_batch_size)
if ctx.alpha != 1:
grad_add_batch = grad_add_batch.mul(ctx.alpha)
if self.needs_input_grad[1]:
if ctx.needs_input_grad[1]:
grad_batch1 = torch.bmm(grad_output, batch2.transpose(1, 2))
if self.beta != 1:
grad_batch1 *= self.beta
if ctx.beta != 1:
grad_batch1 *= ctx.beta
if self.needs_input_grad[2]:
if ctx.needs_input_grad[2]:
grad_batch2 = torch.bmm(batch1.transpose(1, 2), grad_output)
if self.beta != 1:
grad_batch2 *= self.beta
if ctx.beta != 1:
grad_batch2 *= ctx.beta
return grad_add_batch, grad_batch1, grad_batch2
return grad_add_batch, grad_batch1, grad_batch2, None, None, None
class Addmv(_BlasBase):
class Addmv(InplaceFunction):
def forward(self, add_vector, matrix, vector):
self.save_for_backward(matrix, vector)
output = self._get_output(add_vector)
return torch.addmv(self.alpha, add_vector, self.beta,
@staticmethod
def forward(ctx, add_vector, matrix, vector, alpha=1, beta=1, inplace=False):
ctx.alpha = alpha
ctx.beta = beta
ctx.add_vector_size = add_vector.size()
ctx.save_for_backward(matrix, vector)
output = _get_output(ctx, add_vector, inplace=inplace)
return torch.addmv(alpha, add_vector, beta,
matrix, vector, out=output)
def backward(self, grad_output):
matrix, vector = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
matrix, vector = ctx.saved_variables
grad_add_vector = grad_matrix = grad_vector = None
if self.needs_input_grad[0]:
grad_add_vector = grad_output
if self.alpha != 1:
grad_add_vector = grad_add_vector.mul(self.alpha)
if ctx.needs_input_grad[0]:
grad_add_vector = maybe_unexpand(grad_output, ctx.add_vector_size)
if ctx.alpha != 1:
grad_add_vector = grad_add_vector.mul(ctx.alpha)
if self.needs_input_grad[1]:
if ctx.needs_input_grad[1]:
grad_matrix = torch.ger(grad_output, vector)
if self.beta != 1:
grad_matrix *= self.beta
if ctx.beta != 1:
grad_matrix *= ctx.beta
if self.needs_input_grad[2]:
if ctx.needs_input_grad[2]:
grad_vector = torch.mv(matrix.t(), grad_output)
if self.beta != 1:
grad_vector *= self.beta
if ctx.beta != 1:
grad_vector *= ctx.beta
return grad_add_vector, grad_matrix, grad_vector
return grad_add_vector, grad_matrix, grad_vector, None, None, None
class Addr(_BlasBase):
class Addr(InplaceFunction):
def forward(self, add_matrix, vector1, vector2):
self.save_for_backward(vector1, vector2)
output = self._get_output(add_matrix)
return torch.addr(self.alpha, add_matrix, self.beta,
@staticmethod
def forward(ctx, add_matrix, vector1, vector2, alpha=1, beta=1, inplace=False):
ctx.alpha = alpha
ctx.beta = beta
ctx.add_matrix_size = add_matrix.size()
ctx.save_for_backward(vector1, vector2)
output = _get_output(ctx, add_matrix, inplace=inplace)
return torch.addr(alpha, add_matrix, beta,
vector1, vector2, out=output)
def backward(self, grad_output):
vector1, vector2 = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
vector1, vector2 = ctx.saved_variables
grad_add_matrix = grad_vector1 = grad_vector2 = None
if self.needs_input_grad[0]:
grad_add_matrix = grad_output
if self.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(self.alpha)
if ctx.needs_input_grad[0]:
grad_add_matrix = maybe_unexpand(grad_output, ctx.add_matrix_size)
if ctx.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(ctx.alpha)
if self.needs_input_grad[1]:
if ctx.needs_input_grad[1]:
grad_vector1 = torch.mv(grad_output, vector2)
if self.beta != 1:
grad_vector1 *= self.beta
if ctx.beta != 1:
grad_vector1 *= ctx.beta
if self.needs_input_grad[2]:
if ctx.needs_input_grad[2]:
# TODO: maybe it's better to do transpose + mv + transpose
grad_vector2 = torch.mm(vector1.unsqueeze(0), grad_output)
if self.beta != 1:
grad_vector2 *= self.beta
grad_vector2 = torch.mm(vector1.unsqueeze(0), grad_output).squeeze(0)
if ctx.beta != 1:
grad_vector2 *= ctx.beta
return grad_add_matrix, grad_vector1, grad_vector2
return grad_add_matrix, grad_vector1, grad_vector2, None, None, None
class Dot(Function):
def forward(self, vector1, vector2):
self.save_for_backward(vector1, vector2)
@staticmethod
def forward(ctx, vector1, vector2):
ctx.save_for_backward(vector1, vector2)
ctx.sizes = (vector1.size(), vector2.size())
return vector1.new((vector1.dot(vector2),))
def backward(self, grad_output):
vector1, vector2 = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
vector1, vector2 = ctx.saved_variables
grad_vector1 = grad_vector2 = None
if self.needs_input_grad[0]:
grad_vector1 = vector2.mul(grad_output[0])
if ctx.needs_input_grad[0]:
grad_vector1 = vector2.mul(grad_output.expand(ctx.sizes[1])).view(ctx.sizes[0])
if self.needs_input_grad[1]:
grad_vector2 = vector1.mul(grad_output[0])
if ctx.needs_input_grad[1]:
grad_vector2 = vector1.mul(grad_output.expand(ctx.sizes[0])).view(ctx.sizes[1])
return grad_vector1, grad_vector2
# TODO: cross
# TODO: diag
# TODO: trace
# TODO: tril
# TODO: triu

View File

@ -1,20 +1,29 @@
import torch
from ..function import Function
from .utils import maybe_unexpand, maybe_unexpand_or_view
# TODO: once Cpp-style functions are implemented we can detach a and b
# before calling forward.
class _CompareOp(Function):
def __init__(self, scalar=None):
super(_CompareOp, self).__init__()
self.scalar = scalar
def forward(self, tensor1, tensor2=None):
other = tensor2 if tensor2 is not None else self.scalar
mask = getattr(tensor1, self.fn_name)(other)
self.mark_non_differentiable(mask)
@classmethod
def forward(cls, ctx, a, b):
ctx.a_size = a.size()
ctx.b_tensor = torch.is_tensor(b)
ctx.b_size = b.size() if ctx.b_tensor else None
ctx.input_type = type(a)
mask = getattr(a, cls.fn_name)(b)
ctx.mark_non_differentiable(mask)
return mask
@staticmethod
def backward(ctx, grad_output):
grad_input = (grad_output * 0).type(ctx.input_type)
return (maybe_unexpand(grad_input, ctx.a_size),
maybe_unexpand_or_view(grad_input, ctx.b_size) if ctx.b_tensor else None)
class Eq(_CompareOp):
fn_name = 'eq'

View File

@ -1,44 +1,104 @@
import torch
from ..function import Function
from ..variable import Variable
class Diag(Function):
def __init__(self, diagonal_idx=0):
super(Diag, self).__init__()
self.diagonal_idx = diagonal_idx
@staticmethod
def forward(ctx, input, diagonal_idx=0):
ctx.diagonal_idx = diagonal_idx
return input.diag(ctx.diagonal_idx)
def forward(self, input):
return input.diag()
def backward(self, grad_output):
return grad_output.diag()
@staticmethod
def backward(ctx, grad_output):
return grad_output.diag(ctx.diagonal_idx), None
class Tril(Function):
def __init__(self, diagonal_idx=0):
super(Tril, self).__init__()
self.diagonal_idx = diagonal_idx
@staticmethod
def forward(ctx, input, diagonal_idx=0):
ctx.diagonal_idx = diagonal_idx
return input.tril(ctx.diagonal_idx)
def forward(self, input):
return input.tril(self.diagonal_idx)
def backward(self, grad_output):
return grad_output.tril(self.diagonal_idx)
@staticmethod
def backward(ctx, grad_output):
return grad_output.tril(ctx.diagonal_idx), None
class Triu(Function):
def __init__(self, diagonal_idx=0):
super(Triu, self).__init__()
self.diagonal_idx = diagonal_idx
@staticmethod
def forward(ctx, input, diagnoal_idx=0):
ctx.diagonal_idx = diagnoal_idx
return input.triu(ctx.diagonal_idx)
def forward(self, input):
return input.triu(self.diagonal_idx)
@staticmethod
def backward(ctx, grad_output):
return grad_output.triu(ctx.diagonal_idx), None
def backward(self, grad_output):
return grad_output.triu(self.diagonal_idx)
# TODO: trace
class Trace(Function):
@staticmethod
def forward(ctx, input):
ctx.isize = input.size()
return input.new((input.trace(), ))
@staticmethod
def backward(ctx, grad_output):
isize = ctx.isize
min_size = min(isize)
grad_input = Variable(grad_output.data.new(isize).zero_()).view(-1)
grad_input[::(isize[1] + 1)] = grad_output.expand(min_size)
return grad_input.view(isize)
class Cross(Function):
@staticmethod
def forward(ctx, input, other, dim=-1):
ctx.dim = dim
ctx.save_for_backward(input, other)
return torch.cross(input, other, ctx.dim)
@staticmethod
def backward(ctx, grad_output):
input, other = ctx.saved_variables
grad_input = other.cross(grad_output, ctx.dim)
grad_other = grad_output.cross(input, ctx.dim)
return grad_input, grad_other, None
class Inverse(Function):
@staticmethod
def forward(ctx, input):
inverse = torch.inverse(input)
ctx.save_for_backward(inverse)
return inverse
@staticmethod
def backward(ctx, grad_output):
inverse, = ctx.saved_variables
return -torch.mm(inverse.t(), torch.mm(grad_output, inverse.t()))
class Gesv(Function):
@staticmethod
def forward(ctx, b, a):
# TODO see if one can backprop through LU
X, LU = torch.gesv(b, a)
ctx.save_for_backward(X, a)
ctx.mark_non_differentiable(LU)
return X, LU
@staticmethod
def backward(ctx, grad_output, grad_LU=None):
X, a = ctx.saved_variables
grad_b, _ = torch.gesv(grad_output, a.t())
grad_a = -torch.mm(grad_b, X.t())
return grad_b, grad_a

View File

@ -1,282 +1,351 @@
from itertools import repeat
from ..._thnn import type2backend
from ..function import Function, InplaceFunction
from ..variable import Variable
from .utils import maybe_unexpand, maybe_unexpand_or_view
class Exp(InplaceFunction):
def forward(self, i):
if self.inplace:
self.mark_dirty(i)
@staticmethod
def forward(ctx, i, inplace=False):
if inplace:
ctx.mark_dirty(i)
result = i.exp_()
else:
result = i.exp()
self.save_for_backward(result)
ctx.save_for_backward(result)
return result
def backward(self, grad_output):
return self.saved_tensors[0] * grad_output
@staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_variables
return grad_output * result, None
class Log(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.log()
def backward(self, grad_output):
return grad_output.div(self.saved_tensors[0])
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output.div(i)
class Log1p(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.log1p()
def backward(self, grad_output):
return grad_output.div(self.saved_tensors[0].add(1))
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output.div(i.add(1))
class Tanh(InplaceFunction):
def forward(self, i):
if self.inplace:
self.mark_dirty(i)
@staticmethod
def forward(ctx, i, inplace=False):
if inplace:
ctx.mark_dirty(i)
result = i.tanh_()
else:
result = i.tanh()
self.save_for_backward(result)
ctx.save_for_backward(result)
return result
def backward(self, grad_output):
result, = self.saved_tensors
return grad_output * (1 - result * result)
@staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_variables
if grad_output.volatile:
grad_input = Variable(grad_output.data.new(grad_output.size()), volatile=True)
backend = type2backend[type(result.data)]
backend.Tanh_updateGradInput(backend.library_state, None, grad_output.data,
grad_input.data, result.data)
else:
grad_input = grad_output * (1 - result * result)
return grad_input, None
class Sigmoid(InplaceFunction):
def forward(self, i):
if self.inplace:
self.mark_dirty(i)
@staticmethod
def forward(ctx, i, inplace=False):
if inplace:
ctx.mark_dirty(i)
result = i.sigmoid_()
else:
result = i.sigmoid()
self.save_for_backward(result)
ctx.save_for_backward(result)
return result
def backward(self, grad_output):
result, = self.saved_tensors
return grad_output * ((1 - result) * result)
@staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_variables
if grad_output.volatile:
grad_input = Variable(grad_output.data.new(grad_output.size()), volatile=True)
backend = type2backend[type(result.data)]
backend.Sigmoid_updateGradInput(backend.library_state, None, grad_output.data,
grad_input.data, result.data)
else:
grad_input = grad_output * ((1 - result) * result)
return grad_input, None
class Sinh(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.sinh()
def backward(self, grad_output):
i, = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output * i.cosh()
class Cosh(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.cosh()
def backward(self, grad_output):
i, = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output * i.sinh()
class Abs(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.abs()
def backward(self, grad_output):
i, = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output * i.sign()
class Clamp(Function):
def __init__(self, min_val, max_val):
super(Clamp, self).__init__()
self.min_val = min_val
self.max_val = max_val
@staticmethod
def forward(ctx, i, min_val, max_val):
ctx._mask = (i.ge(min_val) * i.le(max_val))
return i.clamp(min_val, max_val)
def forward(self, i):
self.save_for_backward(i)
return i.clamp(self.min_val, self.max_val)
def backward(self, grad_output):
i, = self.saved_tensors
mask = i.ge(self.min_val) * i.le(self.max_val)
return grad_output * mask.type_as(grad_output)
@staticmethod
def backward(ctx, grad_output):
mask = Variable(ctx._mask.type_as(grad_output.data))
return grad_output * mask, None, None
class Sqrt(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.sqrt()
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output.mul(i.pow(-0.5)).div(2)
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output.mul(i.pow(-0.5)).div_(2)
class Sin(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.sin()
def backward(self, grad_output):
i, = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output * i.cos()
class Cos(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.cos()
def backward(self, grad_output):
i, = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output.mul(i.sin()).neg_()
class Tan(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.tan()
def backward(self, grad_output):
i, = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output.div(i.cos().pow(2))
class Asin(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.asin()
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output * (1 - i.mul(i)).sqrt_().reciprocal_()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output * (1 - i.mul(i)).sqrt().reciprocal()
class Acos(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.acos()
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output.mul((1 - i.mul(i)).sqrt_().reciprocal_()).neg_()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output.mul((1 - i.mul(i)).sqrt().reciprocal()).neg_()
class Atan(Function):
def forward(self, i):
self.save_for_backward(i)
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
return i.atan()
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output * i.mul(i).add_(1).reciprocal_()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output * i.mul(i).add_(1).reciprocal()
class Atan2(Function):
@staticmethod
def forward(ctx, y, x):
ctx.save_for_backward(y, x)
return y.atan2(x)
@staticmethod
def backward(ctx, grad_output):
y, x, = ctx.saved_variables
denominator = y.mul(y).add(x.mul(x)).reciprocal()
return grad_output * x.mul(denominator), grad_output * y.neg().mul(denominator)
# TODO: make inplace and update grad formulas
class Reciprocal(Function):
def forward(self, i):
@staticmethod
def forward(ctx, i):
result = i.reciprocal()
self.save_for_backward(result)
ctx.save_for_backward(result)
return result
def backward(self, grad_output):
result, = self.saved_tensors
@staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_variables
return grad_output * result.mul(result).neg_()
class Cmax(Function):
def forward(self, a, b):
self._max_buffer = a.gt(b).type_as(a)
@staticmethod
def forward(ctx, a, b):
ctx._a_size = a.size()
ctx._b_size = b.size()
ctx._mask = a.gt(b)
return a.max(b)
def backward(self, grad_output):
@staticmethod
def backward(ctx, grad_output):
mask = Variable(ctx._mask.type_as(grad_output.data))
return (
grad_output * self._max_buffer,
grad_output * self._max_buffer.eq(0).type_as(grad_output)
maybe_unexpand(grad_output * mask, ctx._a_size),
maybe_unexpand_or_view(grad_output * Variable(ctx._mask.eq(0).type_as(grad_output.data)), ctx._b_size)
)
class CmaxConstant(Function):
def __init__(self, constant):
super(CmaxConstant, self).__init__()
self.constant = constant
@staticmethod
def forward(ctx, i, constant):
ctx._mask = i.gt(constant)
return i.clamp(min=constant)
def forward(self, i):
self._max_buffer = i.gt(self.constant).type_as(i)
return i.clamp(min=self.constant)
def backward(self, grad_output):
return grad_output * self._max_buffer
@staticmethod
def backward(ctx, grad_output):
mask = Variable(ctx._mask.type_as(grad_output.data))
return grad_output * mask, None
class Cmin(Function):
def forward(self, a, b):
self._min_buffer = a.lt(b).type_as(a)
@staticmethod
def forward(ctx, a, b):
ctx._a_size = a.size()
ctx._b_size = b.size()
ctx._mask = a.lt(b).type_as(a)
return a.min(b)
def backward(self, grad_output):
@staticmethod
def backward(ctx, grad_output):
mask = Variable(ctx._mask.type_as(grad_output.data))
return (
grad_output * self._min_buffer,
grad_output * self._min_buffer.eq(0).type_as(grad_output)
maybe_unexpand(grad_output * mask, ctx._a_size),
maybe_unexpand_or_view(grad_output * Variable(ctx._mask.eq(0).type_as(grad_output.data)), ctx._b_size)
)
class CminConstant(Function):
def __init__(self, constant):
super(CminConstant, self).__init__()
self.constant = constant
@staticmethod
def forward(ctx, i, constant):
ctx._mask = i.lt(constant)
return i.clamp(max=constant)
def forward(self, i):
self._min_buffer = i.lt(self.constant).type_as(i)
return i.clamp(max=self.constant)
def backward(self, grad_output):
return grad_output * self._min_buffer
@staticmethod
def backward(ctx, grad_output):
mask = Variable(ctx._mask.type_as(grad_output.data))
return grad_output * mask, None
class _ConstantGrad(Function):
grad_value = 0
def __init__(self, *args):
super(_ConstantGrad, self).__init__()
self.args = args
@classmethod
def forward(cls, ctx, *args):
ctx._num_args = len(args)
ctx._args0_size = args[0].size()
return getattr(args[0], cls.__name__.lower())(*args[1:])
def forward(self, i):
return getattr(i, type(self).__name__.lower())(*self.args)
def backward(self, grad_output):
grad_input = grad_output.new(*repeat(1, grad_output.dim()))
grad_input = grad_input.fill_(self.grad_value).expand_as(grad_output)
return grad_input.mul(grad_output)
@classmethod
def backward(cls, ctx, grad_output):
return (maybe_unexpand(grad_output.mul(cls.grad_value), ctx._args0_size),) + (ctx._num_args - 1) * (None,)
class Floor(_ConstantGrad):
@ -313,91 +382,96 @@ class Remainder(_ConstantGrad):
class Lerp(Function):
def __init__(self, weight):
super(Lerp, self).__init__()
self.weight = float(weight)
@staticmethod
def forward(ctx, a, b, weight):
ctx._a_size = a.size()
ctx._b_size = b.size()
ctx._weight = float(weight)
return a.lerp(b, ctx._weight)
def forward(self, a, b):
return a.lerp(b, self.weight)
def backward(self, grad_output):
return grad_output.mul(1 - self.weight), grad_output.mul(self.weight)
@staticmethod
def backward(ctx, grad_output):
return (maybe_unexpand(grad_output.mul(1 - ctx._weight), ctx._a_size),
maybe_unexpand_or_view(grad_output.mul(ctx._weight), ctx._b_size), None)
class Rsqrt(InplaceFunction):
def forward(self, input):
if self.inplace:
self.mark_dirty(input)
result = input.rsqrt_()
@staticmethod
def forward(ctx, i, inplace=False):
if inplace:
ctx.mark_dirty(i)
result = i.rsqrt_()
else:
result = input.rsqrt()
self.save_for_backward(result)
result = i.rsqrt()
ctx.save_for_backward(result)
return result
def backward(self, grad_output):
result, = self.saved_tensors
return result.pow(3).div_(-2).mul_(grad_output)
@staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_variables
return result.pow(3).div_(-2).mul(grad_output), None
class Addcmul(InplaceFunction):
def __init__(self, scale=1, inplace=False):
super(Addcmul, self).__init__(inplace)
self.scale = scale
def forward(self, add_tensor, mul_tensor1, mul_tensor2):
self.save_for_backward(mul_tensor1, mul_tensor2)
if self.inplace:
return add_tensor.addcmul_(self.scale, mul_tensor1, mul_tensor2)
@staticmethod
def forward(ctx, add_tensor, mul_tensor1, mul_tensor2, scale=1.0, inplace=False):
ctx._scale = scale
ctx._add_tensor_size = add_tensor.size()
ctx.save_for_backward(mul_tensor1, mul_tensor2)
if inplace:
ctx.mark_dirty(add_tensor)
return add_tensor.addcmul_(scale, mul_tensor1, mul_tensor2)
else:
return add_tensor.addcmul(self.scale, mul_tensor1, mul_tensor2)
return add_tensor.addcmul(scale, mul_tensor1, mul_tensor2)
def backward(self, grad_output):
@staticmethod
def backward(ctx, grad_output):
grad_add = grad_mul1 = grad_mul2 = None
mul_tensor1, mul_tensor2 = self.saved_tensors
mul_tensor1, mul_tensor2 = ctx.saved_variables
if self.needs_input_grad[0]:
grad_add = grad_output
if ctx.needs_input_grad[0]:
grad_add = maybe_unexpand(grad_output, ctx._add_tensor_size)
if self.needs_input_grad[1]:
grad_mul1 = grad_output.mul(mul_tensor2).mul(self.scale)
if ctx.needs_input_grad[1]:
grad_mul1 = maybe_unexpand_or_view(grad_output.mul(mul_tensor2).mul_(ctx._scale), mul_tensor1.size())
if self.needs_input_grad[2]:
grad_mul2 = grad_output.mul(mul_tensor1).mul(self.scale)
if ctx.needs_input_grad[2]:
grad_mul2 = maybe_unexpand_or_view(grad_output.mul(mul_tensor1).mul_(ctx._scale), mul_tensor2.size())
return grad_add, grad_mul1, grad_mul2
return grad_add, grad_mul1, grad_mul2, None, None
class Addcdiv(InplaceFunction):
def __init__(self, scale=1, inplace=False):
super(Addcdiv, self).__init__(inplace)
self.scale = scale
def forward(self, add_tensor, div_tensor1, div_tensor2):
self.save_for_backward(div_tensor1, div_tensor2)
if self.inplace:
return add_tensor.addcdiv_(self.scale, div_tensor1, div_tensor2)
@staticmethod
def forward(ctx, add_tensor, div_tensor1, div_tensor2, scale=1.0, inplace=False):
ctx._scale = scale
ctx._add_tensor_size = add_tensor.size()
ctx.save_for_backward(div_tensor1, div_tensor2)
if inplace:
ctx.mark_dirty(add_tensor)
return add_tensor.addcdiv_(ctx._scale, div_tensor1, div_tensor2)
else:
return add_tensor.addcdiv(self.scale, div_tensor1, div_tensor2)
return add_tensor.addcdiv(ctx._scale, div_tensor1, div_tensor2)
def backward(self, grad_output):
@staticmethod
def backward(ctx, grad_output):
grad_add = grad_div1 = grad_div2 = None
div_tensor1, div_tensor2 = self.saved_tensors
div_tensor1, div_tensor2 = ctx.saved_variables
if self.needs_input_grad[0]:
grad_add = grad_output
if ctx.needs_input_grad[0]:
grad_add = maybe_unexpand(grad_output, ctx._add_tensor_size)
if self.needs_input_grad[1]:
grad_div1 = grad_output.div(div_tensor2).mul(self.scale)
if ctx.needs_input_grad[1]:
grad_div1 = maybe_unexpand_or_view(grad_output.div(div_tensor2).mul_(ctx._scale), div_tensor1.size())
if self.needs_input_grad[2]:
if ctx.needs_input_grad[2]:
div_tensor2_sq = div_tensor2.mul(div_tensor2)
grad_div2 = grad_output.mul(div_tensor1).div_(div_tensor2_sq)
grad_div2.neg_().mul_(self.scale)
return grad_add, grad_div1, grad_div2
grad_div2 = maybe_unexpand_or_view(grad_output.mul(div_tensor1).div(div_tensor2_sq).mul(-ctx._scale),
div_tensor2.size())
return grad_add, grad_div1, grad_div2, None, None
# TODO: atan2 + inplace

View File

@ -1,73 +1,141 @@
from functools import reduce
from ..function import Function
from ..variable import Variable
import torch
class _DimReduceFunction(Function):
class Sum(Function):
def __init__(self, dim=None):
super(_DimReduceFunction, self).__init__()
self.dim = dim
def forward(self, input):
self.input_size = input.size()
fn = getattr(input, self.fn_name)
if self.dim is None:
return input.new((fn(),))
@staticmethod
def forward(ctx, input, dim=None, keepdim=None):
ctx.dim = dim
ctx.keepdim = False if keepdim is None else keepdim
ctx.input_size = input.size()
if dim is None:
return input.new((input.sum(),))
else:
return fn(self.dim)
if keepdim is not None:
return input.sum(dim, keepdim=keepdim)
else:
return input.sum(dim)
class Sum(_DimReduceFunction):
fn_name = 'sum'
def backward(self, grad_output):
if self.dim is None:
return grad_output.new(self.input_size).fill_(grad_output[0])
@staticmethod
def backward(ctx, grad_output):
if ctx.dim is None:
return grad_output.expand(ctx.input_size), None, None
else:
repeats = [1 for _ in self.input_size]
repeats[self.dim] = self.input_size[self.dim]
return grad_output.repeat(*repeats),
if ctx.keepdim is False and len(ctx.input_size) != 1:
grad_output = grad_output.unsqueeze(ctx.dim)
repeats = [1 for _ in ctx.input_size]
repeats[ctx.dim] = ctx.input_size[ctx.dim]
return grad_output.repeat(*repeats), None, None
class Prod(_DimReduceFunction):
class Prod(Function):
def forward(self, input):
self.input_size = input.size()
if self.dim is None:
self.result = input.prod()
self.save_for_backward(input)
return input.new((self.result,))
@staticmethod
def forward(ctx, input, dim=None, keepdim=None):
ctx.dim = dim
ctx.keepdim = False if keepdim is None else keepdim
ctx.input_size = input.size()
if dim is None:
ctx.result = input.prod()
ctx.save_for_backward(input)
return input.new((ctx.result,))
else:
output = input.prod(self.dim)
self.save_for_backward(input, output)
if keepdim is not None:
output = input.prod(dim, keepdim=keepdim)
else:
output = input.prod(dim)
ctx.save_for_backward(input, output)
return output
def backward(self, grad_output):
if self.dim is None:
input, = self.saved_tensors
grad_input = grad_output.new(self.input_size).fill_(self.result)
return grad_input.div(input)
@staticmethod
def backward(ctx, grad_output):
def safe_zeros_backward(inp, dim):
# note that the gradient is equivalent to:
# cumprod(exclusive, normal) * cumprod(exclusive, reverse), e.g.:
# input: [ a, b, c]
# cumprod(exclusive, normal): [1 , a, a * b]
# cumprod(exclusive, reverse): [b * c, c, 1]
# product: [b * c, a * c, a * b]
# and this is safe under input with 0s.
if inp.size(dim) == 1:
return grad_output
ones_size = torch.Size((inp.size()[:dim] + (1,) + inp.size()[dim + 1:]))
ones = Variable(grad_output.data.new(ones_size).fill_(1))
exclusive_normal_nocp = torch.cat((ones, inp.narrow(dim, 0, inp.size(dim) - 1)), dim)
exclusive_normal = exclusive_normal_nocp.cumprod(dim)
def reverse_dim(var, dim):
return var.index_select(dim, Variable(torch.arange(var.size(dim) - 1, -1, -1)).long())
narrow_reverse = reverse_dim(inp.narrow(dim, 1, inp.size(dim) - 1), dim)
exclusive_reverse_nocp = torch.cat((ones, narrow_reverse), dim)
exclusive_reverse = reverse_dim(exclusive_reverse_nocp.cumprod(dim), dim)
grad_input = grad_output.expand_as(exclusive_normal).mul(exclusive_normal.mul(exclusive_reverse))
return grad_input
if ctx.dim is None:
input, = ctx.saved_variables
zero_idx = (input.data == 0).nonzero()
if zero_idx.dim() == 0:
return grad_output.mul(ctx.result).expand_as(input).div(input), None, None
elif zero_idx.size(0) > 1:
return (grad_output * 0).expand_as(input), None, None
else:
return safe_zeros_backward(input.contiguous().view(-1), 0).view_as(input), None, None
else:
input, output = self.saved_tensors
repeats = [1 for _ in self.input_size]
repeats[self.dim] = self.input_size[self.dim]
return output.mul(grad_output).repeat(*repeats).div_(input)
input, output = ctx.saved_variables
dim = ctx.dim if ctx.dim >= 0 else ctx.dim + input.dim()
if ctx.keepdim is False and len(ctx.input_size) != 1:
grad_output = grad_output.unsqueeze(dim)
output = output.unsqueeze(dim)
zero_mask = input == 0
slice_zero_count = zero_mask.sum(dim, True)
total_zeros = slice_zero_count.data.sum()
if total_zeros == 0:
grad_input = grad_output.mul(output).expand_as(input).div(input)
else:
grad_input = safe_zeros_backward(input, dim)
return grad_input, None, None
class Mean(_DimReduceFunction):
fn_name = 'mean'
class Mean(Function):
def backward(self, grad_output):
if self.dim is None:
grad_input_val = grad_output[0]
grad_input_val /= reduce(lambda x, y: x * y, self.input_size, 1)
return grad_output.new(*self.input_size).fill_(grad_input_val)
@staticmethod
def forward(ctx, input, dim=None, keepdim=None):
ctx.dim = dim
ctx.keepdim = False if keepdim is None else keepdim
ctx.input_size = input.size()
if dim is None:
return input.new((input.mean(),))
else:
repeats = [1 for _ in self.input_size]
dim_size = self.input_size[self.dim]
repeats[self.dim] = dim_size
return grad_output.repeat(*repeats).div_(dim_size)
if keepdim is not None:
return input.mean(dim, keepdim=keepdim)
else:
return input.mean(dim)
@staticmethod
def backward(ctx, grad_output):
if ctx.dim is None:
grad_input_val = grad_output / reduce(lambda x, y: x * y, ctx.input_size, 1)
return grad_input_val.expand(ctx.input_size), None, None
else:
if ctx.keepdim is False and len(ctx.input_size) != 1:
grad_output = grad_output.unsqueeze(ctx.dim)
repeats = [1 for _ in ctx.input_size]
dim_size = ctx.input_size[ctx.dim]
repeats[ctx.dim] = dim_size
return grad_output.repeat(*repeats).div_(dim_size), None, None
class _SelectionFunction(Function):
@ -75,44 +143,53 @@ class _SelectionFunction(Function):
# additional_args is prepended before dim when calling the tensor
# function. It's a no-op for subclasses other than kthvalue.
# kthvalue not only requires us to pass a dim, but also preceed it with k.
additional_args = tuple()
def __init__(self, dim=None):
super(_SelectionFunction, self).__init__()
self.dim = dim
def forward(self, input):
fn = getattr(input, type(self).__name__.lower())
self.input_size = input.size()
if self.dim is None and self.has_all_reduce:
value = fn(*self.additional_args)
self.indices = tuple(input.eq(value).nonzero()[0])
@classmethod
def forward(cls, ctx, input, dim=None, keepdim=None, additional_args=tuple()):
fn = getattr(input, cls.__name__.lower())
ctx.dim = dim
ctx.keepdim = False if keepdim is None else keepdim
ctx.additional_args = additional_args
ctx.input_size = input.size()
if ctx.dim is None and cls.has_all_reduce:
value = fn(*additional_args)
ctx.indices_tuple = tuple(input.eq(value).nonzero()[0])
return input.new((value,))
else:
if self.dim is None:
if ctx.dim is None:
dim = input.dim() - 1
else:
dim = self.dim
dim = ctx.dim
args = (dim,)
if self.additional_args:
args = self.additional_args + args
output, indices = fn(*args)
self.save_for_backward(indices)
self.mark_non_differentiable(indices)
if additional_args:
args = additional_args + args
if keepdim is not None:
output, indices = fn(*args, keepdim=keepdim)
else:
output, indices = fn(*args)
ctx.save_for_backward(indices)
ctx.mark_non_differentiable(indices)
return output, indices
def backward(self, grad_output, grad_indices=None):
grad_input = grad_output.new(*self.input_size).zero_()
if self.dim is None and self.has_all_reduce:
grad_input[self.indices] = grad_output[0]
@classmethod
def backward(cls, ctx, grad_output, grad_indices=None):
grad_input = Variable(grad_output.data.new(*ctx.input_size).zero_())
if ctx.dim is None and cls.has_all_reduce:
grad_input[ctx.indices_tuple] = grad_output
else:
if self.dim is None:
dim = input.dim() - 1
if ctx.dim is None:
dim = len(ctx.input_size) - 1
else:
dim = self.dim
indices, = self.saved_tensors
dim = ctx.dim
indices, = ctx.saved_variables
if ctx.keepdim is False and len(ctx.input_size) != 1:
grad_output = grad_output.unsqueeze(dim)
grad_indices = grad_indices.unsqueeze(dim)
indices = indices.unsqueeze(dim)
grad_input.scatter_(dim, indices, grad_output)
return grad_input
return grad_input, None, None, None
class Max(_SelectionFunction):
@ -128,53 +205,63 @@ class Mode(_SelectionFunction):
class Median(_SelectionFunction):
has_all_reduce = False
pass
class Kthvalue(_SelectionFunction):
has_all_reduce = False
def __init__(self, k, dim=None):
super(Kthvalue, self).__init__(dim)
self.additional_args = (k,)
@classmethod
def forward(cls, ctx, input, k, dim=None, keepdim=None):
return super(Kthvalue, cls).forward(ctx, input, dim, keepdim, (k,))
class Norm(Function):
def __init__(self, norm_type=2, dim=None):
super(Norm, self).__init__()
self.norm_type = norm_type
self.dim = dim
@staticmethod
def forward(ctx, input, p=2, dim=None, keepdim=None):
ctx.p = p
ctx.dim = dim
ctx.keepdim = False if keepdim is None else keepdim
def forward(self, input):
if self.dim is None:
self.norm = input.norm(self.norm_type)
self.save_for_backward(input)
return input.new((self.norm,))
if dim is None:
ctx.norm = input.norm(p)
ctx.save_for_backward(input)
return input.new((ctx.norm,))
else:
output = input.norm(self.norm_type, self.dim)
self.save_for_backward(input, output)
if keepdim is not None:
output = input.norm(p, dim, keepdim=keepdim)
else:
output = input.norm(p, dim)
ctx.save_for_backward(input, output)
return output
def backward(self, grad_output):
if self.dim is None:
input, = self.saved_tensors
if self.norm_type == 2:
return input.mul(grad_output[0] / self.norm)
@staticmethod
def backward(ctx, grad_output):
if ctx.dim is None:
input, = ctx.saved_variables
if ctx.p == 2:
scale_v = (grad_output / ctx.norm).expand_as(input)
return input.mul(scale_v), None, None, None
else:
pow = input.abs().pow(self.norm_type - 2)
scale = grad_output[0] / self.norm ** (self.norm_type - 1)
return input.mul(pow).mul(scale)
pow = input.abs().pow(ctx.p - 2)
scale_v = (grad_output / ctx.norm ** (ctx.p - 1)).expand_as(input)
return input.mul(pow).mul(scale_v), None, None, None
else:
input, output = self.saved_tensors
input, output = ctx.saved_variables
if ctx.keepdim is False and input.dim() != 1:
grad_output = grad_output.unsqueeze(ctx.dim)
output = output.unsqueeze(ctx.dim)
big_grad_output = grad_output.expand_as(input)
if self.norm_type == 2:
if ctx.p == 2:
big_output = output.expand_as(input)
return input.mul(big_grad_output).div(big_output)
return input.mul(big_grad_output).div(big_output), None, None, None
else:
pow = input.abs().pow(self.norm_type - 2)
big_output = output.pow(self.norm_type - 1).expand_as(input)
return input.mul(pow).mul(big_grad_output).div(big_output)
pow = input.abs().pow(ctx.p - 2)
big_output = output.pow(ctx.p - 1).expand_as(input)
return input.mul(pow).mul(big_grad_output).div(big_output), None, None, None
# TODO: renorm

View File

@ -0,0 +1,3 @@
%s/self/ctx/g
%s/\s\+def forward/ @staticmethod\r def forward/g
%s/\s\+def backward/ @staticmethod\r @once_differentiable\r def backward/g

View File

@ -23,8 +23,9 @@ class Multinomial(StochasticFunction):
if probs.dim() == 1:
probs = probs.unsqueeze(0)
samples = samples.unsqueeze(0)
reward = reward.unsqueeze(0)
# normalize probs (multinomial accepts weights)
probs /= probs.sum(1).expand_as(probs)
probs /= probs.sum(1, True).expand_as(probs)
grad_probs = probs.new().resize_as_(probs).zero_()
output_probs = probs.gather(1, samples)
output_probs.add_(1e-6).reciprocal_()
@ -83,8 +84,9 @@ class Normal(StochasticFunction):
stddevs_cb = stddevs_sq * stddevs
stddevs_sq += 1e-6
stddevs_cb += 1e-6
grad_stddevs = (grad_means * grad_means) / stddevs_cb
grad_stddevs = (stddevs - grad_stddevs) * reward
grad_stddevs = (stddevs_sq - (grad_means * grad_means))
grad_stddevs /= stddevs_cb
grad_stddevs *= reward
grad_means /= stddevs_sq
grad_means *= reward
return grad_means, grad_stddevs

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,39 @@
import torch
def maybe_view(variable, size):
if variable.size() == size:
return variable
return variable.contiguous().view(size)
def maybe_unexpand(variable, old_size):
num_unsqueezed = variable.dim() - len(old_size)
expanded_dims = [dim for dim, (expanded, original)
in enumerate(zip(variable.size()[num_unsqueezed:], old_size))
if expanded != original]
for _ in range(num_unsqueezed):
variable = variable.sum(0, keepdim=False)
for dim in expanded_dims:
variable = variable.sum(dim, keepdim=True)
return variable
def variable_expandable(variable, old_size):
try:
torch._C._infer_size(variable.size(), old_size)
except RuntimeError:
return False
return True
def maybe_unexpand_or_view(variable, old_size):
var_expanded = True
if maybe_view:
var_expanded = variable_expandable(variable, old_size)
if var_expanded:
return maybe_unexpand(variable, old_size)
else:
return maybe_view(variable, old_size)

View File

@ -1,85 +0,0 @@
from collections import deque, defaultdict
from torch._C import _ImperativeEngine as ImperativeEngine
from .variable import Variable
class BasicEngine(object):
def _compute_dependencies(self, function):
dependencies = defaultdict(int)
seen = {function}
queue = [function]
while len(queue) > 0:
fn = queue.pop()
for prev_fn, output_nr in fn.previous_functions:
if not prev_fn.requires_grad or isinstance(prev_fn, Variable):
continue
dependencies[prev_fn] += 1
if prev_fn not in seen:
queue.append(prev_fn)
seen.add(prev_fn)
return dependencies
def _free_backward_dependency(self, dependencies, prev_fn):
dependencies[prev_fn] -= 1
if dependencies[prev_fn] == 0:
del dependencies[prev_fn]
return True
return False
def _add_grad(self, need_copy, prev_grad, output_nr, d_prev_fn):
copy_id = (id(prev_grad), output_nr)
if not prev_grad[output_nr]:
prev_grad[output_nr] = d_prev_fn
need_copy.add(copy_id)
else:
grad_tensor = prev_grad[output_nr]
if copy_id in need_copy:
need_copy.remove(copy_id)
grad_tensor = grad_tensor.clone()
prev_grad[output_nr] = grad_tensor
grad_tensor.add_(d_prev_fn)
def run_backward(self, variable, grad, retain_variables):
if variable.creator is None:
variable._do_backward((grad,), retain_variables)
return
initial_grad = [None for _ in range(variable.creator.num_outputs)]
initial_grad[variable.output_nr] = grad
ready = deque([(variable.creator, initial_grad)])
not_ready = {}
need_copy = set()
dependencies = self._compute_dependencies(variable.creator)
while len(ready) > 0:
fn, grad = ready.pop()
grad_input = fn._do_backward(tuple(grad), retain_variables)
for (prev_fn, output_nr), d_prev_fn in zip(fn.previous_functions, grad_input):
if not prev_fn.requires_grad:
# TODO: check that d_prev_fn is None and warn otherwise
continue
if isinstance(prev_fn, Variable):
prev_fn._do_backward((d_prev_fn,), retain_variables)
continue
is_ready = self._free_backward_dependency(dependencies, prev_fn)
if is_ready:
if prev_fn in not_ready:
prev_grad = not_ready[prev_fn]
self._add_grad(need_copy, prev_grad, output_nr, d_prev_fn)
else:
if prev_fn.num_outputs != 1:
raise RuntimeError("one of the function outputs "
"wasn't used - this is an error not, but "
"it's going to be fixed soon")
prev_grad = (d_prev_fn,)
ready.appendleft((prev_fn, prev_grad))
else:
if prev_fn in not_ready:
prev_grad = not_ready[prev_fn]
else:
prev_grad = [None for _ in range(prev_fn.num_outputs)]
self._add_grad(need_copy, prev_grad, output_nr, d_prev_fn)
not_ready[prev_fn] = prev_grad

View File

@ -1,47 +1,12 @@
import torch
import torch._C as _C
import torch.utils.hooks as hooks
from torch._six import with_metaclass
import functools
from collections import OrderedDict
class Function(_C._FunctionBase):
"""Records operation history and defines formulas for differentiating ops.
Every operation performed on :class:`Variable` s creates a new function
object, that performs the computation, and records that it happened.
The history is retained in the form of a DAG of functions, with edges
denoting data dependencies (``input <- output``). Then, when backward is
called, the graph is processed in the topological ordering, by calling
:func:`backward` methods of each :class:`Function` object, and passing
returned gradients on to next :class:`Function` s.
Normally, the only way users interact with functions is by creating
subclasses and defining new operations. This is a recommended way of
extending torch.autograd.
Since Function logic is a hotspot in most scripts, almost all of it
was moved to our C backend, to ensure that the framework overhead is
minimal.
Each function is meant to be used only once (in the forward pass).
Attributes:
saved_tensors: Tuple of Tensors that were saved in the call to
:func:`forward`.
needs_input_grad: Tuple of booleans of length :attr:`num_inputs`,
indicating whether a given input requires gradient. This can be
used to optimize buffers saved for backward, and ignoring gradient
computation in :func:`~Function.backward`.
num_inputs: Number of inputs given to :func:`forward`.
num_outputs: Number of tensors returned by :func:`forward`.
requires_grad: Boolean indicating whether the :func:`backward` will
ever need to be called.
previous_functions: Tuple of (int, Function) pairs of length
:attr:`num_inputs`. Each entry contains a reference to a
:class:`Function` that created corresponding input, and an index
of the previous function output that's been used.
"""
__call__ = _C._FunctionBase._do_forward
class _ContextMethodMixin(object):
def save_for_backward(self, *tensors):
"""Saves given tensors for a future call to :func:`~Function.backward`.
@ -50,9 +15,10 @@ class Function(_C._FunctionBase):
:func:`forward` **method.**
Later, saved tensors can be accessed through the :attr:`saved_tensors`
attribute. Before returning them to the user, a check is made, to
ensure they weren't used in any in-place operation that modified
their content.
attribute; or, if the corresponding Variable is needed (e.g. for double
backwards), those can be accessed through the :attr:`saved_variables`
attribute. Before returning them to the user, a check is made, to ensure
they weren't used in any in-place operation that modified their content.
Arguments can also be ``None``.
"""
@ -65,7 +31,7 @@ class Function(_C._FunctionBase):
:func:`forward` **method, and all arguments should be inputs.**
Every tensor that's been modified in-place in a call to :func:`forward`
should be given to this function, to ensure correcness of our checks.
should be given to this function, to ensure correctness of our checks.
It doesn't matter wheter the function is called before or after
modification.
"""
@ -106,14 +72,95 @@ class Function(_C._FunctionBase):
"""
self.non_differentiable = args
def register_hook(self, hook):
if self._backward_hooks is None:
self._backward_hooks = OrderedDict()
handle = hooks.RemovableHandle(self._backward_hooks)
self._backward_hooks[id(handle)] = hook
return handle
def forward(self, *input):
class _HookMixin(object):
@staticmethod
def _register_hook(backward_hooks, hook):
if backward_hooks is None:
backward_hooks = OrderedDict()
handle = hooks.RemovableHandle(backward_hooks)
backward_hooks[handle.id] = hook
return backward_hooks, handle
class BackwardCFunction(_C._FunctionBase, _ContextMethodMixin, _HookMixin):
_is_legacy = False
def apply(self, *args):
return self._forward_cls.backward(self, *args)
class FunctionMeta(type):
"""Function metaclass.
This metaclass sets up the following properties:
_is_legacy: True if forward is not defined as a static method.
_backward_cls: The Function class corresponding to the differentiated
version of this function (which is generated on the fly by this
metaclass).
"""
def __init__(cls, name, bases, attrs):
for super_cls in cls.mro():
forward = super_cls.__dict__.get('forward')
if forward is not None:
has_static_forward = isinstance(forward, staticmethod) or isinstance(forward, classmethod)
break
setattr(cls, '_is_legacy', not has_static_forward)
# old-style functions
if not has_static_forward:
return super(FunctionMeta, cls).__init__(name, bases, attrs)
backward_fn = type(name + 'Backward', (BackwardCFunction,), {'_forward_cls': cls})
setattr(cls, '_backward_cls', backward_fn)
return super(FunctionMeta, cls).__init__(name, bases, attrs)
class Function(with_metaclass(FunctionMeta, _C._FunctionBase, _ContextMethodMixin, _HookMixin)):
"""Records operation history and defines formulas for differentiating ops.
Every operation performed on :class:`Variable` s creates a new function
object, that performs the computation, and records that it happened.
The history is retained in the form of a DAG of functions, with edges
denoting data dependencies (``input <- output``). Then, when backward is
called, the graph is processed in the topological ordering, by calling
:func:`backward` methods of each :class:`Function` object, and passing
returned gradients on to next :class:`Function` s.
Normally, the only way users interact with functions is by creating
subclasses and defining new operations. This is a recommended way of
extending torch.autograd.
Since Function logic is a hotspot in most scripts, almost all of it
was moved to our C backend, to ensure that the framework overhead is
minimal.
Each function is meant to be used only once (in the forward pass).
Attributes:
saved_tensors: Tuple of Tensors that were saved in the call to
:func:`forward`.
saved_variables: Tuple of Variables that correspond to the tensors
saved in the call to :func:`forward`.
needs_input_grad: Tuple of booleans of length :attr:`num_inputs`,
indicating whether a given input requires gradient. This can be
used to optimize buffers saved for backward, and ignoring gradient
computation in :func:`~Function.backward`.
num_inputs: Number of inputs given to :func:`forward`.
num_outputs: Number of tensors returned by :func:`forward`.
requires_grad: Boolean indicating whether the :func:`backward` will
ever need to be called.
"""
# only for backward compatibility
__call__ = _C._FunctionBase._do_forward
@staticmethod
def forward(*args, **kwargs):
"""Performs the operation.
This function is to be overriden by all subclasses.
@ -122,7 +169,8 @@ class Function(_C._FunctionBase):
"""
raise NotImplementedError
def backward(self, *grad_output):
@staticmethod
def backward(*grad_outputs):
"""Defines a formula for differentiating the operation.
This function is to be overriden by all subclasses.
@ -136,6 +184,41 @@ class Function(_C._FunctionBase):
raise NotImplementedError
def once_differentiable(fn):
from .variable import Variable
@functools.wraps(fn)
def wrapper(ctx, *args):
tensor_args = [arg.data if isinstance(arg, Variable) else arg
for arg in args]
outputs = fn(ctx, *tensor_args)
# XXX: this is only an approximation of these flags - there's no way
# to figure out if fn didn't use ctx.saved_variables and as a result
# some Variables might require grad, even if no args do.
# Unfortunately, this leads to unexpected error messages ("no nodes
# require computing gradients"), but I don't have a better idea.
# These functions would raise an error in backward anyway.
volatile = any(arg.volatile if isinstance(arg, Variable) else False
for arg in args)
requires_grad = any(arg.requires_grad if isinstance(arg, Variable) else False
for arg in args)
if volatile:
def err_fn(*args):
return args
kwargs = {'volatile': True}
else:
err_fn = torch._C._functions.DelayedError(
b"trying to differentiate twice a function that was marked"
b"with @once_differentiable")
kwargs = {'requires_grad': requires_grad}
if not isinstance(outputs, tuple):
var = Variable(outputs, **kwargs) if outputs is not None else None
return err_fn(var)
return err_fn(*[Variable(o, **kwargs) if o is not None else None
for o in outputs])
return wrapper
class InplaceFunction(Function):
def __init__(self, inplace=False):

225
torch/autograd/gradcheck.py Normal file
View File

@ -0,0 +1,225 @@
import torch
from torch.autograd import Variable
from collections import Iterable
def iter_variables(x):
if isinstance(x, Variable):
if x.requires_grad:
yield (x.grad.data, x.data) if x.grad is not None else (None, None)
elif isinstance(x, Iterable):
for elem in x:
for result in iter_variables(elem):
yield result
def zero_gradients(x):
if isinstance(x, Variable):
if x.grad is not None:
x.grad.detach_()
x.grad.data.zero_()
elif isinstance(x, Iterable):
for elem in x:
zero_gradients(elem)
def make_jacobian(input, num_out):
if isinstance(input, Variable) and not input.requires_grad:
return None
elif torch.is_tensor(input) or isinstance(input, Variable):
return torch.zeros(input.nelement(), num_out)
elif isinstance(input, Iterable):
jacobians = list(filter(
lambda x: x is not None, (make_jacobian(elem, num_out) for elem in input)))
if not jacobians:
return None
return type(input)(jacobians)
else:
return None
def iter_tensors(x, only_requiring_grad=False):
if torch.is_tensor(x):
yield x
elif isinstance(x, Variable):
if x.requires_grad or not only_requiring_grad:
yield x.data
elif isinstance(x, Iterable):
for elem in x:
for result in iter_tensors(elem, only_requiring_grad):
yield result
def contiguous(input):
if torch.is_tensor(input):
return input.contiguous()
elif isinstance(input, Variable):
return input.contiguous()
elif isinstance(input, Iterable):
return type(input)(contiguous(e) for e in input)
return input
def get_numerical_jacobian(fn, input, target, eps=1e-3):
# To be able to use .view(-1) input must be contiguous
input = contiguous(input)
output_size = fn(input).numel()
jacobian = make_jacobian(target, output_size)
# It's much easier to iterate over flattened lists of tensors.
# These are reference to the same objects in jacobian, so any changes
# will be reflected in it as well.
x_tensors = [t for t in iter_tensors(target, True)]
j_tensors = [t for t in iter_tensors(jacobian)]
outa = torch.DoubleTensor(output_size)
outb = torch.DoubleTensor(output_size)
# TODO: compare structure
for x_tensor, d_tensor in zip(x_tensors, j_tensors):
flat_tensor = x_tensor.view(-1)
for i in range(flat_tensor.nelement()):
orig = flat_tensor[i]
flat_tensor[i] = orig - eps
outa.copy_(fn(input), broadcast=False)
flat_tensor[i] = orig + eps
outb.copy_(fn(input), broadcast=False)
flat_tensor[i] = orig
outb.add_(-1, outa).div_(2 * eps)
d_tensor[i] = outb
return jacobian
def get_analytical_jacobian(input, output):
jacobian = make_jacobian(input, output.numel())
jacobian_reentrant = make_jacobian(input, output.numel())
grad_output = output.data.clone().zero_()
flat_grad_output = grad_output.view(-1)
reentrant = True
correct_grad_sizes = True
for i in range(flat_grad_output.numel()):
flat_grad_output.zero_()
flat_grad_output[i] = 1
for jacobian_c in (jacobian, jacobian_reentrant):
zero_gradients(input)
output.backward(grad_output, create_graph=True)
for jacobian_x, (d_x, x) in zip(jacobian_c, iter_variables(input)):
if d_x is None:
jacobian_x[:, i].zero_()
else:
if d_x.size() != x.size():
correct_grad_sizes = False
jacobian_x[:, i] = d_x.to_dense() if d_x.is_sparse else d_x
for jacobian_x, jacobian_reentrant_x in zip(jacobian, jacobian_reentrant):
if (jacobian_x - jacobian_reentrant_x).abs().max() != 0:
reentrant = False
return jacobian, reentrant, correct_grad_sizes
def _as_tuple(x):
if isinstance(x, tuple):
return x
elif isinstance(x, list):
return tuple(x)
else:
return x,
def gradcheck(func, inputs, eps=1e-6, atol=1e-5, rtol=1e-3):
"""Check gradients computed via small finite differences
against analytical gradients
The check between numerical and analytical has the same behaviour as
numpy.allclose https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html
meaning it check that
absolute(a - n) <= (atol + rtol * absolute(n))
is true for all elements of analytical jacobian a and numerical jacobian n.
Args:
func: Python function that takes Variable inputs and returns
a tuple of Variables
inputs: tuple of Variables
eps: perturbation for finite differences
atol: absolute tolerance
rtol: relative tolerance
Returns:
True if all differences satisfy allclose condition
"""
output = func(*inputs)
output = _as_tuple(output)
for i, o in enumerate(output):
if not o.requires_grad:
continue
def fn(input):
return _as_tuple(func(*input))[i].data
analytical, reentrant, correct_grad_sizes = get_analytical_jacobian(_as_tuple(inputs), o)
numerical = get_numerical_jacobian(fn, inputs, inputs, eps)
for a, n in zip(analytical, numerical):
if not ((a - n).abs() <= (atol + rtol * n.abs())).all():
return False
if not reentrant:
return False
if not correct_grad_sizes:
return False
# check if the backward multiplies by grad_output
zero_gradients(inputs)
output = _as_tuple(func(*inputs))
torch.autograd.backward(output, [o.data.new(o.size()).zero_() for o in output])
var_inputs = list(filter(lambda i: isinstance(i, Variable), inputs))
if not var_inputs:
raise RuntimeError("no Variables found in input")
for i in var_inputs:
if i.grad is None:
continue
if not i.grad.data.eq(0).all():
return False
return True
def gradgradcheck(func, inputs, grad_outputs, eps=1e-6, atol=1e-5, rtol=1e-3):
"""Check gradients of gradients computed via small finite differences
against analytical gradients
This function checks that backpropagating through the gradients computed
to the given grad_outputs are correct.
The check between numerical and analytical has the same behaviour as
numpy.allclose https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html
meaning it check that
absolute(a - n) <= (atol + rtol * absolute(n))
is true for all elements of analytical gradient a and numerical gradient n.
Args:
func: Python function that takes Variable inputs and returns
a tuple of Variables
inputs: tuple of Variables
grad_outputs: tuple of Variables
eps: perturbation for finite differences
atol: absolute tolerance
rtol: relative tolerance
Returns:
True if all differences satisfy allclose condition
"""
def new_func(*input_args):
input_args = input_args[:-len(grad_outputs)]
outputs = func(*input_args)
outputs = _as_tuple(outputs)
input_args = tuple(x for x in input_args if isinstance(x, Variable) and x.requires_grad)
grad_inputs = torch.autograd.grad(outputs, input_args, grad_outputs)
return grad_inputs
return gradcheck(new_func, inputs + grad_outputs, eps, atol, rtol)

View File

@ -1,3 +1,5 @@
import torch
from numbers import Number
from .function import Function
_NOT_PROVIDED = object()
@ -17,5 +19,26 @@ class StochasticFunction(Function):
self.reward = None
return result
def _do_forward(self, *inputs):
result = super(StochasticFunction, self)._do_forward(*inputs)
# save output type and size, to check the type of reward
assert isinstance(result, torch.autograd.Variable), \
"stochastic functions support only a single output at the moment"
self.reward_info = (type(inputs[0].data), result.size())
return result
__call__ = _do_forward
def _reinforce(self, reward):
is_number = isinstance(reward, Number)
if not is_number and type(reward) != self.reward_info[0]:
raise TypeError("mismatch between reward and output type: got {}, "
"but expected {}".format(torch.typename(reward),
torch.typename(self.reward_info[0])))
if not is_number and reward.size() != self.reward_info[1]:
raise ValueError("got reward of size {}, but expected a tensor of size {}".format(
'x'.join(map(str, reward.size())),
'x'.join(map(str, self.reward_info[1]))))
if self.reward is not _NOT_PROVIDED:
raise RuntimeError("you can only reinforce a stochastic Function once")
self.reward = reward

View File

@ -1,9 +1,11 @@
import sys
import torch
import torch._C as _C
from collections import OrderedDict
import torch.sparse as sparse
import torch.utils.hooks as hooks
from ._functions import *
import warnings
import weakref
class Variable(_C._VariableBase):
@ -12,7 +14,7 @@ class Variable(_C._VariableBase):
Variable is a thin wrapper around a Tensor object, that also holds
the gradient w.r.t. to it, and a reference to a function that created it.
This reference allows retracing the whole chain of operations that
created the data. If the Variable has been created by the user, its creator
created the data. If the Variable has been created by the user, its grad_fn
will be ``None`` and we call such objects *leaf* Variables.
Since autograd only supports scalar valued function differentiation, grad
@ -32,8 +34,9 @@ class Variable(_C._VariableBase):
inference mode, i.e. don't save the history. See
:ref:`excluding-subgraphs` for more details.
Can be changed only on leaf Variables.
creator: Function of which the variable was an output. For leaf
(user created) variables it's ``None``. Read-only attribute.
is_leaf: Boolean indicating if the Variable is a graph leaf (i.e
if it was created by the user).
grad_fn: Gradient function graph trace.
Parameters:
data (any tensor class): Tensor to wrap.
@ -59,29 +62,30 @@ class Variable(_C._VariableBase):
def __getattr__(self, name):
if name in self._fallthrough_methods:
return getattr(self.data, name)
raise AttributeError(name)
return object.__getattribute__(self, name)
def __getitem__(self, key):
if (isinstance(key, Variable) and
type(key.data).__name__ == 'ByteTensor'):
return MaskedSelect()(self, key)
return Index(key)(self)
if torch.is_tensor(key):
key = Variable(key) # auto-wrap tensors
if isinstance(key, Variable):
if type(key.data).__name__ == 'ByteTensor':
return MaskedSelect.apply(self, key)
elif type(key.data).__name__ == 'LongTensor':
return IndexSelect.apply(self, 0, key)
# else fall through and raise an error in Index
return Index.apply(self, key)
def __setitem__(self, key, value):
if (isinstance(key, Variable) and
type(key.data).__name__ == 'ByteTensor'):
if isinstance(key, Variable) and type(key.data).__name__ == 'ByteTensor':
if isinstance(value, Variable):
return MaskedCopy(inplace=True)(self, key, value)
return MaskedScatter.apply(self, key, value, True)
else:
return MaskedFill(value, inplace=True)(self, key)
return MaskedFill.apply(self, key, value, True)
else:
if isinstance(value, Variable):
return SetItem(key)(self, value)
else:
return SetItem(key, value)(self)
return SetItem.apply(self, key, value)
def __deepcopy__(self, memo):
if self.creator is not None:
if not self.is_leaf:
raise RuntimeError("Only Variables created explicitly by the user "
"(graph leaves) support the deepcopy protocol at the moment")
result = type(self)(self.data.clone())
@ -105,44 +109,51 @@ class Variable(_C._VariableBase):
# legacy serialization of Variable
self.data = state[0]
state = (state[3], state[4], state[2])
if self.creator is not None:
if not self.is_leaf:
raise RuntimeError('__setstate__ can be only called on leaf variables')
self.requires_grad, self.volatile, self._backward_hooks = state
def __repr__(self):
return 'Variable containing:' + self.data.__repr__()
def backward(self, gradient=None, retain_variables=False):
def __bool__(self):
if self.data.numel() == 0:
return False
raise RuntimeError("bool value of Variable objects containing non-empty " +
torch.typename(self.data) + " is ambiguous")
__nonzero__ = __bool__
def backward(self, gradient=None, retain_graph=None, create_graph=None, retain_variables=None):
"""Computes the gradient of current variable w.r.t. graph leaves.
The graph is differentiated using the chain rule. If the variable is
non-scalar (i.e. its data has more than one element) and requires
gradient, the function additionaly requires specifying ``gradient``.
It should be a tensor of matching type and location, that containins
It should be a tensor of matching type and location, that contains
the gradient of the differentiated function w.r.t. ``self``.
This function accumulates gradients in the leaves - you might need to zero
them before calling it.
This function accumulates gradients in the leaves - you might need to
zero them before calling it.
Arguments:
gradient (Tensor): Gradient of the differentiated function
w.r.t. the data. Required only if the data has more than one
element. Type and location should match these of ``self.data``.
retain_variables (bool): If ``True``, buffers necessary for computing
gradients won't be freed after use. It is only necessary to
specify ``True`` if you want to differentiate some subgraph multiple
times (in some cases it will be much more efficient to use
`autograd.backward`).
grad_variables (Tensor, Variable or None): Gradient w.r.t. the
variable. If it is a tensor, it will be automatically converted
to a Variable that is volatile unless ``create_graph`` is True.
None values can be specified for scalar Variables or ones that
don't require grad. If a None value would be acceptable then
this argument is optional.
retain_graph (bool, optional): If False, the graph used to compute
the grads will be freed. Note that in nearly all cases setting
this option to True is not needed and often can be worked around
in a much more efficient way. Defaults to the value of
``create_graph``.
create_graph (bool, optional): If true, graph of the derivative will
be constructed, allowing to compute higher order derivative
products. Defaults to False, unless ``gradient`` is a volatile
Variable.
"""
if self.volatile:
raise RuntimeError('calling backward on a volatile variable')
if gradient is None and self.requires_grad:
if self.data.numel() != 1:
raise RuntimeError(
'backward should be called only on a scalar (i.e. 1-element tensor) '
'or with gradient w.r.t. the variable')
gradient = self.data.new().resize_as_(self.data).fill_(1)
self._execution_engine.run_backward((self,), (gradient,), retain_variables)
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
def register_hook(self, hook):
"""Registers a backward hook.
@ -150,7 +161,7 @@ class Variable(_C._VariableBase):
The hook will be called every time a gradient with respect to the
variable is computed. The hook should have the following signature::
hook(grad) -> Tensor or None
hook(grad) -> Variable or None
The hook should not modify its argument, but it can optionally return
a new gradient which will be used in place of :attr:`grad`.
@ -176,25 +187,12 @@ class Variable(_C._VariableBase):
"doesn't require gradient")
if self._backward_hooks is None:
self._backward_hooks = OrderedDict()
if self.creator is not None:
self.creator._register_hook_dict(self)
if self.grad_fn is not None:
self.grad_fn._register_hook_dict(self)
handle = hooks.RemovableHandle(self._backward_hooks)
self._backward_hooks[id(handle)] = hook
self._backward_hooks[handle.id] = hook
return handle
def _do_backward(self, grad_output, retain_variables):
assert len(grad_output) == 1
assert self._version == 0 and self.creator is None, \
"leaf variable was used in an inplace operation"
unpacked_grad = grad_output[0]
if self._backward_hooks:
for hook in self._backward_hooks.values():
result = hook(unpacked_grad)
if result is not None:
unpacked_grad = result
self.grad.data.add_(unpacked_grad)
return tuple()
def reinforce(self, reward):
"""Registers a reward obtained as a result of a stochastic process.
@ -206,10 +204,10 @@ class Variable(_C._VariableBase):
reward(Tensor): Tensor with per-element rewards. It has to match
the device location and shape of Variable's data.
"""
if not isinstance(self.creator, StochasticFunction):
if not isinstance(self.grad_fn, StochasticFunction):
raise RuntimeError("reinforce() can be only called on outputs "
"of stochastic functions")
self.creator._reinforce(reward)
self.grad_fn._reinforce(reward)
def detach(self):
"""Returns a new Variable, detached from the current graph.
@ -224,32 +222,61 @@ class Variable(_C._VariableBase):
errors in correctness checks.
"""
result = NoGrad()(self) # this is needed, because it merges version counters
result._creator = None
result._grad_fn = None
return result
def detach_(self):
"""Detaches the Variable from the graph that created it, making it a leaf."""
self._creator = None
"""Detaches the Variable from the graph that created it, making it a
leaf.
"""
self._grad_fn = None
self.requires_grad = False
def retain_grad(self):
"""Enables .grad attribute for non-leaf Variables."""
if self.grad_fn is None: # no-op for leaves
return
if not self.requires_grad:
raise RuntimeError("can't retain_grad on Variable that has requires_grad=False")
if hasattr(self, 'retains_grad'):
return
weak_self = weakref.ref(self)
def retain_grad_hook(grad):
var = weak_self()
if var is None:
return
if var._grad is None:
var._grad = grad.clone()
else:
var._grad = var._grad + grad
self.register_hook(retain_grad_hook)
self.retains_grad = True
def contiguous(self):
self.data = self.data.contiguous()
return self
def clone(self):
return Clone()(self)
return Clone.apply(self)
def type(self, t):
if t != type(self.data):
return Type(t)(self)
return Type.apply(self, t)
return self
def type_as(self, t):
if isinstance(t, Variable):
t = t.data
return self.type(type(t))
def _get_type(self, name):
module = torch._import_dotted_name(self.data.__module__)
return getattr(module, name)
def cuda(self, device_id=None, async=False):
return CudaTransfer(device_id, async)(self)
return CudaTransfer.apply(self, device_id, async)
def cpu(self):
return self.type(getattr(torch, type(self.data).__name__))
@ -283,10 +310,10 @@ class Variable(_C._VariableBase):
def _add(self, other, inplace):
if isinstance(other, Variable):
return Add(inplace)(self, other)
return Add.apply(self, other, inplace)
else:
assert not torch.is_tensor(other)
return AddConstant(other, inplace)(self)
return AddConstant.apply(self, other, inplace)
def add(self, other):
return self._add(other, False)
@ -296,10 +323,10 @@ class Variable(_C._VariableBase):
def _sub(self, other, inplace):
if isinstance(other, Variable):
return Sub(inplace=inplace)(self, other)
return Sub.apply(self, other, inplace)
else:
assert not torch.is_tensor(other)
return SubConstant(other, inplace=inplace)(self)
return SubConstant.apply(self, other, inplace)
def sub(self, other):
return self._sub(other, False)
@ -309,219 +336,229 @@ class Variable(_C._VariableBase):
def mul(self, other):
if isinstance(other, Variable):
return Mul()(self, other)
return Mul.apply(self, other)
else:
assert not torch.is_tensor(other)
return MulConstant(other)(self)
return MulConstant.apply(self, other)
def mul_(self, other):
if not isinstance(other, Variable) and not torch.is_tensor(other):
return MulConstant(other, inplace=True)(self)
return MulConstant.apply(self, other, True)
raise RuntimeError("mul_ only supports scalar multiplication")
def div(self, other):
if isinstance(other, Variable):
return Div()(self, other)
return Div.apply(self, other)
else:
assert not torch.is_tensor(other)
return DivConstant(other)(self)
return DivConstant.apply(self, other)
def div_(self, other):
if not isinstance(other, Variable) and not torch.is_tensor(other):
return DivConstant(other, inplace=True)(self)
return DivConstant.apply(self, other, True)
raise RuntimeError("div_ only supports scalar multiplication")
def pow(self, other):
if isinstance(other, Variable):
return Pow()(self, other)
return Pow.apply(self, other)
else:
assert not torch.is_tensor(other)
return PowConstant(other)(self)
return PowConstant.apply(self, other)
def exp(self):
return Exp()(self)
return Exp.apply(self)
def exp_(self):
return Exp(inplace=True)(self)
return Exp.apply(self, True)
def log(self):
return Log()(self)
return Log.apply(self)
def log1p(self):
return Log1p()(self)
return Log1p.apply(self)
def neg(self):
return Negate()(self)
return Negate.apply(self)
def neg_(self):
return Negate(inplace=True)(self)
return Negate.apply(self, True)
def tanh(self):
return Tanh()(self)
return Tanh.apply(self)
def tanh_(self):
return Tanh(True)(self)
return Tanh.apply(self, True)
def sigmoid(self):
return Sigmoid()(self)
return Sigmoid.apply(self)
def sigmoid_(self):
return Sigmoid(True)(self)
return Sigmoid.apply(self, True)
def sin(self):
return Sin()(self)
return Sin.apply(self)
def cos(self):
return Cos()(self)
return Cos.apply(self)
def tan(self):
return Tan()(self)
return Tan.apply(self)
def asin(self):
return Asin()(self)
return Asin.apply(self)
def acos(self):
return Acos()(self)
return Acos.apply(self)
def atan(self):
return Atan()(self)
return Atan.apply(self)
def atan2(self, x):
return Atan2.apply(self, x)
def sinh(self):
return Sinh()(self)
return Sinh.apply(self)
def cosh(self):
return Cosh()(self)
return Cosh.apply(self)
def abs(self):
return Abs()(self)
return Abs.apply(self)
def clamp(self, min=None, max=None):
if min is None and max is None:
raise ValueError("clamp requires specifying at least one of "
"min and max arguments")
elif min is None and max is not None:
return CminConstant(max)(self)
return CminConstant.apply(self, max)
elif min is not None and max is None:
return CmaxConstant(min)(self)
return CmaxConstant.apply(self, min)
else:
return Clamp(min, max)(self)
return Clamp.apply(self, min, max)
def reciprocal(self):
return Reciprocal()(self)
return Reciprocal.apply(self)
def floor(self):
return Floor()(self)
return Floor.apply(self)
def ceil(self):
return Ceil()(self)
return Ceil.apply(self)
def frac(self):
return Frac()(self)
return Frac.apply(self)
def sqrt(self):
return Sqrt()(self)
return Sqrt.apply(self)
def round(self):
return Round()(self)
return Round.apply(self)
def sign(self):
return Sign()(self)
return Sign.apply(self)
def trunc(self):
return Trunc()(self)
def floor(self):
return Floor()(self)
def ceil(self):
return Ceil()(self)
return Trunc.apply(self)
def fmod(self, value):
return Fmod(value)(self)
return Fmod.apply(self, value)
def remainder(self, value):
return Remainder(value)(self)
return Remainder.apply(self, value)
def lerp(self, tensor, weight):
return Lerp(weight)(self, tensor)
return Lerp.apply(self, tensor, weight)
def rsqrt(self):
return Rsqrt()(self)
return Rsqrt.apply(self)
def sum(self, dim=None):
return Sum(dim)(self)
def sum(self, dim=None, keepdim=None):
return Sum.apply(self, dim, keepdim)
def prod(self, dim=None):
return Prod(dim)(self)
def prod(self, dim=None, keepdim=None):
return Prod.apply(self, dim, keepdim)
def mean(self, dim=None):
return Mean(dim)(self)
def mean(self, dim=None, keepdim=None):
return Mean.apply(self, dim, keepdim)
def max(self, dim=None):
def max(self, dim=None, keepdim=None):
if isinstance(dim, Variable):
return Cmax()(self, dim)
return Max(dim)(self)
return Cmax.apply(self, dim)
return Max.apply(self, dim, keepdim)
def min(self, dim=None):
def min(self, dim=None, keepdim=None):
if isinstance(dim, Variable):
return Cmin()(self, dim)
return Min(dim)(self)
return Cmin.apply(self, dim)
return Min.apply(self, dim, keepdim)
def mode(self, dim):
return Mode(dim)(self)
def mode(self, dim=None, keepdim=None):
return Mode.apply(self, dim, keepdim)
def median(self, dim):
return Median(dim)(self)
def median(self, dim=None, keepdim=None):
return Median.apply(self, dim, keepdim)
def kthvalue(self, dim):
return Kthvalue(dim)(self)
def kthvalue(self, k, dim=None, keepdim=None):
return Kthvalue.apply(self, k, dim, keepdim)
def sort(self, dim=None, descending=False):
return Sort(dim, descending)(self)
return Sort.apply(self, dim, descending, True)
def topk(self, k, dim=None, largest=True, sorted=True):
return Topk(k, dim, largest, sorted)(self)
return Topk.apply(self, k, dim, largest, sorted, True)
def view(self, *sizes):
return View(*sizes)(self)
return View.apply(self, sizes)
def view_as(self, tensor):
return View(*tensor.size())(self)
return View.apply(self, tensor.size())
def split(self, split_size, dim=0):
return torch.split(self, split_size, dim)
def chunk(self, n_chunks, dim=0):
return torch.chunk(self, n_chunks, dim)
def repeat(self, *repeats):
if len(repeats) == 1 and isinstance(repeats[0], torch.Size):
repeats = repeats[0]
else:
repeats = torch.Size(repeats)
return Repeat(repeats)(self)
return Repeat.apply(self, repeats)
def var(self, dim=None, unbiased=True):
mean = self.mean(dim)
def cumsum(self, dim):
return Cumsum.apply(self, dim)
def cumprod(self, dim):
return Cumprod.apply(self, dim)
def unfold(self, dim, size, step):
return Unfold.apply(self, dim, size, step)
def var(self, dim=None, keepdim=None, unbiased=True):
keepdim_ = False if keepdim is None else keepdim
mean = self.mean(dim, keepdim)
if dim is None:
mean = mean.view(*(1 for s in self.size()))
# we could just set keepdim to True, but this preserves some fidelity
elif keepdim_ is False and self.dim() != 1:
mean = mean.unsqueeze(dim)
mean_expanded = mean.expand_as(self)
zero_centered = self.sub(mean_expanded)
var = zero_centered.mul(zero_centered).sum(dim)
var = zero_centered.mul(zero_centered).sum(dim, keepdim=keepdim_)
numel = self.numel() if dim is None else self.size(dim)
return var.div(numel - int(unbiased))
def std(self, dim=None, unbiased=True):
return self.var(dim, unbiased).sqrt()
def std(self, dim=None, keepdim=None, unbiased=True):
return self.var(dim, keepdim, unbiased).sqrt()
def renorm(self, norm_type, dim, maxnorm):
def renorm(self, p, dim, maxnorm):
t = self.transpose(dim, 0)
flat = t.contiguous().view(self.size(0), -1)
norms = flat.norm(norm_type, 1)
norms = flat.norm(p, 1, True)
norms = norms.clamp(max=maxnorm).div(norms.add(1e-7))
flat_out = flat.mul(norms.expand_as(flat))
return flat_out.view(t.size()).transpose(dim, 0)
def matmul(self, other):
return torch.matmul(self, other)
@staticmethod
def _static_blas(cls, args, inplace):
num_args = len(args)
@ -532,14 +569,14 @@ class Variable(_C._VariableBase):
alpha, beta = args[1:3]
if num_args == 4:
alpha = args[1]
return cls(alpha, beta, inplace)(*(args[:1] + args[-2:]))
return cls.apply(*(args[:1] + args[-2:] + (alpha, beta, inplace)))
def _blas(self, cls, args, inplace):
return self._static_blas(cls, (self,) + args, inplace)
def mm(self, matrix):
output = Variable(self.data.new(self.data.size(0), matrix.data.size(1)))
return self._static_blas(Addmm, (output, 0, 1, self, matrix), False)
return Addmm.apply(output, self, matrix, 0, 1, True)
def bmm(self, batch):
output = Variable(self.data.new(self.data.size(0), self.data.size(1),
@ -555,10 +592,10 @@ class Variable(_C._VariableBase):
return self._static_blas(Addr, (output, 0, 1, self, vector), False)
def resize(self, *sizes):
return Resize(*sizes)(self)
return Resize.apply(self, sizes)
def resize_as(self, variable):
return Resize(*variable.size())(self)
return Resize.apply(self, variable.size())
def addmm(self, *args):
return self._blas(Addmm, args, False)
@ -591,162 +628,186 @@ class Variable(_C._VariableBase):
return self._blas(Addr, args, True)
def dot(self, other):
return Dot()(self, other)
return Dot.apply(self, other)
def _addcop(self, op, args):
def _addcop(self, op, args, inplace):
if len(args) == 3:
# scale, tensor1, tensor2
return op(args[0])(self, *args[1:])
# args == [scale, tensor1, tensor2]
return op.apply(self, args[1], args[2], args[0], inplace)
else:
# tensor1, tensor2
return op()(self, *args)
# args == [tensor1, tensor2]
return op.apply(self, args[0], args[1], 1.0, inplace)
def addcmul(self, *args):
return self._addcop(Addcmul, args)
return self._addcop(Addcmul, args, False)
def addcdiv(self, *args):
return self._addcop(Addcdiv, args)
return self._addcop(Addcdiv, args, False)
def norm(self, norm_type=2, dim=None):
return Norm(norm_type, dim)(self)
def addcmul_(self, *args):
return self._addcop(Addcmul, args, True)
def dist(self, tensor, norm_type=2):
return Norm(norm_type)(self - tensor)
def addcdiv_(self, *args):
return self._addcop(Addcdiv, args, True)
def norm(self, p=2, dim=None, keepdim=None):
return Norm.apply(self, p, dim, keepdim)
def dist(self, tensor, p=2):
return Norm.apply(self - tensor, p)
def index_add(self, dim, index, tensor):
return IndexAdd(dim)(self, index, tensor)
return IndexAdd.apply(self, dim, index, tensor)
def _advanced_index_add(self, index, tensor):
return AdvancedIndexAdd.apply(self, index, tensor)
def index_add_(self, dim, index, tensor):
return IndexAdd(dim, True)(self, index, tensor)
return IndexAdd.apply(self, dim, index, tensor, True)
def index_copy(self, dim, index, tensor):
return IndexCopy(dim)(self, index, tensor)
return IndexCopy.apply(self, dim, index, tensor)
def index_copy_(self, dim, index, tensor):
return IndexCopy(dim, True)(self, index, tensor)
return IndexCopy.apply(self, dim, index, tensor, True)
def index_fill(self, dim, index, value):
return IndexFill(dim, value)(self, index)
return IndexFill.apply(self, dim, index, value)
def index_fill_(self, dim, index, value):
return IndexFill(dim, value, True)(self, index)
return IndexFill.apply(self, dim, index, value, True)
def index_select(self, dim, index):
return IndexSelect(dim)(self, index)
return IndexSelect.apply(self, dim, index)
def gather(self, dim, index):
return Gather(dim)(self, index)
return Gather.apply(self, dim, index)
def scatter(self, dim, index, source):
return Scatter(dim)(self, index, source)
return Scatter.apply(self, dim, index, source)
def scatter_(self, dim, index, source):
return Scatter(dim, True)(self, index, source)
return Scatter.apply(self, dim, index, source, True)
def scatter_add(self, dim, index, source):
return ScatterAdd.apply(self, dim, index, source)
def scatter_add_(self, dim, index, source):
return ScatterAdd.apply(self, dim, index, source, True)
def masked_copy(self, mask, variable):
return MaskedCopy()(self, mask, variable)
warnings.warn("masked_copy is deprecated and renamed to masked_scatter, and will be removed in v0.3")
return MaskedScatter.apply(self, mask, variable)
def masked_copy_(self, mask, variable):
return MaskedCopy(True)(self, mask, variable)
warnings.warn("masked_copy_ is deprecated and renamed to masked_scatter_, and will be removed in v0.3")
return MaskedScatter.apply(self, mask, variable, True)
def masked_scatter(self, mask, variable):
return MaskedScatter.apply(self, mask, variable)
def masked_scatter_(self, mask, variable):
return MaskedScatter.apply(self, mask, variable, True)
def masked_fill(self, mask, value):
return MaskedFill(value)(self, mask)
return MaskedFill.apply(self, mask, value)
def masked_fill_(self, mask, value):
return MaskedFill(value, True)(self, mask)
return MaskedFill.apply(self, mask, value, True)
def masked_select(self, mask):
return MaskedSelect()(self, mask)
return MaskedSelect.apply(self, mask)
def expand(self, *sizes):
if isinstance(sizes[0], torch.Size):
if len(sizes) > 1:
raise ValueError("expand expects a several ints or a single "
"torch.Size argument")
sizes = sizes[0]
return Expand(sizes)(self)
return Expand.apply(self, sizes)
def expand_as(self, tensor):
return Expand(tensor.size())(self)
return Expand.apply(self, (tensor.size(),))
def t(self):
return Transpose(0, 1)(self)
if self.dim() != 2:
raise RuntimeError("t() expects a 2D Variable, but self is {}D".format(self.dim()))
return Transpose.apply(self, 0, 1)
def transpose(self, dim1, dim2):
return Transpose(dim1, dim2)(self)
return Transpose.apply(self, dim1, dim2)
def select(self, dim, _index):
dim = dim if dim >= 0 else dim + self.dim()
index = tuple(slice(None, None) for _ in range(dim)) + (_index,)
return Index(index)(self)
return Index.apply(self, index)
def narrow(self, dim, start_index, length):
dim = dim if dim >= 0 else dim + self.dim()
index = tuple(slice(None, None) for _ in range(dim)) + \
(slice(start_index, start_index + length),)
return Index(index)(self)
return Index.apply(self, index)
def chunk(self, num_chunks, dim=0):
return Chunk(num_chunks, dim)(self)
return Chunk.apply(self, num_chunks, dim)
def squeeze(self, dim=None):
return Squeeze(dim)(self)
return Squeeze.apply(self, dim)
def squeeze_(self, dim=None):
return Squeeze.apply(self, dim, True)
def unsqueeze(self, dim):
return Unsqueeze(dim)(self)
return Unsqueeze.apply(self, dim)
def permute(self, *permutation):
return Permute(permutation)(self)
return Permute.apply(self, permutation)
def diag(self, diagonal_idx=0):
return Diag(diagonal_idx)(self)
def diag(self, diagonal=0):
return Diag.apply(self, diagonal)
def tril(self, diagonal_idx=0):
return Tril(diagonal_idx)(self)
def tril(self, diagonal=0):
return Tril.apply(self, diagonal)
def triu(self, diagonal_idx=0):
return Triu(diagonal_idx)(self)
def triu(self, diagonal=0):
return Triu.apply(self, diagonal)
def multinomial(self, num_samples=1, with_replacement=False):
return Multinomial(num_samples, with_replacement)(self)
def trace(self):
return Trace.apply(self)
def cross(self, other, dim=-1):
return Cross.apply(self, other)
def inverse(self):
return Inverse.apply(self)
def gesv(self, a):
return Gesv.apply(self, a)
def multinomial(self, num_samples=1, replacement=False):
return Multinomial(num_samples, replacement)(self)
def bernoulli(self):
return Bernoulli()(self)
def eq(self, other):
if isinstance(other, Variable):
return Eq()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Eq(other)(self)
return Eq.apply(self, other)
def ne(self, other):
if isinstance(other, Variable):
return Ne()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Ne(other)(self)
return Ne.apply(self, other)
def gt(self, other):
if isinstance(other, Variable):
return Gt()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Gt(other)(self)
return Gt.apply(self, other)
def ge(self, other):
if isinstance(other, Variable):
return Ge()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Ge(other)(self)
return Ge.apply(self, other)
def lt(self, other):
if isinstance(other, Variable):
return Lt()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Lt(other)(self)
return Lt.apply(self, other)
def le(self, other):
if isinstance(other, Variable):
return Le()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Le(other)(self)
return Le.apply(self, other)
def __add__(self, other):
return self.add(other)
@ -762,7 +823,7 @@ class Variable(_C._VariableBase):
return self.sub_(other)
def __rsub__(self, other):
return SubConstant(other, sub_tensor=True)(self)
return SubConstant.apply(other, self)
def __mul__(self, other):
return self.mul(other)
@ -772,28 +833,16 @@ class Variable(_C._VariableBase):
return self.mul_(other)
def __matmul__(self, other):
dim_self = self.dim()
try:
dim_other = other.dim()
except AttributeError: # not a Variable
if not isinstance(other, Variable):
return NotImplemented
if dim_self == 1 and dim_other == 1:
return self.dot(other)
if dim_self == 2 and dim_other == 1:
return self.mv(other)
if dim_self == 1 and dim_other == 2:
return self.unsqueeze(0).mm(other).squeeze(0)
elif dim_self == 2 and dim_other == 2:
return self.mm(other)
raise ValueError("both arguments to __matmul__ need to be 1D or 2D, "
"but they are {}D and {}D".format(dim_self, dim_other))
return self.matmul(other)
def __div__(self, other):
return self.div(other)
__truediv__ = __div__
def __rdiv__(self, other):
return DivConstant(other, div_by_tensor=True)(self)
return DivConstant.apply(other, self)
__rtruediv__ = __rdiv__
def __idiv__(self, other):
@ -806,10 +855,10 @@ class Variable(_C._VariableBase):
raise NotImplementedError("in-place pow not implemented")
def __rpow__(self, other):
return PowConstant(other, tensor_power=True)(self)
return PowConstant.apply(other, self)
def __neg__(self):
return Negate()(self)
return Negate.apply(self)
def __len__(self):
return len(self.data)
@ -845,7 +894,7 @@ class Variable(_C._VariableBase):
@staticmethod
def cat(iterable, dim=0):
return Concat(dim)(*iterable)
return Concat.apply(dim, *iterable)
@staticmethod
def normal(means, std=1):
@ -868,7 +917,7 @@ class Variable(_C._VariableBase):
tensors = args[1:]
else:
tensors = args
return cls(alpha, beta, inplace)(*tensors)
return cls.apply(*(tensors + (alpha, beta, inplace)))
@classmethod
def addmm(cls, *args):
@ -902,5 +951,6 @@ for method in dir(Variable):
setattr(Variable._torch, method, as_static)
from .engine import ImperativeEngine
from ._functions import *
from torch._C import _ImperativeEngine as ImperativeEngine
Variable._execution_engine = ImperativeEngine()

View File

@ -1,45 +1,37 @@
import torch._C as _C
import ctypes
import warnings
import torch.cuda
import sys
import os.path as path
import torch
import warnings
enabled = True # set to False to globally disable cuDNN
lib = None
# TODO: fix libname for Windows
__cudnn_version = None
# TODO: dynamic version checks via cudnnGetVersion
# TODO: load 5.1.3 if using CUDA 7.5 and 5.1.5 if using CUDA 8.0
thisdir = path.dirname(__file__)
libpaths = ['', path.join(thisdir, '../../lib')]
if sys.platform.startswith('linux'):
libnames = ['libcudnn.so.6.0.5', 'libcudnn.so.6.0.10', 'libcudnn.so.5.1.5', 'libcudnn.so.5.1.3',
'libcudnn.so.5.0.5', 'libcudnn.so.5.1.10']
elif sys.platform == 'darwin':
libnames = ['libcudnn.6.dylib', 'libcudnn.5.dylib']
else:
libnames = []
def _loadlib():
global lib
loaded = False
for libpath in libpaths:
for libname in libnames:
try:
lib = ctypes.cdll.LoadLibrary(path.join(libpath, libname))
loaded = True
break
except OSError:
continue
if loaded:
break
if loaded:
lib.cudnnGetErrorString.restype = ctypes.c_char_p
else:
lib = None
raise OSError("Could not load cuDNN")
def _libcudnn():
global lib, __cudnn_version
if lib is None:
lib = ctypes.cdll.LoadLibrary(None)
if hasattr(lib, 'cudnnGetErrorString'):
lib.cudnnGetErrorString.restype = ctypes.c_char_p
__cudnn_version = lib.cudnnGetVersion()
compile_version = torch._C._cudnn_version()
# Check that cuDNN major and minor versions match
if (__cudnn_version // 100) != (compile_version // 100):
raise RuntimeError(
'cuDNN version mismatch: PyTorch was compiled against {} '
'but linked against {}'.format(compile_version, __cudnn_version))
else:
lib = None
return lib
def version():
if _libcudnn() is None:
return None
return __cudnn_version
def is_acceptable(tensor):
@ -49,59 +41,30 @@ def is_acceptable(tensor):
isinstance(tensor, torch.cuda.FloatTensor) or
isinstance(tensor, torch.cuda.DoubleTensor)):
return False
if lib is None:
try:
_loadlib()
except Exception:
warnings.warn('cuDNN library not found. Check your {libpath}'.format(
libpath={
'darwin': 'DYLD_LIBRARY_PATH',
'win32': 'PATH'
}.get(sys.platform, 'LD_LIBRARY_PATH')))
return False
if not _C.has_cudnn:
warnings.warn("cuDNN library has been detected, but your pytorch "
"installation was compiled without support for it. You "
"might want to rebuild pytorch, making sure the library "
"is visible to the build system.")
if not torch._C.has_cudnn:
warnings.warn(
"PyTorch was compiled without cuDNN support. To use cuDNN, rebuild "
"PyTorch making sure the library is visible to the build system.")
return False
if _libcudnn() is None:
warnings.warn('cuDNN library not found. Check your {libpath}'.format(
libpath={
'darwin': 'DYLD_LIBRARY_PATH',
'win32': 'PATH'
}.get(sys.platform, 'LD_LIBRARY_PATH')))
return False
return True
__cudnn_version = []
def version():
if not lib:
raise RuntimeError("cuDNN not initialized")
if len(__cudnn_version) == 0:
__cudnn_version.append(lib.cudnnGetVersion())
return __cudnn_version[0]
_handles = {}
benchmark = False
verbose = False
workspace_limit = None
CUDNN_DATA_FLOAT = 0
CUDNN_DATA_DOUBLE = 1
CUDNN_DATA_HALF = 2
CUDNN_CONVOLUTION = 0
CUDNN_CROSS_CORRELATION = 1
CUDNN_CONVOLUTION_FWD_NO_WORKSPACE = 0
CUDNN_CONVOLUTION_FWD_PREFER_FASTEST = 1
CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT = 2
CUDNN_CONVOLUTION_BWD_FILTER_NO_WORKSPACE = 0
CUDNN_CONVOLUTION_BWD_FILTER_PREFER_FASTEST = 1
CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT = 2
CUDNN_CONVOLUTION_BWD_DATA_NO_WORKSPACE = 0
CUDNN_CONVOLUTION_BWD_DATA_PREFER_FASTEST = 1
CUDNN_CONVOLUTION_BWD_DATA_SPECIFY_WORKSPACE_LIMIT = 2
CUDNN_TENSOR_NCHW = 0
CUDNN_TENSOR_NHWC = 1
@ -113,16 +76,12 @@ CUDNN_GRU = 3
CUDNN_LINEAR_INPUT = 0
CUDNN_SKIP_INPUT = 1
CUDNN_NON_DETERMINISTIC = 0
CUDNN_DETERMINISTIC = 1
CUDNN_RNN_ALGO_STANDARD = 0
CUDNN_RNN_ALGO_PERSIST_STATIC = 1
CUDNN_RNN_ALGO_PERSIST_DYNAMIC = 2
class CuDNNHandle:
def __init__(self):
ptr = ctypes.c_void_p()
check_error(lib.cudnnCreate(ctypes.byref(ptr)))
@ -133,7 +92,6 @@ class CuDNNHandle:
class CuDNNError(RuntimeError):
def __init__(self, status):
self.status = status
msg = '{}: {}'.format(status, get_error_string(status))
@ -141,7 +99,6 @@ class CuDNNError(RuntimeError):
class TensorDescriptor(object):
def __init__(self):
ptr = ctypes.c_void_p()
check_error(lib.cudnnCreateTensorDescriptor(ctypes.byref(ptr)))
@ -164,7 +121,6 @@ class TensorDescriptor(object):
class TensorDescriptorArray(object):
def __init__(self, N):
self.ptrs = (ctypes.c_void_p * N)()
for i in range(N):
@ -179,44 +135,22 @@ class TensorDescriptorArray(object):
def __getitem__(self, key):
return ctypes.c_void_p(self.ptrs[key])
def set(self, tensor):
self._type = tensor.type()
self._size = tensor.size()
self._stride = tensor.stride()
def set_all(self, tensor):
_type = _typemap[tensor.type()]
_ndim = tensor.dim()
_size = int_array(tensor.size())
_stride = int_array(tensor.stride())
for ptr in self.ptrs:
check_error(lib.cudnnSetTensorNdDescriptor(
ctypes.c_void_p(ptr), _typemap[tensor.type()], tensor.dim(),
int_array(tensor.size()), int_array(tensor.stride())))
ctypes.c_void_p(ptr), _type, _ndim, _size, _stride))
def as_tuple(self):
return (self._type, tuple(self._size), tuple(self._stride))
class ConvolutionDescriptor(object):
def __init__(self):
ptr = ctypes.c_void_p()
check_error(lib.cudnnCreateConvolutionDescriptor(ctypes.byref(ptr)))
self._as_parameter_ = ptr
def __del__(self):
check_error(lib.cudnnDestroyConvolutionDescriptor(self._as_parameter_))
del self._as_parameter_
def set(self, typename, pad, stride):
self._pad = pad
self._stride = stride
upscale = int_array([1, 1])
check_error(lib.cudnnSetConvolutionNdDescriptor(
self, 2, int_array(pad), int_array(stride), upscale,
CUDNN_CROSS_CORRELATION, _typemap[typename]))
def as_tuple(self):
return (self._pad, self._stride)
def set_raw(self, i, _type, _ndim, _size, _stride):
ptr = self.ptrs[i]
check_error(lib.cudnnSetTensorNdDescriptor(
ctypes.c_void_p(ptr), _type, _ndim, _size, _stride))
class FilterDescriptor(object):
def __init__(self):
ptr = ctypes.c_void_p()
check_error(lib.cudnnCreateFilterDescriptor(ctypes.byref(ptr)))
@ -230,41 +164,58 @@ class FilterDescriptor(object):
self._size = weight.size()
datatype = _typemap[weight.type()]
check_error(lib.cudnnSetFilterNdDescriptor(
self, datatype, CUDNN_TENSOR_NCHW, weight.ndimension(), int_array(weight.size())))
self, datatype, CUDNN_TENSOR_NCHW, weight.ndimension(),
int_array(weight.size())))
def as_tuple(self):
return tuple(self._size)
class DropoutDescriptor(object):
def __init__(self, handle, dropout, seed):
ptr = ctypes.c_void_p()
check_error(lib.cudnnCreateDropoutDescriptor(ctypes.byref(ptr)))
self._as_parameter_ = ptr
self.state = None
self.dropout = dropout
self.handle = handle
dropout_states_size = ctypes.c_long()
check_error(lib.cudnnDropoutGetStatesSize(
handle,
ctypes.byref(dropout_states_size)))
self._set(dropout, seed)
self.state = torch.cuda.ByteTensor(dropout_states_size.value)
def set_dropout(self, dropout, seed):
if dropout != self.dropout:
self._set(dropout, seed)
def _set(self, dropout, seed):
if self.state is None and dropout > 0:
dropout_states_size = ctypes.c_long()
check_error(lib.cudnnDropoutGetStatesSize(
self.handle,
ctypes.byref(dropout_states_size)))
self.state = torch.cuda.ByteTensor(dropout_states_size.value)
state_ptr = self.state.data_ptr()
state_size = self.state.size(0)
else:
state_ptr = None
state_size = 0
check_error(lib.cudnnSetDropoutDescriptor(
self,
handle,
self.handle,
ctypes.c_float(dropout),
ctypes.c_void_p(self.state.data_ptr()),
ctypes.c_size_t(self.state.size(0)),
ctypes.c_void_p(state_ptr),
ctypes.c_size_t(state_size),
ctypes.c_ulonglong(seed),
))
self.dropout = dropout
def __del__(self):
check_error(lib.cudnnDestroyDropoutDescriptor(self))
class RNNDescriptor(object):
def __init__(self, handle, hidden_size, num_layers, dropout_desc, input_mode,
bidirectional, mode, datatype):
ptr = ctypes.c_void_p()
@ -299,26 +250,6 @@ class RNNDescriptor(object):
check_error(lib.cudnnDestroyRNNDescriptor(self))
class ConvolutionAlgoPerf_v5(ctypes.Structure):
_fields_ = [
("algo", ctypes.c_int),
("status", ctypes.c_int),
("time", ctypes.c_float),
("memory", ctypes.c_size_t),
]
class ConvolutionAlgoPerf_v6(ctypes.Structure):
_fields_ = [
("algo", ctypes.c_int),
("status", ctypes.c_int),
("time", ctypes.c_float),
("memory", ctypes.c_size_t),
("determinism", ctypes.c_int),
("reserved", ctypes.c_int * 4)
]
def check_error(status):
if status is not 0:
raise CuDNNError(status)
@ -329,8 +260,8 @@ def get_error_string(status):
def get_handle():
if lib is None:
_loadlib()
if _libcudnn() is None:
raise RuntimeError('cuDNN not available')
current_device = torch.cuda.current_device()
handle = _handles.get(current_device, None)
if handle is None:
@ -338,6 +269,7 @@ def get_handle():
_handles[current_device] = handle
return handle
_typemap = {
'torch.cuda.HalfTensor': CUDNN_DATA_HALF,
'torch.cuda.FloatTensor': CUDNN_DATA_FLOAT,
@ -368,136 +300,28 @@ def int_array(itr):
def descriptor(tensor, N=None):
padded_size = tensor.size() + ((1,) * (5 - tensor.dim()))
tensor = tensor.view(padded_size)
if N is not None:
descriptor = TensorDescriptorArray(N)
descriptor.set_all(tensor)
else:
descriptor = TensorDescriptor()
if tensor.dim() == 2:
tensor = tensor.view(tensor.size(0), tensor.size(1), 1, 1)
elif tensor.dim() == 3:
tensor = tensor.view(tensor.size(0), tensor.size(1), tensor.size(2), 1)
descriptor.set(tensor)
descriptor.set(tensor)
return descriptor
_autotuner_forward = {}
_autotuner_backward_data = {}
_autotuner_backward_filter = {}
def convolution_autotuner_key(idesc, weight_desc, conv_desc):
return (idesc.as_tuple(), weight_desc.as_tuple(), conv_desc.as_tuple())
def convolution_forward_algorithm(idesc, weight_desc, conv_desc, odesc):
k = convolution_autotuner_key(idesc, weight_desc, conv_desc)
if k in _autotuner_forward:
return _autotuner_forward[k]
if benchmark:
if version() < 6000:
perf_results = ConvolutionAlgoPerf_v5()
else:
perf_results = ConvolutionAlgoPerf_v6()
algo_count = ctypes.c_int()
check_error(lib.cudnnFindConvolutionForwardAlgorithm(
get_handle(), idesc, weight_desc, conv_desc, odesc, 1,
ctypes.byref(algo_count), ctypes.byref(perf_results)))
_autotuner_forward[k] = perf_results.algo
return perf_results.algo
search_mode = CUDNN_CONVOLUTION_FWD_PREFER_FASTEST
wlimit = 0
if workspace_limit is not None:
wlimit = workspace_limit
search_mode = CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT
fwd_alg = ctypes.c_int()
check_error(lib.cudnnGetConvolutionForwardAlgorithm(
get_handle(), idesc, weight_desc, conv_desc, odesc, search_mode,
wlimit, ctypes.byref(fwd_alg)))
return fwd_alg
def convolution_forward_workspace_size(*args):
check_error(lib.cudnnGetConvolutionForwardWorkspaceSize(*args))
def convolution_forward(*args):
check_error(lib.cudnnConvolutionForward(*args))
def convolution_backward_data(*args):
return check_error(lib.cudnnConvolutionBackwardData(*args))
def convolution_backward_data_algorithm(weight_desc, odesc, conv_desc, idesc):
k = convolution_autotuner_key(idesc, weight_desc, conv_desc)
if k in _autotuner_backward_data:
return _autotuner_backward_data[k]
if benchmark:
perf_results = ConvolutionAlgoPerf()
algo_count = ctypes.c_int()
check_error(lib.cudnnFindConvolutionBackwardDataAlgorithm(
get_handle(), weight_desc, odesc, conv_desc, idesc, 1,
ctypes.byref(algo_count), ctypes.byref(perf_results)))
_autotuner_backward_data[k] = perf_results.algo
return perf_results.algo
search_mode = CUDNN_CONVOLUTION_BWD_DATA_PREFER_FASTEST
wlimit = 0
if workspace_limit is not None:
wlimit = workspace_limit
search_mode = CUDNN_CONVOLUTION_BWD_DATA_SPECIFY_WORKSPACE_LIMIT
bwd_data_alg = ctypes.c_int()
check_error(lib.cudnnGetConvolutionBackwardDataAlgorithm(
get_handle(), weight_desc, odesc, conv_desc, idesc, search_mode,
wlimit, ctypes.byref(bwd_data_alg)))
return bwd_data_alg
def convolution_backward_data_workspace_size(*args):
return check_error(lib.cudnnGetConvolutionBackwardDataWorkspaceSize(*args))
def convolution_backward_filter(*args):
return check_error(lib.cudnnConvolutionBackwardFilter(*args))
def convolution_backward_filter_algorithm(idesc, odesc, conv_desc, weight_desc):
k = convolution_autotuner_key(idesc, weight_desc, conv_desc)
if k in _autotuner_backward_filter:
return _autotuner_backward_filter[k]
if benchmark:
perf_results = ConvolutionAlgoPerf()
algo_count = ctypes.c_int()
check_error(lib.cudnnFindConvolutionBackwardFilterAlgorithm(
get_handle(), idesc, odesc, conv_desc, weight_desc, 1,
ctypes.byref(algo_count), ctypes.byref(perf_results)))
_autotuner_backward_filter[k] = perf_results.algo
return perf_results.algo
search_mode = CUDNN_CONVOLUTION_BWD_FILTER_PREFER_FASTEST
wlimit = 0
if workspace_limit is not None:
wlimit = workspace_limit
search_mode = CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT
bwd_filter_alg = ctypes.c_int()
check_error(lib.cudnnGetConvolutionBackwardFilterAlgorithm(
get_handle(), idesc, odesc, conv_desc, weight_desc, search_mode,
wlimit, ctypes.byref(bwd_filter_alg)))
return bwd_filter_alg
def convolution_backward_filter_workspace_size(*args):
return check_error(lib.cudnnGetConvolutionBackwardFilterWorkspaceSize(*args))
def convolution_backward_bias(*args):
check_error(lib.cudnnConvolutionBackwardBias(*args))
def descriptor_sequence(tensor, batch_sizes):
descriptors = TensorDescriptorArray(len(batch_sizes))
_type = _typemap[tensor.type()]
_ndim = 5
dim_pad = (1,) * (5 - tensor.dim())
_size = int_array(tensor.size() + dim_pad)
_stride = int_array(tensor.stride() + dim_pad)
for i, batch_size in enumerate(batch_sizes):
_size[0] = batch_size
descriptors.set_raw(i, _type, _ndim, _size, _stride)
return descriptors
def add_tensor(*args):

View File

@ -34,20 +34,20 @@ class Unserializable(object):
self.inner = None
def init_dropout_descriptor(fn, handle):
return cudnn.DropoutDescriptor(
handle,
fn.dropout,
fn.dropout_seed
)
def init_rnn_descriptor(fn, handle):
dropout_desc_name = 'desc_' + str(torch.cuda.current_device())
dropout_p = fn.dropout if fn.train else 0
if (dropout_desc_name not in fn.dropout_state) or (fn.dropout_state[dropout_desc_name].get() is None):
fn.dropout_state[dropout_desc_name] = Unserializable(
cudnn.DropoutDescriptor(handle, dropout_p, fn.dropout_seed)
)
dropout_desc = fn.dropout_state[dropout_desc_name].get()
dropout_desc.set_dropout(dropout_p, fn.dropout_seed)
return cudnn.RNNDescriptor(
handle,
fn.hidden_size,
fn.num_layers,
fn.dropout_state['desc'].get(),
dropout_desc,
fn.input_mode,
fn.bidirectional,
fn.mode,
@ -62,16 +62,22 @@ def init_weight_descriptor(fn, weight):
return w_desc
def _input_size(fn):
return (fn.seq_length, fn.mini_batch, fn.input_size)
def _input_size(fn, input):
if fn.batch_sizes is not None:
return (input.size(0), fn.input_size)
else:
return (fn.seq_length, fn.mini_batch, fn.input_size)
def _hidden_size(fn):
return (fn.num_layers * fn.num_directions, fn.mini_batch, fn.hidden_size)
def _output_size(fn):
return (fn.seq_length, fn.mini_batch, fn.hidden_size * fn.num_directions)
def _output_size(fn, input):
if fn.batch_sizes is not None:
return (input.size(0), fn.hidden_size * fn.num_directions)
else:
return (fn.seq_length, fn.mini_batch, fn.hidden_size * fn.num_directions)
def get_num_weights(handle, rnn_desc, x_desc, datatype):
@ -157,9 +163,9 @@ def get_parameters(fn, handle, weight_buf):
# might as well merge the CUDNN ones into a single tensor as well
if linear_id == 0 or linear_id == num_linear_layers / 2:
assert filter_dim_a.prod() == filter_dim_a[0]
size = (filter_dim_a[0] * num_linear_layers // 2, filter_dim_a[2])
param = fn.weight_buf.new().set_(
weight_buf.storage(), offset,
filter_dim_a[0] * num_linear_layers // 2, filter_dim_a[2])
weight_buf.storage(), offset, size)
layer_params.append(param)
else:
assert cur_offset == offset
@ -172,10 +178,13 @@ def get_parameters(fn, handle, weight_buf):
def _copyParams(params_from, params_to):
assert len(params_from) == len(params_to)
for layer_params_from, layer_params_to in zip(params_from, params_to):
# NOTE: these lists have all weights before all biases, so if the layer doesn't
# use biases, zip will terminate once layer_params_from ends and ignore them.
for param_from, param_to in zip(layer_params_from, layer_params_to):
assert param_from.type() == param_to.type()
param_to.copy_(param_from)
param_to.copy_(param_from, broadcast=False)
def forward(fn, input, hx, weight, output, hy):
@ -183,6 +192,7 @@ def forward(fn, input, hx, weight, output, hy):
lib = cudnn.lib
handle = cudnn.get_handle()
fn.datatype = cudnn._typemap[input.type()]
is_input_packed = fn.batch_sizes is not None
if fn.mode == cudnn.CUDNN_LSTM:
hx, cx = hx
@ -190,22 +200,27 @@ def forward(fn, input, hx, weight, output, hy):
else:
cx, cy = None, None
if fn.batch_first:
if fn.batch_first and not is_input_packed:
input = input.transpose(0, 1)
if input.dim() != 3:
if (not is_input_packed and input.dim() != 3) or (is_input_packed and input.dim() != 2):
raise RuntimeError(
'input must have 3 dimensions, got {}'.format(input.dim()))
if fn.input_size != input.size(2):
raise RuntimeError('input.size(2) must be equal to input_size. Expected {}, got {}'.format(
fn.input_size, input.size(2)
if fn.input_size != input.size(-1):
raise RuntimeError('input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
fn.input_size, input.size(-1)
))
if fn.dropout != 0 and cudnn.version() < 5103:
raise RuntimeError('dropout supported only in cudnn v5.1 and above')
fn.seq_length, fn.mini_batch, fn.input_size = input.size()
if is_input_packed:
fn.seq_length = len(fn.batch_sizes)
fn.mini_batch = fn.batch_sizes[0]
fn.input_size = input.size(-1)
else:
fn.seq_length, fn.mini_batch, fn.input_size = input.size()
hidden_size = _hidden_size(fn)
output_size = _output_size(fn)
output_size = _output_size(fn, input)
assert hx.is_contiguous()
assert cx is None or cx.is_contiguous()
@ -217,30 +232,34 @@ def forward(fn, input, hx, weight, output, hy):
y = output
# init descriptors
if ('desc' not in fn.dropout_state) or (fn.dropout_state['desc'].get() is None):
fn.dropout_state['desc'] = Unserializable(
init_dropout_descriptor(fn, handle)
)
fn.rnn_desc = init_rnn_descriptor(fn, handle)
fn.x_descs = cudnn.descriptor(x[0], fn.seq_length)
fn.y_descs = cudnn.descriptor(y[0], fn.seq_length)
if is_input_packed:
fn.x_descs = cudnn.descriptor_sequence(x, fn.batch_sizes)
fn.y_descs = cudnn.descriptor_sequence(y, fn.batch_sizes)
else:
fn.x_descs = cudnn.descriptor(x[0], fn.seq_length)
fn.y_descs = cudnn.descriptor(y[0], fn.seq_length)
fn.hx_desc = cudnn.descriptor(hx)
fn.hy_desc = cudnn.descriptor(hx)
fn.cx_desc = cudnn.descriptor(cx) if cx is not None else None
fn.cy_desc = cudnn.descriptor(cx) if cx is not None else None
# create the weight buffer and copy the weights into it
num_weights = get_num_weights(
handle, fn.rnn_desc, fn.x_descs[0], fn.datatype)
fn.weight_buf = input.new(num_weights)
fn.w_desc = init_weight_descriptor(fn, fn.weight_buf)
w = fn.weight_buf
# this zero might not seem necessary, but it is in the case
# where biases are disabled; then they won't be copied and must be zero'd.
# Alternatively, _copyParams could be written more carefully.
w.zero_()
params = get_parameters(fn, handle, w)
_copyParams(weight, params)
if fn.weight_buf is None:
num_weights = get_num_weights(
handle, fn.rnn_desc, fn.x_descs[0], fn.datatype)
fn.weight_buf = x.new(num_weights)
fn.w_desc = init_weight_descriptor(fn, fn.weight_buf)
w = fn.weight_buf
# this zero might not seem necessary, but it is in the case
# where biases are disabled; then they won't be copied and must be zero'd.
# Alternatively, _copyParams could be written more carefully.
w.zero_()
params = get_parameters(fn, handle, w)
_copyParams(weight, params)
else:
fn.w_desc = init_weight_descriptor(fn, fn.weight_buf)
w = fn.weight_buf
if tuple(hx.size()) != hidden_size:
raise RuntimeError('Expected hidden size {}, got {}'.format(
@ -257,8 +276,10 @@ def forward(fn, input, hx, weight, output, hy):
fn.x_descs,
ctypes.byref(workspace_size)
))
fn.workspace = torch.cuda.ByteTensor(workspace_size.value)
if fn.train:
fn.workspace_size = workspace_size.value
with torch.cuda.device_of(input):
workspace = torch.cuda.ByteTensor(fn.workspace_size)
if fn.requires_grad:
reserve_size = ctypes.c_long()
check_error(lib.cudnnGetRNNTrainingReserveSize(
handle,
@ -280,7 +301,7 @@ def forward(fn, input, hx, weight, output, hy):
fn.y_descs, ctypes.c_void_p(y.data_ptr()),
fn.hy_desc, ctypes.c_void_p(hy.data_ptr()),
fn.cy_desc, ctypes.c_void_p(cy.data_ptr()) if cx is not None else None,
ctypes.c_void_p(fn.workspace.data_ptr()), fn.workspace.size(0),
ctypes.c_void_p(workspace.data_ptr()), workspace.size(0),
ctypes.c_void_p(fn.reserve.data_ptr()), fn.reserve.size(0)
))
else: # inference
@ -295,15 +316,16 @@ def forward(fn, input, hx, weight, output, hy):
fn.y_descs, ctypes.c_void_p(y.data_ptr()),
fn.hy_desc, ctypes.c_void_p(hy.data_ptr()),
fn.cy_desc, ctypes.c_void_p(cy.data_ptr()) if cx is not None else None,
ctypes.c_void_p(fn.workspace.data_ptr()), fn.workspace.size(0)
ctypes.c_void_p(workspace.data_ptr()), workspace.size(0)
))
if fn.batch_first:
output = output.transpose_(0, 1)
if fn.batch_first and not is_input_packed:
output.transpose_(0, 1)
def backward_grad(fn, input, hx, weight, output, grad_output, grad_hy, grad_input, grad_hx):
with torch.cuda.device_of(input):
is_input_packed = fn.batch_sizes is not None
handle = cudnn.get_handle()
if fn.mode == cudnn.CUDNN_LSTM:
@ -313,14 +335,14 @@ def backward_grad(fn, input, hx, weight, output, grad_output, grad_hy, grad_inpu
else:
cx, grad_cx, grad_cy = None, None, None
if fn.batch_first:
if fn.batch_first and not is_input_packed:
input = input.transpose(0, 1)
grad_output = grad_output.transpose(0, 1)
output = output.transpose(0, 1)
input_size = _input_size(fn)
input_size = _input_size(fn, input)
hidden_size = _hidden_size(fn)
output_size = _output_size(fn)
output_size = _output_size(fn, input)
assert hx.is_contiguous()
assert cx is None or cx.is_contiguous()
@ -336,12 +358,12 @@ def backward_grad(fn, input, hx, weight, output, grad_output, grad_hy, grad_inpu
if fn.dropout != 0 and cudnn.version() < 5103:
raise RuntimeError('dropout supported only in cudnn v 5.1 and above')
if not fn.train:
raise RuntimeError('backward_grad can only be called when training!')
if not fn.requires_grad:
raise RuntimeError('backward_grad can only be called when the function requires grad!')
if tuple(input.size()) != input_size:
raise RuntimeError('Expected input size {}, got {}'.format(
input_size, tuple(input.size())))
if tuple(output.size()) != _output_size(fn):
if tuple(output.size()) != output_size:
raise RuntimeError('Expected output size {}, got {}'.format(
output_size, output.size()))
if hx is not None and tuple(hx.size()) != hidden_size:
@ -359,6 +381,8 @@ def backward_grad(fn, input, hx, weight, output, grad_output, grad_hy, grad_inpu
if not dhy.is_cuda or not dy.is_cuda or (dcy is not None and not dcy.is_cuda):
raise RuntimeError('Gradients aren\'t CUDA tensors')
with torch.cuda.device_of(input):
workspace = torch.cuda.ByteTensor(fn.workspace_size)
check_error(cudnn.lib.cudnnRNNBackwardData(
handle,
fn.rnn_desc,
@ -373,11 +397,11 @@ def backward_grad(fn, input, hx, weight, output, grad_output, grad_hy, grad_inpu
fn.x_descs, ctypes.c_void_p(dx.data_ptr()),
fn.hx_desc, ctypes.c_void_p(dhx.data_ptr()),
fn.cx_desc, ctypes.c_void_p(dcx.data_ptr()) if cx is not None else None,
ctypes.c_void_p(fn.workspace.data_ptr()), fn.workspace.size(0),
ctypes.c_void_p(workspace.data_ptr()), workspace.size(0),
ctypes.c_void_p(fn.reserve.data_ptr()), fn.reserve.size(0)
))
if fn.batch_first:
if fn.batch_first and not is_input_packed:
grad_input = grad_input.transpose_(0, 1)
@ -396,6 +420,7 @@ def _num_linear_layers(fn):
def backward_weight(fn, input, hx, output, weight, grad_weight):
with torch.cuda.device_of(input):
is_input_packed = fn.batch_sizes is not None
handle = cudnn.get_handle()
if fn.mode == cudnn.CUDNN_LSTM:
@ -403,13 +428,13 @@ def backward_weight(fn, input, hx, output, weight, grad_weight):
else:
cx = None
if fn.batch_first:
if fn.batch_first and not is_input_packed:
input = input.transpose(0, 1)
output = output.transpose(0, 1)
input_size = _input_size(fn)
input_size = _input_size(fn, input)
hidden_size = _hidden_size(fn)
if not fn.train:
raise RuntimeError('backward_weight can only be called when training!')
if not fn.requires_grad:
raise RuntimeError('backward_weight can only be called when the function requires grad!')
if fn.dropout != 0 and cudnn.version() < 5103:
raise RuntimeError('dropout supported only in cudnn v 5.1 and above')
if tuple(input.size()) != input_size:
@ -425,6 +450,8 @@ def backward_weight(fn, input, hx, output, weight, grad_weight):
y = output
dw = fn.weight_buf.new().resize_as_(fn.weight_buf).zero_()
with torch.cuda.device_of(input):
workspace = torch.cuda.ByteTensor(fn.workspace_size)
check_error(cudnn.lib.cudnnRNNBackwardWeights(
handle,
fn.rnn_desc,
@ -432,7 +459,7 @@ def backward_weight(fn, input, hx, output, weight, grad_weight):
fn.x_descs, ctypes.c_void_p(x.data_ptr()),
fn.hx_desc, ctypes.c_void_p(hx.data_ptr()),
fn.y_descs, ctypes.c_void_p(y.data_ptr()),
ctypes.c_void_p(fn.workspace.data_ptr()), fn.workspace.size(0),
ctypes.c_void_p(workspace.data_ptr()), workspace.size(0),
fn.w_desc, ctypes.c_void_p(dw.data_ptr()),
ctypes.c_void_p(fn.reserve.data_ptr()), fn.reserve.size(0)
))

Some files were not shown because too many files have changed in this diff Show More