pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-31 12:15:03 +08:00

Author	SHA1	Message	Date
Wei Yang	5ee8312b63	sparse.mm(), reland #14526 (#14661 ) Summary: - reland reverted PR #14526 with doc fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/14661 Differential Revision: D13289047 Pulled By: weiyangfb fbshipit-source-id: 5b843a11a58b56aeada3af2680a27cf89ecef4d8	2018-12-03 10:39:27 -08:00
Alyssa Wang	1c21dc6e16	Revert D13252990: [pytorch][PR] [sparse] sparse.mm(S, D) Differential Revision: D13252990 Original commit changeset: 8fdb14144405 fbshipit-source-id: 49b8b0759a6e647854689962ffa72a205b4a2088	2018-11-30 18:53:47 -08:00
Wei Yang	c3a2b1e155	sparse.mm(S, D) (#14526 ) Summary: - add `sparse.mm(S, D)` with backward - for `sparse.addmm()`, relax input constraint so that sparse matrix input doesn't have to coalesced Pull Request resolved: https://github.com/pytorch/pytorch/pull/14526 Reviewed By: ezyang Differential Revision: D13252990 Pulled By: weiyangfb fbshipit-source-id: 8fdb14144405a2122d4b8447ad4055cd0330e6e8	2018-11-30 14:15:34 -08:00
Wei Yang	be7c618fd7	torch.sparse.sum() (#12430 ) Summary: - to fix #12241 - add `_sparse_sum()` to ATen, and expose as `torch.sparse.sum()`, not support `SparseTensor.sum()` currently - this PR depends on #11253, and will need to be updated upon it lands - [x] implement forward - [x] implement backward - performance [benchmark script](https://gist.github.com/weiyangfb/f4c55c88b6092ef8f7e348f6b9ad8946#file-sparse_sum_benchmark-py): - sum all dims is fastest for sparse tensor - when input is sparse enough nnz = 0.1%, sum of sparse tensor is faster than dense in CPU, but not necessary in CUDA - CUDA backward is comparable (<2x) between `sum several dims` vs `sum all dims` in sparse - CPU backward uses binary search is still slow in sparse, takes `5x` time in `sum [0, 2, 3] dims` vs `sum all dims` - optimize CUDA backward for now - using thrust for sort and binary search, but runtime not improved - both of CPU and CUDA forward are slow in sparse (`sum several dims` vs `sum all dims`), at most `20x` slower in CPU, and `10x` in CUDA - improve CPU and CUDA forward kernels (nnz, sizes, sum_dims, keepdim, sum all or dims, bk=backward) \| CPU (sparse vs dense) \| CUDA(sparse vs dense) -- \| -- \| -- (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 8.77 µs vs 72.9 µs \| 42.5 µs vs 108 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 112 µs vs 4.47 ms \| 484 µs vs 407 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 141 µs vs 148 µs \| 647 µs vs 231 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 235 µs vs 1.23 ms \| 781 µs vs 213 µs (1000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 48.5 µs vs 360 µs \| 160 µs vs 2.03 ms (1000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 258 µs vs 1.22 ms \| 798 µs vs 224 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 204 µs vs 882 µs \| 443 µs vs 133 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 709 µs vs 1.15 ms \| 893 µs vs 202 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 39.8 µs vs 81 µs \| 42.4 µs vs 113 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 747 µs vs 4.7 ms \| 2.4 ms vs 414 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 1.04 ms vs 126 µs \| 5.03 ms vs 231 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 1.12 ms vs 1.24 ms \| 5.99 ms vs 213 µs (10000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 133 µs vs 366 µs \| 463 µs vs 2.03 ms (10000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 1.56 ms vs 1.22 ms \| 6.11 ms vs 229 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 1.53 ms vs 799 µs \| 824 µs vs 134 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 5.15 ms vs 1.09 ms \| 7.02 ms vs 205 µs - after improving CPU and CUDA forward kernels - in `(1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD)` forward, CPU takes ~~`171 µs`~~, in which `130 µs` is spent on `coalesce()`, for CUDA, total time is ~~`331 µs`~~, in which `141 µs` is spent on `coalesce()`, we need to reduce time at other places outside `coalesce()`. - after a few simple tweaks, now in the forward, it is at most `10x` slower in CPU, and `7x` in CUDA. And time takes in `sum dense dims only [2, 3]` is `~2x` of `sum all dims`. Speed of `sum all sparse dims [0, 1]` is on bar with `sum all dims` (nnz, sizes, sum_dims, keepdim, sum all or dims, bk=backward) \| CPU (sparse vs dense) \| CUDA(sparse vs dense) -- \| -- \| -- (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 7 µs vs 69.5 µs \| 31.5 µs vs 61.6 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 11.3 µs vs 4.72 ms \| 35.2 µs vs 285 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 197 µs vs 124 µs \| 857 µs vs 134 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 124 µs vs 833 µs \| 796 µs vs 106 µs (1000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 20.5 µs vs 213 µs \| 39.4 µs vs 1.24 ms (1000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 131 µs vs 830 µs \| 881 µs vs 132 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 95.8 µs vs 409 µs \| 246 µs vs 87.2 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 624 µs vs 820 µs \| 953 µs vs 124 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 45.3 µs vs 72.9 µs \| 33.9 µs vs 57.2 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 81.4 µs vs 4.49 ms \| 39.7 µs vs 280 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 984 µs vs 111 µs \| 6.41 ms vs 121 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 1.45 ms vs 828 µs \| 6.77 ms vs 113 µs (10000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 74.9 µs vs 209 µs \| 37.7 µs vs 1.23 ms (10000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 1.48 ms vs 845 µs \| 6.96 ms vs 132 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 1.14 ms vs 411 µs \| 252 µs vs 87.8 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 4.53 ms vs 851 µs \| 7.12 ms vs 128 µs - time takes in CUDA backward of sparse is super long with large variance (in case of nnz=10000, it normally takes 6-7ms). To improve backward of sparse ops, we will need to debug at places other than CUDA kernels. here is a benchmark of `torch.copy_()`: ``` >>> d = [1000, 1000, 2, 2] >>> nnz = 10000 >>> I = torch.cat([torch.randint(0, d[0], size=(nnz,)), torch.randint(0, d[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, d[2], d[3]) >>> size = torch.Size(d) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce().cuda() >>> S2 = torch.sparse_coo_tensor(I, V, size).coalesce().cuda().requires_grad_() >>> data = S2.clone() >>> S.copy_(S2) >>> y = S * 2 >>> torch.cuda.synchronize() >>> %timeit y.backward(data, retain_graph=True); torch.cuda.synchronize() 7.07 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12430 Differential Revision: D12878313 Pulled By: weiyangfb fbshipit-source-id: e16dc7681ba41fdabf4838cf05e491ca9108c6fe	2018-11-28 02:19:12 -08:00
Wei Yang	50bc9dc9c3	fix doc for sparse.addmm (#14403 ) Summary: - fixing the doc issue in sparse.addmm ================ before change ================== ![image](https://user-images.githubusercontent.com/38509346/49063994-2f10fe80-f1ce-11e8-9ccc-54241bc45f0b.png) ![image](https://user-images.githubusercontent.com/38509346/49064064-641d5100-f1ce-11e8-865a-7227be7156ef.png) ================ post change ================== ![image](https://user-images.githubusercontent.com/38509346/49064078-76978a80-f1ce-11e8-8f38-f1f8ac9ce63b.png) ![image](https://user-images.githubusercontent.com/38509346/49064085-7bf4d500-f1ce-11e8-8a0d-bf9e5460d21f.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14403 Differential Revision: D13216582 Pulled By: weiyangfb fbshipit-source-id: 52e0a20c6b341c37cfb31f281be3afe2a52ca532	2018-11-27 10:24:18 -08:00
Wei Yang	12558019a8	backward for sparse.addmm(D, S, D, alpha, beta) -> D (#13345 ) Summary: - introduce `sparse.addmm()` with backward for sparse matrix input for https://github.com/pytorch/pytorch/issues/12308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13345 Differential Revision: D13094070 Pulled By: weiyangfb fbshipit-source-id: 136c08c3ca9bafb20577b60dd43d31c3e5cd5461	2018-11-26 17:47:48 -08:00
Sam Gross	48a3349c29	Delete dead Tensor code paths (#5417 ) This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp. This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.	2018-02-27 17:58:09 -05:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
Luca Antiga	af58bfbb1b	Make integer parameters and buffers immune to float(), double() and half() (#3820 ) * Avoid casting integer params and buffers to float(), double() and half() * Add test for immune integer buffers * Fix documentation for float(), double() and half() * Fix test	2017-11-22 18:34:53 -05:00
Edward Z. Yang	4ec0435b39	Report overall size of sparse tensors. (#1461 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-03 17:51:56 -04:00
Edward Z. Yang	743e4894d2	Prefix values/indices/sparse_mask/nnz with underscore (#1457 ) As discussed in #1441. I also added some docs giving clear guidance about how to coalescing in sparse tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-03 11:14:10 -04:00
Adam Lerer	f2903332c7	Make coalesce() out of place	2017-04-28 17:11:05 -04:00
Adam Lerer	4f09461d24	Rename sparse tensor contiguous() to coalesce()	2017-04-28 17:11:05 -04:00
Edward Z. Yang	bafb2e5cc2	Implement sparse pow. (#1387 )	2017-04-28 23:06:09 +02:00
Martin Raison	701e63107f	speed improvements, fix tests	2017-04-18 12:46:54 -07:00
Martin Raison	f17cfe4293	sparse tensor operations (#735 )	2017-03-03 18:37:03 +01:00
Luke Yeager	e7c1e6a8e3	[pep8] Fix most lint automatically with autopep8 Here's the command I used to invoke autopep8 (in parallel!): git ls-files \| grep '\.py$' \| xargs -n1 -P`nproc` autopep8 -i Several rules are ignored in setup.cfg. The goal is to let autopep8 handle everything which it can handle safely, and to disable any rules which are tricky or controversial to address. We may want to come back and re-enable some of these rules later, but I'm trying to make this patch as safe as possible. Also configures flake8 to match pep8's behavior. Also configures TravisCI to check the whole project for lint.	2017-01-28 01:15:51 +01:00
Zeming Lin	59d66e6963	Sparse Library (#333 )	2017-01-05 00:43:41 +01:00

1 2 3

118 Commits