pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
eqy	2fddcf0fc0	[CUDA][CUDA 11] Remove more CUDA 11 version checks (#92934 ) Working on removing stragglers missed in previous CUDA version < 11.0 cleanup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92934 Approved by: https://github.com/ngimel	2023-03-30 19:49:52 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00
Christian Puhrsch	9d37cefcb0	Resubmit _int_mm (#96685 ) Avoids any changes to gemm_and_bias Pull Request resolved: https://github.com/pytorch/pytorch/pull/96685 Approved by: https://github.com/drisspg, https://github.com/ngimel	2023-03-27 16:14:07 +00:00
haozhe.zhu	fe0afc5852	use accumulate type in BF16 gemm(include dot, mv) ref path (#96074 ) Fix https://github.com/pytorch/pytorch/issues/95125 and https://github.com/pytorch/pytorch/issues/83863 for bf16 accumulation in gemm ref path Pull Request resolved: https://github.com/pytorch/pytorch/pull/96074 Approved by: https://github.com/lezcano, https://github.com/peterbell10	2023-03-23 01:22:59 +00:00
Christian Puhrsch	0a53c9624a	Back out "Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 )" (#96885 ) Summary: Backing out _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339) Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/96885 Approved by: https://github.com/drisspg	2023-03-16 05:32:55 +00:00
mantaionut	2cbce06fee	Enablee test_inverse_errors_large (#94727 ) Test to see if TestLinAlgCUDA.test_inverse_errors_large_cuda_float64 still fails on CI. The test was not failing in multiple CI runs. I was not able to reproduce the crash locally. Fixes #57482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94727 Approved by: https://github.com/lezcano	2023-03-13 08:31:41 +00:00
XiaobingSuper	ac77883e48	fix issue of baddbmm when out has nan value for beta=0 (#96086 ) Fix https://github.com/pytorch/pytorch/issues/96037. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96086 Approved by: https://github.com/ngimel, https://github.com/lezcano	2023-03-07 14:54:05 +00:00
Christian Puhrsch	1fe2a9d122	Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 ) Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-02-27 20:27:25 +00:00
lezcano	03cc0f587c	Don't create large intermediary tensors in the backward of matmul (#95261 ) Currently, if we multiply a transposed batch of matrices with shape [b, m, n] and a matrix with shape [n, k], when computing the gradient of the matrix, we instantiate a matrix of shape [b, n, k]. This may be a very large matrix. Instead, we fold the batch of matrices into a matrix, which avoids creating any large intermediary tensor. Note that multiplying a batch of matrices and a matrix naturally occurs within an attention module, so this case surely happens in the wild. In particular, this issue was found while investigating the OOMs caused by the improved folding algorithm in the next PR of this stack. See https://github.com/pytorch/pytorch/pull/76828#issuecomment-1432359980 This PR fixes those OOMs and decreases the memory footprint of the backward of matmul. I understand this is a tricky one, so I put it on its own PR to discuss it. Differential Revision: [D43541495](https://our.internmc.facebook.com/intern/diff/D43541495) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95261 Approved by: https://github.com/ezyang	2023-02-27 15:19:09 +00:00
kshitij12345	3b966a6ce3	[autograd] disable backward/grad for complex scalar output (#92753 ) Fixes https://github.com/pytorch/pytorch/issues/92750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753 Approved by: https://github.com/ezyang	2023-02-23 11:38:27 +00:00
XiaobingSuper	5730cabdd0	using float type to do the computation of norm reduce for cpu half and bfloat16 dtype (#95166 ) As the title, we should use a higher dtype to compute norm reduce for half and bfloat1 dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95166 Approved by: https://github.com/peterbell10, https://github.com/jgong5, https://github.com/ngimel, https://github.com/lezcano	2023-02-23 05:00:25 +00:00
Nikita Shulga	42b6bcdb13	[BE] Add empty tensor check to _compute_linear_combination (#94245 ) Fixes https://github.com/pytorch/pytorch/issues/94124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94245 Approved by: https://github.com/lezcano	2023-02-07 11:31:11 +00:00
Ivan Yashchuk	fba13d94a1	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. - [x] XLA PR: https://github.com/pytorch/xla/pull/4498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet	2023-01-31 11:59:11 +00:00
PyTorch MergeBot	acdd462b1a	Revert "Remove deprecated torch.symeig (#70988 )" This reverts commit d70ed68162521341060b06985620cdbef04a8fa9. Reverted https://github.com/pytorch/pytorch/pull/70988 on behalf of https://github.com/kit1980 due to Failing XLA tests, forward fix unsuccessful	2023-01-24 19:03:40 +00:00
Eddie Yan	0bf7506051	[CUDA] Drop CUDA < 11.0 test flags (#92605 ) Follow-up of #89582 to drop flags like `CUDA11OrLater` in tests. Note that in some places it appears that `TEST_WITH_ROCM` is _implicitly_ guarded against via the `CUDA11OrLater` version check, based on my best-guess of how `torch.version.cuda` would behave in ROCM builds, so I've added `not TEST_WITH_ROCM` in cases where ROCM wasn't previously explicitly allowed. CC @ptrblck @malfet @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/92605 Approved by: https://github.com/ngimel	2023-01-24 04:34:06 +00:00
Ivan Yashchuk	d70ed68162	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980	2023-01-23 22:51:40 +00:00
PyTorch MergeBot	0a6053e9b5	Revert "Avoid copies in matmul (#76828 )" This reverts commit 8c2e82b48790afb7df8d77ffd9ced74083a3f5b7. Reverted https://github.com/pytorch/pytorch/pull/76828 on behalf of https://github.com/mehtanirav due to Internal breakages	2023-01-03 23:36:58 +00:00
lezcano	8c2e82b487	Avoid copies in matmul (#76828 ) With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see https://github.com/pytorch/pytorch/pull/75197#discussion_r843413208 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489479 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489805 Fixes https://github.com/pytorch/pytorch/issues/76702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76828 Approved by: https://github.com/ngimel	2023-01-03 14:18:38 +00:00
PyTorch MergeBot	db2a237763	Revert "Avoid copies in matmul (#76828 )" This reverts commit 0c3659586d26a762426805af5d4536e0dd01a0c6. Reverted https://github.com/pytorch/pytorch/pull/76828 on behalf of https://github.com/lezcano due to Makes functorch tests fail	2023-01-03 12:26:29 +00:00
lezcano	0c3659586d	Avoid copies in matmul (#76828 ) With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see https://github.com/pytorch/pytorch/pull/75197#discussion_r843413208 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489479 https://github.com/pytorch/pytorch/pull/75197#discussion_r863489805 Fixes https://github.com/pytorch/pytorch/issues/76702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76828 Approved by: https://github.com/ngimel	2023-01-02 20:07:38 +00:00
Jithun Nair	e8e591b72f	Upgrade CI to ROCm5.3 (#88297 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88297 Approved by: https://github.com/malfet	2022-12-14 05:09:56 +00:00
PyTorch MergeBot	af4735d3ad	Revert "Upgrade CI to ROCm5.3 (#88297 )" This reverts commit 181a82ffd26d85bb8dda1b2551dffab2bc04452d. Reverted https://github.com/pytorch/pytorch/pull/88297 on behalf of https://github.com/IvanYashchuk due to Tests are unnecessarily skipped on all platforms	2022-12-13 12:23:44 +00:00
Jithun Nair	181a82ffd2	Upgrade CI to ROCm5.3 (#88297 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88297 Approved by: https://github.com/malfet	2022-12-13 04:50:06 +00:00
lezcano	1d6a188d08	Reland Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761 ) (#84624 ) Reland https://github.com/pytorch/pytorch/pull/81761 Differential Revision: [D39332292](https://our.internmc.facebook.com/intern/diff/D39332292) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84624 Approved by: https://github.com/kit1980	2022-11-22 07:53:24 +00:00
lezcano	d8506ff42b	Generalize gesvdjBatched to run whith full_matrices==false (#88502 ) As brought up in https://github.com/pytorch/pytorch/issues/86234#issuecomment-1268296036, our heuristic for which SVD backend to choose was not great in some cases. The case in which there could be some improvements is when we have a large batch of very small non-square matrices. This PR, adapts the calling code to gesvdj by creating two temporary square buffers to allow to call gesvdjBatched, and then copies back the result into the output buffers. We then modify the heuristic that chooses between gesvdj and gesvdjBatched. Fixes https://github.com/pytorch/pytorch/issues/86234 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88502 Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry, https://github.com/xwang233	2022-11-07 22:07:48 +00:00
Fang Wang	160118d72a	Add test case for matrix multiply-add with large inputs (#85550 ) Summary: - Added test case for addmm, baddbmm and linear with large inputs - Testing with torch types: float32, float16, bfloat16 Test Plan: Run unit tests with: `buck2 run mode/opt //caffe2/test:linalg_re_cuda` ``` ... test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok ---------------------------------------------------------------------- Ran 24 tests in 63.224s OK (skipped=12) ``` Differential Revision: D39718256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85550 Approved by: https://github.com/IvanYashchuk, https://github.com/malfet	2022-10-11 17:52:21 +00:00
Jane Xu	a348975e00	Add opteinsum backend to give users control (#86219 ) This achieves the same things as https://github.com/pytorch/pytorch/pull/85908 but using backends instead of kwargs (which breaks torchscript unfortunately). This also does mean we let go of numpy compatibility BUT the wins here are that users can control what opt einsum they wanna do! The backend allows for..well you should just read the docs: ``` .. attribute:: torch.backends.opteinsum.enabled A :class:`bool` that controls whether opt_einsum is enabled (on by default). If so, torch.einsum will use opt_einsum (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html) to calculate an optimal path of contraction for faster performance. .. attribute:: torch.backends.opteinsum.strategy A :class:`str` that specifies which strategies to try when `torch.backends.opteinsum.enabled` is True. By default, torch.einsum will try the "auto" strategy, but the "greedy" and "optimal" strategies are also supported. Note that the "optimal" strategy is factorial on the number of inputs as it tries all possible paths. See more details in opt_einsum's docs (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html). ``` In trying (and failing) to land 85908, I discovered that jit script does NOT actually pull from python's version of einsum (because it cannot support variadic args nor kwargs). Thus I learned that jitted einsum does not subscribe to the new opt_einsum path calculation. Overall, this is fine since jit script is getting deprecated, but where is the best place to document this? ## Test plan: - added tests to CI - locally tested that trying to set the strategy to something invalid will error properly - locally tested that tests will pass even if you don't have opt-einsum - locally tested that setting the strategy when opt-einsum is not there will also error properly Pull Request resolved: https://github.com/pytorch/pytorch/pull/86219 Approved by: https://github.com/soulitzer, https://github.com/malfet	2022-10-05 06:33:25 +00:00
albanD	94da90e41f	LU solve/unpack fix to prevent bad memory usage on CPU (#85922 ) Fixes https://github.com/pytorch/pytorch/issues/77898 Fixes https://github.com/pytorch/pytorch/issues/85026 There is a minor perf impact but: - For lu_solve, the actual compute is going to be more expensive than this O(n) check (ones pass over the other matrices is O(n^2) in any case) - For lu_unpack, the check inside the kernel should be almost free. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85922 Approved by: https://github.com/ngimel, https://github.com/nikitaved	2022-09-30 20:07:08 +00:00
Jane Xu	e7e1cd945f	Add path optimize kwarg to einsum (#84890 ) ## This PR seeks to: - [x] add c++ support for an optimize path - [x] add python opt_einsum path passthrough - [x] add opt_einsum to OSS requirements, but a soft one - [x] show benchmark results here Additional things I've explored + their conclusions: - Delaying the summing over dimensions => added! - The idea here is to not incur kernel calls to `sum` as we try to early sum out in einsum. Thus, we collect all the dimensions that need to be summed together in one contraction + sum at the end instead of summing as we go. While this optimization didn't feel like it made things faster for the random cases we've selected (they all summed 1 dim per contraction), it is a good principle and would help more common use cases that would reduce multiple dimensions at a time (like `bxy,xyi,xyj->bij`). - Caching contract_path based on equation and tensor sizes => dropped :( - The benchmarks were strictly worse for all the cases, and, from scanning the use cases, I observed people do not often call einsum on the same equation/tensor order enough for caching to be justified. I do think caching can be effective in the future, but it would require further investigation. ## Not a part of this PR (but are next steps): - adding opt_einsum package to OSS CI - adding it to internal CI - potentially adding a kwarg path argument to the python API -- if the path is given, we wouldn't have to spend time calculating it, but there would be some time lost validating user input. ## Testing: - Added more tests to CI ## Benchmarking: TL;DRs - torch.einsum with opt_einsum is a definite win for the production case. - torch.einsum with opt_einsum installed is consistently fast, but has an overhead of needing to find the path. If the path is already found/optimal, it will be slightly slower. - The einsum overhead decreases for bigger dimensions. - torch.einsum without opt_einsum installed is comparable to before this commit, with occasional slowness potentially due to not reshaping/squeezing as we contract until the end. - For many of the random generated cases, the dimensions were too similar and small where an optimal order wasn't that much more optimal than just going left to right. However, in production, dimensions are commonly quite distinct (batch size will be small, but the data will be huge). - torch.einsum opt is comparable (slightly faster overall) compared to numpy.einsum opt for the cpu case. This is interesting given that torch.einsum currently spends time computing the path, but numpy.einsum takes it as input. - torch.einsum opt is significantly faster than numpy.einsum opt for the gpu case. This is because numpy doesn't take advantage of GPUs. The following benchmarks were done on an A100 GPU and Linux CPUs. The line in the first chart separates GPU (on top) from CPU, and the line in the second graph separates CPU (on top) and then GPU. Sorry it's flipped 😛 . Production example (see [colab benchmark](https://colab.research.google.com/drive/1V2s4v1dOOKwRvp5T_DC-PNUosOV9FFJx?authuser=1#scrollTo=WZoQkC8Mdt6I) for more context): <img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012636-9a68bfa7-2601-43b1-afeb-b4e0877db6a4.png"> Randomly generated examples (the same ones as in https://github.com/pytorch/pytorch/pull/60191) <img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012804-1c639595-b3e6-48c9-a385-ad851c13e1c2.png"> Open below to see old + not super relevant benchmarking results: <details> Benchmark results BEFORE this PR (on Linux -- I will update devices so they are consistent later): <img width="776" alt="image" src="https://user-images.githubusercontent.com/31798555/190807274-18f71fce-556e-47f4-b18c-e0f7d0c0d5aa.png"> Benchmark results with the code on this PR (on my x86 mac): For the CPU internal use case -- ![image](https://user-images.githubusercontent.com/31798555/190801376-6f591b00-cebd-4ca7-bb23-ae8f17f1634e.png) For the general use case -- It looks like numpy opt still does better in several of these random cases, but torch einsum opt is consistently faster than torch.einsum. ![image](https://user-images.githubusercontent.com/31798555/190811730-fbb6797d-af59-4f5a-92da-ba4103372014.png) <details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84890 Approved by: https://github.com/albanD, https://github.com/soulitzer	2022-09-24 03:47:36 +00:00
Sourav Mandal	70b27e91c7	[pytorch] Skip linalg tests that fail on Meta infra (#85577 ) Summary: test_inverse_errors_large and test_linalg_solve_triangular fail for dtype=float64 when invoked on GPUs on Meta internal testing infra. Skip in Meta internal testing. Test Plan: (observe tests skipped on Meta internal infra) Reviewed By: mikekgfb Differential Revision: D39785331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85577 Approved by: https://github.com/malfet	2022-09-24 01:02:42 +00:00
Wei Wang	8bd4724f04	Adding a unit test that would gate PRs and prevent reverts, e.g. #83327 (#85442 ) PR #83327 slipped through CI, adding this unit test as part of efforts to minimize future reverts Pull Request resolved: https://github.com/pytorch/pytorch/pull/85442 Approved by: https://github.com/Balandat, https://github.com/mehtanirav	2022-09-23 01:05:17 +00:00
Ivan Yashchuk	539076e2c2	Remove deprecated torch.lstsq (#70980 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.lstsq`. There's a note in `tools/codegen/gen.py` about `lstsq` schema in `native_function.yaml` that I will not remove: `87139d8532/tools/codegen/gen.py (L734-L770)` cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70980 Approved by: https://github.com/lezcano, https://github.com/kit1980	2022-09-23 00:16:55 +00:00
Ivan Yashchuk	bcf93181a0	Remove deprecated torch.matrix_rank (#70981 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.matrix_rank`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70981 Approved by: https://github.com/lezcano, https://github.com/kit1980	2022-09-22 17:40:46 +00:00
Sourav Mandal	5aa84c16db	[pytorch] cuBLAS addmm malfunction test (#85432 ) Summary: Re-submit for approved PR that was then reverted: https://github.com/pytorch/pytorch/pull/85084 Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85432 Approved by: https://github.com/zrphercule	2022-09-21 22:17:48 +00:00
PyTorch MergeBot	2fb820455c	Revert "[pytorch] cuBLAS addmm malfunction test (#85084 )" This reverts commit 0297c75c141103cc780c88bfe9749c460690bf58. Reverted https://github.com/pytorch/pytorch/pull/85084 on behalf of https://github.com/clee2000 due to broke tests on trunk, https://github.com/pytorch/pytorch/actions/runs/3098347639/jobs/5017166419	2022-09-21 16:48:55 +00:00
Sourav Mandal	0297c75c14	[pytorch] cuBLAS addmm malfunction test (#85084 ) Summary: Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85084 Approved by: https://github.com/zrphercule	2022-09-21 13:42:13 +00:00
Ivan Yashchuk	01c54ad6de	Remove deprecated torch.eig (#70982 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.eig`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70982 Approved by: https://github.com/Lezcano, https://github.com/malfet	2022-09-09 21:31:57 +00:00
PyTorch MergeBot	166dec74b5	Revert "Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761 )" This reverts commit 65beff5acb0d7c0c484bd0558bcaf8ddc9c96aab. Reverted https://github.com/pytorch/pytorch/pull/81761 on behalf of https://github.com/mehtanirav due to Breakages in pytorch/glow	2022-09-06 22:31:14 +00:00
lezcano	65beff5acb	Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761 ) `torch.norm` is very odd. Some notable issues are: - The default value of `"fro"` in `torch.norm` has an odd behaviour when `dim=None`. This is handled in the new dispatch - The treatment of the `dtype` argument in `torch.norm` was completely wrong. This should fix it - Some `out=` variants in the previous implementation were also wrong. This should fix those. - This new dispatch should make some paths much faster. For example, `torch.norm(x)` where `x` is complex. I'll try to make the changes in these PRs as incremental as possible as this is a tricky one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81761 Approved by: https://github.com/ngimel	2022-09-02 19:12:25 +00:00
Mario Lezcano	f5a3515083	Make linalg.inv composite of linalg.solve (#80074 ) The `getri` kernel calls inside `getrs` so we can do so explicitly ourselves and save ourselves from having to maintain an extra kernel. This way we just need to optimise `lu_factor` and `lu_solve` and `inv` will be as efficient as it can be, as it'll be choosing the best backend to perform the factorisation and the best backend (not necessarily the same) to perform the solve. Fixes https://github.com/pytorch/pytorch/issues/77498 The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet	2022-08-25 09:28:55 +00:00
PyTorch MergeBot	5321bf52f2	Revert "Make linalg.inv composite of linalg.solve (#80074 )" This reverts commit 4737b3361479f4104efaa3bfa2ea517eaacb60fb. Reverted https://github.com/pytorch/pytorch/pull/80074 on behalf of https://github.com/malfet due to Depends on the changes from https://github.com/pytorch/pytorch/pull/83628	2022-08-25 00:43:00 +00:00
Mario Lezcano	4737b33614	Make linalg.inv composite of linalg.solve (#80074 ) The `getri` kernel calls inside `getrs` so we can do so explicitly ourselves and save ourselves from having to maintain an extra kernel. This way we just need to optimise `lu_factor` and `lu_solve` and `inv` will be as efficient as it can be, as it'll be choosing the best backend to perform the factorisation and the best backend (not necessarily the same) to perform the solve. Fixes https://github.com/pytorch/pytorch/issues/77498 The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet	2022-08-24 15:18:56 +00:00
lezcano	0bdcfcb840	Strenghten preconditions of linalg.cross (#83798 ) This makes `linalg.cross` array API complaint (https://github.com/data-apis/array-api/issues/415) and fixes a few bugs. Fixes https://github.com/pytorch/pytorch/issues/77629 Fixes https://github.com/pytorch/pytorch/issues/83756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83798 Approved by: https://github.com/mruberry	2022-08-24 15:17:12 +00:00
PyTorch MergeBot	bbe803cb35	Revert "Strenghten preconditions of linalg.cross (#83798 )" This reverts commit 7f0198e7390eff2f2f5fcb33ce36c99ec3b7f55e. Reverted https://github.com/pytorch/pytorch/pull/83798 on behalf of https://github.com/janeyx99 due to Sorry, land race caused functorch issues `7f0198e739`	2022-08-23 19:36:43 +00:00
lezcano	7f0198e739	Strenghten preconditions of linalg.cross (#83798 ) This makes `linalg.cross` array API complaint (https://github.com/data-apis/array-api/issues/415) and fixes a few bugs. Fixes https://github.com/pytorch/pytorch/issues/77629 Fixes https://github.com/pytorch/pytorch/issues/83756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83798 Approved by: https://github.com/mruberry	2022-08-23 18:06:51 +00:00
pbialecki	b4f7e22640	Enable periodic builds for CUDA 11.7 (#81688 ) CC @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/81688 Approved by: https://github.com/atalman	2022-08-10 00:03:51 +00:00
albanD	2255911f8a	Make M1 tests green (#82213 ) This is skipping all the failing tests and add a new master job to test on M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82213 Approved by: https://github.com/seemethere, https://github.com/soulitzer, https://github.com/malfet	2022-08-05 16:12:08 +00:00
lezcano	c5330183ca	[PrimTorch] Reference for linalg.matrix_norm (#81113 ) As per title. I corrected a thing or two from my previous implementation to make for better errors in some weird edge-cases and have a more clear understanding of when does this function support low_precision types and when it doesn't. We also use the optimisation for bfloat16 within `vector_norm` within this function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81113 Approved by: https://github.com/ngimel	2022-07-21 23:07:32 +00:00
Nikita Vedeneev	85144e63a9	`matrix_exp`: Make sure `_compute_linear_combinations` result preserves dim of the input. (#81330 ) Fixes https://github.com/pytorch/pytorch/issues/80948. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81330 Approved by: https://github.com/Lezcano, https://github.com/mruberry	2022-07-12 21:21:34 +00:00
lezcano	37a5819665	Make slogdet, linalg.sloget and logdet support metatensors (#79742 ) This PR also adds complex support for logdet, and makes all these functions support out= and be composite depending on one function. We also extend the support of `logdet` to complex numbers and improve the docs of all these functions. We also use `linalg_lu_factor_ex` in these functions, so we remove the synchronisation present before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79742 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD	2022-07-01 16:09:21 +00:00

... 3 4 5 6 7 ...

522 Commits