Commit Graph

1074 Commits

Author SHA1 Message Date
ada65fdd67 [complex32] fft support (cuda only) (#74857)
`half` and `complex32` support for `torch.fft.{fft, fft2, fftn, hfft, hfft2, hfftn, ifft, ifft2, ifftn, ihfft, ihfft2, ihfftn, irfft, irfft2, irfftn, rfft, rfft2, rfftn}`

* We only add support for `CUDA` as `cuFFT` supports these precision.
* We still error out on `CPU` and `ROCm` as their respective backends don't support this precision

For `cuFFT` following are the constraints for these precisions
* Minimum GPU architecture is SM_53
* Sizes are restricted to powers of two only
* Strides on the real part of real-to-complex and complex-to-real transforms are not supported
* More than one GPU is not supported
* Transforms spanning more than 4 billion elements are not supported

Ref: https://docs.nvidia.com/cuda/cufft/#half-precision-transforms

TODO:
* [x] Update docs about the restrictions
* [x] Check the correct way to check for `hip` device. (seems like `device.is_cuda()` is true for hip as well) (Thanks @peterbell10 )

Ref  for second point in TODO:e424e7d214/aten/src/ATen/native/SpectralOps.cpp (L31)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74857
Approved by: https://github.com/anjali411, https://github.com/peterbell10
2022-05-12 04:28:55 +00:00
c25bdeea26 Added logsumexp decomposition (#77219)
Pretty simple.

cc: @jansel who mentioned this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77219
Approved by: https://github.com/jansel
2022-05-12 02:01:31 +00:00
cc9d0f309e lshift and rshift stop support floating types (#77146)
Fixes #74358

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77146
Approved by: https://github.com/ngimel
2022-05-11 22:29:30 +00:00
21d4281b1d Simplify the OpInfos for norm / linalg_norm
This will be helpful later on if we want to start testing dtype (for
which the gradient formula is currently wrong).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76545

Approved by: https://github.com/mruberry
2022-05-11 18:53:53 +00:00
420b49c3ef [complex32] add, sub, neg (#77179)
Ref: https://github.com/pytorch/pytorch/issues/74537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77179
Approved by: https://github.com/anjali411
2022-05-11 17:20:42 +00:00
0a14a4c280 Register prims as operators.
This makes prims look as if they were defined in native_functions.yaml
but they're still all written in Python.  You now need to give a full
schema string for your prims.  The returned prim object is now
torch.ops.prim overload (prims are not allowed to be overloaded,
so we return the overload, not the overload packet, for speed.)

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77117

Approved by: https://github.com/mruberry, https://github.com/albanD
2022-05-11 16:38:14 +00:00
140c8168c4 [composite compliance] backward, forward: nn.linear (#77151)
Reference : https://github.com/pytorch/pytorch/issues/69991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77151
Approved by: https://github.com/Lezcano, https://github.com/soulitzer
2022-05-11 14:39:21 +00:00
4a45b88d6d [composite compliance] backward: gather, take_along_dim (#77152)
Reference : https://github.com/pytorch/pytorch/issues/69991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77152
Approved by: https://github.com/Lezcano, https://github.com/soulitzer
2022-05-11 14:39:14 +00:00
299ebf1ec8 [composite compliance] backward: masked_select, combinations (#76794)
Reference : https://github.com/pytorch/pytorch/issues/69991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76794
Approved by: https://github.com/soulitzer, https://github.com/Lezcano
2022-05-11 14:38:49 +00:00
afd8bd772c nn.functional.glu: forward AD support (#77186)
To knock out functions in https://github.com/pytorch/pytorch/issues/71117.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77186
Approved by: https://github.com/soulitzer
2022-05-10 23:58:35 +00:00
00fb828276 [chalf] update type promotion table (#76893)
Reference #74537

TODO:
* [x] Add tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76893
Approved by: https://github.com/anjali411
2022-05-10 19:51:33 +00:00
767af8e335 Add meta tensor support for some operations using python registration
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76916

Approved by: https://github.com/ezyang
2022-05-10 17:55:06 +00:00
8d4e069e66 add BFloat16 support for UpSample on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76935

Approved by: https://github.com/frank-wei
2022-05-10 16:56:41 +00:00
890bdf13e1 Remove deprecated torch.solve (#70986)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.solve`.

cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70986
Approved by: https://github.com/Lezcano, https://github.com/albanD
2022-05-10 13:44:07 +00:00
417676720e [JIT] fix opinfo utils to handle tensor kwargs
Previously, the traced function would only take tensor args (and not tensor kwargs). Then the tensor kwargs would be inlined as constants into the graph during tracing, and changes to the kwarg inputs wouldn't be reflected in new calls to the traced function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76768

Approved by: https://github.com/mruberry
2022-05-10 03:30:07 +00:00
949cbf1d65 [NVFuser] Opinfos for extremal values in binary ufuncs
Added slow tests for comparing the eager & fused outputs for given extremal inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75917

Approved by: https://github.com/jjsjann123, https://github.com/eellison
2022-05-10 03:22:20 +00:00
da565c07e1 [composite compliance] backward: value selecting reduction ops (#76731)
Reference: https://github.com/pytorch/pytorch/issues/69991
Fixes backward composite compliance for value selecting reduction ops like `topk`, `kthvalue`, `min`, `max`, `mode`, `msort`, `median`, `nanmedian`, `nanquantile`, `quantile`, etc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76731
Approved by: https://github.com/Lezcano, https://github.com/soulitzer
2022-05-09 20:05:31 +00:00
4ebc4890dd Revert "Add linalg.lu_solve"
This reverts commit fc5b4a5a33f1906ca335c26ec4da9357ed196419.

Reverted https://github.com/pytorch/pytorch/pull/72935 on behalf of https://github.com/malfet
2022-05-09 19:12:30 +00:00
1467e0dd5d Revert "Deprecate torch.lu"
This reverts commit a5bbfd94fb91c078416a99b95eb7b45d3ea81b6f.

Reverted https://github.com/pytorch/pytorch/pull/73804 on behalf of https://github.com/malfet
2022-05-09 19:06:44 +00:00
4ded63e2ac updates and encodes clamp xfails (#77077)
This should resolve clamp CI test failures that aren't using disabled issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77077
Approved by: https://github.com/janeyx99
2022-05-09 13:50:55 +00:00
bb8baea932 [primTorch] flatten, squeeze, unsqueeze... (#77043)
This PR ...

Makes the following testing changes:

- Updates stride testing in test_python_reference_consistency to only check strides of dimensions with length > 1
- Creates reference inputs for reshape
- Creates reference inputs for chunk
- Extends the sample inputs for unsqueeze
- Extends the sample inputs for stack -- test_conj_view and test_neg_view are now xfailed
  - https://github.com/pytorch/pytorch/issues/77046

Makes the following architecture changes:
- Adds the refs.special (sub)module
- Adds the refs.nn.functional (sub)module

Adds the following prims:
- expand_dims
- view_of
- rev
- clone

Adds the following references:
  -  flatten
  - squeeze
  - unsqueeze
  - special.i0e
  - special.i1e
  - logical_or
  - logical_and
  - isclose
  - flip
  - stack
  - nn.functional.elu
  - chunk
  - clone
  - narrow

Identifies the following bugs in PyTorch today:
- https://github.com/pytorch/pytorch/issues/77054
- https://github.com/pytorch/pytorch/issues/77055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77043
Approved by: https://github.com/ngimel
2022-05-09 11:24:55 +00:00
362525724b type promote clamp (#77035)
Fixes #76630
When clamp(Tensor, Tensor) is structured, big parts of this PR won't be needed, but for now let's fix type promotion to make behavior more regular.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77035
Approved by: https://github.com/mruberry
2022-05-09 05:54:17 +00:00
a585df6664 xfails bool add nvfuser test (#77031)
Per title, the test doesn't handle boolean alpha properly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77031
Approved by: https://github.com/ezyang
2022-05-08 02:31:03 +00:00
60f131fb6c Add OpInfo based meta tensor tests [RELAND]
PR #75994 was taking too long to ship so I extracted out the CrossRef gadget and
had it run on a simple OpInfo invocation only.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77008

Approved by: https://github.com/ngimel
2022-05-07 12:15:10 +00:00
c031643e39 Adds decorators for Python References and extends Python Reference testing (#76945)
This PR does the following...

Tests:
- fixes test_type_promotion in test_binary_ufuncs to correctly generate scalar cpu tensors
- fixes test_python_reference_consistency to use the Python Reference's reference inputs
- extends Python reference testing to test_conj_view, test_neg_view, and test_neg_conj_view
- adds a NaN propagation sample input for elementwise unary and binary operations
- fixes the UnaryUfuncInfo class to properly register its reference inputs
- Updates the Python Reference OpInfos to skip error inputs when their behavior on scalar inputs is inconsistent with their reference operators

Code organization:
- moves elementwise type promotion functionality to prims.utils

Prims & Refs:
- fixes scalar cpu tensor handling by having them pass through broadcasting and device and shape checks
- adds two decorators, `elementwise_type_promotion_wrapper` and `out_wrapper`, the former allows for elementwise type promotion to be automated and the latter automatically adds the out kwarg and handles it properly

cc @ezyang who also had some thoughts on cpu scalar tensor handling
cc @chillee -- might want to use this new decorator as we converge decompositions and references
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76945
Approved by: https://github.com/ngimel
2022-05-07 03:42:24 +00:00
901cb7c2e4 Skip TestCudaFuserOpInfo for Jiterator (#76995)
Fixes test_nvfuser_correctness_jiterator_*

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76995
Approved by: https://github.com/ngimel
2022-05-07 00:26:48 +00:00
828fb8c620 Revert "Add OpInfo based meta tensor tests"
This reverts commit d9fda18c4b21fe16e9413942d6ff420359a1e9bf.

Reverted https://github.com/pytorch/pytorch/pull/76905 on behalf of https://github.com/ezyang
2022-05-06 23:11:35 +00:00
d9fda18c4b Add OpInfo based meta tensor tests
https://github.com/pytorch/pytorch/pull/75994 was taking too long to
ship so I extracted out the CrossRef gadget and had it run on a simple
OpInfo invocation only.

TODO: There are failures that correspond to known bugs and need to be
skipped.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76905

Approved by: https://github.com/anjali411, https://github.com/mruberry, https://github.com/albanD
2022-05-06 20:12:28 +00:00
b08633917d Revert D29463782: opitimze ConvTransposedND with mkldnn float32 and bfloat16 on CPU
Test Plan: revert-hammer

Differential Revision:
D29463782 (479e0d64e6)

Original commit changeset: 74b3d6138945

Original Phabricator Diff: D29463782 (479e0d64e6)

fbshipit-source-id: a9765f67f9c8c01faad82450e3c6a8d0c0abbe4b
(cherry picked from commit 12ce4ef02a13da85aa9bfe6c92ac41d4e0b8d2b0)
2022-05-06 19:34:41 +00:00
8b6a78f39f Python Interface for Jiterator
This PR allows user to author a CUDA kernel in python.

```
from torch.cuda.jiterator import create_jit_fn

code_string = "template <typename T> T my_kernel(T x, T y, T alpha) { return  -x * y + x - y + alpha; }"
jitted_fn = create_jit_fn(code_string, alpha=0)

a = torch.rand(3, device='cuda')
b = torch.rand(3, device='cuda')
result = jitted_fn(a, b, alpha=1.0)
```

Limitations:
- Only supports elementwise kernel
- 1~8 tensor inputs (empty input, e.g. factory methods, is not supported)
- inputs tensors must live in cuda device
- cpu Scalar is not supported
- kwargs must be pre-declared when calling create_jit_fn
- kwargs must be convertible to at::Scalar, one of float64, int64_t, bool. (complex not support for now)

TODOs:
- [x] consolidate union and c10::variant implementation
- [x] plug into existing op testing framework
- [ ] rename files, place files in the right folder
- [ ] place util functions in the right file
- [x] enforce assumptions in python interface e.g <8 inputs, kwargs types
- [x] Add user-facing documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76394
Approved by: https://github.com/mruberry
2022-05-06 18:44:28 +00:00
479e0d64e6 opitimze ConvTransposedND with mkldnn float32 and bfloat16 on CPU (#58348)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58348

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D29463782

Pulled By: VitalyFedyunin

fbshipit-source-id: 74b3d613894526280996c8211e0df918ac09364d
(cherry picked from commit 2db963bfaee7823bf5ecb2ef909405eb02db0613)
2022-05-06 17:19:05 +00:00
0adf070574 Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454

Approved by: https://github.com/cpuhrsch
2022-05-06 15:40:22 +00:00
621ff0f973 Add linalg.vander
This PR adds `linalg.vander`, the linalg version of `torch.vander`.

We add autograd support and support for batched inputs.

We also take this chance to improve the docs (TODO: Check that they
render correctly!) and add an OpInfo.

**Discussion**: The current default for the `increasing` kwargs is extremely
odd as it is the opposite of the classical definition (see
[wiki](https://en.wikipedia.org/wiki/Vandermonde_matrix)). This is
reflected in the docs, where I explicit both the odd defaults that we
use and the classical definition. See also [this stackoverflow
post](https://stackoverflow.com/a/71758047/5280578), which shows how
people are confused by this defaults.

My take on this would be to correct the default to be `increasing=True`
and document the divergence with NumPy (as we do for other `linalg`
functions) as:

- It is what people expect
- It gives the correct determinant called "the Vandermonde determinant" rather than (-1)^{n-1} times the Vandermonde det (ugh).
- [Minor] It is more efficient (no `flip` needed)
- Since it's under `linalg.vander`, it's strictly not a drop-in replacement for `np.vander`.

We will deprecate `torch.vander` in a PR after this one in this stack
(once we settle on what's the correct default).

Thoughts? mruberry

cc kgryte rgommers as they might have some context for the defaults of
NumPy.

Fixes https://github.com/pytorch/pytorch/issues/60197

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76303

Approved by: https://github.com/albanD, https://github.com/mruberry
2022-05-06 08:44:14 +00:00
465e0ae266 Bugfix scatter_reduce backward formulas
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76523

Approved by: https://github.com/albanD
2022-05-05 20:22:39 +00:00
a5bbfd94fb Deprecate torch.lu
**BC-breaking note**:

This PR deprecates `torch.lu` in favor of `torch.linalg.lu_factor`.
A upgrade guide is added to the documentation for `torch.lu`.

Note this PR DOES NOT remove `torch.lu`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73804

Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry
2022-05-05 19:17:11 +00:00
fc5b4a5a33 Add linalg.lu_solve
This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA
when calling the batched MAGMA backend with trans=True. We work around
that by solving the system solving two triangular systems.

We also update the heuristics for this function, as they were fairly
updated. We found that cuSolver is king, so luckily we do not need to
rely on the buggy backend from magma for this function.

We added tests testing this function left and right. We also added tests
for the different backends. We also activated the tests for AMD, as
those should work as well.

Fixes https://github.com/pytorch/pytorch/issues/61657

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72935

Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry
2022-05-05 19:02:13 +00:00
1c776d209c Adds amax and amin references
Also extends reference testing to error inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76855
Approved by: https://github.com/mruberry
2022-05-05 15:53:09 +00:00
33fabe9a2e functional.max_unpool: OpInfo tests + simpler backward + forward ad + fwad over backward ad
Resolves https://github.com/pytorch/pytorch/issues/67657, https://github.com/pytorch/pytorch/issues/67658, https://github.com/pytorch/pytorch/issues/67660.

These are not necessarily bugs because we cannot produce arbitrary samples coming from `max_pool` to the gradcheck's eternal satisfaction.

This PR also replaces low-level complicated backward kernels with much simpler high-level and well-tested counterparts. The replacement is also faster (before: parallel for loop, after: memory layout optimized TensorIterator's parallelization coming from `gather`).

cc @albanD @mruberry @jbschlosser @walterddr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68625
Approved by: https://github.com/albanD
2022-05-05 10:13:51 +00:00
7cb7cd5802 Add linalg.lu
This PR modifies `lu_unpack` by:
- Using less memory when unpacking `L` and `U`
- Fuse the subtraction by `-1` with `unpack_pivots_stub`
- Define tensors of the correct types to avoid copies
- Port `lu_unpack` to be a strucutred kernel so that its `_out` version
does not incur on extra copies

Then we implement `linalg.lu` as a structured kernel, as we want to
compute its derivative manually. We do so because composing the
derivatives of `torch.lu_factor` and `torch.lu_unpack` would be less efficient.

This new function and `lu_unpack` comes with all the things it can come:
forward and backward ad, decent docs, correctness tests, OpInfo, complex support,
support for metatensors and support for vmap and vmap over the gradients.

I really hope we don't continue adding more features.

This PR also avoids saving some of the tensors that were previously
saved unnecessarily for the backward in `lu_factor_ex_backward` and
`lu_backward` and does some other general improvements here and there
to the forward and backward AD formulae of other related functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67833

Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry
2022-05-05 09:17:05 +00:00
1a4eea57be Improve derivative of QR decomposition
We derive and implement a more concise rule for the forward and backward
derivatives of the QR decomposition. While doing this we:
- Fix the composite compliance of `linalg.qr` and we make it support batches
- Improve the performance and simplify the implementation of both foward and backward
- Avoid saving the input matrix for the backward computation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76115

Approved by: https://github.com/nikitaved, https://github.com/albanD
2022-05-05 09:14:57 +00:00
0f1618ef76 [complex32] real and imag (also remove unused real and imag kernels)
Reference: #74537

Removes unused kernels for `real` and `imag` for CPU and CUDA

Also adds `complex_types_and` to `common_dtype`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76615
Approved by: https://github.com/anjali411
2022-05-05 04:36:58 +00:00
c59d5f17d9 Remove pow and float_power TestGradient Skips
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76825

Approved by: https://github.com/soulitzer
2022-05-04 22:36:19 +00:00
381e08309f Revert "Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)"
This reverts commit fc2a2e8b7271b258f5f394c94e9154ebef4769e4.

Reverted https://github.com/pytorch/pytorch/pull/75454 on behalf of https://github.com/b0noI
2022-05-04 22:31:31 +00:00
8a2f207de8 [complex32] enable testing for multiple ops
Reference #74537

Ops `block_diag`, `chunk`, `clone`, `contiguous`, `diag_embed`, `diagonal`, `as_strided`, `column_stack`, `T`, `H`, `mT`, `mH`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76724
Approved by: https://github.com/anjali411
2022-05-04 21:32:21 +00:00
1335512056 Sparse CSR: Add CPU fallback for sampled_addmm
`torch.sparse.sampled_addmm` function is used in backward for
`torch.sparse.addmm` and `torch.sparse.mm` therefore we need a CPU
implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76589

Approved by: https://github.com/cpuhrsch
2022-05-04 21:30:43 +00:00
b557e102d8 Fixes prim type promotion and updates type promotion testing
This PR fixes prim elementwise type promotion, tests elementwise binary references using `test_type_promotion` in the elementwise binary test suite, and updates that test with additional cases for float x complex and scalar type promotion.

The following issues were discovered while working on this PR:

- https://github.com/pytorch/pytorch/issues/76806
- https://github.com/pytorch/pytorch/issues/76805
- https://github.com/pytorch/pytorch/issues/76804
- https://github.com/pytorch/pytorch/issues/76803
- https://github.com/pytorch/pytorch/issues/76801

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76809
Approved by: https://github.com/ngimel
2022-05-04 17:58:10 +00:00
ef9f56eb0b [primTorch] slice and transpose & etc.
This PR...

Adds the following prims:
- slice
- slice_in_dim
- transpose

Adds the following refs:
- cat
- permute
- transpose
- swap_axes (alias for transpose)
- tensor_split

Makes the following test improvements:
- adds reference inputs for torch.permute
- adds a NumPy reference for torch.permute
- adds reference inputs for torch.cat

Fixes the following bugs:
- adds support for scalars to the min and max prims

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76727
Approved by: https://github.com/ngimel
2022-05-04 05:38:33 +00:00
c51b53d4ef [WIP] sum reference
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76714
Approved by: https://github.com/mruberry
2022-05-04 02:50:00 +00:00
fc2a2e8b72 Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454

Approved by: https://github.com/cpuhrsch
2022-05-03 23:17:07 +00:00
47e7b12d39 Update isfinite for complex to avoid overflow
Fixes: https://github.com/pytorch/pytorch/issues/66402
`abs` is larger than both real and imag, therefore, it could overflow, that is, both real and imag are finite, but abs are infinite. We should avoid this case. See also https://github.com/pytorch/pytorch/pull/76598/files#r862257429
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76606
Approved by: https://github.com/ngimel, https://github.com/mruberry
2022-05-03 17:14:30 +00:00