Summary:
Both CPU and CUDA versions of PowKernel reimplement functionality that
already exists in UnaryOps, such as sqrt, rsqrt and reciprocal
Find this out while looking at sluggish compilation of PowerKernel.cu:
- Before the change it took 11m5s and resulted in 7.6Mb .o file
- After the change compilation finished in 10m20s, and 6.4Mb .o file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57873
Reviewed By: ezyang
Differential Revision: D28304929
Pulled By: malfet
fbshipit-source-id: ac499476280de55a92044b1b041b1246eea74c64
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57749
add to a fx test
Test Plan: Imported from OSS
Reviewed By: huiguoo
Differential Revision: D28425974
fbshipit-source-id: 195c7a1944decb7a2a99c2831cab38485f32be17
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58040
This PR uses `torch.linalg.inv_ex` to determine the non-invertible
inputs and return the condition number of infinity for such inputs.
Added OpInfo entry for `torch.linalg.cond`.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28405146
Pulled By: mruberry
fbshipit-source-id: 524b9a38309851fa6461cb787ef3fba5aa7d5328
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58039
The new function has the following signature
`inv_ex(Tensor inpit, *, bool check_errors=False) -> (Tensor inverse, Tensor info)`.
When `check_errors=True`, an error is thrown if the matrix is not invertible; `check_errors=False` - responsibility for checking the result is on the user.
`linalg_inv` is implemented using calls to `linalg_inv_ex` now.
Resolves https://github.com/pytorch/pytorch/issues/25095
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28405148
Pulled By: mruberry
fbshipit-source-id: b8563a6c59048cb81e206932eb2f6cf489fd8531
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57701
The new OpInfo flag has the following semantic:
- If it says that it supports forward AD, we run gradcheck with forward AD to ensure it is correct
- If it says that it does not support it, we check that the corresponding error is raised
All the added tests take 3s to run for CPU builds and 1min for GPU builds which should be pretty negligible compared to the test_ops runtime for each of these arch.
Test Plan: Imported from OSS
Reviewed By: agolynski
Differential Revision: D28387767
Pulled By: albanD
fbshipit-source-id: 369d76921c8460aa4548f9b5159b7297994672f5
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57719.
This PR fixes `torch.Tensor{__rsub__, __rdiv__, __rtruediv__, __pow__, __rmatmul__}` to return `NotImplemented` instead of raising a `TypeError`.
cc/ mruberry: The first commit of this PR is the same as 1d209db1cc excepts the commit message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57934
Reviewed By: mruberry
Differential Revision: D28351931
Pulled By: albanD
fbshipit-source-id: 985457a44dba24d2496794dfb8c1661cbcd4ff8f
Summary:
Enabled BFloat16 for `nan_to_num` on CUDA. For comparison with numpy, a [workaround suggested](https://github.com/pytorch/pytorch/issues/57982#issuecomment-839150556) by ngimel is being used - the OpInfo's `sample.kwargs` is used to set two `numpy.kwargs`, viz. `posinf` & `neginf` for `BFloat16`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58063
Reviewed By: mruberry
Differential Revision: D28373478
Pulled By: ngimel
fbshipit-source-id: 6493b560d83632a8519c1d3bfc5c54be9b935fb9
Summary:
Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method.
However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`,
`torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python.
Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function.
~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913
Reviewed By: albanD
Differential Revision: D28355725
Pulled By: mruberry
fbshipit-source-id: 281260f3b6e93c15b08b2ba66d5a221314b00e78
Summary:
This PR adds a note to the documentation that torch.svd is deprecated together with an upgrade guide on how to use `torch.linalg.svd` and `torch.linalg.svdvals` (Lezcano's instructions from https://github.com/pytorch/pytorch/issues/57549).
In addition, all usage of the old svd function is replaced with a new one from torch.linalg module, except for the `at::linalg_pinv` function, that fails the XLA CI build (https://github.com/pytorch/xla/issues/2755, see failure in draft PR https://github.com/pytorch/pytorch/pull/57772).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57981
Reviewed By: ngimel
Differential Revision: D28345558
Pulled By: mruberry
fbshipit-source-id: 02dd9ae6efe975026e80ca128e9b91dfc65d7213
Summary:
Enabled `dot` for BFloat16 on CUDA (version 11+).
It also enabled `matmul` & `vdot` for BFloat16.
Backward for `matmul` isn't supported for `BFloat16`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57903
Reviewed By: mruberry
Differential Revision: D28346031
Pulled By: ngimel
fbshipit-source-id: 0917e9e0d6cf3694f45fe1c7e76370581502036a
Summary:
Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method.
However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`,
`torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python.
Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function.
~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913
Reviewed By: astaff
Differential Revision: D28117714
Pulled By: mruberry
fbshipit-source-id: befd33db12ecc147afacac792418b6f4948fa4a4
Summary:
This PR is focused on the API for `linalg.matrix_norm` and delegates computations to `linalg.norm` for the moment.
The main difference between the norms is when `dim=None`. In this case
- `linalg.norm` will compute a vector norm on the flattened input if `ord=None`, otherwise it requires the input to be either 1D or 2D in order to disambiguate between vector and matrix norm
- `linalg.vector_norm` will flatten the input
- `linalg.matrix_norm` will compute the norm over the last two dimensions, treating the input as batch of matrices
In future PRs, the computations will be moved to `torch.linalg.matrix_norm` and `torch.norm` and `torch.linalg.norm` will delegate computations to either `linalg.vector_norm` or `linalg.matrix_norm` based on the arguments provided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57127
Reviewed By: mrshenli
Differential Revision: D28186736
Pulled By: mruberry
fbshipit-source-id: 99ce2da9d1c4df3d9dd82c0a312c9570da5caf25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50903
First part of #50010. Also fixes#51127.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D27911345
Pulled By: mruberry
fbshipit-source-id: 7138fddc935802918ab9ff19f4bc1b9f4d745d41
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56820
The test only fails for inverse n-dim functions with `norm="forward"`. The relative error for isn't actually any bigger than other norm modes though. It's just that the magnitude of the result is bigger, so the absolute tolerance is less relative each element. So, I just increase the relative tolerance to compensate.
This `precisionOverride` is already applied to `fftn` and `rfftn` for exactly the same reason.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57576
Reviewed By: albanD
Differential Revision: D28249222
Pulled By: mruberry
fbshipit-source-id: 734c7c1ae8236b253d6e3cd2218c05d21901c567
Summary:
As per discussion here https://github.com/pytorch/pytorch/pull/57127#discussion_r624948215
Note that we cannot remove the optional type from the `dim` parameter because the default is to flatten the input tensor which cannot be easily captured by a value other than `None`
### BC Breaking Note
This PR changes the `ord` parameter of `torch.linalg.vector_norm` so that it no longer accepts `None` arguments. The default behavior of `2` is equivalent to the previous default of `None`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57662
Reviewed By: albanD, mruberry
Differential Revision: D28228870
Pulled By: heitorschueroff
fbshipit-source-id: 040fd8055bbe013f64d3c8409bbb4b2c87c99d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57316
CUDA support is implemented using cuSOLVER.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28242071
Pulled By: mruberry
fbshipit-source-id: 6f0a1c50c21c376d2ee2907bddb618c6a600db1f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57315
This PR ports `torch.ormqr` from TH to ATen.
CUDA path will be implemented in a follow-up PR.
With ATen port, support for complex and batched inputs is added.
The tests are rewritten and OpInfo entry is added.
We can implement the least squares solver with geqrf + ormqr +
triangular_solve. So it's useful to have this function renewed at least for the
internal code.
Resolves https://github.com/pytorch/pytorch/issues/24748
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D28242070
Pulled By: mruberry
fbshipit-source-id: f070bb6ac2f5a3269b163b22f7354e9089ed3061
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57189
`torch.linalg.eigvalsh` now supports autograd. This is achieved by
computing the eigenvectors internally if input requires grad,
otherwise the eigenvectors are not computed and the operation is faster.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D28199708
Pulled By: albanD
fbshipit-source-id: 12ac56f50137398613e186abd49f82c8ab83532e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57188
`torch.linalg.svdvals` now supports autograd. This is achieved by
computing the singular vectors internally if input requires grad,
otherwise the vectors are not computed and the operation is faster.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D28199709
Pulled By: albanD
fbshipit-source-id: cf39cf40965c606927db5331ce16743178fa711f