Commit Graph

46987 Commits

Author SHA1 Message Date
642fc94501 Update extending.rst (#78707)
Follow-up fix for https://github.com/pytorch/pytorch/pull/78073 : https://github.com/pytorch/pytorch/pull/78073#discussion_r887621219

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78707
Approved by: https://github.com/albanD
2022-06-02 17:24:00 +00:00
79ddc32b6a Add a check to ensure input func to Library.impl is callable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77990

Approved by: https://github.com/albanD
2022-06-02 16:55:39 +00:00
ebc4cfe3aa Add __all__ definition in torch.profiler to fix Pylance type check er… (#78553)
- Declare __all__ to make sure all the import is marked as public, so Pylance won't complain.
- Import modules directly from torch._C._autograd to suppress Pylance warnings.

Fixes #76652

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78553
Approved by: https://github.com/albanD, https://github.com/robieta
2022-06-02 16:48:36 +00:00
b0814b63df Reenable assert after test update
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78658

Approved by: https://github.com/ezyang, https://github.com/albanD
2022-06-02 16:40:06 +00:00
308d813d45 Add nonuniform observer class and tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78680

Approved by: https://github.com/dzdang
2022-06-02 16:29:21 +00:00
eb88ea01b5 Cleanup impl_nvfuser for unary ops (#78670)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78670
Approved by: https://github.com/mruberry, https://github.com/IvanYashchuk
2022-06-02 16:17:47 +00:00
7fc73285da Added a function that prints the check statuses run on a given commit SHA (#78663)
Relates to #76700

Gets the commit SHAs from the past M minutes, and prints out the SHAs along with the status checks for all of the jobs for each commit. The current output prints out each SHA and a list of all of the workflow job conclusions.

Example output:
![Screen Shot 2022-06-01 at 4 51 07 PM](https://user-images.githubusercontent.com/24441980/171499216-59f6d2f2-01b3-4d01-a7ae-5215b4ac4e5c.png)

**Test Plan:** compare output with HUD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78663
Approved by: https://github.com/seemethere
2022-06-02 15:05:14 +00:00
4a5381ab40 Bessel functions (#78451)
Adds:

```Python
bessel_j0(input, *, out=None) -> Tensor
```

Bessel function of the first kind of order $0$, $J_{0}(\text{input})$.

```Python
bessel_j1(input, *, out=None) -> Tensor
```

Bessel function of the first kind of order $1$, $J_{1}(\text{input})$.

```Python
bessel_j0(input, *, out=None) -> Tensor
```

Bessel function of the second kind of order $0$, $Y_{0}(\text{input})$.

```Python
bessel_j1(input, *, out=None) -> Tensor
```

Bessel function of the second kind of order $1$, $Y_{1}(\text{input})$.

```Python
modified_bessel_i0(input, *, out=None) -> Tensor
```

Modified Bessel function of the first kind of order $0$, $I_{0}(\text{input})$.

```Python
modified_bessel_i1(input, *, out=None) -> Tensor
```

Modified Bessel function of the first kind of order $1$, $I_{1}(\text{input})$.

```Python
modified_bessel_k0(input, *, out=None) -> Tensor
```

Modified Bessel function of the second kind of order $0$, $K_{0}(\text{input})$.

```Python
modified_bessel_k1(input, *, out=None) -> Tensor
```

Modified Bessel function of the second kind of order $1$, $K_{1}(\text{input})$.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78451
Approved by: https://github.com/mruberry
2022-06-02 14:06:20 +00:00
78824a7d54 Revert "Always convert truthy booleans to 1"
This reverts commit 3c3c6cd9821dc48182cbfb96572cc562b76a375e.

Reverted https://github.com/pytorch/pytorch/pull/77122 on behalf of https://github.com/mruberry due to broke some jobs, like https://github.com/pytorch/pytorch/runs/6706333043?check_suite_focus=true
2022-06-02 13:45:54 +00:00
ce7c7bb2a9 Fix embedding jvp support by making embedding_renorm ignore forward mode AD (#78560)
On functorch, we started seeing [embedding forward mode fail](https://github.com/pytorch/functorch/pull/816). From looking at it, we figured out that recently [embedding got forward mode support enabled](369d9f4137) and then doing forward mode with embedding and [max_norm doesn't work with gradcheck](https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py#L8877-L8881), so it's not checked.

What was happening is that `embedding_renorm` was setting `torch.no_grad()` which only turns off the backwards mode AD so functorch's jvp tests were still using forward mode AD during the `embedding_renorm` call. This makes it so that we don't use forward mode during the embedding_renorm call
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78560
Approved by: https://github.com/soulitzer, https://github.com/albanD
2022-06-02 13:40:21 +00:00
0be4672a9d [primTorch] Use the same error message as in ATen for canonicalize_dim (#78541)
Fixes https://github.com/pytorch/pytorch/issues/78252.

Locally nothing seems to break when changing the error type and the error message meaning there were no tests.
At least one xfailed test from https://github.com/pytorch/pytorch/pull/78080 wouldn't pass with this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78541
Approved by: https://github.com/ngimel, https://github.com/mruberry
2022-06-02 12:10:41 +00:00
48256f3cbb Reference implementations for rot90, roll, atleast_1d,2d,3d (#78080)
This PR adds the following references:

- `rot90`
- `roll`
- `atleast_1d`
- `atleast_2d`
- `atleast_3d`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78080
Approved by: https://github.com/mruberry
2022-06-02 09:05:11 +00:00
fea909b43e [primTorch] Adds broadcast_shapes reference (#78612)
1. Added references `_refs.broadcast_shapes`
2. Added OpInfo test for `torch.broadcast_shapes`

A few minor changes:
- `test_python_ref_meta` and `_ref_test_helper` update to avoid non-tensor outputs
- type annotation update for `_resize_meta`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78612
Approved by: https://github.com/mruberry
2022-06-02 08:56:37 +00:00
4858c56334 MPS: Fix issues with view tensors and linspace. (#78690)
Fixes: #https://github.com/pytorch/pytorch/issues/78642, https://github.com/pytorch/pytorch/issues/78511
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78690
Approved by: https://github.com/razarmehr, https://github.com/DenisVieriu97
2022-06-02 06:17:19 +00:00
3c3c6cd982 Always convert truthy booleans to 1
Ref #54789

A `bool` has only two valid values, 1 or 0. Any in-memory value
outside of those leads to undefined behavior. So, instead of
`reinterpret_cast`-ing to `bool*` I introduce `c10::load<scalar_t>`
which will read as `unsigned char` and convert to a valid `bool`.

This gets >90% of operators working, but the remaining operators where
skips and xfails have been added will require individual attention.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77122

Approved by: https://github.com/mruberry
2022-06-02 04:18:34 +00:00
388d44314d Fix docs for torch.real (#78644)
Non-complex types are supported

```python
>>> import torch
>>> z = torch.zeros(5)
>>> torch.real(z.float())
tensor([0., 0., 0., 0., 0.])
>>> torch.real(z.int())
tensor([0, 0, 0, 0, 0], dtype=torch.int32)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78644
Approved by: https://github.com/mruberry, https://github.com/anjali411
2022-06-02 04:17:03 +00:00
b651148fc3 remove prims::square (#78627)
because it is just `x * x`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78627
Approved by: https://github.com/mruberry
2022-06-02 02:18:17 +00:00
876c359347 Generalize sizes and strides policy on _make_wrapper_subclass
Previously, there was a `dispatch_strides` boolean arg. Change this to
a string argument that directly maps onto `SizesStridesPolicy`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78646

Approved by: https://github.com/ezyang
2022-06-02 02:06:38 +00:00
64a01f12ad Revert "[complex32, jiterator] cos, sinh, cosh, tanh (#78458)"
This reverts commit 5fbec86faef07d66ab696bc4c4edbaf6259a2189.

Reverted https://github.com/pytorch/pytorch/pull/78458 on behalf of https://github.com/malfet due to as it broke Windows Ci, see 5fbec86fae
2022-06-02 01:01:13 +00:00
02273f056b Norm decomposition (#78582)
A decomposition for torch.ops.aten.norm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78582
Approved by: https://github.com/Chillee
2022-06-02 00:25:43 +00:00
cfc968956c [ONNX] Update CI test script to run parallel by default (#78200)
Also update default process count to auto, matching the CI machine
cpu core count.

Fixes #77678

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78200
Approved by: https://github.com/garymm
2022-06-02 00:25:17 +00:00
bf629642ff remove math kernels that have derivative formulas in core
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78183

Approved by: https://github.com/ezyang
2022-06-01 23:53:13 +00:00
575c420287 [DataPipe] Lazily generate exception message for performance
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78673

Approved by: https://github.com/ejguan
2022-06-01 23:19:31 +00:00
7dc5b5bf10 move generated_srcs_list.bzl into caffe2/build.bzl
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77680

This is only used by ATen code generation and libraries. These are
about to move into the shared build structure, so let's move this
cleanly first.

Differential Revision: [D36455725](https://our.internmc.facebook.com/intern/diff/D36455725/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36455725/)!

Approved by: https://github.com/kit1980
2022-06-01 23:03:54 +00:00
d90652db65 Docs: build with Sphinx 5 (#70309)
Fixes #60979. Also see #61045 and https://github.com/sphinx-doc/sphinx/issues/9395 for discussion.

I _believe_ the reason that we were previously pinning to Sphinx 3 was because of issues with pytorch_sphinx_theme and Sphinx 4 support, but these seem to have been resolved now. See https://torchgeo.readthedocs.io/ for an example of docs built with pytorch_sphinx_theme and Sphinx 4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70309
Approved by: https://github.com/albanD
2022-06-01 22:28:29 +00:00
22fd2f2e05 [quant] Factor out common operator configs from native.py (#78407)
Summary:
Some helper functions that generate operator configs based on dtype_configs are reused in native backend and tensorrt, so we
factor out this part to a util file: common_operator_configs.py

Test Plan: buck test mode/opt deeplearning/trt/fx2trt_oss/test/quant:test_quant_trt

Differential Revision: D36728359

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78407
Approved by: https://github.com/vkuzo, https://github.com/andrewor14
2022-06-01 22:24:36 +00:00
634954c55c [MPS] Do not pass linker command to a compiler (#78630)
`-weak_framework` is a linker rather than a compiler option and as such
it should not be passed as CXX flag
Also, use `string(APPEND` rather than `set(FOO "$(FOO) ...)`

Likely fixes our ability to use `sccache` for MacOS CI builds, see https://github.com/pytorch/pytorch/issues/78375#issuecomment-1143697183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78630
Approved by: https://github.com/albanD
2022-06-01 22:08:54 +00:00
ca7f948806 Don't include libiomp with conda install on MacOS (#78632)
Fixes #78490

Following command:
```
conda install pytorch torchvision torchaudio -c pytorch-nightly
```

Installs libiomp . Hence we don't want to package libiomp with conda installs. However, we still keep it for libtorch and wheels.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78632
Approved by: https://github.com/malfet
2022-06-01 22:06:16 +00:00
6671b504f7 Modernize FakeTensorMode, throw on non-fake inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78516

Approved by: https://github.com/samdow
2022-06-01 21:43:59 +00:00
24b7142d7a Update distributed/CONTRIBUTING.md to remove ProcessGroupAgent references and add test instructions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78625

Approved by: https://github.com/mrshenli, https://github.com/albanD
2022-06-01 21:31:12 +00:00
5874a31169 [quant][core][better-engineering] Rename files in quantized/cpu directory to conform with non-quantized countertpart filenames
Summary:
Names of analogous files in quantized directory (previously snake case) were inconsistent with
their non-quantized filename counterparts (pascal case). This is the second of a series of PRs that changes
all files in quantized (and sub-directories) dir to have pascal case.

Some files have not been renamed as it is causing issues related to
custom class with `import torch` at runtime. See
https://github.com/pytorch/pytorch/pull/77037 for additional details

Test Plan:
```
python test/test_quantization.py
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77422

Approved by: https://github.com/jerryzh168
2022-06-01 21:20:30 +00:00
aa06d05297 enable with semantics
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78214

Approved by: https://github.com/ezyang, https://github.com/zou3519
2022-06-01 21:14:45 +00:00
9b81e81771 [PyTorchEdge] Extend Flatbuffer to get mobile_info for NMLML workflows
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78306

Extending the feature available from pickle that helps NMLML system get info of mobile models from `extra_files` dir

Differential Revision: [D36609548](https://our.internmc.facebook.com/intern/diff/D36609548/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36609548/)!

Approved by: https://github.com/iseeyuan
2022-06-01 20:09:09 +00:00
5fbec86fae [complex32, jiterator] cos, sinh, cosh, tanh (#78458)
Follows: #74537 and #74748

cc @kshitij12345!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78458
Approved by: https://github.com/kshitij12345, https://github.com/anjali411
2022-06-01 19:42:51 +00:00
4bb8db85e9 Revert "[chalf] where(cpu and cuda), pow(cuda) (#77640)"
This reverts commit 3697cf7f76fcad845a1f38643d8b92febf5bc5a3.

Reverted https://github.com/pytorch/pytorch/pull/77640 on behalf of https://github.com/mruberry due to as it broke ROCM on trunk
2022-06-01 19:39:38 +00:00
272193d026 Move THPStorage definitions out of torch/csrc/generic (#78032)
Fixes #77908

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78032
Approved by: https://github.com/ezyang
2022-06-01 19:00:58 +00:00
6a4997e66a [Profiler] Weaken ordering check during post processing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78563

The profiler assembles a call hierarchy by replaying recorded events. There is an assert to ensure that the events form a well structured tree; however many of the inputs are from external sources and small differences (e.g. recording time in a lower precision) leads to traces which violate that assumption. For now this is acceptable; the post processing can handle resolving these descrepencies. As a result, I am relaxing the assert to only test event types where we expect the framework to be able to enforce these strong structural requirements.

Differential Revision: [D36787787](https://our.internmc.facebook.com/intern/diff/D36787787/)

Approved by: https://github.com/suo
2022-06-01 18:55:19 +00:00
5aa2ed1922 Remove call to .contiguous() for local_shard_t.
The call to contiguous was probably left over from a previous
implementation and is no longer needed.

Had to adjust atol for one of the tests to accomodate for this.

Differential Revision: [D36797942](https://our.internmc.facebook.com/intern/diff/D36797942/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78598

Approved by: https://github.com/kumpera
2022-06-01 18:50:10 +00:00
497ae27050 [chalf] warn once on creating a chalf tensor (#78245)
`chalf` is experimental as the op coverage is low.

Following script raises 6 warnings if `set_warn_always(True)` else raises only 1 warning.
```python
import torch
torch.set_warn_always(True)
device='cpu'
t = torch.randn(3, dtype=torch.chalf, device=device)
y = torch.rand(3, dtype=torch.chalf, device=device)
# Allocates new tensor for result
t + y

device='cuda'
t = torch.randn(3, dtype=torch.chalf, device=device)
y = torch.rand(3, dtype=torch.chalf, device=device)

# Allocates new tensor for result
t + y

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78245
Approved by: https://github.com/anjali411
2022-06-01 18:38:31 +00:00
3697cf7f76 [chalf] where(cpu and cuda), pow(cuda) (#77640)
Ref: #74537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77640
Approved by: https://github.com/anjali411, https://github.com/ngimel
2022-06-01 18:35:53 +00:00
cd4ffc865b Skip test_fn_gradgrad_linalg_pinv_singular_cuda_complex128
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78623

Approved by: https://github.com/albanD
2022-06-01 18:17:16 +00:00
7390658e80 Add APoT tensor class and tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78577

Approved by: https://github.com/dzdang
2022-06-01 18:14:06 +00:00
d990277908 Make lintrunner compatible with M1 (#78628)
numpy-1.20 is not available on the platform, so change pinned version to 1.21.6

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78628
Approved by: https://github.com/suo, https://github.com/ZainRizvi, https://github.com/janeyx99, https://github.com/seemethere
2022-06-01 17:44:09 +00:00
03cf01bdc0 index_select for COO CUDA tensors. (#77551)
Brings a native CUDA implementation for `index_select`. Master silently converts CUDA tensors to CPU for CUDA support.

Case `nnz >> size` could be optimized similar to how https://github.com/pytorch/pytorch/pull/72710 is doing that.

Some benchmarks:
<details>

<summary>PR/torch_sparse/master</summary>

```
[------------------------------- cuda coo.index_select -------------------------------]
                                                    |   PR   |  torch_sparse  |  master
32 threads: ---------------------------------------------------------------------------
      n=10000, nnz=100, index_len=100, dim=0        |    96  |       327      |     70
      n=10000, nnz=100, index_len=100, dim=1        |   120  |       505      |     74
      n=10000, nnz=100, index_len=1000, dim=0       |    90  |       333      |     93
      n=10000, nnz=100, index_len=1000, dim=1       |   120  |       499      |     98
      n=10000, nnz=100, index_len=10000, dim=0      |    92  |       331      |    350
      n=10000, nnz=100, index_len=10000, dim=1      |   100  |       506      |    352
      n=100000, nnz=1000, index_len=100, dim=0      |    53  |       274      |     60
      n=100000, nnz=1000, index_len=100, dim=1      |    90  |       368      |     71
      n=100000, nnz=1000, index_len=1000, dim=0     |    93  |       332      |    100
      n=100000, nnz=1000, index_len=1000, dim=1     |   130  |       501      |    140
      n=100000, nnz=1000, index_len=10000, dim=0    |   100  |       341      |    522
      n=100000, nnz=1000, index_len=10000, dim=1    |   130  |       530      |    549
      n=1000000, nnz=10000, index_len=100, dim=0    |    90  |       429      |    110
      n=1000000, nnz=10000, index_len=100, dim=1    |   296  |       810      |    355
      n=1000000, nnz=10000, index_len=1000, dim=0   |   100  |       435      |    170
      n=1000000, nnz=10000, index_len=1000, dim=1   |   309  |       830      |    548
      n=1000000, nnz=10000, index_len=10000, dim=0  |   110  |       446      |    750
      n=1000000, nnz=10000, index_len=10000, dim=1  |   310  |       830      |   1000
      n=10, nnz=100, index_len=100, dim=0           |    90  |       333      |     74
      n=10, nnz=100, index_len=100, dim=1           |   100  |       497      |     78
      n=10, nnz=100, index_len=1000, dim=0          |    90  |       329      |    140
      n=10, nnz=100, index_len=1000, dim=1          |   100  |       800      |    100
      n=10, nnz=100, index_len=10000, dim=0         |    93  |       340      |    900
      n=10, nnz=100, index_len=10000, dim=1         |   120  |       800      |    489
      n=10, nnz=1000, index_len=100, dim=0          |    90  |       321      |    140
      n=10, nnz=1000, index_len=100, dim=1          |   100  |       680      |    140
      n=10, nnz=1000, index_len=1000, dim=0         |   110  |       349      |    670
      n=10, nnz=1000, index_len=1000, dim=1         |   130  |       740      |    800
      n=10, nnz=1000, index_len=10000, dim=0        |   302  |       503      |   4882
      n=10, nnz=1000, index_len=10000, dim=1        |   325  |      2257      |   5262
      n=10, nnz=10000, index_len=100, dim=0         |   229  |       349      |    810
      n=10, nnz=10000, index_len=100, dim=1         |   433  |       870      |    700
      n=10, nnz=10000, index_len=1000, dim=0        |   666  |       502      |   5581
      n=10, nnz=10000, index_len=1000, dim=1        |   826  |      2379      |   4820
      n=10, nnz=10000, index_len=10000, dim=0       |  2534  |      2700      |  80000
      n=10, nnz=10000, index_len=10000, dim=1       |  2723  |     18540      |  80000
      n=100, nnz=1000, index_len=100, dim=0         |    94  |       324      |    110
      n=100, nnz=1000, index_len=100, dim=1         |   100  |       499      |    110
      n=100, nnz=1000, index_len=1000, dim=0        |    96  |       337      |    150
      n=100, nnz=1000, index_len=1000, dim=1        |   130  |       800      |    140
      n=100, nnz=1000, index_len=10000, dim=0       |   100  |       346      |    900
      n=100, nnz=1000, index_len=10000, dim=1       |   130  |       760      |    900
      n=100, nnz=10000, index_len=100, dim=0        |    90  |       323      |    190
      n=100, nnz=10000, index_len=100, dim=1        |   279  |       800      |    180
      n=100, nnz=10000, index_len=1000, dim=0       |   110  |       339      |    781
      n=100, nnz=10000, index_len=1000, dim=1       |   294  |       870      |    800
      n=100, nnz=10000, index_len=10000, dim=0      |   315  |       505      |   6264
      n=100, nnz=10000, index_len=10000, dim=1      |   497  |      2398      |   5404
      n=1000, nnz=10000, index_len=100, dim=0       |    90  |       333      |    160
      n=1000, nnz=10000, index_len=100, dim=1       |   279  |       635      |    150
      n=1000, nnz=10000, index_len=1000, dim=0      |   100  |       328      |    215
      n=1000, nnz=10000, index_len=1000, dim=1      |   287  |       810      |    207
      n=1000, nnz=10000, index_len=10000, dim=0     |   100  |       339      |    900
      n=1000, nnz=10000, index_len=10000, dim=1     |   291  |       880      |   1000
      n=1000, nnz=100000, index_len=100, dim=0      |    92  |       358      |    435
      n=1000, nnz=100000, index_len=100, dim=1      |   302  |       900      |    530
      n=1000, nnz=100000, index_len=1000, dim=0     |   130  |       360      |   1000
      n=1000, nnz=100000, index_len=1000, dim=1     |   329  |       930      |   1200
      n=1000, nnz=100000, index_len=10000, dim=0    |   343  |       530      |   7000
      n=1000, nnz=100000, index_len=10000, dim=1    |   545  |      2446      |   6100
      n=1000, nnz=1000000, index_len=100, dim=0     |   355  |       394      |   2210
      n=1000, nnz=1000000, index_len=100, dim=1     |  1660  |      2276      |   2674
      n=1000, nnz=1000000, index_len=1000, dim=0    |   877  |       574      |   6700
      n=1000, nnz=1000000, index_len=1000, dim=1    |  2449  |      3782      |   9000
      n=1000, nnz=1000000, index_len=10000, dim=0   |  3112  |      2931      |  57000
      n=1000, nnz=1000000, index_len=10000, dim=1   |  7340  |     20220      |  65700

Times are in microseconds (us).

```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77551
Approved by: https://github.com/cpuhrsch
2022-06-01 17:39:03 +00:00
de5a2320f2 Mark more methods of DispatchKeySet as constexpr
Added operator-, DispatchKeySet::add, and DispatchKeySet::remove.
I wanted to use these in functorch to make a constexpr DispatchKeySet.

Also adds C10_NODISCARD to DispatchKeySet::remove to make it
consistent with DispatchKeySet::add (this will raise a
warning if someone calls remove without assigning the result to a
variable; remove is NOT mutable and this is a pitfall that I run into a
lot)

Test Plan:
- wait for tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78558

Approved by: https://github.com/bdhirsh
2022-06-01 17:29:03 +00:00
ffaee6619c tools: Add ability to grab release versions
Adds the ability for generate_torch_version to grab release versions
based on the current tag. Also includes a regex to check if the tagged
version matches our release pattern (vX.Y.Z) so we don't collide with
ciflow tags

Signed-off-by: Eli Uriegas <eliuriegasfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78584

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Approved by: https://github.com/janeyx99
2022-06-01 17:19:17 +00:00
44aa4ad894 Use _all_gather_base and fuse matmul for sharded linear.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78477

Use `_all_gather_base` instead of all_gather for col-wise sharding
since `_all_gather_base` returns a single fused tensor that can be used to
perform a single matmul instead of looping through and performing multiple
matmuls.

This improves performance for col-wise sharding.

Differential Revision: [D36754385](https://our.internmc.facebook.com/intern/diff/D36754385/)

Approved by: https://github.com/aazzolini, https://github.com/wanchaol
2022-06-01 17:17:34 +00:00
effd270986 Fuse row-wise sharded linear matmul to increase perf.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78449

Instead of looping through and performing a matmul separately, we can
just perform a single matmul to ensure we launch a single cuda kernel for this
operation.

Differential Revision: [D36743354](https://our.internmc.facebook.com/intern/diff/D36743354/)

Approved by: https://github.com/aazzolini, https://github.com/wanchaol
2022-06-01 17:13:48 +00:00
93d5a722b1 [coreml] Introducing Quantization (#78108)
Summary: Adding Quantization mode to preprocess, which allows us to run through quantization for coreml models

Test Plan:
https://fburl.com/anp/r0ntsbq0

Notebook runnining through quantization workflow:

created a custom bentos kernel to run it through coreml

```bento_kernel(
    name = "coreml",
    deps = [
        "fbsource//third-party/pypi/coremltools:coremltools",
        "//caffe2:coreml_backend",
        "//caffe2:coreml_backend_cpp",
        "//caffe2:torch",
        "//caffe2/torch/fb/mobile/model_exporter:model_exporter",
    ],
)
```

Initial benchmarks on iPhone 11:

FP32 Core ML Model:
https://our.intern.facebook.com/intern/aibench/details/203998485252700

Quantized Core ML Model:
https://our.intern.facebook.com/intern/aibench/details/927584023592505

High End Quantized Model:
https://our.intern.facebook.com/intern/aibench/details/396271714697929

Summarized Results
| Backend | Quantization | p50 net latency | Model Size |
|---------|--------------|-----------------|------------|
| Core ML | No           | 1.2200          | 1.2mb      |
| Core ML | Yes          | 1.2135          | 385kb      |
| CPU     | Yes          | 3.1720          | 426kb      |

Reviewed By: SS-JIA

Differential Revision: D36559966

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78108
Approved by: https://github.com/jmdetloff
2022-06-01 17:10:17 +00:00
2d5eac48d5 Revert "Reference implementations for rot90, roll, atleast_1d,2d,3d (#78080)"
This reverts commit 96c134854d4dbc418cdc0ec82959476ddac8068e.

Reverted https://github.com/pytorch/pytorch/pull/78080 on behalf of https://github.com/malfet due to as it broke XLA on trunk (see https://github.com/pytorch/pytorch/runs/6678429656?check_suite_focus=true ) and the same pattern were observable on PR CI https://github.com/pytorch/pytorch/runs/6672733779?check_suite_focus=true
2022-06-01 16:52:25 +00:00