Commit Graph

648 Commits

Author SHA1 Message Date
9ea009fe8b Add as_quantized_tensor (#20740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20740

Provide a way to assemble quantized Tensor from int8 Tensor, scale and zero point.

Differential Revision: D15232416

fbshipit-source-id: c3a3d9d7214b1dc569214c019440c2779fbd063b
2019-05-22 15:19:45 -07:00
7a0c6d528a Fix copy_transpose_valid check (#20759)
Summary:
Fixes #20755

(Was broken in #20685)

cc vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20759

Differential Revision: D15433712

Pulled By: colesbury

fbshipit-source-id: 29f612f7d4d7b73158d6f5dc1e46fd2f8fb09a2f
2019-05-21 15:37:37 -07:00
cca923c481 Add dequantize_linear for JIT pass (#20107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20107

att

Reviewed By: nishantpdce

Differential Revision: D15202187

fbshipit-source-id: 7d6274a67fcca695c0425587f35046fecbc2ccdc
2019-05-21 12:26:48 -07:00
987f1ccf49 Add "ndim" property to tensor (#20565)
Summary:
For compatibility with numpy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20565

Differential Revision: D15374390

Pulled By: umanwizard

fbshipit-source-id: 4ab209a5fb27d8ba27ee7eb6b67b858ce2480594
2019-05-20 16:10:50 -07:00
71260b98e2 Fixed histc return type for CUDA (#20369)
Summary:
Fixing reported [issue](https://github.com/pytorch/pytorch/issues/20208).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20369

Reviewed By: zou3519

Differential Revision: D15300959

Pulled By: izdeby

fbshipit-source-id: 219692f99a66ea433112dfc226132eb6867122cf
2019-05-20 08:08:28 -07:00
85fad0597c Add qint8 type (int8_t) (#19984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19984

Add qint8 for QTensor, with underlying type of int8_t

Reviewed By: jianyuh

Differential Revision: D15150715

fbshipit-source-id: 57580f599d46f9323af5ce462dbbc464b25e40d7
2019-05-17 20:35:05 -07:00
8c9f4c560a Add matmul optimization for the case A.ndim <= 2 && B.ndim >= 3 (#20448)
Summary:
This addresses #18862.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20448

Differential Revision: D15393465

Pulled By: ezyang

fbshipit-source-id: 87e5b0ed8253ea00365f420d98ac96dd4e934028
2019-05-17 09:44:26 -07:00
690efa5220 Remove checks for CUDA 8 in LU-based tests (#20482)
Summary:
CUDA 8 is no longer supported and removed from CI, so these checks are irrelevant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20482

Differential Revision: D15393438

Pulled By: ezyang

fbshipit-source-id: ac0979bf660b3314eec502c745e34ce4940bda0e
2019-05-17 08:51:56 -07:00
220e6894c5 Rename qint8 data type (#19932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19932

In preparation to add int8_t data type for QTensor

Reviewed By: zafartahirov

Differential Revision: D15137838

fbshipit-source-id: 59462c36d6fc5982986d4196bf3f32f49bb294d7
2019-05-16 18:09:28 -07:00
a837c00acd Removing unnecessary comments (+fix flake8)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20589

Differential Revision: D15373655

Pulled By: VitalyFedyunin

fbshipit-source-id: 25277648d3e8f8a09cec7569ceda56e74c2ef0b1
2019-05-16 09:19:34 -07:00
5b78a5eadb Memory format support for contiguous and is_contiguous (#20455)
Summary:
#19975 was separated by 2 PRs.

This one:

Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.

At this moment both functions just operate with strides and doesn't store any tensor state.

(Original RFC #19092)

-----

Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).

Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.

1.  `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.

    - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.

        `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.

2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.

    - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.

Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455

Differential Revision: D15341577

Pulled By: VitalyFedyunin

fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
2019-05-16 07:18:24 -07:00
abb3698976 Add QInt32 ScalarType and qint32 data type (#19816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19816

We need this for quantization for bias
add third argument of ScalarType to `quantize_linear`

Differential Revision: D15094174

fbshipit-source-id: f19ec8f4716cf5fe0aa21b38d45af6d27c9ab377
2019-05-15 18:50:18 -07:00
4c23c34e79 Computing var/stddev and mean at the same time (#18731)
Summary:
The current variance kernels compute mean at the same time. Many times we want both statistics together, so it seems reasonable to have a kwarg/function that allows us to get both values without launching an extra kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18731

Differential Revision: D14726082

Pulled By: ifedan

fbshipit-source-id: 473cba0227b69eb2240dca5e61a8f4366df0e029
2019-05-15 16:42:38 -07:00
72bb84c518 Provide a few default args for numpy translation (#20451)
Summary:
Add automatic translations for a few argument names that commonly differ between PyTorch and NumPy.

For now, they are as follows:

* `keepdim` -> `keepdims`
* `dim` -> `axis`
* `input` -> (any of `a`, `x`, `x1`)
* `other` -> `x2`

Basic examples:
```python
>>> t=torch.randn(10,10)
>>> torch.sum(x=t, axis=1)
tensor([ 0.5199, -0.3768,  4.3619, -0.9105,  1.1804,  1.0837, -0.9036,  0.2365,
         1.1171, -0.0999])
```
```python
>>> torch.add(x1=5, x2=6)
tensor(11)
```

The additional overhead is zero when using traditional PyTorch argument names, and a few (usually 1) extra PyDict lookups when using NumPy argument names.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20451

Differential Revision: D15337521

Pulled By: umanwizard

fbshipit-source-id: 7a7d389786f4ccf5c86a14ecb2002c61730c51b5
2019-05-15 10:13:17 -07:00
f23fb66e6e Fix in file position logic: file descriptor and Python-side handle (#20270)
Summary:
This addresses #18436

The logic replicates the essence of closing file descriptors in numpy:
bf20e30340/numpy/core/include/numpy/npy_3kcompat.h (L278)

This stores the position of the file descriptor before resetting it to the Python handle offset, then resets to the original position before exit. The Python-side handle is then updated to reflect the new position. Also added somewhat more demanding tests to cover this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20270

Differential Revision: D15275902

Pulled By: soumith

fbshipit-source-id: 5ca8a52b61c7718d2e69571f72f80b1350b0acdb
2019-05-09 08:20:01 -07:00
c406bf20a0 error instead of crashing on attempt to subclass typed tensors (#20283)
Summary:
https://github.com/pytorch/pytorch/issues/20052

typed tensors (e.g. torch.FloatTensor) can't be subclassed. Was causing
crashes and other errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20283

Differential Revision: D15278138

Pulled By: nairbv

fbshipit-source-id: 8493eac4d34dfb76b054362bf0acec02146cd0e2
2019-05-09 07:10:38 -07:00
481b6d0268 Allow a non-OpenMP based build (#19749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19749
ghimport-source-id: a6636c0acddbdc5fd5b0dcb20b9f80cbdb9159b9

Differential Revision: D15141993

Pulled By: ilia-cher

fbshipit-source-id: 96085608398b2a4c97c68b2948f5184d07f9ad3d
2019-05-06 19:34:48 -07:00
17268a9225 Add print function for QTensor (#19513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19513

Add support for printing a QTensor in python frontend

Differential Revision: D15017168

fbshipit-source-id: 312d1f18e6ca3c9eb4a5b8bb1c64f7cc8bc1dcf5
2019-05-06 13:12:43 -07:00
ca57dd9332 Fixed log_normal and geometric for CPU (#19938)
Summary:
log_normal_ and geometric_ were disabled for CPU by mistake in [this PR](bc53805f2e), this PR fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19938

Differential Revision: D15143404

Pulled By: izdeby

fbshipit-source-id: 41c7bd29f046b5a3ac6d601de8c64ab553771d19
2019-04-30 12:18:10 -07:00
aa6403bae6 Added .bool() method
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19928

Differential Revision: D15131923

Pulled By: izdeby

fbshipit-source-id: 3909cf4623fe85e98ceaf57fbb57745919899445
2019-04-30 10:34:31 -07:00
de19eeee99 Enabled masked for a bool tensor (#19140)
Summary:
Added deprecation warnings for the masked methods and enabled them for a bool tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19140

Differential Revision: D14888021

Pulled By: izdeby

fbshipit-source-id: 0e42daf8f3732ca29f36d10485402bfc502716ad
2019-04-29 10:40:12 -07:00
2ce39de3fc Add elementwise_affine for layer_norm_op (#19713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713

Add elementwise_affine for layer_norm_op

Reviewed By: houseroad

Differential Revision: D15075454

fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f
2019-04-26 17:20:01 -07:00
6ec55c13a9 Enable assignment for QTensor in pytorch frontend (#19676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19676

Make copy work with QTensor, enable assignment of QTensor in pytorch frontend.

Differential Revision: D15064710

fbshipit-source-id: 04f2dc02a825695d41fa1114bfca49e92108fef3
2019-04-24 16:05:34 -07:00
4a65ee95cc Make torch.equal work with boolean CPU tensors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19604

Differential Revision: D15056022

Pulled By: li-roy

fbshipit-source-id: 1309b107b2d4ee0a490bce1b43c3c175180a1580
2019-04-24 15:51:10 -07:00
d14abe3aff Add torch.from_file function similar to the Storage.from_file, but returning tensor (#18688)
Summary:
Porting `torch.Storage.from_file(filename, shared, size)` function to `torch.from_file(filename, shared, size, dtype=torch.int)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18688

Differential Revision: D15012644

Pulled By: VitalyFedyunin

fbshipit-source-id: 3f62ca9e414fad3847fe71b785ff97b5bdc2d2cd
2019-04-24 15:38:56 -07:00
c42f3f9055 Revert D15008160: Enable assignment for QTensor in pytorch frontend
Differential Revision:
D15008160

Original commit changeset: 5f1166246d76

fbshipit-source-id: 24c7350431ae6a87199d6e3f7ffbbc8ec7d3c28b
2019-04-24 06:58:13 -07:00
309c15e2df Enable assignment for QTensor in pytorch frontend (#19530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19530
Make copy work with QTensor, enable assignment of QTensor in pytorch frontend.

Differential Revision: D15008160

fbshipit-source-id: 5f1166246d768b23f009cde1fa03e8952368a332
2019-04-23 21:29:31 -07:00
9b272affde Add base support to torch.logspace, default base=10 (#19542)
Summary:
Add base support for torch.logspace. See #19220 for details.
SsnL can you feedback? Thanks a lot.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19542

Differential Revision: D15028484

Pulled By: soumith

fbshipit-source-id: fe5a58a203b279103abbc192c754c25d5031498e
2019-04-23 15:06:34 -07:00
f767c9ac76 Add docs and test guaranteeing indices from torch.nonzero ordered C-style (#19539)
Summary:
See #17556.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19539

Differential Revision: D15030151

Pulled By: ezyang

fbshipit-source-id: d46ee56a66d89b0113f86e3f8693dc1680d0adb9
2019-04-23 09:29:21 -07:00
3b4d4ef503 Remove unnecessary printing from tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19606

Differential Revision: D15046583

Pulled By: ezyang

fbshipit-source-id: ea9bb691d23855e7eddbabe68bf112a726641ba4
2019-04-23 09:24:08 -07:00
c30224ad21 Rename potri to cholesky_inverse (#19498)
Summary:
Changelog:
- Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498

Differential Revision: D15029901

Pulled By: ezyang

fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a
2019-04-22 08:18:39 -07:00
fc1aadec3b Make empty_affine_quantized private (#19446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19446

change empty_affine_quantized to _empty_affine_quantized

Reviewed By: dzhulgakov

Differential Revision: D15008757

fbshipit-source-id: c7699ac0c208a8f17d88e95193970c75ba7219d3
2019-04-19 11:21:44 -07:00
e1750754c8 Step 4: add support for unique with dim=None (#18651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18651
ghimport-source-id: e11988130a3f9a73529de0b0d08b4ec25fbc639c

Differential Revision: D15000463

Pulled By: VitalyFedyunin

fbshipit-source-id: 9e258e473dea6a3fc2307da2119b887ba3f7934a
2019-04-18 18:28:07 -07:00
88f70a1670 Fix pickling torch.float32 (#18045)
Summary:
Attempt fix for #14057 . This PR fixes the example script in the issue.
The old behavior is a bit confusing here. What happened to pickling is python2 failed to recognize `torch.float32` is in module `torch`, thus it's looking for `torch.float32` in module `__main__`. Python3 is smart enough to handle it.
According to the doc [here](https://docs.python.org/2/library/pickle.html#object.__reduce__), it seems `__reduce__` should return `float32` instead of the old name `torch.float32`. In this way python2 is able to find `float32` in `torch` module.
> If a string is returned, it names a global variable whose contents are pickled as normal. The string returned by __reduce__() should be the object’s local name relative to its module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18045

Differential Revision: D14990638

Pulled By: ailzhang

fbshipit-source-id: 816b97d63a934a5dda1a910312ad69f120b0b4de
2019-04-18 12:28:10 -07:00
ad8f34fcca Add empty_quantized (#18960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18960

empty_affine_quantized creates an empty affine quantized Tensor from scratch.
We might need this when we implement quantized operators.

Differential Revision: D14810261

fbshipit-source-id: f07d8bf89822d02a202ee81c78a17aa4b3e571cc
2019-04-17 16:17:40 -07:00
eaa14f5f59 Error out on in-place binops on tensors with internal overlap (#19317)
Summary:
This adds checks for `mul_`, `add_`, `sub_`, `div_`, the most common
binops. See #17935 for more details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19317

Differential Revision: D14972399

Pulled By: zou3519

fbshipit-source-id: b9de331dbdb2544ee859ded725a5b5659bfd11d2
2019-04-17 13:02:07 -07:00
33443d083e Fix python lint (#19331)
Summary:
VitalyFedyunin jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19331

Differential Revision: D14969435

Pulled By: bddppq

fbshipit-source-id: c1555c52064758ecbe668f92b837f2d7524f6118
2019-04-16 21:47:30 -07:00
06c28d8a12 Add slicing and int_repr() to QTensor (#19296)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#19296 [pt1][quant] Add slicing and int_repr() to QTensor**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14756833/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #18960 [pt1][quant] Add empty_quantized&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14810261/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #19312 Use the QTensor with QReLU&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14819460/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #19319 [RFC] Quantized SumRelu&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14866442/)

Methods added to pytorch python frontend:
- int_repr() returns a CPUByte Tensor which copies the data of QTensor.
- Added as_strided for QTensorImpl which provides support for slicing a QTensor(see test_torch.py)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19296

Differential Revision: D14756833

Pulled By: jerryzh168

fbshipit-source-id: 6f4c92393330e725c4351d6ff5f5fe9ac7c768bf
2019-04-16 20:17:21 -07:00
df67969e6b Step 3: Add support for return_counts to torch.unique for dim not None (#18650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18650
ghimport-source-id: 75759c95e6c48e27c172b919097dbc40c6bfb5e6

Differential Revision: D14892319

Pulled By: VitalyFedyunin

fbshipit-source-id: ec5d1b80fc879d273ac5a534434fd648468dda1e
2019-04-16 14:06:45 -07:00
1c5073fb4b Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors (#18952)
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.

Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```

Part of the bigger: `Remove Storage` plan.

Now compatible with both torch scripts:
 `  _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)`
and
`  _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"))`

Same checked for all similar functions `rand_like`, `empty_like` and others

It is fixed version of #18455
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18952

Differential Revision: D14801792

Pulled By: VitalyFedyunin

fbshipit-source-id: 8dbc61078ff7a637d0ecdb95d4e98f704d5450ba
2019-04-16 11:06:15 -07:00
e1f38a847d Fix type conversion in dequant and add a test (#19226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19226

Type conversoin was wrong previously. Thanks zafartahirov for finding it!

Differential Revision: D14926610

fbshipit-source-id: 6824f9813137a3d171694d743fbb437a663b1f88
2019-04-16 10:52:44 -07:00
1c836e7bb9 Add Quantized Backend (#18546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18546

We'll expose all combinations of various ways of quantization in the top level dispatch key, that is we have AffineCPUTensor, PerChannelAffineCUDATensor, etc.

QTensor method added:
- is_quantized()
- item()

Differential Revision: D14637671

fbshipit-source-id: 346bc6ef404a570f0efd34e8793056ad3c7855f5
2019-04-12 12:55:49 -07:00
3f7ddd269c Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim (#18649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18649
ghimport-source-id: 3411d240a6af5fe299a889667964730184e30645

Differential Revision: D14888292

Pulled By: VitalyFedyunin

fbshipit-source-id: 80da83c264598f74ab8decb165da4a1ce2b352bb
2019-04-12 12:41:20 -07:00
507fe66bea Enable comp ops for bool tensor (#19109)
Summary:
Enabled comparison ops for bool tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19109

Differential Revision: D14871187

Pulled By: izdeby

fbshipit-source-id: cf9951847d69124a93e5e21dd0a39c9568b1037d
2019-04-11 14:37:10 -07:00
1858773c0c Fixed bool Tensor value change bug (#19096)
Summary:
Fixes #19077
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19096

Differential Revision: D14871044

Pulled By: izdeby

fbshipit-source-id: 61b12559c8c5b9613e00ba5933f478321ea80469
2019-04-10 11:09:07 -07:00
ea2405c7dc Add torch.unique_consecutive (#19060)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/19045

Please review: VitalyFedyunin ngimel

This is independent on the #18649 series. This will cause merge conflicts in #18649 series, but please merge this first, and I will resolve the merge conflicts there.

The new feature is exposed in `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon`. But not at `torch.unique` yet. I will take care of the API after #18649 series get merged completely.

Benchmark on a tensor of shape `torch.Size([15320, 2])`:

```python
print(torch.__version__)
print()
a = tensor.sort().values.to('cpu')
print('cpu, sorted_input=False:')
%timeit torch._unique2_temporary_will_remove_soon(a)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True)
print()
print('cpu, sorted_input=True:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True)
print()
a = a.to('cuda')
print('cuda, sorted_input=False:')
%timeit torch._unique2_temporary_will_remove_soon(a); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True); torch.cuda.synchronize()
print()
print('cuda, sorted_input=True:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
1.1.0a0+2addccc

cpu, sorted_input=False:
340 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
717 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
52.3 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
52.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted_input=True:
32.8 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
49.9 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
51.6 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
78 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cuda, sorted_input=False:
213 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
291 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
250 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
321 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cuda, sorted_input=True:
45.6 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
110 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
82 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
143 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

```python
print(torch.__version__)
print()
a1, a2 = tensor.unbind(1)
indices = (a1 * tensor.max() + a2).sort().indices
a = tensor.index_select(0, indices).to('cpu')
print('cpu, sorted_input=False:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True)
print()
print('cpu, sorted_input=True:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True)
print()
a = a.to('cuda')
print('cuda, sorted_input=False:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True); torch.cuda.synchronize()
print()
print('cuda, sorted_input=True:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
cpu, sorted_input=False:
55.4 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.8 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.2 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.1 ms ± 725 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted_input=True:
54.7 ms ± 585 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.2 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.9 ms ± 577 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cuda, sorted_input=False:
171 µs ± 783 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
220 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
203 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
251 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cuda, sorted_input=True:
59.6 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
113 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
93.2 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
147 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
The CPU implementation of `unique_dim` is super slow, see https://github.com/pytorch/pytorch/issues/18987, but this PR will not worry about this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19060

Differential Revision: D14866909

Pulled By: ezyang

fbshipit-source-id: d20012cec68c37b05cf770a6f4d6524f910b950f
2019-04-10 07:36:08 -07:00
82b570528d Move abs, frac, reciprocal, and neg to TensorIterator (#19041)
Summary:
I've been messing around with vectorizing the fusion compiler in JIT, and noticed that these ops were pathologically slow. I moved them to use TensorIterator + Vec256<> and got some speed wins.

Benchmark script:

```
import torch, time

ops = ['abs', 'neg', 'reciprocal', 'frac']

x = torch.rand(1024, 1024)
NITER = 10000

print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t')

for op in ops:
    s = time.time()
    for i in range(NITER):
        getattr(x, op)()
    elapsed_sec = ((time.time() - s) / NITER)
    print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t')

```

Before this change (on my mac with a skylake):
```
op      time per iter (ms)      gops/s  GB/s
abs     0.9730974197387695      1.0775652866097343      8.620522292877874
neg     1.0723679780960083      0.9778136063534356      7.822508850827485
reciprocal      1.2610594034194946      0.8315040490215421      6.6520323921723366
frac    1.1681334018707275      0.8976509004200546      7.181207203360437
```

After this change:
```
op      time per iter (ms)      gops/s  GB/s
abs     0.5031076192855835      2.084198210889721       16.673585687117768
neg     0.4433974027633667      2.3648672578256087      18.91893806260487
reciprocal      0.47145988941192624     2.2241043693195985      17.79283495455679
frac    0.5036592721939087      2.0819154096627024      16.65532327730162
```

So, after this change it looks like we are hitting machine peak for bandwidth and are bandwidth bound.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19041

Differential Revision: D14862037

Pulled By: jamesr66a

fbshipit-source-id: e2032ac0ca962dbf4120bb36812277c260e22912
2019-04-09 21:55:00 -07:00
487388d8ad Rename btrisolve to lu_solve (#18726)
Summary:
Changelog:
- Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726

Differential Revision: D14726237

Pulled By: zou3519

fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b
2019-04-09 15:21:24 -07:00
29ea08616b Add torch.__config__.show(), reporting detailed version of all libraries. (#18579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18579
ghimport-source-id: 65124c95e49423de4ad1008c65e75057fea09b94

Differential Revision: D14778507

Pulled By: ezyang

fbshipit-source-id: 1e4bb79f4800a116ce8fb7af2fefbd34da8d102c
2019-04-09 11:13:24 -07:00
89145e602b Namedtuple return for gels, triangular_solve, and test refactor (#17195)
Summary:
Partial fix of: https://github.com/pytorch/pytorch/issues/394
- `gels` and `triangular_solve` now returns namedtuple
- refactor test for namedtuple API for better coverage and maintainability
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17195

Differential Revision: D14851875

Pulled By: ezyang

fbshipit-source-id: 9b2cba95564269d2c3a15324ba48751d68ed623c
2019-04-09 09:13:26 -07:00