Commit Graph

2642 Commits

Author SHA1 Message Date
c3d05e86cc Resend "Split ATen/Parallel into interface and backend" (#20825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20825
ghimport-source-id: 0371fbd37cb37635647d473d5ac9f2859e787061

Differential Revision: D15458073

Pulled By: ilia-cher

fbshipit-source-id: cd27d0da1691f6be1183cd152348ac0d93a53996
2019-05-24 02:03:06 -07:00
6b74856747 Fix init_thread calls in thread pool initialization (#20848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20848
ghimport-source-id: e542858a198252838c1f3100dbfbe90fd3960f07

Differential Revision: D15466918

Pulled By: ilia-cher

fbshipit-source-id: e75d38f51edd5b508c4ca28a292e4141e90f209f
2019-05-24 01:14:31 -07:00
1bb728fe14 Change the quantizer to match the behavior of the FBGEMM implementation (#20892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20892

FBGEMM uses 64 bit values. Need to change our implementation to match

Reviewed By: jerryzh168

Differential Revision: D15487664

fbshipit-source-id: 29cba26093c6f9aeafce14982c1ae12149e63562
2019-05-24 00:46:08 -07:00
fc941d3bca Catchall kernels instead of fallback kernels (#20773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20773

This removes the feature to register fallback kernels that are called when no other kernel matches.
Instead, we introduce the concept of catchall kernels that are always called independent of inputs.
If you only have a fallback/catchall kernel and no kernels with concrete dispatch keys, then both concepts behave in the same way.
The difference is that we now disallow operators to have both, a catchall kernel and kernels with concrete dispatch keys.
This was possible before when they have been fallback kernels.

The reason for this change is that we anticipate needing a method_missing feature in backends, i.e. a backend-wide fallback to call when the backend doesn't specify a kernel for an operator.
We are not clear on precendence between this backend-wide fallback and an operator level fallback. Disallow fallbacks for now so we are free to choose later without breaking backwards compatibility.

Reviewed By: dzhulgakov

Differential Revision: D15438977

fbshipit-source-id: cb3aa764a1659d909ee21a7bd8ec3d32438aafaa
2019-05-23 23:47:51 -07:00
c25e33789e Lightweight at-most-once logging for API usage (#20745)
Summary:
Resubmit #20698 which got messed up.

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745

Differential Revision: D15429196

Pulled By: dzhulgakov

fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
2019-05-23 23:17:59 -07:00
8cde4c4d22 Remove Variable::Impl and DifferentiableViewImpl (#17072)
Summary:
As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR:
1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class
2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()`
3. Remove `Variable.data()` API
3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history.

After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.

**Note that this PR is BC-breaking in the following use cases:**

**Use Case 1:**
Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type.

**Use Case 2:**
If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example:
```python
params = torch.tensor([1.5, 1.5]).requires_grad_()
with torch.no_grad():
    # Change gradient to a sparse tensor
    params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.]))

grad_saved = params.grad
params.backward(torch.tensor([1.5, 1.5]))
assert id(grad_saved) == id(params.grad)  # This will fail after this PR
```
The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072

Differential Revision: D14075257

Pulled By: yf225

fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
2019-05-23 21:09:04 -07:00
d5b7138a2c Dict is a reference type (#20669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20669

Before, Dict was a value type, i.e. copying it did a deep copy.
Unfortunately, this doesn't work well with storing and passing Dicts around in IValues because IValues are reference types.
This diff changes Dict to be a reference type.

Reviewed By: dzhulgakov

Differential Revision: D15404911

fbshipit-source-id: dc990d3eb7cae044b74dd0253f8b704dde6a6c86
2019-05-23 15:24:31 -07:00
b6d0f6c85a Move THCTensor_{random, clampedRandom, cappedRandom} to ATen (#20620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20620
ghimport-source-id: 7c09c2462021e3fa5adef61570a575964ff16125

Differential Revision: D15454050

Pulled By: ezyang

fbshipit-source-id: 5b0421c56445baf19dbdbdd9680af128a5cdf443
2019-05-23 13:44:16 -07:00
48424a6c94 Avoid dynamic dispatch inside the omp loop in AdaptiveAvgPool2d (#20366)
Summary:
This PR changes CPU implementation of `AdaptiveAveragePool2D` by
- move dispatch to outside the OpenMP loop
- support fp16
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20366

Differential Revision: D15456069

Pulled By: ezyang

fbshipit-source-id: 00fa2916f8b136af9f5c8b5db0eca4619f9f5bac
2019-05-23 13:29:29 -07:00
cf0268e51c Modify cos to cosh in Vec256 (#20797)
Summary:
Minor typo fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20797

Differential Revision: D15469308

Pulled By: ezyang

fbshipit-source-id: 3288ad69316e296e46d861737c5b09e0ea1e694b
2019-05-23 13:24:09 -07:00
70caa2efe2 Add mkldnn sigmoid operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20820

Reviewed By: dzhulgakov

Differential Revision: D15455866

fbshipit-source-id: 712b06dfbd441051dc284a1acdf94926df09bc1d
2019-05-23 12:51:57 -07:00
5f83c5d834 Fix build error with MSVC (#20853)
Summary:
Close #20642

Possibly broken by #19816
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20853

Differential Revision: D15474620

Pulled By: jerryzh168

fbshipit-source-id: 99b52d92a93bac7cab52537f1ebdbd286d4b2cfe
2019-05-23 12:11:29 -07:00
ec57d1f18a Port dilated_max_pool2d() to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20691

Differential Revision: D15435960

Pulled By: ezyang

fbshipit-source-id: 548b7cc42e52ad2c641ec7d9cf78028d9411d02e
2019-05-23 09:04:04 -07:00
f039401bf2 Add back at::_copy_from for use by XLA (#20783)
Summary:
XLA needs a way to override CPUTensor.copy_(XLATensor), but we only
dispatch on the "self" argument. This inverts the dispatch order when
"src" is an unhandled type.

Note that things like XLATensor.copy_(CPUTensor) never enter this
implementation.

cc dlibenzi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20783

Differential Revision: D15443187

Pulled By: colesbury

fbshipit-source-id: 4ee93ba598ef0fed2a99c0683aae30cb50a1f99c
2019-05-23 08:47:20 -07:00
80aed36fb6 fix a couple of typos in README markdown (#20819)
Summary:
was reading the README on github and came across a couple of typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20819

Differential Revision: D15469603

Pulled By: nairbv

fbshipit-source-id: 0ed7868de2d4e6d82557a8c170783966f8a1afd7
2019-05-23 08:11:25 -07:00
d35a587958 Remove cpu_half, cpu_bool, cuda_bool from native_functions.yaml (#20552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20552
ghimport-source-id: 0ef4e85b40f3b927564257f44f72f671251acaf1

Differential Revision: D15362154

Pulled By: li-roy

fbshipit-source-id: b2477582389099c6696dca33f1371e8e136e32b6
2019-05-22 22:58:40 -07:00
41100d4027 Add PerChannelAffineQuantizer (#20764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20764

att

Reviewed By: dskhudia

Differential Revision: D15367364

fbshipit-source-id: 1d3ebf356ceac73b0fa4493209839d1c66d4d5b3
2019-05-22 19:16:52 -07:00
cde611a66c Quantized Conv2d operator (#20772)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20772

Copy of D15178352

A conflicting commit landed at the same time as D15178352 that removed registering kernels using IntArrayRef, Hence, D15178352 was revered. Using std::vector instead.

Reviewed By: zafartahirov

Differential Revision: D15437237

fbshipit-source-id: cd2f1caebcc720352b48ce25d716cb1ca49a5197
2019-05-22 17:53:24 -07:00
90910fc6cb Mark values entering containers as wildcards (#20556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20556
ghimport-source-id: d7c62e38a2f6928f6f8d988c26a38ea8f8cff8b6

Reviewed By: jamesr66a

Differential Revision: D15447568

Pulled By: suo

fbshipit-source-id: 77ebc11b571b8517d3bad3ee1b3ee5ac037542b2
2019-05-22 16:50:06 -07:00
9ea009fe8b Add as_quantized_tensor (#20740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20740

Provide a way to assemble quantized Tensor from int8 Tensor, scale and zero point.

Differential Revision: D15232416

fbshipit-source-id: c3a3d9d7214b1dc569214c019440c2779fbd063b
2019-05-22 15:19:45 -07:00
12bc81ae2a Change comparison ops result dtype to bool [Part1] (#20767)
Summary:
This is the first part of the planned changes to change the comparison operations result tensor dtype from Byte to Bool. You can see the whole list of changes (not cleaned up) [here](https://github.com/pytorch/pytorch/pull/19332). As the PR is too big for a single review im breaking it into pieces.

**Changes in this PR:**
1. Enable these methods for bool tensors:
- maskedSelect
- maskedSelectBool
- bitand
- cbitand
- bitor
- cbitor
- bitxor
- cbitxor
- sign
- equal
- neg

2. Add bool clause for the TH version of sign method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20767

Differential Revision: D15436446

Pulled By: izdeby

fbshipit-source-id: 8d2494b5f4873cd79c7f1a40d2cb045cadfad51a
2019-05-22 15:12:46 -07:00
05543153dd CUDA implementation of fakequant (#20252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20252

Add CUDA implementation for fakequant op for quantization aware training.

Reviewed By: zafartahirov

Differential Revision: D15243386

fbshipit-source-id: 37610ab046786ffc69aaec5235e5df8304c353d6
2019-05-22 14:46:39 -07:00
fd95947e68 Revert D15248618: Split ATen/Parallel into interface and backend
Differential Revision:
D15248618

Original commit changeset: 060879266bc8

fbshipit-source-id: fc5cbb030b87613c9e15100118c3d4a064097c20
2019-05-22 09:55:51 -07:00
4a85e7955c Rename FC to Linear in the test routine (#20716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20716

As Title says.

Reviewed By: zafartahirov

Differential Revision: D15410823

fbshipit-source-id: e82fc241ee288b41304675cb087c0cdcd60d7148
2019-05-21 19:58:19 -07:00
77651615c8 fbgemm precision argument (#20790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20790

att

Reviewed By: jianyuh

Differential Revision: D15445903

fbshipit-source-id: fd338aea55e40eecc780be881e67417679e2ea35
2019-05-21 19:26:15 -07:00
c4a3b4d528 Split ATen/Parallel into interface and backend (#20057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20057
ghimport-source-id: c583f61bf661c994eb4d0625748a299e892a7246

Differential Revision: D15248618

Pulled By: ilia-cher

fbshipit-source-id: 060879266bc8616916fe220adef6ae6c0b076fbd
2019-05-21 19:15:47 -07:00
adbab82846 int_repr for different quantized types (#20656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20656

att

Reviewed By: zafartahirov

Differential Revision: D15398134

fbshipit-source-id: b02899d4ff33598416f65cf76b2ecc62adee243b
2019-05-21 17:57:42 -07:00
c9da01194a Optimize pytorch layer_norm forward (#20345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20345

Seperate from D15194600
Optimize pytorch layer_norm op part 1:
  optimize layer_norm_forward_cpu
  import Eigen Maps for the performance of reduction

Reviewed By: zheng-xq

Differential Revision: D15290608

fbshipit-source-id: cf2c208dfd6fbcbc4c69db3ed60278d9bee156b5
2019-05-21 15:59:49 -07:00
9cec8ae146 use tensoriterator instead of th for fill_ implementation. (#20719)
Summary:
Moves fill_ to aten as suggested in:
https://github.com/pytorch/pytorch/pull/20336#issuecomment-493260729

borrows from cpuhrsch's PR: https://github.com/pytorch/pytorch/pull/18876/files#diff-0d1178f1a4ce15aeb760d251974e6924

Co-authored-by: cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20719

Differential Revision: D15439420

Pulled By: nairbv

fbshipit-source-id: cbcc313cda61a528cecc4a28d601871565e6110c
2019-05-21 15:45:45 -07:00
7a0c6d528a Fix copy_transpose_valid check (#20759)
Summary:
Fixes #20755

(Was broken in #20685)

cc vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20759

Differential Revision: D15433712

Pulled By: colesbury

fbshipit-source-id: 29f612f7d4d7b73158d6f5dc1e46fd2f8fb09a2f
2019-05-21 15:37:37 -07:00
e6f22e1b89 Change Bias to QTensor with qint32(int32_t) (#20713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20713

As Title says.

Reviewed By: zafartahirov

Differential Revision: D15410734

fbshipit-source-id: c00f409278736cf9e3205f7d36dda1b96120f47d
2019-05-21 14:17:37 -07:00
b9a150ede0 Change Weight to QTensor with qint8(int8_t) (#20712)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20712

As Title says.

Differential Revision: D15410696

fbshipit-source-id: 48147a79d8cc47a724eb473796a37a1c64f8e883
2019-05-21 14:17:34 -07:00
ac2314fdeb Fix a bug in quantize_linear (#20711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20711

For uint8_t, ```std::numeric_limits::digits``` returns 8;
For int8_t, ```std::numeric_limits::digits``` returns 7.

FBGEMM wants to get the ```qparams.precision``` to be always 8 for both int8_t and uint8_t.

Reviewed By: jerryzh168

Differential Revision: D15410695

fbshipit-source-id: 17dc3842d7c426947454c201bcb167b87b7301ce
2019-05-21 14:17:31 -07:00
cca923c481 Add dequantize_linear for JIT pass (#20107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20107

att

Reviewed By: nishantpdce

Differential Revision: D15202187

fbshipit-source-id: 7d6274a67fcca695c0425587f35046fecbc2ccdc
2019-05-21 12:26:48 -07:00
cc02a1af61 Throw error if multiple kernels registered (#20737)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20737

If someone tries to register multiple kernels in the same .op() call, we're now throwing an error.

Differential Revision: D15425660

fbshipit-source-id: 6d2f1444da3e16a6a98863d847965c2aa211e046
2019-05-21 12:17:01 -07:00
fac307a5cf Revert D15178352: [pt1][quant] Quantized Conv2d operator
Differential Revision:
D15178352

Original commit changeset: 2e5453283137

fbshipit-source-id: 73cf64c483eedbd41a047e7593c0c92bbd33008c
2019-05-21 09:59:57 -07:00
29b1b59449 Quantized Conv2d operator (#20064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20064

Initial implementation of quantized convolution operator using fbgemm.

Reviewed By: zafartahirov

Differential Revision: D15178352

fbshipit-source-id: 2e5453283137dc165e9a20164ffc138fa8caf88a
2019-05-21 09:13:42 -07:00
320c38555e Refactor CUDA copy and general copy dispatch (#20685)
Summary:
Copy.cu goes from 308 to 190 lines of code. In general it uses, the same
copy strategy, using cudaMempcyAsync, a pointwise kernel, or a copy
using temporary buffers. The pointwise kernel has slightly improved
performance when broadcasting due to faster index calculation.

This deletes "`s_copy_`", "`_s_copy_from`", and "`_copy_same_type_`". The only
entry-point now is "`copy_`".

A mini-benchmark is here:
https://gist.github.com/colesbury/706de1d4e8260afe046020988410b992

Before:
https://gist.github.com/colesbury/ab454b6fe3791bff420d7bcf8c041f18
After:
https://gist.github.com/colesbury/9024d242b56ab09a9ec985fa6d1620bc

Results were measured on 2.2 GHz Broadwell; no-turbo; one thread;
compiled with GCC 7.3.0. (Results are slower than typical usage due to
turbo being off.)

The only significant differences is in the CUDA [1024] -> [1024, 1024]
broadcasting copy which is ~25% faster. I don't expect a noticeable
difference in real programs.

CPU copy overhead is a tiny bit (~200 ns) faster, but I don't expect
anyone to notice that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20685

Differential Revision: D15414819

Pulled By: colesbury

fbshipit-source-id: d3c6e04a5020470e3bef15b1fc09503cae5df440
2019-05-20 17:09:44 -07:00
cf548ba683 De-deprecate old list and dict APIs (#20709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20709

- Remove ArrayRef based API. This is neither the old nor the planned new API.
- De-deprecate kernels based on std::vector and std::unordered_map. We don't have the Dict/List based API figured out entirely yet, so we shouldn't push people towards using them.
  std::vector and std::unordered_map will get deprecated again once we figured out List/Dict.

Reviewed By: dzhulgakov

Differential Revision: D15417025

fbshipit-source-id: bfbb33c762e43487bb499bc8cc36d515e678f8fc
2019-05-20 13:53:00 -07:00
10445c0404 Finish removal of AT_CHECK, officially deprecate the macro. (#20600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20600

All future uses of AT_CHECK will fail our CI.

Reviewed By: jerryzh168

Differential Revision: D15375397

fbshipit-source-id: 5582664d6c7c4f1a56ae45647eb1bca49fed2866
2019-05-20 11:57:15 -07:00
f215db9b92 InsertGuards pass
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20438

Differential Revision: D15342655

Pulled By: Krovatkin

fbshipit-source-id: a193e582d621b99f848573fb4478e7b62265dc9f
2019-05-20 10:49:19 -07:00
71260b98e2 Fixed histc return type for CUDA (#20369)
Summary:
Fixing reported [issue](https://github.com/pytorch/pytorch/issues/20208).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20369

Reviewed By: zou3519

Differential Revision: D15300959

Pulled By: izdeby

fbshipit-source-id: 219692f99a66ea433112dfc226132eb6867122cf
2019-05-20 08:08:28 -07:00
9b1dbffba5 Re-sync with internal repository (#20702) 2019-05-20 09:22:57 -04:00
d3059b9c49 Lightweight logging for once-only API usage 2019-05-19 23:04:40 -07:00
e74869473d De-deprecate parts of the legacy API (#20561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20561

We previously planned to deprecate the direct passing of a kernel function or lambda to the op() call, e.g.

    static auto registry = RegisterOperators().op("my::op", &func);

and push users towards the options based API:

    static auto registry = RegisterOperators().op("my::op", RegisterOperators::options().kernel<decltype(func), &func>());

because that has a slightly lower performance overhead when calling the kernel.

However, that overhead is negligible for all but exotic use cases, so there's no reason to push users towards a more verbose API.
This diff removes the deprecation warning from that API.

However, if you use the API together with deprecated types like std::unordered_map, you will now get a deprecation warning there.

Reviewed By: zdevito

Differential Revision: D15364271

fbshipit-source-id: 56dae0c5870bbab16ad19ba5178f4bea9eafed9f
2019-05-17 20:54:45 -07:00
cb6be42403 Options based registration API (#20514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20514

Change API from

    static auto registry = c10::RegisterOperators()
      .op("my::op",
        c10::kernel(...),
        c10::dispatchKey(...)
      );

to

    static auto registry = c10::RegisterOperators()
      .op("my::op", c10::RegisterOperators::options()
        .kernel(...)
        .dispatchKey(...)
      );

because this allows better discoverability. People looking for which options are available will easier find it and IDE autocompletion will work better.

Reviewed By: zdevito

Differential Revision: D15346348

fbshipit-source-id: 4b74a33b75c2b9cda4a903639fb7abd2c7cff167
2019-05-17 20:54:42 -07:00
85fad0597c Add qint8 type (int8_t) (#19984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19984

Add qint8 for QTensor, with underlying type of int8_t

Reviewed By: jianyuh

Differential Revision: D15150715

fbshipit-source-id: 57580f599d46f9323af5ce462dbbc464b25e40d7
2019-05-17 20:35:05 -07:00
e42665cf39 Some small performance fixes for c10 dispatcher (#20472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20472
ghimport-source-id: d118bf8d48eea3faf241a7288fcad1bb6a5f051f

Differential Revision: D15332284

Pulled By: li-roy

fbshipit-source-id: a8d9e50a440a7ad3ee730f70c0fcae06ae848cbd
2019-05-17 17:35:56 -07:00
26dfeffacd Remove TH/THC link for single matrix inverse (#20534)
Summary:
- Earlier, we had to use the legacy implementation of `getri` for single matrix inverse from TH and THC
- Now, this has been moved to ATen

Changelog:
- Move single matrix inverse implementation to ATen
- Remove unused code in TH and THC resulting from the change
- Minor modifications made to single matrix CPU function implementations in ATen to avoid redundancy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20534

Differential Revision: D15393383

Pulled By: ezyang

fbshipit-source-id: 81972111cd9757d15f1d634f294c93fd0f35636c
2019-05-17 10:03:36 -07:00
8c9f4c560a Add matmul optimization for the case A.ndim <= 2 && B.ndim >= 3 (#20448)
Summary:
This addresses #18862.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20448

Differential Revision: D15393465

Pulled By: ezyang

fbshipit-source-id: 87e5b0ed8253ea00365f420d98ac96dd4e934028
2019-05-17 09:44:26 -07:00