Commit Graph

1506 Commits

Author SHA1 Message Date
26b5986297 ReflectionPad supports BFloat16 (#84949)
Just by looking at some commits, I didn't find why BFloat16 isn't there.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84949
Approved by: https://github.com/ngimel
2022-09-14 00:01:06 +00:00
36d79143ce Revert "[reland] Call jit decomposition in VariableType to increase forward AD coverage (#84151) (#84675)"
This reverts commit bb4e96c9644a034e593085026b781ee78a4d6a77.

Reverted https://github.com/pytorch/pytorch/pull/84675 on behalf of https://github.com/osalpekar due to causing asan xplat link-time errors like ld.lld: error: undefined symbol: torch::jit::has_jit_decomposition(c10::FunctionSchema const&)
2022-09-13 22:54:54 +00:00
d09e8b23bf [primTorch] Add repeat and unfold_copy references (#81374)
Add References:

- repeat
- unfold
- expand_as
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81374
Approved by: https://github.com/mruberry, https://github.com/ngimel
2022-09-12 22:19:06 +00:00
bb4e96c964 [reland] Call jit decomposition in VariableType to increase forward AD coverage (#84151) (#84675)
This reverts commit acb4a09628284201281e262aaee58e3dc6be9c2b.

In addition, we also fix a memory leak in layer norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84675
Approved by: https://github.com/zou3519
2022-09-12 20:33:14 +00:00
4f6027b78a [opinfo] narrow: add new sample for Tensor overload (#84785)
`narrow` accepts `start` argument to be a Tensor. We add a sample to test this overload.

NOTE: This leads to a bunch of failed tests and hence the skips and xfails
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84785
Approved by: https://github.com/zou3519
2022-09-12 16:59:08 +00:00
8cdc0679b9 [ROCm][jiterator] unskip additional tests (#84371)
Follow-up to #77982.  Unskip additional jiterator tests for ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84371
Approved by: https://github.com/ngimel, https://github.com/SherlockNoMad
2022-09-12 15:20:51 +00:00
01c54ad6de Remove deprecated torch.eig (#70982)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.eig`.

cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70982
Approved by: https://github.com/Lezcano, https://github.com/malfet
2022-09-09 21:31:57 +00:00
2614079f89 OpInfo: Prevent clamp sample inputs from sharing tensors (#84696)
As per the comment, re-using tensors between sample inputs is strongly
discouraged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84696
Approved by: https://github.com/ngimel
2022-09-09 19:58:08 +00:00
e8b9501861 test: adding uniform (#84292)
Adding OpInfo for uniform

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84292
Approved by: https://github.com/amjames, https://github.com/ngimel
2022-09-09 18:54:49 +00:00
acb4a09628 Revert "Call jit decomposition in VariableType to increase forward AD coverage (#84151)"
This reverts commit 42d99e6f196233627a28b8e9efb26a0a166fa370.

Reverted https://github.com/pytorch/pytorch/pull/84151 on behalf of https://github.com/malfet due to Regressed test_jvpvjp_nn_functional_layer_norm_cuda_float32, see 42d99e6f19
2022-09-07 18:02:27 +00:00
42d99e6f19 Call jit decomposition in VariableType to increase forward AD coverage (#84151)
This PR:
- updates forward AD codegen in core to generate code that tries calling into decompositions registered to jit when
   - (1) the function is not in-place or out variant
   - AND (2) the function is differentiable (requires_derivative=True)
   - AND (3) there are no forward AD formulas registered
   - To simplify things we always generating the if/else (as long as (1) is true), but generate 'false' when either (2) or (3) are false.
 - removes the mechanism from functorch
    - (follow up) some functorch tests should be updated here so they no longer have to compute the Jacobian with vjp
  - factors out some logic to generate the any_has_forward_grad condition
     - (bc-breaking) when TensorList inputs unexpectedly have forward grad, the error will no longer contain the name

See https://github.com/pytorch/pytorch/pull/84151#issuecomment-1238519247 for codegen output and more discussion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84151
Approved by: https://github.com/samdow, https://github.com/albanD, https://github.com/zou3519
2022-09-07 15:31:46 +00:00
166dec74b5 Revert "Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761)"
This reverts commit 65beff5acb0d7c0c484bd0558bcaf8ddc9c96aab.

Reverted https://github.com/pytorch/pytorch/pull/81761 on behalf of https://github.com/mehtanirav due to Breakages in pytorch/glow
2022-09-06 22:31:14 +00:00
752c3bcb47 Enable nvfuser tests for refs.broadcast_to and refs.broadcast_tensors (#84337)
Previously these tests were failing because they required some other op alongside prims.broadcast_in_dim to be executed. Now it works standalone.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84337
Approved by: https://github.com/mruberry, https://github.com/ngimel
2022-09-06 22:08:13 +00:00
88b1cc885c Removed tri[lu]* tests, superseeded by OpInfos (#84256)
triu, tril, triu_indices and tril_indices had some
tests in test_tensor_creation_ops.py and test_cuda.py
that are redudant with the ones done by OpInfos for those ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84256
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-09-06 18:54:10 +00:00
c771d73461 [composite compliance] fix max_pool1d (#84127)
max_pool1d has a fast path for CPU tensors that do not require grad that
directly accesses the data_ptr. This PR makes the change that if the
input Tensor is a Tensor Subclass, then we want to walk through the
"slow path" of calling max_pool1d_with_indices.

Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84127
Approved by: https://github.com/kshitij12345, https://github.com/samdow, https://github.com/malfet
2022-09-06 17:13:09 +00:00
139599ba95 Contiguify bias in slow_conv_transpose3d kernel (#84125)
Users never run into this because PyTorch now comes with cudnn by
default and cudnn has a better conv_transpose implementation. However we
seem to test without cudnn in our CI; and also, ROCM goes down this
path.

The .contiguous() call does not regress anything because previously it
was a runtime error. Because this kernel is the "slow conv transpose3d
kernel", we don't care much for its performance.

Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84125
Approved by: https://github.com/ngimel
2022-09-06 17:13:09 +00:00
91a5f52f51 Decomp for nn.functional.grid_sampler_2d (#84350)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84350
Approved by: https://github.com/jansel, https://github.com/Lezcano
2022-09-05 21:33:26 +00:00
65beff5acb Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761)
`torch.norm` is very odd. Some notable issues are:

- The default value of `"fro"` in `torch.norm` has an odd behaviour when `dim=None`. This is handled in the new dispatch
- The treatment of the `dtype` argument in `torch.norm` was completely wrong. This should fix it
- Some `out=` variants in the previous implementation were also wrong. This should fix those.
- This new dispatch should make some paths much faster. For example, `torch.norm(x)` where `x` is complex.

I'll try to make the changes in these PRs as incremental as possible as this is a tricky one.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81761
Approved by: https://github.com/ngimel
2022-09-02 19:12:25 +00:00
5cfe769387 [primTorch] Add refs for reshape_as, view_as, unify tests (#84222)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84222
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-09-01 16:14:34 +00:00
65e887c041 Remove unnecessary copy from torch._refs.to, add OpInfo for torch.Tensor.to (#84270)
This PR removes unnecessary copy from `torch._refs.to`, adds OpInfo for `torch.Tensor.to`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84270
Approved by: https://github.com/ngimel
2022-09-01 07:18:42 +00:00
85b889fa5f [primTorch] Add ref for poisson_nll_loss (#83805)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83805
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-08-31 17:39:34 +00:00
71ce9cd072 [primTorch] Add decomp for soft_margin_loss (#83804)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83804
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-08-31 17:39:34 +00:00
305af90d0f [primTorch] Add docstring and promotion for l1_loss ref (#83803)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83803
Approved by: https://github.com/Lezcano, https://github.com/ngimel
2022-08-31 17:39:31 +00:00
9c452abcf1 Use reentrant mode when invoking prims, delete global prim_fake_mode (#84090)
Maybe I should be using the meta_impl instead of the prim_impl, but it's not terribly clear why, since the prim impl will be better tested and should work under the re-entrant FakeTensorMode.

Fixes https://github.com/pytorch/pytorch/issues/78613 in the process
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84090
Approved by: https://github.com/ezyang, https://github.com/samdow
2022-08-31 01:58:44 +00:00
db7784e722 [Static Runtime] Schema checks for index_put (#84152)
Summary:
`index_put` can take a list of tensors, but Static Runtime always tries to convert its argument to a list of optional tensors. This was causing crashes for some users. Add some schema checks to prevent this, and add a new overload for the new case.

Also, I found a clear bug in the JIT interpreter (mutating the argument when its not supposed to), so I fixed that too.

Test Plan: New unit test

Differential Revision: D39072214

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84152
Approved by: https://github.com/tenpercent
2022-08-31 01:20:14 +00:00
d09486ab23 [ROCm] enable nvfuser (#82498)
### Description
The nvfuser is enabled for ROCm.

### Testing
CI label ciflow/trunk covers the newly enabled ROCm functionality as well as any CUDA regressions caused by these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82498
Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
2022-08-30 21:50:39 +00:00
90161c23cf Add nvfuser support for squeeze (#84117)
"_refs.squeeze" and "refs.unsqueeze" now work with nvfuser executor tests.

Similarly to `_refs.reshape` we need to explicitly save the concrete shape on the trace to pass that info to nvfuser, as it gets lost in translation (https://github.com/pytorch/pytorch/pull/83739#discussion_r950352124).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84117
Approved by: https://github.com/ngimel
2022-08-30 20:36:11 +00:00
b106a04d76 Fix the edge case when y = 0 in kl_div (#82714)
Brought up in https://github.com/pytorch/pytorch/pull/80334#issuecomment-1193600883

We also prepare its opinfo to fix https://github.com/pytorch/pytorch/issues/80488
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82714
Approved by: https://github.com/albanD
2022-08-30 18:18:25 +00:00
7088a98fba conv2d: require bias to have the same dtype as input and weight on cpu (#83686)
Fixes https://github.com/pytorch/pytorch/issues/83505

BC-breaking message:
- Previously we only required input and weight to have the same dtype on cpu (when input is non-complex). After this change, the dtype of bias is now also expected to have the same dtype. This change was necessary to improve the error message for certain combinations of inputs. This behavior now also matches that of convolution on cuda.

<details>
<summary>
Old plan
</summary>
Previously convolution (at least for slow_conv2d) did not perform type promotion, i.e. the output of `conv(int, int, float)` is an int, and that leads to the autograd assert.

This PR adds type promotion handling at the `at::native::conv2d` (this is a composite) level. We also need to correct or remove many tests that assume that conv errors when input types are mixed

Pros:
- Doing type promotion at this level avoids the complex path from having any special handling for mixed dtypes, and can potentially speed up mixed dtype inputs to now dispatch to faster kernels which are only capable of handling floats.

Cons:
- Doing type promotion at this level has the risk of introducing extra overhead when we would've dispatched to a kernel capable of handle mixed type anyway. I don't know if any of these exist at all though - it is possible that inputs with any non-float arguments are dispatched to the slow path.

If this approach is OK, we can proceed with the other convolutions as well:
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83686
Approved by: https://github.com/ngimel
2022-08-29 16:41:17 +00:00
3aae6ff1e1 Add nvprims.var_mean (#83508)
This PR adds nvfuser-specific primitive - `var_mean`.
Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager.

I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`).

Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti):
```py
import torch
from torch._prims.context import TorchRefsNvfuserCapabilityMode
from torch.fx.experimental.proxy_tensor import make_fx
from torch._prims.executor import execute

def func(a):
    return torch.native_layer_norm(a, (1024,), None, None, 1e-6)

a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda")

with TorchRefsNvfuserCapabilityMode():
    gm = make_fx(func)(a)

for _ in range(10):
    execute(gm, a, executor="strictly_nvfuser");
```
run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py`
```py
# WITH THIS PR
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.033792 ms, achieved: 621.818 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.032608 ms, achieved: 644.396 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.03072 ms, achieved: 684 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s

# ON MASTER
# kernel1 run in 0.05632 ms, achieved: 373.091 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043808 ms, achieved: 479.649 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
```
So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape.

Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`).

Ref. https://github.com/pytorch/pytorch/issues/80187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508
Approved by: https://github.com/ngimel
2022-08-28 18:45:25 +00:00
b159a5230f Revert "Add nvprims.var_mean (#83508)"
This reverts commit 7e7694b6615fbf46abfab234615fa891c2819eb7.

Reverted https://github.com/pytorch/pytorch/pull/83508 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-28 11:30:27 +00:00
c9b144ff47 Replace assertEqualIgnoreTypes from common_methods_invocations.py (#84076)
This addresses TODO:38095 . More details at https://github.com/pytorch/pytorch/issues/38095

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84076
Approved by: https://github.com/kit1980
2022-08-28 01:25:07 +00:00
7e7694b661 Add nvprims.var_mean (#83508)
This PR adds nvfuser-specific primitive - `var_mean`.
Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager.

I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`).

Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti):
```py
import torch
from torch._prims.context import TorchRefsNvfuserCapabilityMode
from torch.fx.experimental.proxy_tensor import make_fx
from torch._prims.executor import execute

def func(a):
    return torch.native_layer_norm(a, (1024,), None, None, 1e-6)

a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda")

with TorchRefsNvfuserCapabilityMode():
    gm = make_fx(func)(a)

for _ in range(10):
    execute(gm, a, executor="strictly_nvfuser");
```
run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py`
```py
# WITH THIS PR
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.033792 ms, achieved: 621.818 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.032608 ms, achieved: 644.396 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.03072 ms, achieved: 684 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s

# ON MASTER
# kernel1 run in 0.05632 ms, achieved: 373.091 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043808 ms, achieved: 479.649 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
```
So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape.

Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`).

Ref. https://github.com/pytorch/pytorch/issues/80187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508
Approved by: https://github.com/ngimel
2022-08-27 09:05:20 +00:00
65ea3d0621 [composite compliance] cov, corrcoef (#82954)
Ref: #69991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82954
Approved by: https://github.com/zou3519
2022-08-26 15:14:37 +00:00
f5a3515083 Make linalg.inv composite of linalg.solve (#80074)
The `getri` kernel calls inside `getrs` so we can do so explicitly
ourselves and save ourselves from having to maintain an extra kernel.
This way we just need to optimise `lu_factor` and `lu_solve` and `inv`
will be as efficient as it can be, as it'll be choosing the best backend
to perform the factorisation and the best backend (not necessarily the
same) to perform the solve.

Fixes https://github.com/pytorch/pytorch/issues/77498

The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet
2022-08-25 09:28:55 +00:00
5321bf52f2 Revert "Make linalg.inv composite of linalg.solve (#80074)"
This reverts commit 4737b3361479f4104efaa3bfa2ea517eaacb60fb.

Reverted https://github.com/pytorch/pytorch/pull/80074 on behalf of https://github.com/malfet due to Depends on the changes from https://github.com/pytorch/pytorch/pull/83628
2022-08-25 00:43:00 +00:00
4737b33614 Make linalg.inv composite of linalg.solve (#80074)
The `getri` kernel calls inside `getrs` so we can do so explicitly
ourselves and save ourselves from having to maintain an extra kernel.
This way we just need to optimise `lu_factor` and `lu_solve` and `inv`
will be as efficient as it can be, as it'll be choosing the best backend
to perform the factorisation and the best backend (not necessarily the
same) to perform the solve.

Fixes https://github.com/pytorch/pytorch/issues/77498

The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet
2022-08-24 15:18:56 +00:00
591222f5d9 Fix use-dict-literal lint (#83718)
Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718
Approved by: https://github.com/albanD
2022-08-24 00:26:46 +00:00
a802603ef7 [complex] conv_transpose1d (#79694)
Reference: https://github.com/pytorch/pytorch/issues/71108
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79694
Approved by: https://github.com/ngimel
2022-08-23 19:31:22 +00:00
9095030239 [fix] edge case in MaxPool1d and add ErrorInputs (#83553)
Fixes #83224

cc @kshitij12345 @albanD!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83553
Approved by: https://github.com/albanD
2022-08-23 19:23:39 +00:00
cb488e6d2f Allow None arguments for elementwise type promotion wrapper and fix clamp with None arguments (#83586)
Fixes https://github.com/pytorch/torchdynamo/issues/759
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83586
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-08-23 17:47:10 +00:00
91eb1b9bb9 Move _masked opinfos to opinfo/definitions/_masked.py (#83763)
Ref #82518
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83763
Approved by: https://github.com/albanD
2022-08-22 19:08:41 +00:00
7656ef73f1 Move torch.special OpInfos into opinfo/definitions/special.py (#83762)
Ref #82518

As with `linalg` this doesn't include ops with an alias in special,
only the ones where `special.foo` is the actual name of the opinfo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83762
Approved by: https://github.com/albanD
2022-08-22 19:08:41 +00:00
1f38225b56 [primTorch] Add ref for new_empty_strided (#82466)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82466
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-08-19 18:51:57 +00:00
1407e6728c Nvfuser python api patch take 2 (#83684)
landing #83645 again.

Previously we are breaking on codegen bf16 kernel for cuda TK 10.2. Added a short-cut to disable bf tests on pre cuda 11 build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83684
Approved by: https://github.com/ngimel
2022-08-19 16:05:39 +00:00
8788e92f0f Move torch.linalg opinfos to opinfo.definitions (2/2) (#83554)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83554
Approved by: https://github.com/albanD
2022-08-19 12:26:01 +00:00
8dbb0990bc Move torch.linalg opinfos to opinfo.definitions (1/2) (#83547)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83547
Approved by: https://github.com/albanD
2022-08-19 12:26:01 +00:00
4aeb98dee9 Move RefInfo classes into opinfo.refs (#83563)
Given that there is already a clear `op_db`, `python_ref_db` split I
think it makes sense to have the `RefInfo` classes be defined in a
different file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83563
Approved by: https://github.com/albanD
2022-08-19 12:25:59 +00:00
f4caeb25e9 Move gradcheck_wrapper and clone_sample funcs into opinfo.core (#83560)
The linalg OpInfos need these, so moving them into core to prevent
circular dependencies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83560
Approved by: https://github.com/albanD
2022-08-19 12:25:58 +00:00
b4bc0d249f [composite compliance] batch_norm (#79990)
Fixes https://github.com/pytorch/pytorch/issues/76283
Ref: #69991

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79990
Approved by: https://github.com/zou3519
2022-08-19 11:59:31 +00:00