Commit Graph

115 Commits

Author SHA1 Message Date
677e67c399 Update nn.Module._apply to not gate on should_use_set_data when swap_tensors is set (#120659)
This updates the nesting of if statements in `nn.Module._apply` such that if

`torch.__future__.set_swap_module_params_on_conversion(True)`, we always try to swap regardless of whether
- `torch._has_compatible_shallow_copy_type(param, fn(param)`
- `torch.__future__.set_overwrite_module_params_on_conversion` is set

This means that `meta_module.to_empty('device')` can now use the swap_tensors path cc @awgu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120659
Approved by: https://github.com/albanD
2024-02-28 00:59:34 +00:00
b3df3e4e94 Restore OpInfo/ModuleInfo tests in Inductor-wrapped tests (#119693)
I accidentally disabled this without realizing it. It turns out that
PYTORCH_TEST_WITH_INDUCTOR=1 implies PYTORCH_TEST_WITH_DYNAMO=1, which
activates skipIfTorchDynamo decorators.

Test Plan:
- wait for CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119693
Approved by: https://github.com/bdhirsh
2024-02-12 22:44:45 +00:00
2c91e13afc Add lowerings to special functions (#119187)
As in the title.

In addition, the PR introduces infrastructure for lowerings of pointwise functions that have both cpp and triton implementations available.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119187
Approved by: https://github.com/peterbell10
2024-02-11 16:35:40 +00:00
db1a4dcb5a [BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039)
Right now, `ModuleInfo.dtypes` defaults to `torch.testing._internal.common_dtype.floating_types()`, almost no ModuleInfos override this (so only `float32` and `float64` are tested).

This is the first step to clean up/improve dtype testing for `ModuleInfos` and fix #116626.

Follow up PRs will updates `dtypes=` (and perhaps `dtypesIf{Device}` (if it makes sense)) for each `ModuleInfo`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119039
Approved by: https://github.com/janeyx99
2024-02-08 20:35:32 +00:00
d5a718d27b Add swap_tensors path to nn.Module._apply (#117167)
Added `torch.__future__.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify  to override this and default to `True` in `nn.Module._apply` if input is a tensor subclass.

From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run*** if the autograd graph is still alive. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1.  The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](6cf1fc66e3/torch/csrc/autograd/variable.cpp (L307)). **Future work might be to swap the refs that the `AccumulateGrad` nodes hold if it is necessary.**

***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad` OR the autograd graph is no longer alive because the output has been garbage collected.

If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error.

**`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117167
Approved by: https://github.com/albanD
ghstack dependencies: #118028
2024-02-07 18:55:44 +00:00
c0164f2393 Revert "[BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039)"
This reverts commit 04d52d5399ad4abb8af9e8405be79e2a7f8b4c7a.

Reverted https://github.com/pytorch/pytorch/pull/119039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing MPS test in trunk 04d52d5399,  may be a landrace ([comment](https://github.com/pytorch/pytorch/pull/119039#issuecomment-1928595240))
2024-02-06 01:13:28 +00:00
04d52d5399 [BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039)
Right now, `ModuleInfo.dtypes` defaults to `torch.testing._internal.common_dtype.floating_types()`, almost no ModuleInfos override this (so only `float32` and `float64` are tested).

This is the first step to clean up/improve dtype testing for `ModuleInfos` and fix #116626.

Follow up PRs will updates `dtypes=` (and perhaps `dtypesIf{Device}` (if it makes sense)) for each `ModuleInfo`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119039
Approved by: https://github.com/janeyx99
2024-02-05 23:19:01 +00:00
9bce208dfb Replace follow_imports = silent with normal (#118414)
This is a lot of files changed! Don't panic! Here's how it works:

* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.

In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.

The codemod was done with this script authored by GPT-4:

```
import glob

exclude_patterns = [
    ...
]

for pattern in exclude_patterns:
    for filepath in glob.glob(pattern, recursive=True):
        if filepath.endswith('.py'):
            with open(filepath, 'r+') as f:
                content = f.read()
                f.seek(0, 0)
                f.write('# mypy: ignore-errors\n\n' + content)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD
2024-01-27 02:44:11 +00:00
06576d859d Stop running ModuleInfo tests under Dynamo (#117318)
This is a policy decision, similar to the OpInfo one. The problem is
that they just take too long to run when we reset() before and after
each.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117318
Approved by: https://github.com/voznesenskym
2024-01-12 22:17:59 +00:00
d0cf2182ea Fix TransformerEncoderLayer for bias=False (#116760)
Fixes https://github.com/pytorch/pytorch/issues/116385

Don't call `torch._transformer_encoder_layer_fwd` when `bias=False`

`bias=False` was not something that `torch._transformer_encoder_layer_fwd`  was meant to work with, it was my bad that this wasn't tested as I approved https://github.com/pytorch/pytorch/pull/101687.

`bias=False` was causing the `tensor_args` in [`TransformerEncoder`](a17de2d645/torch/nn/modules/transformer.py (L663-L677)) to contain `None`s and error on checks for the fastpath like `t.requires_grad for t in tensor_args`.

Alternative fix would be to
1) Pass `torch.zeros_like({*}.weight)` to the kernel when `bias=False` and filter `tensor_args` as appropriate
2) Fix `torch._transformer_encoder_layer_fwd` to take `Optional<Tensor>` for biases and fix the kernels as appropriate

Let me know if these approaches are preferable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116760
Approved by: https://github.com/jbschlosser
2024-01-05 00:13:10 +00:00
3acb7972b0 [BE] Test CrossEntropyLoss for torch.half (#116681)
To test it on MPS and CUDA devices
Also, move some float64 skip-tests for MPS to xfail, same as CPU tests for torch.half
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116681
Approved by: https://github.com/xuzhao9, https://github.com/mikaylagawarecki
2024-01-04 02:16:09 +00:00
ac60a70e06 Migrated loss functions to ModuleInfos (#115584)
Migrates most tests in `common_nn.py:criterion_tests` to ModuleInfos.

**I can split this up if it is too large to review**

What this PR does not include:
- [`no_batch_dim` tests](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L3995-L4112)
- [tests that use the functional variant of the loss function and `wrap_functional`](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L1079-L1128)

#### On test times
This PR increases test time by ~58s locally
Before this PR:
```
>>> python test/test_nn.py -k Loss
Ran 1003 tests in 28.977s
```
After this PR
```
>>> python test/test_nn.py -k Loss
Ran 368 tests in 23.073s
```

```
>>> python test/test_modules.py -k Loss
Ran 836 tests in 63.900s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115584
Approved by: https://github.com/janeyx99
ghstack dependencies: #115617
2023-12-14 16:21:05 +00:00
626b7dc847 Revert "Migrated loss functions to ModuleInfos (#115584)"
This reverts commit f138b08d2e9c8d676f2a404e97d773f42132b0c7.

Reverted https://github.com/pytorch/pytorch/pull/115584 on behalf of https://github.com/atalman due to OSS CI oncall, breaks slow test ([comment](https://github.com/pytorch/pytorch/pull/115584#issuecomment-1854855080))
2023-12-13 23:34:30 +00:00
f138b08d2e Migrated loss functions to ModuleInfos (#115584)
Migrates most tests in `common_nn.py:criterion_tests` to ModuleInfos.

**I can split this up if it is too large to review**

What this PR does not include:
- [`no_batch_dim` tests](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L3995-L4112)
- [tests that use the functional variant of the loss function and `wrap_functional`](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L1079-L1128)

#### On test times
This PR increases test time by ~58s locally
Before this PR:
```
>>> python test/test_nn.py -k Loss
Ran 1003 tests in 28.977s
```
After this PR
```
>>> python test/test_nn.py -k Loss
Ran 368 tests in 23.073s
```

```
>>> python test/test_modules.py -k Loss
Ran 836 tests in 63.900s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115584
Approved by: https://github.com/janeyx99
ghstack dependencies: #115617
2023-12-12 22:20:20 +00:00
68f74dd162 Add python and C++ support for LPPool3d (#114199)
Add python and C++ support for LPPool3d to Fixes #114114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199
Approved by: https://github.com/mikaylagawarecki
2023-12-08 18:18:44 +00:00
b7b2178204 [BE]: Remove useless lambdas (#113602)
Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602
Approved by: https://github.com/albanD
2023-11-14 20:06:48 +00:00
7c9052165a add fp16 support for native conv and deconv on CPU (#99497)
### Testing

Native conv vs. mkldnn conv on SPR (with avx512_fp16 support)

Single core:

Input | Naïve impl   / us | oneDNN /   us | Speed up
-- | -- | -- | --
IC:   64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 | 34676789 | 524199.8 | 66.15185
IC:   128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 | 33454125 | 349844.4 | 95.62573
IC: 256, OC: 256, kernel: 3, stride: 1,   N: 1, H: 16, W: 16, G: 1, pad: 0 | 317650.1 | 2317.677 | 137.0554
IC: 128, OC: 256, kernel: 3, stride: 1,   N: 1, L: 64 | 15334.68 | 167.264 | 91.67952

56 cores:
Input | Naïve impl   / us | oneDNN /   us | Speed up
-- | -- | -- | --
IC:   64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 | 1032064 | 11073.58 | 93.20061
IC:   128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 | 1000097 | 16371.19 | 61.08883
IC:   256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 | 981813.4 | 9008.908 | 108.9825
IC: 1024, OC: 256, kernel: 1, stride: 1,   N: 256, H: 14, W: 14, G: 1, pad: 0 | 1082606 | 10150.47 | 106.6558
IC: 256, OC: 256, kernel: 3, stride: 1,   N: 1, H: 16, W: 16, G: 1, pad: 0 | 319980.6 | 181.598 | 1762.027

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497
Approved by: https://github.com/jgong5, https://github.com/cpuhrsch
2023-09-25 01:31:26 +00:00
003c5bb156 Add checks to num_layers for RNN, LSTM, GRU (#108853)
Fixes #108223

As the title shown

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108853
Approved by: https://github.com/mikaylagawarecki
2023-09-09 19:33:52 +00:00
8f02884569 add Half support for GroupNorm on CPU (#100234)
### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 2.45E-05 | 3.26E-05 | 6.87E-05 | 7.40E-05
[10,   128, 80, 80] | 0.000726 | 0.000606 | 0.002183 | 0.001112

* Channels Last:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 2.88E-05 | 2.72E-05 | 6.56E-05 | 6.63E-05
[10,   128, 80, 80] | 0.00076 | 0.000256 | 0.002385 | 0.000735

Single core:

* Contiguous:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 9.47E-05 | 1.90E-04 | 2.03E-04 | 3.10E-04
[10,   128, 80, 80] | 6.25E-03 | 8.98E-03 | 0.016485 | 0.01369

* Channels Last:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 8.66E-05 | 7.89E-05 | 1.95E-04 | 1.43E-04
[10,   128, 80, 80] | 5.97E-03 | 3.13E-03 | 0.01626 | 8.70E-03

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100234
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-09-01 21:25:24 +00:00
584a01b650 Fix LayerNorm(bias=False) error (#108060)
Fixes #108048

- [ ] Cherry pick this [here](https://github.com/pytorch/pytorch/issues/108055)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108060
Approved by: https://github.com/jbschlosser, https://github.com/albanD, https://github.com/malfet
2023-08-28 18:23:13 +00:00
3267996372 add channel last 3d support for maxpool3d on CPU (#97775)
### Testing
Single socket (28 cores):

shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms  | bf16 backward / ms
-- | -- | -- | -- | --
size: (1, 56, 264, 264), kernel: 3,   stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485
size: (1, 56, 264, 264), kernel: 3,   stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506
size: (32, 32, 100, 100), kernel: 3,   stride: 1, mem_format: contig  | 10.63426 | 15.28637 | 2.67656 | 1.71365
size: (32, 32, 100, 100), kernel: 3,   stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923
size: (4, 19, 10, 16, 16), kernel:   3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155
size: (4, 19, 10, 16, 16), kernel:   3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364

Single core:

shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms
-- | -- | -- | -- | --
size: (1, 56, 264, 264), kernel: 3,   stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541
size: (1, 56, 264, 264), kernel: 3,   stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323
size: (32, 32, 100, 100), kernel: 3,   stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853
size: (32, 32, 100, 100), kernel: 3,   stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796
size: (4, 19, 10, 16, 16), kernel:   3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229
size: (4, 19, 10, 16, 16), kernel:   3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97775
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-08-26 00:21:27 +00:00
3992450e8d Add backward check for test_memory_format (#106104)
Add backward check for test_memory_format.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106104
Approved by: https://github.com/mikaylagawarecki
2023-08-25 18:11:54 +00:00
3022a395f3 test_memory_format test now passes on rocm (#107696)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107696
Approved by: https://github.com/pruthvistony, https://github.com/albanD
2023-08-23 16:39:19 +00:00
71632d4d24 [cpu] add sdpa choice and UT (#105131)
Feature RFC: https://github.com/pytorch/rfcs/pull/56.

Write an SDPA selecting function for CPU to automatically choose one SDPA implementation among several ones. There are two CPU implementations which could be chosen: the unfused SDPA and flash attention. In general, flash attention has a higher priority than the unfused SDPA. For cases where flash attention is not applicable, such as manually disabling flash attention or the inputs not 4 dimensional, the unfused SDPA is chosen.

## Performance of the stack

### NanoGPT's SDPA kernel
Using benchmark [repo](https://github.com/mingfeima/bench_sdpa/blob/main/README.md), with one socket.
Shape: Batch size 1, Sequence length 1024, Head number 25, Head size 64.
Machine: SPR.

| Dtype    | Causal   | Mode      | SDPA            | Time (ms per iter) | Speedup |
| -------- | -------- | -------   | -------         | -------            | ------- |
| float32  | FALSE    | Inference | Unfused         | 3.081              |         |
|          |          |           | Flash attention | 1.665              | **1.85045** |
| float32  | TRUE     | Inference | Unfused         | 3.463              |         |
|          |          |           | Flash attention | 1.662              | **2.083634**|
| bfloat16 | FALSE    | Inference | Unfused         | 1.203              |         |
|          |          |           | Flash attention | 1.154              | **1.042461**|
| bfloat16 | TRUE     | Inference | Unfused         | 1.543              |         |
|          |          |           | Flash attention | 1.154              | **1.337088**|
| float32  | FALSE    | Training  | Unfused         | 54.938             |         |
|          |          |           | Flash attention | 23.029             | **2.385601**|
| float32  | TRUE     | Training  | Unfused         | 58.266             |         |
|          |          |           | Flash attention | 17.835             | **3.266947**|
| bfloat16 | FALSE    | Training  | Unfused         | 18.924             |         |
|          |          |           | Flash attention | 18.886             | **1.002012**|
| bfloat16 | TRUE     | Training  | Unfused         | 21.08              |         |
|          |          |           | Flash attention | 14.172             | **1.48744** |

### Stable Diffusion
Following model's [BKM](https://github.com/intel-innersource/frameworks.ai.models.intel-models/blob/develop/quickstart/diffusion/pytorch/stable_diffusion/inference/cpu/README.md).
Mode: Inference; Machine: SPR.

| Dtype    | SDPA                    | Throughput (fps) | Speedup SDPA | Total Time (ms) | Speedup |
| -------- | --------                | -------          | -------      | -------         | ------- |
| float32  | Unfused                 | 1.63             |              | 1139            |         |
|          | Flash attention         | 1.983            | 1.216564     | 547.488         | **2.080411**|
| bfloat16 | Flash attention in IPEX | 4.784            |              | 429.051         |         |
|          | Flash attention         | 4.857            | 1.015259     | 408.823         | **1.049479**|

### LLM models of Torchbench

Dtype: float32; Mode: Inference, single socket; Machine: CPX.
Model   name | SDPA | Inductor_new | Inductor_old | Inductor   Ratio(old/new)
-- | -- | -- | -- | --
hf_Albert | Unfused -> Flash attention | 0.048629309 | 0.05591545 | **1.14983024**
hf_Bert | Unfused -> Flash attention | 0.053156243 | 0.060732115 | **1.142520841**
hf_Bert_large | Unfused -> Flash attention | 0.141089502 | 0.155190077 | **1.099940636**
llama | Unfused -> Flash attention | 0.033250106 | 0.033720745 | **1.01415451**

Dtype: bfloat16; Mode: Inference, single socket; Machine: SPR.
Model   name | SDPA | Inductor_new | Inductor_old | Inductor   Ratio(old/new)
-- | -- | -- | -- | --
hf_Albert | Unfused -> Flash attention | 0.020681298 | 0.020718282 | **1.001788324**
hf_Bert | Unfused -> Flash attention | 0.019932816 | 0.019935424 | **1.000130842**
hf_Bert_large | Unfused -> Flash attention | 0.047949174 | 0.048312502 | **1.007577355**
llama | Unfused -> Flash attention | 0.018528057 | 0.01861126 | **1.0044907**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105131
Approved by: https://github.com/drisspg
ghstack dependencies: #104583, #104584, #103826, #104693, #104863, #107128
2023-08-20 08:56:21 +00:00
2d2d43d9fb add more check on LSTMCell (#107380)
Just like #107223, operator ``LSTMCell`` have the same problems as ``GRUCell``, and add some check and tests related to fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107380
Approved by: https://github.com/ezyang
2023-08-18 20:44:17 +00:00
02bcaf45f6 Revert "Add backward check for test_memory_format (#106104)"
This reverts commit 2e44adb06608d09a36b899ffdb375cb7d46a78d2.

Reverted https://github.com/pytorch/pytorch/pull/106104 on behalf of https://github.com/huydhn due to Sorry for reverting this but it is failing inductor job in trunk 2e44adb066.  I will add ciflow/inductor label to the PR make sure that the test runs there ([comment](https://github.com/pytorch/pytorch/pull/106104#issuecomment-1683119990))
2023-08-17 23:45:31 +00:00
2e44adb066 Add backward check for test_memory_format (#106104)
Add backward check for test_memory_format.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106104
Approved by: https://github.com/mikaylagawarecki
2023-08-17 21:19:34 +00:00
a4229690e3 Add Some Checks about dim (#107223)
Fixes #106769

As mentioned in [GRUCell](https://pytorch.org/docs/stable/generated/torch.nn.GRUCell.html#grucell), `hidden` should have the same dimension as `input`, and the dimension should be either `1D` or `2D`.

As for other aspects, it has been verified in `C++`, such as the batch of `Input` and `hidden` are the same, `Input`'s Dim1 and `input_size` are the same, `hidden`'s Dim1 and `hidden_size` are the same, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107223
Approved by: https://github.com/albanD
2023-08-16 22:03:31 +00:00
1317dbf176 Reland "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)" (#106632)
Previous one was reverted because the PR stacked under which added error-checking to Pad variants https://github.com/pytorch/pytorch/pull/106147 was reverted as internally some people pass 2D inputs to ZeroPad2d (which should actually take 3d or 4d inputs :) but there wasn't actually anything this PR was breaking according to my understanding

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106632
Approved by: https://github.com/albanD
2023-08-07 20:10:25 +00:00
dfcfd5cedb Revert "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)"
This reverts commit 87d253697116eee12d6010233d0a57fd5b152e9e.

Reverted https://github.com/pytorch/pytorch/pull/106148 on behalf of https://github.com/malfet due to Reverting as dependent PR https://github.com/pytorch/pytorch/pull/106147 was reverted as well ([comment](https://github.com/pytorch/pytorch/pull/106148#issuecomment-1662344543))
2023-08-02 14:46:00 +00:00
d83b887f2a Revert "Add error checking for padding modules (#106147)"
This reverts commit 0547b6279d6f7249c0e588508c2561589514d3aa.

Reverted https://github.com/pytorch/pytorch/pull/106147 on behalf of https://github.com/jeanschmidt due to sadly it is breaking internal builds, and I can't coordinate a FF due to timezone differences ([comment](https://github.com/pytorch/pytorch/pull/106147#issuecomment-1661870970))
2023-08-02 09:37:40 +00:00
87d2536971 Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)
Fixes #105749 https://github.com/pytorch/pytorch/issues/95320

(tldr is that input should always be `[N, C, H, (W, D])` where only H, W and D dimensions get circular padding, so the 2D case where user wants both dimensions to be padded --> they should `.unsqueeze(0)` (as is the case for `Reflection/ReplicationPad`) but we didn't document this for circular padding. [This seems to be the old docstring](277b05014a/torch/nn/functional.py (L4689)) that was somehow lost.

Fixes no_batch_dim support https://github.com/pytorch/pytorch/issues/104860

- Adds missing documentation for circular padding
- Adds missing CircularPad modules
- Migrates legacy test_nn tests from circular padding to ModuleInfo
- Adds no_batch_dim support + sample inputs that test this

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106148
Approved by: https://github.com/albanD
ghstack dependencies: #106325, #106147
2023-08-01 12:49:58 +00:00
0547b6279d Add error checking for padding modules (#106147)
Fixes https://github.com/pytorch/pytorch/issues/105627

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106147
Approved by: https://github.com/albanD
ghstack dependencies: #106325
2023-08-01 12:49:58 +00:00
c9be60cd0e Add error inputs to ModuleInfo (mirroring OpInfo) (#106325)
Add infra for error inputs to ModuleInfos, migrate first few error inputs tests from test_nn.py (more to come!)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106325
Approved by: https://github.com/albanD
2023-08-01 12:49:56 +00:00
e18d53e2df Added ModuleInfo test for meta device ctx init (#105871)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105871
Approved by: https://github.com/albanD
2023-07-26 01:57:54 +00:00
be03a56955 [BE] Enable ruff's UP rules and autoformat testing/ (#105425)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105425
Approved by: https://github.com/malfet
2023-07-18 21:04:39 +00:00
a66f08d626 enable channels last for replication padding on CPU (#102597)
Enable channels last support for replication padding on CPU. This patch add channels last support for ReplicationPad2d/3d on CPU backend. The following test cases will pass with this patch:
```
python test_modules.py TestModuleCPU.test_memory_format_nn_ReplicationPad2d_cpu_float32
python test_modules.py TestModuleCPU.test_memory_format_nn_ReplicationPad3d_cpu_float32
```

The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket.

### single core inference
```
(before)
ReplicationPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) , NHWC: 0.339 ms
ReplicationPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) , NHWC: 82.935 ms

(after)
ReplicationPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) ,  NHWC: 0.324 ms
ReplicationPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) ,  NHWC: 16.717 ms
```

### single socket inference
```
(before)
ReplicationPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) , NHWC: 0.135 ms
ReplicationPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) , NHWC: 7.203 ms

(after)
ReplicationPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) ,  NHWC: 0.029 ms
ReplicationPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) ,  NHWC: 3.174 ms
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102597
Approved by: https://github.com/CaoE, https://github.com/cpuhrsch
2023-07-14 03:44:55 +00:00
f73757d551 enable channels last for reflection padding on CPU (#102518)
Add channels last support for reflection padding on CPU. The following test cases will pass with this patch:
```
python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32
python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad3d_cpu_float32
```

The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket.

### single core inference
```
(before)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) ,  NHWC: 0.356 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) ,  NHWC: 86.821 ms

(after)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) ,  NHWC: 0.328 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) ,  NHWC: 16.806 ms
```

### single socket inference
```
(before)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) ,  NHWC: 0.142 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) ,  NHWC: 7.367 ms

(after)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) ,  NHWC: 0.027 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) , NHWC: 3.181 ms
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102518
Approved by: https://github.com/CaoE, https://github.com/cpuhrsch
2023-07-13 16:22:31 +00:00
86e0eda18d Add partial derivative unit tests (#103809)
Adds the unit tests requested in #95810

This PR also addresses a gap in unit testing of gradients, as `gradcheck` always performs total derivatives w.r.t. all arguments and module parameters. Some modules have different code paths for partial derivatives, e.g. `LayerNorm`, and those should be tested separately.

The PR has the following limitations:
- it does not test partial derivatives w.r.t. every combination of arguments, which would exponentially increase CI time.
- it does not implement the same logic for Hessians, where the increase in CI time would be quadratic in the number of arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103809
Approved by: https://github.com/kit1980
2023-06-25 00:36:10 +00:00
cecfcf1e17 [MPS] Handle MPS failures of test_modules.py in common_modules.py (#95334)
- Also cleaned up `test_modules.py` from skipMPS code.
- Added `skipMPS` for unsupported or failing tests on MPS backend in common_modules.py.
   (We'll remove `skipMPS` from those tests once a fix is available for them.)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95334
Approved by: https://github.com/kulinseth, https://github.com/albanD
2023-05-09 03:55:16 +00:00
2c6c7deeb3 Added ModuleInfos for Pooling ops (#98358)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98358
Approved by: https://github.com/albanD
2023-04-05 19:39:07 +00:00
3a0ad3c194 [easy] Remove large LayerNorm sample input causing OOM from ModuleInfo (#98424)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98424
Approved by: https://github.com/huydhn, https://github.com/albanD
2023-04-05 19:38:15 +00:00
96ad739ddc Added ModuleInfos for {*}Norm modules (#97919)
Not adding Lazy variants yet pending investigation of #97915

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97919
Approved by: https://github.com/albanD
2023-04-04 01:15:25 +00:00
6871665a97 Avoid copies in matmul (no ghstack) (#97355)
Resubmit of https://github.com/pytorch/pytorch/pull/76828 without using ghstack so that @ngimel can import it and help me debug the issue why it was reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97355
Approved by: https://github.com/ngimel, https://github.com/malfet
2023-03-29 06:54:09 +00:00
1a2dcff127 Added ModuleInfos for remaining activation functions (#97704)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97704
Approved by: https://github.com/albanD
2023-03-28 17:11:41 +00:00
a283c15e34 Added ModuleInfos for {*}LU modules (#97375)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97375
Approved by: https://github.com/albanD, https://github.com/jbschlosser
2023-03-28 00:36:31 +00:00
236bac811a Add ModuleInfos for Adaptive{Max/Avg}Pool ops (#97291)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97291
Approved by: https://github.com/albanD
2023-03-27 19:45:37 +00:00
0b094ca37f Add gradcheck_nondet_tol to a few padding moduleinfos (#97265)
Fixes #96739, see https://github.com/pytorch/pytorch/issues/96739#issuecomment-1478327704

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97265
Approved by: https://github.com/albanD
2023-03-21 23:46:28 +00:00
152c1529ca Add tests for all padding layers to module_db in common_modules.py (#96641)
Adding the PR discussed in #96295.

- Adds tests for all current padding layers to `module_db` in `torch/testing/_internal/common_modules.py` ( `nn.ReflectionPad`, `nn.ReplicationPad`, `nn.ZeroPad`, `nn.ConstantPad` ) for 1D, 2D, and 3D variants.
- Removes tests for the same padding layers from `torch/testing/_internal/common_nn.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96641
Approved by: https://github.com/albanD
2023-03-14 17:42:10 +00:00
8c8148c887 Revert D43643526: Multisect successfully blamed D43643526 for test or build failures (#96126)
Summary:
This diff is reverting D43643526
Depends on D43693521
D43643526: Avoid copies in matmul (#76828) by generatedunixname499836121 has been identified to be causing the following test or build failures:

Tests affected:
- [mle/favour:tests - favour_test.py::TestLinears::test_psd](https://www.internalfb.com/intern/test/562950027104300/)

Here's the Multisect link:
https://www.internalfb.com/intern/testinfra/multisect/1611690
Here are the tasks that are relevant to this breakage:
T146911536: 5 tests started failing for oncall prob in the last 2 weeks
We're generating a revert to back out the changes in this diff, please note the backout may land if someone accepts it.

Test Plan: NA

Differential Revision: D43693526

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96126
Approved by: https://github.com/weiwangmeta
2023-03-06 22:30:07 +00:00