Commit Graph

132 Commits

Author SHA1 Message Date
cyy
d44daebdbc [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-31 01:20:45 +00:00
67739d8c6f Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 699db7988d84d163ebb6919f78885e4630182a7a.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))
2024-05-30 01:16:57 +00:00
cyy
699db7988d [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-29 11:58:03 +00:00
cdbb2c9acc Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 4fdbaa794f9d5af2f171f772a51cb710c51c925f.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))
2024-05-29 03:02:35 +00:00
cyy
4fdbaa794f [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-27 03:54:03 +00:00
db9c6aeec6 Revert "Skip test_memory_format_nn_BatchNorm2d in inductor (#125970)" (#126594)
This reverts commit 0a9c6e92f8d1a35f33042c8dab39f23b7f39d6e7.

enable the test since it's fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126594
Approved by: https://github.com/huydhn
ghstack dependencies: #126593
2024-05-25 01:27:02 +00:00
df4b7cb5f7 Reapply "Skip test_memory_format_nn_BatchNorm2d in inductor (#125970)" (#126594)
This reverts commit ce6e36bf8b524c3f4b07605c5b3af2b7d5ba8fd9.

Reverted https://github.com/pytorch/pytorch/pull/126594 on behalf of https://github.com/clee2000 due to broke tests on inductor? test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CTCLoss_cuda_float64 43f2f43eb3 https://github.com/pytorch/pytorch/actions/runs/9200644034/job/25308511495 ([comment](https://github.com/pytorch/pytorch/pull/126586#issuecomment-2126228689))
2024-05-23 04:54:28 +00:00
ce6e36bf8b Revert "Skip test_memory_format_nn_BatchNorm2d in inductor (#125970)" (#126594)
This reverts commit 0a9c6e92f8d1a35f33042c8dab39f23b7f39d6e7.

enable the test since it's fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126594
Approved by: https://github.com/huydhn
ghstack dependencies: #126586, #126593
2024-05-22 22:43:09 +00:00
0a9c6e92f8 Skip test_memory_format_nn_BatchNorm2d in inductor (#125970)
Skipping the test in the context of https://github.com/pytorch/pytorch/issues/125967 until the issue is root caused and fixed properly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125970
Approved by: https://github.com/clee2000
2024-05-11 04:11:18 +00:00
2f3b0befed [BE]: Apply ruff FURB 118. (#124743)
Replaces various lambdas with operator.itemgetter which is more efficient (as it's a builtin function). Particularly useful for when lambdas are used as 'key' functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124743
Approved by: https://github.com/albanD, https://github.com/malfet
2024-04-26 14:34:52 +00:00
6fcbeb3489 [ATen] Add CPU fp16 support for nll_loss and cross_entropy_loss (#123256)
Add CPU FP16 support for nll_loss and cross_entropy_loss.
Resolve issue #123328.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123256
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet
2024-04-18 11:44:38 +00:00
487b6d40ec Add RMSNorm module (#121364)
Similar to dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)

**The implementation here is not optimized and we welcome pull requests to improve this**

- Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation
- Remove the [upcast to float and downcast
](dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73))

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)

Differential Revision: [D55485840](https://our.internmc.facebook.com/intern/diff/D55485840)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364
Approved by: https://github.com/albanD
2024-03-29 18:05:28 +00:00
8698121636 Revert "Add RMSNorm module (#121364)"
This reverts commit a7306de0dc96cda8b698d19680a88d27aa45a31d.

Reverted https://github.com/pytorch/pytorch/pull/121364 on behalf of https://github.com/atalman due to Broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/121364#issuecomment-2025502007))
2024-03-28 15:31:10 +00:00
cc12668053 Fix swap_tensors path in _apply for modules that inherit from RNNBase (RNN, GRU, LSTM) (#122800)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122800
Approved by: https://github.com/albanD
2024-03-27 23:34:16 +00:00
a7306de0dc Add RMSNorm module (#121364)
Similar to dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)

**The implementation here is not optimized and we welcome pull requests to improve this**

- Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation
- Remove the [upcast to float and downcast
](dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364
Approved by: https://github.com/albanD
2024-03-27 21:39:30 +00:00
d621e3e3b8 Add exhaustive module and optimizer tests for torch.load(state_dict, weights_only=True) (#121049)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121049
Approved by: https://github.com/janeyx99
2024-03-05 14:27:50 +00:00
bfa71b523d add complex32 to v3_dtypes (#120388)
Fixes [#120290](https://github.com/pytorch/pytorch/issues/120290)
Fixes https://github.com/pytorch/pytorch/issues/73502

use `v3_dtypes` and `torch._utils._rebuild_tensor_v3` to handle torch.save(complex32)

result:
![image](https://github.com/pytorch/pytorch/assets/37650440/18b6cbb3-fb3f-4855-9d48-374014647988)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120388
Approved by: https://github.com/albanD
2024-02-28 02:32:29 +00:00
677e67c399 Update nn.Module._apply to not gate on should_use_set_data when swap_tensors is set (#120659)
This updates the nesting of if statements in `nn.Module._apply` such that if

`torch.__future__.set_swap_module_params_on_conversion(True)`, we always try to swap regardless of whether
- `torch._has_compatible_shallow_copy_type(param, fn(param)`
- `torch.__future__.set_overwrite_module_params_on_conversion` is set

This means that `meta_module.to_empty('device')` can now use the swap_tensors path cc @awgu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120659
Approved by: https://github.com/albanD
2024-02-28 00:59:34 +00:00
b3df3e4e94 Restore OpInfo/ModuleInfo tests in Inductor-wrapped tests (#119693)
I accidentally disabled this without realizing it. It turns out that
PYTORCH_TEST_WITH_INDUCTOR=1 implies PYTORCH_TEST_WITH_DYNAMO=1, which
activates skipIfTorchDynamo decorators.

Test Plan:
- wait for CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119693
Approved by: https://github.com/bdhirsh
2024-02-12 22:44:45 +00:00
2c91e13afc Add lowerings to special functions (#119187)
As in the title.

In addition, the PR introduces infrastructure for lowerings of pointwise functions that have both cpp and triton implementations available.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119187
Approved by: https://github.com/peterbell10
2024-02-11 16:35:40 +00:00
db1a4dcb5a [BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039)
Right now, `ModuleInfo.dtypes` defaults to `torch.testing._internal.common_dtype.floating_types()`, almost no ModuleInfos override this (so only `float32` and `float64` are tested).

This is the first step to clean up/improve dtype testing for `ModuleInfos` and fix #116626.

Follow up PRs will updates `dtypes=` (and perhaps `dtypesIf{Device}` (if it makes sense)) for each `ModuleInfo`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119039
Approved by: https://github.com/janeyx99
2024-02-08 20:35:32 +00:00
d5a718d27b Add swap_tensors path to nn.Module._apply (#117167)
Added `torch.__future__.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify  to override this and default to `True` in `nn.Module._apply` if input is a tensor subclass.

From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run*** if the autograd graph is still alive. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1.  The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](6cf1fc66e3/torch/csrc/autograd/variable.cpp (L307)). **Future work might be to swap the refs that the `AccumulateGrad` nodes hold if it is necessary.**

***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad` OR the autograd graph is no longer alive because the output has been garbage collected.

If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error.

**`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117167
Approved by: https://github.com/albanD
ghstack dependencies: #118028
2024-02-07 18:55:44 +00:00
c0164f2393 Revert "[BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039)"
This reverts commit 04d52d5399ad4abb8af9e8405be79e2a7f8b4c7a.

Reverted https://github.com/pytorch/pytorch/pull/119039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing MPS test in trunk 04d52d5399,  may be a landrace ([comment](https://github.com/pytorch/pytorch/pull/119039#issuecomment-1928595240))
2024-02-06 01:13:28 +00:00
04d52d5399 [BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039)
Right now, `ModuleInfo.dtypes` defaults to `torch.testing._internal.common_dtype.floating_types()`, almost no ModuleInfos override this (so only `float32` and `float64` are tested).

This is the first step to clean up/improve dtype testing for `ModuleInfos` and fix #116626.

Follow up PRs will updates `dtypes=` (and perhaps `dtypesIf{Device}` (if it makes sense)) for each `ModuleInfo`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119039
Approved by: https://github.com/janeyx99
2024-02-05 23:19:01 +00:00
9bce208dfb Replace follow_imports = silent with normal (#118414)
This is a lot of files changed! Don't panic! Here's how it works:

* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.

In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.

The codemod was done with this script authored by GPT-4:

```
import glob

exclude_patterns = [
    ...
]

for pattern in exclude_patterns:
    for filepath in glob.glob(pattern, recursive=True):
        if filepath.endswith('.py'):
            with open(filepath, 'r+') as f:
                content = f.read()
                f.seek(0, 0)
                f.write('# mypy: ignore-errors\n\n' + content)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD
2024-01-27 02:44:11 +00:00
06576d859d Stop running ModuleInfo tests under Dynamo (#117318)
This is a policy decision, similar to the OpInfo one. The problem is
that they just take too long to run when we reset() before and after
each.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117318
Approved by: https://github.com/voznesenskym
2024-01-12 22:17:59 +00:00
d0cf2182ea Fix TransformerEncoderLayer for bias=False (#116760)
Fixes https://github.com/pytorch/pytorch/issues/116385

Don't call `torch._transformer_encoder_layer_fwd` when `bias=False`

`bias=False` was not something that `torch._transformer_encoder_layer_fwd`  was meant to work with, it was my bad that this wasn't tested as I approved https://github.com/pytorch/pytorch/pull/101687.

`bias=False` was causing the `tensor_args` in [`TransformerEncoder`](a17de2d645/torch/nn/modules/transformer.py (L663-L677)) to contain `None`s and error on checks for the fastpath like `t.requires_grad for t in tensor_args`.

Alternative fix would be to
1) Pass `torch.zeros_like({*}.weight)` to the kernel when `bias=False` and filter `tensor_args` as appropriate
2) Fix `torch._transformer_encoder_layer_fwd` to take `Optional<Tensor>` for biases and fix the kernels as appropriate

Let me know if these approaches are preferable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116760
Approved by: https://github.com/jbschlosser
2024-01-05 00:13:10 +00:00
3acb7972b0 [BE] Test CrossEntropyLoss for torch.half (#116681)
To test it on MPS and CUDA devices
Also, move some float64 skip-tests for MPS to xfail, same as CPU tests for torch.half
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116681
Approved by: https://github.com/xuzhao9, https://github.com/mikaylagawarecki
2024-01-04 02:16:09 +00:00
ac60a70e06 Migrated loss functions to ModuleInfos (#115584)
Migrates most tests in `common_nn.py:criterion_tests` to ModuleInfos.

**I can split this up if it is too large to review**

What this PR does not include:
- [`no_batch_dim` tests](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L3995-L4112)
- [tests that use the functional variant of the loss function and `wrap_functional`](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L1079-L1128)

#### On test times
This PR increases test time by ~58s locally
Before this PR:
```
>>> python test/test_nn.py -k Loss
Ran 1003 tests in 28.977s
```
After this PR
```
>>> python test/test_nn.py -k Loss
Ran 368 tests in 23.073s
```

```
>>> python test/test_modules.py -k Loss
Ran 836 tests in 63.900s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115584
Approved by: https://github.com/janeyx99
ghstack dependencies: #115617
2023-12-14 16:21:05 +00:00
626b7dc847 Revert "Migrated loss functions to ModuleInfos (#115584)"
This reverts commit f138b08d2e9c8d676f2a404e97d773f42132b0c7.

Reverted https://github.com/pytorch/pytorch/pull/115584 on behalf of https://github.com/atalman due to OSS CI oncall, breaks slow test ([comment](https://github.com/pytorch/pytorch/pull/115584#issuecomment-1854855080))
2023-12-13 23:34:30 +00:00
f138b08d2e Migrated loss functions to ModuleInfos (#115584)
Migrates most tests in `common_nn.py:criterion_tests` to ModuleInfos.

**I can split this up if it is too large to review**

What this PR does not include:
- [`no_batch_dim` tests](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L3995-L4112)
- [tests that use the functional variant of the loss function and `wrap_functional`](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L1079-L1128)

#### On test times
This PR increases test time by ~58s locally
Before this PR:
```
>>> python test/test_nn.py -k Loss
Ran 1003 tests in 28.977s
```
After this PR
```
>>> python test/test_nn.py -k Loss
Ran 368 tests in 23.073s
```

```
>>> python test/test_modules.py -k Loss
Ran 836 tests in 63.900s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115584
Approved by: https://github.com/janeyx99
ghstack dependencies: #115617
2023-12-12 22:20:20 +00:00
68f74dd162 Add python and C++ support for LPPool3d (#114199)
Add python and C++ support for LPPool3d to Fixes #114114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199
Approved by: https://github.com/mikaylagawarecki
2023-12-08 18:18:44 +00:00
b7b2178204 [BE]: Remove useless lambdas (#113602)
Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602
Approved by: https://github.com/albanD
2023-11-14 20:06:48 +00:00
7c9052165a add fp16 support for native conv and deconv on CPU (#99497)
### Testing

Native conv vs. mkldnn conv on SPR (with avx512_fp16 support)

Single core:

Input | Naïve impl   / us | oneDNN /   us | Speed up
-- | -- | -- | --
IC:   64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 | 34676789 | 524199.8 | 66.15185
IC:   128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 | 33454125 | 349844.4 | 95.62573
IC: 256, OC: 256, kernel: 3, stride: 1,   N: 1, H: 16, W: 16, G: 1, pad: 0 | 317650.1 | 2317.677 | 137.0554
IC: 128, OC: 256, kernel: 3, stride: 1,   N: 1, L: 64 | 15334.68 | 167.264 | 91.67952

56 cores:
Input | Naïve impl   / us | oneDNN /   us | Speed up
-- | -- | -- | --
IC:   64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 | 1032064 | 11073.58 | 93.20061
IC:   128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 | 1000097 | 16371.19 | 61.08883
IC:   256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 | 981813.4 | 9008.908 | 108.9825
IC: 1024, OC: 256, kernel: 1, stride: 1,   N: 256, H: 14, W: 14, G: 1, pad: 0 | 1082606 | 10150.47 | 106.6558
IC: 256, OC: 256, kernel: 3, stride: 1,   N: 1, H: 16, W: 16, G: 1, pad: 0 | 319980.6 | 181.598 | 1762.027

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497
Approved by: https://github.com/jgong5, https://github.com/cpuhrsch
2023-09-25 01:31:26 +00:00
003c5bb156 Add checks to num_layers for RNN, LSTM, GRU (#108853)
Fixes #108223

As the title shown

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108853
Approved by: https://github.com/mikaylagawarecki
2023-09-09 19:33:52 +00:00
8f02884569 add Half support for GroupNorm on CPU (#100234)
### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 2.45E-05 | 3.26E-05 | 6.87E-05 | 7.40E-05
[10,   128, 80, 80] | 0.000726 | 0.000606 | 0.002183 | 0.001112

* Channels Last:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 2.88E-05 | 2.72E-05 | 6.56E-05 | 6.63E-05
[10,   128, 80, 80] | 0.00076 | 0.000256 | 0.002385 | 0.000735

Single core:

* Contiguous:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 9.47E-05 | 1.90E-04 | 2.03E-04 | 3.10E-04
[10,   128, 80, 80] | 6.25E-03 | 8.98E-03 | 0.016485 | 0.01369

* Channels Last:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 8.66E-05 | 7.89E-05 | 1.95E-04 | 1.43E-04
[10,   128, 80, 80] | 5.97E-03 | 3.13E-03 | 0.01626 | 8.70E-03

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100234
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-09-01 21:25:24 +00:00
584a01b650 Fix LayerNorm(bias=False) error (#108060)
Fixes #108048

- [ ] Cherry pick this [here](https://github.com/pytorch/pytorch/issues/108055)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108060
Approved by: https://github.com/jbschlosser, https://github.com/albanD, https://github.com/malfet
2023-08-28 18:23:13 +00:00
3267996372 add channel last 3d support for maxpool3d on CPU (#97775)
### Testing
Single socket (28 cores):

shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms  | bf16 backward / ms
-- | -- | -- | -- | --
size: (1, 56, 264, 264), kernel: 3,   stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485
size: (1, 56, 264, 264), kernel: 3,   stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506
size: (32, 32, 100, 100), kernel: 3,   stride: 1, mem_format: contig  | 10.63426 | 15.28637 | 2.67656 | 1.71365
size: (32, 32, 100, 100), kernel: 3,   stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923
size: (4, 19, 10, 16, 16), kernel:   3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155
size: (4, 19, 10, 16, 16), kernel:   3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364

Single core:

shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms
-- | -- | -- | -- | --
size: (1, 56, 264, 264), kernel: 3,   stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541
size: (1, 56, 264, 264), kernel: 3,   stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323
size: (32, 32, 100, 100), kernel: 3,   stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853
size: (32, 32, 100, 100), kernel: 3,   stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796
size: (4, 19, 10, 16, 16), kernel:   3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229
size: (4, 19, 10, 16, 16), kernel:   3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97775
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-08-26 00:21:27 +00:00
3992450e8d Add backward check for test_memory_format (#106104)
Add backward check for test_memory_format.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106104
Approved by: https://github.com/mikaylagawarecki
2023-08-25 18:11:54 +00:00
3022a395f3 test_memory_format test now passes on rocm (#107696)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107696
Approved by: https://github.com/pruthvistony, https://github.com/albanD
2023-08-23 16:39:19 +00:00
71632d4d24 [cpu] add sdpa choice and UT (#105131)
Feature RFC: https://github.com/pytorch/rfcs/pull/56.

Write an SDPA selecting function for CPU to automatically choose one SDPA implementation among several ones. There are two CPU implementations which could be chosen: the unfused SDPA and flash attention. In general, flash attention has a higher priority than the unfused SDPA. For cases where flash attention is not applicable, such as manually disabling flash attention or the inputs not 4 dimensional, the unfused SDPA is chosen.

## Performance of the stack

### NanoGPT's SDPA kernel
Using benchmark [repo](https://github.com/mingfeima/bench_sdpa/blob/main/README.md), with one socket.
Shape: Batch size 1, Sequence length 1024, Head number 25, Head size 64.
Machine: SPR.

| Dtype    | Causal   | Mode      | SDPA            | Time (ms per iter) | Speedup |
| -------- | -------- | -------   | -------         | -------            | ------- |
| float32  | FALSE    | Inference | Unfused         | 3.081              |         |
|          |          |           | Flash attention | 1.665              | **1.85045** |
| float32  | TRUE     | Inference | Unfused         | 3.463              |         |
|          |          |           | Flash attention | 1.662              | **2.083634**|
| bfloat16 | FALSE    | Inference | Unfused         | 1.203              |         |
|          |          |           | Flash attention | 1.154              | **1.042461**|
| bfloat16 | TRUE     | Inference | Unfused         | 1.543              |         |
|          |          |           | Flash attention | 1.154              | **1.337088**|
| float32  | FALSE    | Training  | Unfused         | 54.938             |         |
|          |          |           | Flash attention | 23.029             | **2.385601**|
| float32  | TRUE     | Training  | Unfused         | 58.266             |         |
|          |          |           | Flash attention | 17.835             | **3.266947**|
| bfloat16 | FALSE    | Training  | Unfused         | 18.924             |         |
|          |          |           | Flash attention | 18.886             | **1.002012**|
| bfloat16 | TRUE     | Training  | Unfused         | 21.08              |         |
|          |          |           | Flash attention | 14.172             | **1.48744** |

### Stable Diffusion
Following model's [BKM](https://github.com/intel-innersource/frameworks.ai.models.intel-models/blob/develop/quickstart/diffusion/pytorch/stable_diffusion/inference/cpu/README.md).
Mode: Inference; Machine: SPR.

| Dtype    | SDPA                    | Throughput (fps) | Speedup SDPA | Total Time (ms) | Speedup |
| -------- | --------                | -------          | -------      | -------         | ------- |
| float32  | Unfused                 | 1.63             |              | 1139            |         |
|          | Flash attention         | 1.983            | 1.216564     | 547.488         | **2.080411**|
| bfloat16 | Flash attention in IPEX | 4.784            |              | 429.051         |         |
|          | Flash attention         | 4.857            | 1.015259     | 408.823         | **1.049479**|

### LLM models of Torchbench

Dtype: float32; Mode: Inference, single socket; Machine: CPX.
Model   name | SDPA | Inductor_new | Inductor_old | Inductor   Ratio(old/new)
-- | -- | -- | -- | --
hf_Albert | Unfused -> Flash attention | 0.048629309 | 0.05591545 | **1.14983024**
hf_Bert | Unfused -> Flash attention | 0.053156243 | 0.060732115 | **1.142520841**
hf_Bert_large | Unfused -> Flash attention | 0.141089502 | 0.155190077 | **1.099940636**
llama | Unfused -> Flash attention | 0.033250106 | 0.033720745 | **1.01415451**

Dtype: bfloat16; Mode: Inference, single socket; Machine: SPR.
Model   name | SDPA | Inductor_new | Inductor_old | Inductor   Ratio(old/new)
-- | -- | -- | -- | --
hf_Albert | Unfused -> Flash attention | 0.020681298 | 0.020718282 | **1.001788324**
hf_Bert | Unfused -> Flash attention | 0.019932816 | 0.019935424 | **1.000130842**
hf_Bert_large | Unfused -> Flash attention | 0.047949174 | 0.048312502 | **1.007577355**
llama | Unfused -> Flash attention | 0.018528057 | 0.01861126 | **1.0044907**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105131
Approved by: https://github.com/drisspg
ghstack dependencies: #104583, #104584, #103826, #104693, #104863, #107128
2023-08-20 08:56:21 +00:00
2d2d43d9fb add more check on LSTMCell (#107380)
Just like #107223, operator ``LSTMCell`` have the same problems as ``GRUCell``, and add some check and tests related to fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107380
Approved by: https://github.com/ezyang
2023-08-18 20:44:17 +00:00
02bcaf45f6 Revert "Add backward check for test_memory_format (#106104)"
This reverts commit 2e44adb06608d09a36b899ffdb375cb7d46a78d2.

Reverted https://github.com/pytorch/pytorch/pull/106104 on behalf of https://github.com/huydhn due to Sorry for reverting this but it is failing inductor job in trunk 2e44adb066.  I will add ciflow/inductor label to the PR make sure that the test runs there ([comment](https://github.com/pytorch/pytorch/pull/106104#issuecomment-1683119990))
2023-08-17 23:45:31 +00:00
2e44adb066 Add backward check for test_memory_format (#106104)
Add backward check for test_memory_format.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106104
Approved by: https://github.com/mikaylagawarecki
2023-08-17 21:19:34 +00:00
a4229690e3 Add Some Checks about dim (#107223)
Fixes #106769

As mentioned in [GRUCell](https://pytorch.org/docs/stable/generated/torch.nn.GRUCell.html#grucell), `hidden` should have the same dimension as `input`, and the dimension should be either `1D` or `2D`.

As for other aspects, it has been verified in `C++`, such as the batch of `Input` and `hidden` are the same, `Input`'s Dim1 and `input_size` are the same, `hidden`'s Dim1 and `hidden_size` are the same, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107223
Approved by: https://github.com/albanD
2023-08-16 22:03:31 +00:00
1317dbf176 Reland "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)" (#106632)
Previous one was reverted because the PR stacked under which added error-checking to Pad variants https://github.com/pytorch/pytorch/pull/106147 was reverted as internally some people pass 2D inputs to ZeroPad2d (which should actually take 3d or 4d inputs :) but there wasn't actually anything this PR was breaking according to my understanding

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106632
Approved by: https://github.com/albanD
2023-08-07 20:10:25 +00:00
dfcfd5cedb Revert "Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)"
This reverts commit 87d253697116eee12d6010233d0a57fd5b152e9e.

Reverted https://github.com/pytorch/pytorch/pull/106148 on behalf of https://github.com/malfet due to Reverting as dependent PR https://github.com/pytorch/pytorch/pull/106147 was reverted as well ([comment](https://github.com/pytorch/pytorch/pull/106148#issuecomment-1662344543))
2023-08-02 14:46:00 +00:00
d83b887f2a Revert "Add error checking for padding modules (#106147)"
This reverts commit 0547b6279d6f7249c0e588508c2561589514d3aa.

Reverted https://github.com/pytorch/pytorch/pull/106147 on behalf of https://github.com/jeanschmidt due to sadly it is breaking internal builds, and I can't coordinate a FF due to timezone differences ([comment](https://github.com/pytorch/pytorch/pull/106147#issuecomment-1661870970))
2023-08-02 09:37:40 +00:00
87d2536971 Add nn.CircularPad{*}d for consistency + fix no_batch_dim support (#106148)
Fixes #105749 https://github.com/pytorch/pytorch/issues/95320

(tldr is that input should always be `[N, C, H, (W, D])` where only H, W and D dimensions get circular padding, so the 2D case where user wants both dimensions to be padded --> they should `.unsqueeze(0)` (as is the case for `Reflection/ReplicationPad`) but we didn't document this for circular padding. [This seems to be the old docstring](277b05014a/torch/nn/functional.py (L4689)) that was somehow lost.

Fixes no_batch_dim support https://github.com/pytorch/pytorch/issues/104860

- Adds missing documentation for circular padding
- Adds missing CircularPad modules
- Migrates legacy test_nn tests from circular padding to ModuleInfo
- Adds no_batch_dim support + sample inputs that test this

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106148
Approved by: https://github.com/albanD
ghstack dependencies: #106325, #106147
2023-08-01 12:49:58 +00:00
0547b6279d Add error checking for padding modules (#106147)
Fixes https://github.com/pytorch/pytorch/issues/105627

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106147
Approved by: https://github.com/albanD
ghstack dependencies: #106325
2023-08-01 12:49:58 +00:00