Commit Graph

155 Commits

Author SHA1 Message Date
0f47e76937 [MPS] Implement hardshrink metal kernel (#155304)
Implements the forward and backward hardshrink operators as Metal kernels.
In order to support the lambda parameter, we extend the `exec_unary_kernel`  and `exec_binary_kernel` methods. Now they take an optional Scalar and an optional ScalarType argument. When the optional ScalarType is provided, it overrides the type of the Scalar.
We add a new `REGISTER_UNARY_ALPHA_OP` macro, and modify the existing `REGISTER_BINARY_ALPHA_OP` to support the new feature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155304
Approved by: https://github.com/malfet
2025-06-10 18:20:27 +00:00
e7698ff5cf [MPS] Move abs op to Metal (#155474)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155474
Approved by: https://github.com/Skylion007, https://github.com/malfet
2025-06-10 00:23:59 +00:00
cyy
970fefcc53 Remove outdated skipCUDAIfCudnnVersionLessThan decoration (#148940)
Test conditions for CUDNN 7 and 8 were removed because we have moved to CUDNN 9.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148940
Approved by: https://github.com/mikaylagawarecki
2025-03-13 18:02:50 +00:00
8f71d4563e Fix rms_norm in fp16/bf16 (#147203)
Fixes #134106. This PR moves the `upcasted_result` down-casting after all computation is done.

Since the multiplication with the weight_opt input is not done in half precision, the current code path is doing the following: fp16 -> fp32 -> fp16 -> fp32 -> fp16. What we want tho is to avoid down-casting and this PR proposes: fp16 -> fp32 -> fp16. This results in better accuracy as it avoids truncating.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147203
Approved by: https://github.com/eqy
2025-03-08 04:43:18 +00:00
edd640a95a [BE][Ez]: Use itertools.chain.from_iterable when possible (#148190)
Often makes the code more readable, more efficient, and adds support for infinite iterables.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148190
Approved by: https://github.com/jansel, https://github.com/malfet
2025-03-06 20:37:06 +00:00
c219c5ca38 Fix code descriptions in the test package. (#148145)
The parameter and function description have something wrong and make them correct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148145
Approved by: https://github.com/janeyx99
2025-03-04 19:14:41 +00:00
c7ca1df37e Disable slow gradcheck for nn.Transformer ModuleInfo (#145531)
Fixes https://github.com/pytorch/pytorch/issues/117140

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145531
Approved by: https://github.com/mikaylagawarecki
ghstack dependencies: #145520
2025-01-25 00:58:03 +00:00
dea7ad3371 PEP585 update - torch/testing (#145200)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145200
Approved by: https://github.com/bobrenjc93
2025-01-20 22:42:42 +00:00
3b6b306b71 Migrate from Tuple -> tuple in torch/testing (#144256)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144256
Approved by: https://github.com/aorenste
2025-01-10 06:37:55 +00:00
2d52f7946b [BE] Use torch.log1p(x) instead of torch.log(1+x) (#141167)
To fix TOR107 linter violations
Found while trying to migrate PyTorch to latest torchfix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141167
Approved by: https://github.com/kit1980, https://github.com/Skylion007
2024-11-21 00:36:20 +00:00
9c88b08ac9 [BE] Replace skipIfMPS with expectedFailureMPS (#139940)
Functionally two decorators are very similar, but one should rely on expectedFailure as much as possible to get signal when something is fixed.
- Move `product_version` variable from `test_mps` to common_utils, but call it `MACOS_VERSION`
- Introduce `skipIfMPSOnMacOS13`  to decorate the hard crashes that happens only on MacOS13 (which at this point will not get any fixes and will be deprecated soon)
- Add `device_type='mps'` to all `skipIfMPS` per https://github.com/pytorch/pytorch/issues/140560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139940
Approved by: https://github.com/janeyx99, https://github.com/huydhn
2024-11-15 03:48:37 +00:00
0f739b8f66 [Codemod] skipIfMps->skipIfMPS (#140562)
As `MPS` is an acronym that stands for Metal Performance Shaders
Also to closer align with `skipCUDAIf` not `skipCudaIf`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140562
Approved by: https://github.com/ZainRizvi, https://github.com/r-barnes
2024-11-13 19:45:08 +00:00
68ef445c33 [MPS][Perf] Dispatch to SDP-math-mps for non-contig Tensors (#139791)
As MacOS-15 or newer supports those out of the box. This significantly reduces memory requirements and improves performance for some stable diffision networks.

Test plan: Run
```python
from diffusers import StableDiffusionXLPipeline, AutoencoderKL, EulerAncestralDiscreteScheduler
import torch
import time

vae = AutoencoderKL.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",
                                    subfolder='vae',
                                    torch_dtype=torch.bfloat16,
                                    force_upcast=False).to('mps')

pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", vae=vae,
                                                 torch_dtype=torch.bfloat16, variant="fp16").to('mps')
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

start_time = time.time()
start_mps_mem = torch.mps.driver_allocated_memory()
image = pipe(prompt="Spherical cow in vacuum",
             num_inference_steps=10,
             guidance_scale=8,
             generator=torch.Generator("mps").manual_seed(42),
             ).images[0]
end_mps_mem = torch.mps.driver_allocated_memory()
run_time = time.time() - start_time
print(f"run time in {run_time:.2f} sec, end_mps_mem {end_mps_mem/1024.0**2:.2f} Mb mem increase {(end_mps_mem-start_time)/1024.0**2:.2f} Mb")
image.save(f'bfloat16.png')
```

Before the change total memory use were 16Gb and needed 65 sec to complete, after it drops down to 14Gb and takes 50 sec to finish on M2Pro, though generated image remains the same:
![image](https://github.com/user-attachments/assets/1a35efef-9f80-4cd0-ac9c-30203eab6bb1)

Fixes https://github.com/pytorch/pytorch/issues/139389
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139791
Approved by: https://github.com/drisspg, https://github.com/Skylion007
ghstack dependencies: #139788, #139784, #139763
2024-11-06 16:25:39 +00:00
c0582fd0f8 Remove unused Python variables in torch/[b-z]* (#136963)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963
Approved by: https://github.com/ezyang
2024-10-19 16:45:22 +00:00
b6d6aa49b8 Revert "Validate input types for torch.nn.Linear and torch.nn.Bilinear (#135596)"
This reverts commit e157ce3ebbb3f30d008c15914e82eb74217562f0.

Reverted https://github.com/pytorch/pytorch/pull/135596 on behalf of https://github.com/malfet due to It's too restrictive, should allow other int-like types, such as `numpy.int64` ([comment](https://github.com/pytorch/pytorch/pull/135596#issuecomment-2349714104))
2024-09-13 18:06:56 +00:00
e157ce3ebb Validate input types for torch.nn.Linear and torch.nn.Bilinear (#135596)
Adding validation checks to check the input types and display better error messages for the same.
Fixes https://github.com/pytorch/pytorch/issues/135463

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135596
Approved by: https://github.com/malfet
2024-09-12 21:28:37 +00:00
9a04cfbeff fix for fp16 (#134106)
This PR is a replacement for https://github.com/pytorch/pytorch/pull/133085 for pushing a quick fix for RMSNorm.
The original author is @kkontny

Previous PR summary:
Since FP16 has quite small dynamic range it is very easy to overflow while computing `at::pow(input, 2)` , and it happens in real world computation.

I've tried to use `nn.RMSNorm` fused implementation instead of `LlamaRMSNorm` inside `transformers` implementation of Llama (`src/transformers/models/llama/modeling_llama.py`). It started to give wrong answers in Fp16 while still giving good in FP32. I figured out happens due to overflow while computing square of the input tensor.

Original `LLamaRMSNorm` implementation upcasts input to fp32 to prevent this and give better numerical stability.

```
class LlamaRMSNorm(nn.Module):
    def __init__(self, hidden_size, eps=1e-6):
        """
        LlamaRMSNorm is equivalent to T5LayerNorm
        """
        super().__init__()
        self.weight = nn.Parameter(torch.ones(hidden_size))
        self.variance_epsilon = eps

    def forward(self, hidden_states):
        input_dtype = hidden_states.dtype
        hidden_states = hidden_states.to(torch.float32)
        variance = hidden_states.pow(2).mean(-1, keepdim=True)
        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
        return self.weight * hidden_states.to(input_dtype)
```

Proposed commit fixed the issue. FP16 in RMSNorm has to be treated in special way, to be usable in real world implementations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134106
Approved by: https://github.com/mikaylagawarecki, https://github.com/eqy
2024-09-11 22:02:07 +00:00
71383dd3da [MPS] Fix bachnorm_2d for channels last (#134618)
By skipping gather of input tensor if memory_layout is channels_last, which is a first step towards fixing  https://github.com/pytorch/pytorch/issues/134580

Though underlying problem is much more interesting, i.e. MPS does not have a generic support for channels last, but `c10::is_contiguoius()` is true for channels last layout.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134618
Approved by: https://github.com/albanD
2024-09-03 19:20:11 +00:00
f95085fd91 [BE][MPS] Prefer xfail to skip (#134858)
This essentially undoes large skips on everything but MacOS Sequoia to nn.modules made by https://github.com/pytorch/pytorch/pull/128393

Instead it uses existing `xfail`, but guards it on `_macos15_or_newer` boolean

Before the change if run on MacOS 14:
```
 % python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3
Ran 57 tests in 0.053s

OK (skipped=32)
```
After
```
% python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3
Ran 57 tests in 0.229s

OK (skipped=10, expected failures=2)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134858
Approved by: https://github.com/janeyx99
2024-08-31 00:29:48 +00:00
8de0d7690c Use newer toAccumulateType signature in Normalization.cpp (#134540)
Which fixes BatchNorm behavior for if called with empty tensors on MPS backed. Removed `expectedFailureMPS` in test_nn.py, deleted expected failure in `test_mps.py` and adjusted `skipIfMPS` to `expectedFailureMPS`  in BatchNorm2d OpInfo decorator, but restrict it only to the memory format tests

Test Plan: CI + `python3 -c "import torch; print(torch.nn.BatchNorm2d(3, device='mps')(torch.rand(0, 3, 2, 2, device='mps')))"`

Fixes https://github.com/pytorch/pytorch/issues/134423

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134540
Approved by: https://github.com/Skylion007, https://github.com/albanD
2024-08-27 18:09:20 +00:00
d028b810fe Fix flaky GroupNorm ModuleInfo test (#133899)
Fixes https://github.com/pytorch/pytorch/issues/98677

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133899
Approved by: https://github.com/albanD
2024-08-27 14:45:51 +00:00
861bdf96f4 [MPS] Add native strided API for MPSNDArray starting with macOS 15 (#128393)
Add support for native strides in MPS starting with macOS Sequoia. This will get rid of the additional gather and scatter operations needed to solve the strides or storage offsets of the tensors.

Summary of changes (starting with macOS 15):
- Add support for **MPS strided API** (strides/storage offsets etc):
   - [initWithBuffer:offset:descriptor:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarray/4391636-initwithbuffer?language=objc)
   - [arrayViewWithCommandBuffer:descriptor:aliasing:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarray/3114040-arrayviewwithcommandbuffer?language=objc)
   - [arrayViewWithShape:strides:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarray/4408694-arrayviewwithshape?language=objc)
   - [reshapeWithCommandBuffer:sourceArray:shape:destinationArray:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarrayidentity/4438557-reshapewithcommandbuffer?language=objc)
- Add native support for NHWC convolutions (without incurring any extra copy from NCHW -> NHWC -> NCHW).
- Add support for strided output buffers (previously we would create a contiguous buffer

OSes older than macOS 15 will run the old gather/scatter code path to solve strides/storage offsets.

---

Couple performance stats collected from torchbench comparing macOS 15 vs macOS 14:
```
- test_train[functorch_maml_omniglot-mps]: 27% faster
- test_train[timm_vision_transformer-mps]: 12% faster
- test_train[hf_T5-mps]: 9.46% faster
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128393
Approved by: https://github.com/albanD

Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
2024-08-16 21:07:50 +00:00
ebc012ace6 Add hooks for execution on intel gaudi devices - 1 (#128584)
## Motivation
This is follow up to PR:https://github.com/pytorch/pytorch/pull/126970  to support Gaudi devices for Pytorch UT execution.

## Changes
We are adding additional hooks to:
1. Add dtype exceptions for Gaudi/HPU
2. Extend onlyNativeDevices decorator  functionality to add additional devices

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128584
Approved by: https://github.com/albanD
2024-07-20 05:03:36 +00:00
cyy
d44daebdbc [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-31 01:20:45 +00:00
67739d8c6f Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 699db7988d84d163ebb6919f78885e4630182a7a.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))
2024-05-30 01:16:57 +00:00
cyy
699db7988d [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-29 11:58:03 +00:00
cdbb2c9acc Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 4fdbaa794f9d5af2f171f772a51cb710c51c925f.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))
2024-05-29 03:02:35 +00:00
cyy
4fdbaa794f [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-27 03:54:03 +00:00
db9c6aeec6 Revert "Skip test_memory_format_nn_BatchNorm2d in inductor (#125970)" (#126594)
This reverts commit 0a9c6e92f8d1a35f33042c8dab39f23b7f39d6e7.

enable the test since it's fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126594
Approved by: https://github.com/huydhn
ghstack dependencies: #126593
2024-05-25 01:27:02 +00:00
df4b7cb5f7 Reapply "Skip test_memory_format_nn_BatchNorm2d in inductor (#125970)" (#126594)
This reverts commit ce6e36bf8b524c3f4b07605c5b3af2b7d5ba8fd9.

Reverted https://github.com/pytorch/pytorch/pull/126594 on behalf of https://github.com/clee2000 due to broke tests on inductor? test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CTCLoss_cuda_float64 43f2f43eb3 https://github.com/pytorch/pytorch/actions/runs/9200644034/job/25308511495 ([comment](https://github.com/pytorch/pytorch/pull/126586#issuecomment-2126228689))
2024-05-23 04:54:28 +00:00
ce6e36bf8b Revert "Skip test_memory_format_nn_BatchNorm2d in inductor (#125970)" (#126594)
This reverts commit 0a9c6e92f8d1a35f33042c8dab39f23b7f39d6e7.

enable the test since it's fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126594
Approved by: https://github.com/huydhn
ghstack dependencies: #126586, #126593
2024-05-22 22:43:09 +00:00
0a9c6e92f8 Skip test_memory_format_nn_BatchNorm2d in inductor (#125970)
Skipping the test in the context of https://github.com/pytorch/pytorch/issues/125967 until the issue is root caused and fixed properly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125970
Approved by: https://github.com/clee2000
2024-05-11 04:11:18 +00:00
2f3b0befed [BE]: Apply ruff FURB 118. (#124743)
Replaces various lambdas with operator.itemgetter which is more efficient (as it's a builtin function). Particularly useful for when lambdas are used as 'key' functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124743
Approved by: https://github.com/albanD, https://github.com/malfet
2024-04-26 14:34:52 +00:00
6fcbeb3489 [ATen] Add CPU fp16 support for nll_loss and cross_entropy_loss (#123256)
Add CPU FP16 support for nll_loss and cross_entropy_loss.
Resolve issue #123328.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123256
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet
2024-04-18 11:44:38 +00:00
487b6d40ec Add RMSNorm module (#121364)
Similar to dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)

**The implementation here is not optimized and we welcome pull requests to improve this**

- Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation
- Remove the [upcast to float and downcast
](dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73))

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)

Differential Revision: [D55485840](https://our.internmc.facebook.com/intern/diff/D55485840)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364
Approved by: https://github.com/albanD
2024-03-29 18:05:28 +00:00
8698121636 Revert "Add RMSNorm module (#121364)"
This reverts commit a7306de0dc96cda8b698d19680a88d27aa45a31d.

Reverted https://github.com/pytorch/pytorch/pull/121364 on behalf of https://github.com/atalman due to Broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/121364#issuecomment-2025502007))
2024-03-28 15:31:10 +00:00
cc12668053 Fix swap_tensors path in _apply for modules that inherit from RNNBase (RNN, GRU, LSTM) (#122800)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122800
Approved by: https://github.com/albanD
2024-03-27 23:34:16 +00:00
a7306de0dc Add RMSNorm module (#121364)
Similar to dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)

**The implementation here is not optimized and we welcome pull requests to improve this**

- Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation
- Remove the [upcast to float and downcast
](dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364
Approved by: https://github.com/albanD
2024-03-27 21:39:30 +00:00
d621e3e3b8 Add exhaustive module and optimizer tests for torch.load(state_dict, weights_only=True) (#121049)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121049
Approved by: https://github.com/janeyx99
2024-03-05 14:27:50 +00:00
bfa71b523d add complex32 to v3_dtypes (#120388)
Fixes [#120290](https://github.com/pytorch/pytorch/issues/120290)
Fixes https://github.com/pytorch/pytorch/issues/73502

use `v3_dtypes` and `torch._utils._rebuild_tensor_v3` to handle torch.save(complex32)

result:
![image](https://github.com/pytorch/pytorch/assets/37650440/18b6cbb3-fb3f-4855-9d48-374014647988)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120388
Approved by: https://github.com/albanD
2024-02-28 02:32:29 +00:00
677e67c399 Update nn.Module._apply to not gate on should_use_set_data when swap_tensors is set (#120659)
This updates the nesting of if statements in `nn.Module._apply` such that if

`torch.__future__.set_swap_module_params_on_conversion(True)`, we always try to swap regardless of whether
- `torch._has_compatible_shallow_copy_type(param, fn(param)`
- `torch.__future__.set_overwrite_module_params_on_conversion` is set

This means that `meta_module.to_empty('device')` can now use the swap_tensors path cc @awgu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120659
Approved by: https://github.com/albanD
2024-02-28 00:59:34 +00:00
b3df3e4e94 Restore OpInfo/ModuleInfo tests in Inductor-wrapped tests (#119693)
I accidentally disabled this without realizing it. It turns out that
PYTORCH_TEST_WITH_INDUCTOR=1 implies PYTORCH_TEST_WITH_DYNAMO=1, which
activates skipIfTorchDynamo decorators.

Test Plan:
- wait for CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119693
Approved by: https://github.com/bdhirsh
2024-02-12 22:44:45 +00:00
2c91e13afc Add lowerings to special functions (#119187)
As in the title.

In addition, the PR introduces infrastructure for lowerings of pointwise functions that have both cpp and triton implementations available.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119187
Approved by: https://github.com/peterbell10
2024-02-11 16:35:40 +00:00
db1a4dcb5a [BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039)
Right now, `ModuleInfo.dtypes` defaults to `torch.testing._internal.common_dtype.floating_types()`, almost no ModuleInfos override this (so only `float32` and `float64` are tested).

This is the first step to clean up/improve dtype testing for `ModuleInfos` and fix #116626.

Follow up PRs will updates `dtypes=` (and perhaps `dtypesIf{Device}` (if it makes sense)) for each `ModuleInfo`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119039
Approved by: https://github.com/janeyx99
2024-02-08 20:35:32 +00:00
d5a718d27b Add swap_tensors path to nn.Module._apply (#117167)
Added `torch.__future__.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify  to override this and default to `True` in `nn.Module._apply` if input is a tensor subclass.

From offline discussion, for now we are **not** allowing `swap_tensor` after the first module forward has been run*** if the autograd graph is still alive. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1.  The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](6cf1fc66e3/torch/csrc/autograd/variable.cpp (L307)). **Future work might be to swap the refs that the `AccumulateGrad` nodes hold if it is necessary.**

***From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad` OR the autograd graph is no longer alive because the output has been garbage collected.

If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error.

**`RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117167
Approved by: https://github.com/albanD
ghstack dependencies: #118028
2024-02-07 18:55:44 +00:00
c0164f2393 Revert "[BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039)"
This reverts commit 04d52d5399ad4abb8af9e8405be79e2a7f8b4c7a.

Reverted https://github.com/pytorch/pytorch/pull/119039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing MPS test in trunk 04d52d5399,  may be a landrace ([comment](https://github.com/pytorch/pytorch/pull/119039#issuecomment-1928595240))
2024-02-06 01:13:28 +00:00
04d52d5399 [BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039)
Right now, `ModuleInfo.dtypes` defaults to `torch.testing._internal.common_dtype.floating_types()`, almost no ModuleInfos override this (so only `float32` and `float64` are tested).

This is the first step to clean up/improve dtype testing for `ModuleInfos` and fix #116626.

Follow up PRs will updates `dtypes=` (and perhaps `dtypesIf{Device}` (if it makes sense)) for each `ModuleInfo`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119039
Approved by: https://github.com/janeyx99
2024-02-05 23:19:01 +00:00
9bce208dfb Replace follow_imports = silent with normal (#118414)
This is a lot of files changed! Don't panic! Here's how it works:

* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.

In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.

The codemod was done with this script authored by GPT-4:

```
import glob

exclude_patterns = [
    ...
]

for pattern in exclude_patterns:
    for filepath in glob.glob(pattern, recursive=True):
        if filepath.endswith('.py'):
            with open(filepath, 'r+') as f:
                content = f.read()
                f.seek(0, 0)
                f.write('# mypy: ignore-errors\n\n' + content)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD
2024-01-27 02:44:11 +00:00
06576d859d Stop running ModuleInfo tests under Dynamo (#117318)
This is a policy decision, similar to the OpInfo one. The problem is
that they just take too long to run when we reset() before and after
each.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117318
Approved by: https://github.com/voznesenskym
2024-01-12 22:17:59 +00:00
d0cf2182ea Fix TransformerEncoderLayer for bias=False (#116760)
Fixes https://github.com/pytorch/pytorch/issues/116385

Don't call `torch._transformer_encoder_layer_fwd` when `bias=False`

`bias=False` was not something that `torch._transformer_encoder_layer_fwd`  was meant to work with, it was my bad that this wasn't tested as I approved https://github.com/pytorch/pytorch/pull/101687.

`bias=False` was causing the `tensor_args` in [`TransformerEncoder`](a17de2d645/torch/nn/modules/transformer.py (L663-L677)) to contain `None`s and error on checks for the fastpath like `t.requires_grad for t in tensor_args`.

Alternative fix would be to
1) Pass `torch.zeros_like({*}.weight)` to the kernel when `bias=False` and filter `tensor_args` as appropriate
2) Fix `torch._transformer_encoder_layer_fwd` to take `Optional<Tensor>` for biases and fix the kernels as appropriate

Let me know if these approaches are preferable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116760
Approved by: https://github.com/jbschlosser
2024-01-05 00:13:10 +00:00