311 Commits

Author SHA1 Message Date
45bf3f6216 Optimized EMA implementation (#94820)
This PR proposes an optimized way to do Exponential Moving Average (EMA), which is faster than the current way using `swa_utils.AveragedModel` described in https://pytorch.org/docs/stable/optim.html#custom-averaging-strategies.

This implementation is asynchronous, and is built as an optimizer wrapper so that the EMA weight update happens without any additional CPU/GPU sync, just after optimizer steps, and with limited code changes.

Example usage:
```
model = Model().to(device)
opt = torch.optim.Adam(model.parameters())

opt = EMAOptimizer(opt, device, 0.9999)

for epoch in range(epochs):
    training_loop(model, opt)

    regular_eval_accuracy = evaluate(model)

    with opt.swap_ema_weights():
        ema_eval_accuracy = evaluate(model)
```

Here are some benchmarks (time per iteration) on various torchvision models:

|model|this PR iteration time                      |swa_utils.AveragedModel iteration time| iteration speedup                                      |
|-----|-----------------------------|-----------------------|---------------------------------------------|
|     |                             |                       |                                             |
|regnet_x_1_6gf|62.73                        |67.998                 |1.08                                         |
|regnet_x_3_2gf|101.75                       |109.422                |1.08                                         |
|regnet_x_400mf|25.13                        |32.005                 |1.27                                         |
|regnet_x_800mf|33.01                        |37.466                 |1.13                                         |
|regnet_x_8gf|128.13                       |134.868                |1.05                                         |
|regnet_y_16gf|252.91                       |261.292                |1.03                                         |
|regnet_y_1_6gf|72.14                        |84.22                  |1.17                                         |
|regnet_y_3_2gf|99.99                        |109.296                |1.09                                         |
|regnet_y_400mf|29.53                        |36.506                 |1.24                                         |
|regnet_y_800mf|37.82                        |43.634                 |1.15                                         |
|regnet_y_8gf|196.63                       |203.317                |1.03                                         |
|resnet101|128.80                       |137.434                |1.07                                         |
|resnet152|182.85                       |196.498                |1.07                                         |
|resnet18|29.06                        |29.975                 |1.03                                         |
|resnet34|50.73                        |53.443                 |1.05                                         |
|resnet50|76.88                        |80.602                 |1.05                                         |
|resnext101_32x8d|277.29                       |280.759                |1.01                                         |
|resnext101_64x4d|269.56                       |281.052                |1.04                                         |
|resnext50_32x4d|100.73                       |101.102                |1.00                                         |
|shufflenet_v2_x0_5|10.56                        |15.419                 |1.46                                         |
|shufflenet_v2_x1_0|13.11                        |18.525                 |1.41                                         |
|shufflenet_v2_x1_5|18.05                        |23.132                 |1.28                                         |
|shufflenet_v2_x2_0|25.04                        |30.008                 |1.20                                         |
|squeezenet1_1|14.26                        |14.325                 |1.00                                         |
|swin_b|264.52                       |274.613                |1.04                                         |
|swin_s|180.66                       |188.914                |1.05                                         |
|swin_t|108.62                       |112.632                |1.04                                         |
|swin_v2_s|220.29                       |231.153                |1.05                                         |
|swin_v2_t|127.27                       |133.586                |1.05                                         |
|vgg11|95.52                        |103.714                |1.09                                         |
|vgg11_bn|106.49                       |120.711                |1.13                                         |
|vgg13|132.94                       |147.063                |1.11                                         |
|vgg13_bn|149.73                       |165.256                |1.10                                         |
|vgg16|158.19                       |172.865                |1.09                                         |
|vgg16_bn|177.04                       |192.888                |1.09                                         |
|vgg19|184.76                       |194.194                |1.05                                         |
|vgg19_bn|203.30                       |213.334                |1.05                                         |
|vit_b_16|217.31                       |219.748                |1.01                                         |
|vit_b_32|69.47                        |75.692                 |1.09                                         |
|vit_l_32|223.20                       |258.487                |1.16                                         |
|wide_resnet101_2|267.38                       |279.836                |1.05                                         |
|wide_resnet50_2|145.06                       |154.918                |1.07                                         |

You can see that in all cases it is faster than using `AveragedModel`. In fact in many cases, adding EMA does not add any overhead since the computation is hidden behind the usual iteration flow.

This is a similar implementation to the one currently in [NVIDIA NeMo](https://github.com/NVIDIA/NeMo).

If the team is interested in merging this, let me know and I'll add some documentation similar to `swa_utils` and tests.

Credits to @szmigacz for the implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94820
Approved by: https://github.com/janeyx99
2023-04-26 18:02:11 +00:00
22ea21da3d Change 1D Tensor of 1 element to 0D Tensor (#96994)
add 0d tensor to graph adam/adamw test

Affected:
- `torch.cuda.amp.GradScaler`'s `found_inf`, `_scale`, and `_growth_tracker`
- `step` of Adam & AdamW of `capturable`

Fixes #96776 🤞

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96994
Approved by: https://github.com/janeyx99
2023-03-21 18:24:19 +00:00
e8b0f504e2 Fix unpicklable object in AveragedModel (#95979)
Fixes #95376

Don't store the callable `avg_fn`, instead test if `avg_fn` is None and call
the default impl if it's not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95979
Approved by: https://github.com/janeyx99
2023-03-12 05:13:22 +00:00
7d765cdc66 Fix wrong handling of grad_scale & found_inf in fused optimizers (#95847)
Fixes #95781.
The cause seems to be that the current implementation doesn't correctly pass `found_inf` when `grad_scale` is `None`. Therefore parameters can get mistakenly updated by gradients whose some elements are invalid, i.e. nan or inf.

Related #94060

I forgot about this wrong handling after #94344

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95847
Approved by: https://github.com/janeyx99
2023-03-04 01:21:21 +00:00
75cb99e549 [optim] Widen the cases for defaulting to foreach (#95820)
Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95820
Approved by: https://github.com/albanD
2023-03-02 04:15:33 +00:00
cece63f197 Add warn-once deprecation warning to legacy sparse constructors (#94850)
Addresses https://github.com/pytorch/pytorch/issues/68323#issuecomment-1425174341

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94850
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-02-23 15:05:12 +00:00
3b966a6ce3 [autograd] disable backward/grad for complex scalar output (#92753)
Fixes https://github.com/pytorch/pytorch/issues/92750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753
Approved by: https://github.com/ezyang
2023-02-23 11:38:27 +00:00
e0a954f531 call zero_grad in foreach/fused optimizers tests (#94724)
the tests calling this method haven't failed because `iter` is a built-in function's name

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94724
Approved by: https://github.com/Skylion007
2023-02-15 04:14:34 +00:00
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
9171f7d4cd [BE] Modernize PyTorch even more for 3.8 with pyupgrade (#94520)
Applies some more pyupgrade fixits to PyTorch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94520
Approved by: https://github.com/ezyang
2023-02-10 18:02:50 +00:00
1e2d82b8e4 [BE] Merge isinstance calls together (#94419)
Simplify and speeds up isinstance calls by checking for multiple types at the same time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419
Approved by: https://github.com/ezyang
2023-02-09 00:47:26 +00:00
6ba041fcae Look up group["capturable"], not defaults["capturable"] in Adam(W) (#94149)
We could set different values in each `param_group` when calling dunder init of `torch.optim` optimizers as in e.g.  https://github.com/pytorch/pytorch/issues/89987.

So check whether or not `capturable` is `True` among all the `param_group`s.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94149
Approved by: https://github.com/albanD
2023-02-07 00:24:35 +00:00
a23ed38f9a [mta][foreach] Implement fused adamw (#88015)
related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167
possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88015
Approved by: https://github.com/albanD, https://github.com/ngimel
2023-02-01 19:32:29 +00:00
de0375e79d [optim][foreach] Do NOT inplace modify gradients (#92706)
SGD and ASGD already had out-of-place grads.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92706
Approved by: https://github.com/ngimel, https://github.com/albanD
2023-01-21 00:12:28 +00:00
2b885e1f6c [optim][NAdam] Fix discrepancy between mt vs st impl (#92699)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92699
Approved by: https://github.com/albanD
2023-01-21 00:12:28 +00:00
3ba5eae72a [optim][radam] fix eps discrepancy for foreach (#92551)
Will likely race with https://github.com/pytorch/pytorch/pull/92365

eps was not being used at all in the mta/foreach impl. There was also a discrepancy between the docs vs the implementation: the implementation was doing sqrt(x) + eps and the docs were doing sqrt(x+eps)).

I've fixed the docs + extended the current multi_tensor test case to capture this issue.

![image](https://user-images.githubusercontent.com/31798555/213300617-61cbb763-da2d-48e0-b3b6-0190594dd049.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92551
Approved by: https://github.com/albanD
2023-01-19 14:38:59 +00:00
4af5939d7a [optim] Improve adadelta foreach, group tensors to maximize fast path (#92048)
Old behavior would have adadelta foreach sending tensors to the slow path if they were not all the same dtype nor on the same device.

This PR adds grouping for adadelta optimizer so that it would run foreach in batches, allowing more users to benefit from foreach perf.

Of course, we should ensure that the new implementation works, so there are new tests to ensure this behavior is not broken.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92048
Approved by: https://github.com/albanD
2023-01-14 00:35:14 +00:00
7f2b5ea1e1 Revert "Avoid device casting for all singleton tensors in optimizer states (#91454)"
This reverts commit 1e725c97470d8cf74e85984ca997e77c76e91a18.

Reverted https://github.com/pytorch/pytorch/pull/91454 on behalf of https://github.com/janeyx99 due to Likely caused regression where checkpoint resume fails during training
2023-01-10 18:57:50 +00:00
1e725c9747 Avoid device casting for all singleton tensors in optimizer states (#91454)
Fixes #75224
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91454
Approved by: https://github.com/janeyx99
2023-01-04 17:55:00 +00:00
f5e20d6060 Make the state dict of CyclicLR scheduler pickleable (#91400)
Fixes #90414

This PR drops the unpicklable `weakref.WeakMethod` object from CyclicLR scheduler from the state dict, and re-inits the object again once the state dict gets loaded. This makes the state picklable so you can include it in your checkpoint. Also fixes https://github.com/Lightning-AI/lightning/issues/15901

A simple test was added that `pickle.dumps(state)` the state.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91400
Approved by: https://github.com/albanD
2022-12-28 18:05:24 +00:00
e3383d296f [optim][fix] test_fused_optimizers did not test fused before (#91228)
I realized test_fused_optimizers used a helper that was written for foreach, so we were not testing fused at all. This PR fixes that test so we actually test fused adam.

The explicitly adding fused=False is to set the stage for my later changes (but should be a no-op here).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91228
Approved by: https://github.com/albanD, https://github.com/soulitzer
2022-12-21 19:42:24 +00:00
1accd915a4 Re-enable optimizers (#90709)
Fixes
https://github.com/pytorch/pytorch/issues/90165
https://github.com/pytorch/torchdynamo/issues/328

Re-enables optimizer capture + compilation now that the dynamo slowdowns have been fixed

and it has speedups, numbers to come soon

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90709
Approved by: https://github.com/anijain2305, https://github.com/jansel, https://github.com/yanboliang
2022-12-19 04:07:41 +00:00
6f4dea562d Implement post and pre hooks for optimizer (#89176)
Fixes #88446

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89176
Approved by: https://github.com/albanD
2022-12-02 07:03:45 +00:00
903ae4570e Disable optimizer tracing, enable for tests only (#89500)
Disabling optimizer tracing before launch until it can be added to the benchmark suites without increasing compile times

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89500
Approved by: https://github.com/anijain2305
2022-11-24 04:15:34 +00:00
0a69c50a46 Publicly expose _LRScheduler to LRScheduler (#88503)
Fixes #61232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88503
Approved by: https://github.com/soulitzer
2022-11-07 21:15:10 +00:00
bc73affdad prepare removal of deprecated functionality in torch.testing (#87969)
_Redo of #86586 with all BC breaking changes granularly placed into separate commits._

---

Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac33be0c8ad9956e3fc15e78ddb6cb95, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969
Approved by: https://github.com/mruberry
2022-11-02 14:04:48 +00:00
512a3a48e3 sync AveragedModel buffers when use_buffers=False (#84054)
Fixes #84053

As described in the issue, the AveragedModel will deep copy the model during initialization, which means that the buffers in the averaged model cannot be updated together with the model.

One solution is to make the buffers equal to the source model every time when calling `update_parameters`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84054
Approved by: https://github.com/samdow
2022-10-24 16:03:14 +00:00
1b43883fd6 Make AdamW, NAdam & RAdam differentiable (#86183)
Blocked by #86096
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86183
Approved by: https://github.com/albanD
2022-10-17 04:32:08 +00:00
d29c8c0ffa enable optim tests on dynamo to test flaky bot (#86976)
will link the issue that disabled them if this gets approved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86976
Approved by: https://github.com/albanD
2022-10-14 21:44:13 +00:00
7dcfbedce0 Fix LinearLR scheduler start_factor (#86695)
Fixes #86454

The `start_factor` must be comprised in ]0;1] instead of [0;1] to avoid division by 0. This PR changes the lower limit checking of the parameter.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86695
Approved by: https://github.com/albanD
2022-10-13 17:31:36 +00:00
cb4867a71a Make ASGD & RProp differentiable (#86258)
Blocked by #86183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86258
Approved by: https://github.com/albanD
2022-10-13 04:06:13 +00:00
aacb9f3ac6 Make Adadelta,Adagrad & Adamax differentiable (#86096)
Continuing the differentiable optimizers support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86096
Approved by: https://github.com/janeyx99
2022-10-12 23:16:29 +00:00
9eb4f9dd17 Tweak test tolerances to be compatible with A10G (#86538)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86538
Approved by: https://github.com/ngimel
2022-10-11 23:31:48 +00:00
a079dad7cf Skip dynamo for all optim test as they are all flaky otherwise (#86482)
Fixes https://github.com/pytorch/pytorch/issues/86433
Fixes https://github.com/pytorch/pytorch/issues/86435
Fixes https://github.com/pytorch/pytorch/issues/86432
Fixes https://github.com/pytorch/pytorch/issues/86389
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86482
Approved by: https://github.com/ezyang
2022-10-07 22:47:48 +00:00
82229d1e33 [optim] fix: empty grad support for SparseAdam (#86459)
Fixes #82486

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86459
Approved by: https://github.com/albanD
2022-10-07 19:24:59 +00:00
b3fdb02fb2 Fix memory leak in _LRScheduler.step() (#85602)
Fixes #85410

This diff removed the cyclic references in `_LRScheduler.step()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85602
Approved by: https://github.com/albanD
2022-10-07 15:55:55 +00:00
233d6f195a Revert "Fix memory leak in _LRScheduler.step() (#85602)"
This reverts commit eb32330d6b3709dc8910eb298d8802fbca57b05c.

Reverted https://github.com/pytorch/pytorch/pull/85602 on behalf of https://github.com/albanD due to newly added test is flaky
2022-10-06 22:02:02 +00:00
eb32330d6b Fix memory leak in _LRScheduler.step() (#85602)
Fixes #85410

This diff removed the cyclic references in `_LRScheduler.step()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85602
Approved by: https://github.com/albanD
2022-10-06 17:07:36 +00:00
4c04fa9587 Remove optim_mt from test/test_optim.py (#83549)
As per title, this updates `test_optim.py` so that `foreach` optimizers are constructed using the `foreach` keyword argument of `torch.optim` optimizers.

Also, this makes some cosmetic changes to remove `torch.autograd.Variable`, `.data` calls, and `torch._six`.

Related: https://github.com/pytorch/pytorch/pull/81705#discussion_r939440776

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83549
Approved by: https://github.com/ngimel
2022-09-30 20:32:05 +00:00
5f26df0345 resubmit: "resubmit: [mta] APEX style Fused Adam (#81705) (#85507)" (#85739)
Embarrassingly move the pow implementations around [ATen/native/cuda/PowKernel.cu#L21-L66](849b08f14b/aten/src/ATen/native/cuda/PowKernel.cu (L21-L66)) to a new header file and let FusedAdam use them to tame MSVC, hopefully.

cc @ngimel @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85739
Approved by: https://github.com/ngimel
2022-09-29 16:58:59 +00:00
9f1468ae6c CyclicLR memory leak fix (#85462)
Hi, we noticed in our team that by using CyclicLR, there is a problem with memory clearance on GPU (probably it will be the case without the GPU as well, but that was our use case) After initializing CyclicLR, GPU memory is not cleared even after the model, optimizer and scheduler are out of scope (e.g. reference count is zero). This is because `__init__` method inside `CyclicLR` creates reference to its own methods and it will not get removed until `gc.collect()` is called manually. This is a problem if people want to test multiple models in one run of a script, after testing the first model, second one will fail on `CUDA out of memory error` because the first one is not cleared from the memory.

I propose a simple fix by using `weakref`, similarly as in `_LRScheduler` base class, but if you have any comments I am happy to change it.

Here is the code to reproduce the bug:

```
import torch
import weakref
from transformers import DetrForObjectDetection

class X:
    def __init__(self, optimizer):
        self.optimizer = optimizer

        # Will cause cyclic reference.
        self.func = self.dummy

        # Will work as expected, memory cleared after instance count is zero.
        # self.func = weakref.WeakMethod(self.dummy)

    def dummy(self, x):
        return 1.

def test():
    model = DetrForObjectDetection.from_pretrained('facebook/detr-resnet-50')
    model.to('cuda')
    optimizer = torch.optim.Adam(model.parameters())
    x = X(optimizer)

test()
print(f'{torch.cuda.memory_reserved()}, {torch.cuda.memory_allocated()}')  # Should print (<some memory>, 0), but with cyclic reference, it will print (<some memory>, <some memory>).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85462
Approved by: https://github.com/albanD
2022-09-27 17:41:58 +00:00
7167996346 Revert "resubmit: [mta] APEX style Fused Adam (#81705) (#85507)"
This reverts commit 4615d1bcfa0915a992e7445086ba559ca7441607.

Reverted https://github.com/pytorch/pytorch/pull/85507 on behalf of https://github.com/atalman due to Break internal windows builds
2022-09-27 16:59:35 +00:00
4615d1bcfa resubmit: [mta] APEX style Fused Adam (#81705) (#85507)
This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel.

related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705
Approved by: https://github.com/ngimel

cc @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85507
Approved by: https://github.com/ngimel
2022-09-23 18:56:00 +00:00
e505360eb8 Revert "[mta] APEX style Fused Adam (#81705)"
This reverts commit 7a6c4d0c50dd0670d87bc39d53292cf8cb90ca04.

Reverted https://github.com/pytorch/pytorch/pull/81705 on behalf of https://github.com/dagitses due to broke internal builds, details to come
2022-09-22 19:37:29 +00:00
7a6c4d0c50 [mta] APEX style Fused Adam (#81705)
This PR implements an APEX style FusedAdam in PyTorch.
This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel.

related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167
possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436

cc @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705
Approved by: https://github.com/ngimel
2022-09-20 17:18:33 +00:00
faac3dbce2 [optim] asgd : handle complex params as independent real params (#84472)
Ref: #65711
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84472
Approved by: https://github.com/Lezcano, https://github.com/soulitzer
2022-09-06 16:58:42 +00:00
e72256604f Enhance add_out_dense_sparse_cpu for hybrid sparse tensor (#23057)
This is to improve the performance for hybrid sparse coo tensor on CPU path. This case is appeared at the DLRM terabyte test.
With this fix, according to the previous performance test data, it got ~10x performance improvement on DLRM execution.
without this, the DLRM will run as
Finished training it 100/1000 of epoch 0, 2969.25 ms/it, loss 0.220505, accuracy 0.000 %
with this, the DLRM will run as
Finished training it 100/1000 of epoch 0, 270.71 ms/it, loss 0.220505, accuracy 0.000 %
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23057
Approved by: https://github.com/VitalyFedyunin, https://github.com/malfet
2022-08-24 22:42:53 +00:00
7c20ad3dfa [optim] rprop: handle complex params as independent real params (#83858)
Ref #65711

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83858
Approved by: https://github.com/albanD
2022-08-23 08:39:35 +00:00
09331c947c [optim] rmsprop: handle complex params as independent real params (#83860)
Ref: #65711
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83860
Approved by: https://github.com/albanD
2022-08-22 21:55:01 +00:00