95 Commits

Author SHA1 Message Date
de0375e79d [optim][foreach] Do NOT inplace modify gradients (#92706)
SGD and ASGD already had out-of-place grads.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92706
Approved by: https://github.com/ngimel, https://github.com/albanD
2023-01-21 00:12:28 +00:00
b2ca2c8662 [optim][adagrad] group tensors in foreach to maximize perf (#92362)
another one
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92362
Approved by: https://github.com/albanD
2023-01-20 16:24:39 +00:00
0070c546b5 [BE][optim] abstract out docstrings, add differentiable docs (#92336)
1. abstract out common doc strings --> I'm sure there are more, but let this be a first step.
2. Add differentiable docs to those who are actually differentiable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92336
Approved by: https://github.com/albanD
2023-01-18 15:09:28 +00:00
06326a7721 [optim] skip .item calls in all optimizers when compiling with dynamo (#88173)
@mlazos: skips `item()` calls if compiling with dynamo, by defining a helper function `_get_value` which either returns the result of `.item()` or the scalar cpu tensor if compiling with dynamo. This was done because removing `item()` calls significantly regresses eager perf. Additionally, `_dispatch_sqrt` calls the appropriate sqrt function (math.sqrt, or torch.sqrt).

Fixes https://github.com/pytorch/torchdynamo/issues/1083

This PR will no longer be needed once symint support is default.

This PR closes all remaining graph breaks in the optimizers (!!)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88173
Approved by: https://github.com/albanD
2022-12-12 17:32:35 +00:00
c63afb283c Disable dynamo on optimizer lazy initialization (#89902)
Helps with https://github.com/pytorch/torchdynamo/issues/1803

Separate out the group initialization and disable dynamo on it

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89902
Approved by: https://github.com/soumith, https://github.com/albanD
2022-12-02 01:15:11 +00:00
3d47c74cfe Update code style for optimizer code (#89862)
Separating out whitespace-only changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89862
Approved by: https://github.com/albanD, https://github.com/soumith
2022-11-30 00:53:05 +00:00
aacb9f3ac6 Make Adadelta,Adagrad & Adamax differentiable (#86096)
Continuing the differentiable optimizers support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86096
Approved by: https://github.com/janeyx99
2022-10-12 23:16:29 +00:00
71d50f4f89 Change docstring type callable to Callable for consistency (#82487)
### Description

Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function.

### Testing

There shouldn't be any testing required.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487
Approved by: https://github.com/albanD
2022-08-01 17:26:09 +00:00
bda04e9f5e Add __all__ for torch.optim and torch.nn.modules modules (#80237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80237
Approved by: https://github.com/albanD
2022-06-24 21:34:10 +00:00
de7219e8a7 Use generators with all/any in torch/optim (#78142)
Generator comprehensions with any/all are less verbose and potentially help to save memory/CPU : https://eklitzke.org/generator-comprehensions-and-using-any-and-all-in-python

To make JIT work with this change, I added code to convert GeneratorExp to ListComp. So the whole PR is basically NoOp for JIT, but potentially memory and speed improvement for eager mode.

Also I removed a test from test/jit/test_parametrization.py. The test was bad and had a TODO to actually implement and just tested that UnsupportedNodeError is thrown, and with GeneratorExp support a different error would be thrown.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78142
Approved by: https://github.com/malfet, https://github.com/albanD
2022-06-24 17:23:45 +00:00
6642e88ad2 Adding maximize flag to Adagrad
This adds maximize to Adagrad (#68052) along with updates the respective tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75968
Approved by: https://github.com/albanD
2022-04-20 08:29:03 +00:00
dabfea8363 Optim foreach cleanup for Adagrad (#69981)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69981

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767863

Pulled By: mikaylagawarecki

fbshipit-source-id: 1c99abe4ac4eb2a9eb896dff4837b539b94f68e7
(cherry picked from commit 61c28d0645046b67050efaf0d4617203126cbd30)
2022-02-09 16:52:12 +00:00
7176c92687 [optim] update step in functional and pass state_steps instead of state (#71333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71333

Updated
- Adagrad
- Adamax
- Adam
- AdamW
- RAdam
make multi_tensor functionals take `state_steps: List[Tensor]` instead of taking `states: List[Dict]`
make `state_steps: List[int]s -> state_steps:List[Tensor]` where each is a Singleton tensor so step can be updated within the functional

(NAdam and ASGD) were updated in separate diffs to fold their handling of state into the functionals

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767872

Pulled By: mikaylagawarecki

fbshipit-source-id: 9baa7cafb6375eab839917df9287c65a437891f2
(cherry picked from commit 831c02b3d0f585f61165ead368213f94b97a99ee)
2022-02-08 16:51:19 +00:00
acb340de75 [Pytorch][Bootcamp] Add fixes and vanilla testing for Adagrad non-vectorized and vectorized optimizers to handle complex numbers (#66671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671

Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141442350

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex'
https://pxl.cl/1Rd44

Reviewed By: albanD

Differential Revision: D31673503

fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464
2021-10-25 10:13:21 -07:00
d4b09dbab3 [doc][hackathon] To add Adagrad Optimizer to the documentation (#63254)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adagrad to the documentation.  For more details, we refer to the paper
http://jmlr.org/papers/v12/duchi11a.html

<img width="658" alt="AdaGradAlgo" src="https://user-images.githubusercontent.com/73658284/132743276-a52ea3fb-70a5-4788-94b7-f99367907a26.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63254

Reviewed By: albanD

Differential Revision: D30852139

Pulled By: iramazanli

fbshipit-source-id: 9e496560a97e92be8386585b01d9bd3bba4b0c66
2021-09-09 15:41:29 -07:00
4611387608 [optim] take kw-only argument for functional optim APIs (#56185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56185

ghstack-source-id: 126670123

Reviewed By: albanD

Differential Revision: D27802169

fbshipit-source-id: f5e1cb2046dcdeecf5f6b0f70892828bf0adb22f
2021-04-15 20:08:04 -07:00
50d903f19f [optim] make functional api be private (#51316) (#51665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51665

This reverts commit 896f82aa92eb7557229053a21da786f5927e64e0.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D26232608

Pulled By: vincentqb

fbshipit-source-id: ca006baf4fb672c11c1bb003c39a29cbadb63dd3
2021-02-03 17:59:05 -08:00
896f82aa92 [optim] make functional api be private (#51316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51316

Make optim functional API be private until we release with beta

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26213469

fbshipit-source-id: b0fd001a8362ec1c152250bcd57c7205ed893107
2021-02-03 09:29:33 -08:00
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
0444c372e1 [optimizer] introduce optimizer functional API, refactor Adagrad (#44715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44715

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D23935258

Pulled By: wanchaol

fbshipit-source-id: d2a5228439edb3bc64f7771af2bb9e891847136a
2020-09-25 17:10:26 -07:00
6e2bb1c054 End of the .data removal in torch/optim (#34211)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211

Test Plan: Imported from OSS

Differential Revision: D20248684

Pulled By: albanD

fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421
2020-03-09 06:40:39 -07:00
6a97777f72 Remove use of .data from optimizers (#33640)
Summary:
Removes all uses of `.data` from optimizers.

Or tries to.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640

Reviewed By: vincentqb

Differential Revision: D20203216

Pulled By: albanD

fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0
2020-03-03 13:21:55 -08:00
c1dd70688a Fix deprecated python "add" calls (#33428)
Summary:
This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used.

cc csarofeen zasdfgbnm ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428

Differential Revision: D20002534

Pulled By: vincentqb

fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130
2020-02-26 09:02:31 -08:00
877c96cddf explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008

Test Plan: Imported from OSS

Differential Revision: D18575981

Pulled By: VitalyFedyunin

fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c
2019-11-19 16:19:29 -08:00
14ac7a1d87 Add epsilon argument to Adagrad optimizer (#24980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24980

We'll need this internally, so just updating the open source version. the other optimizers have this argument anyways.

Test Plan: Imported from OSS

Differential Revision: D16945279

Pulled By: li-roy

fbshipit-source-id: 0b8cc86f15387cd65660747899d3d7dd870cff27
2019-08-21 16:36:51 -07:00
75754beca3 Revert D14577575: [pytorch][PR] Fix lack of state init for adagrad and add share_memory flag
Differential Revision:
D14577575

Original commit changeset: 12440079ac96

fbshipit-source-id: 935106385e608471dc280fc61cfedf19d330812d
2019-04-26 15:43:04 -07:00
444f792fa6 Fix lack of state init for adagrad and add share_memory flag (#17679)
Summary:
The current code initialize the `state` in `__init__` method, but the initialization process is not invoked in `add_parameter_group`.

I followed the same approach in other Optimizers to init the `state`.

```python
import torch

emb = torch.nn.Embedding(10,10)
emb2 = torch.nn.Embedding(10,10)

optim = torch.optim.Adagrad(emb.parameters())
print(optim.state[emb.weight])  # already initialized

optim.add_param_group({'params': emb2.parameters()})
print(optim.state[emb2.weight])  # empty dict

loss = emb2.weight.sum() + emb.weight.sum()
loss.backward()
optim.step()  # raised KeyError
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17679

Differential Revision: D14577575

Pulled By: ezyang

fbshipit-source-id: 12440079ac964b9eedad48e393d47f558babe300
2019-04-23 12:22:19 -07:00
fb4e8088f3 Remove methods that start with an underscore from at::Tensor (#11152)
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.

For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.

ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152

Differential Revision: D9683607

Pulled By: goldsborough

fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
2018-09-07 11:55:11 -07:00
3e83e3abfe Adding initial_accumulator_value parameter to Adagrad (#6616) 2018-04-16 22:12:36 +02:00
063946d2b3 Added parameter range checks for all optimizers (#6000) 2018-03-28 11:22:23 +02:00
f76d6c029c Sparse Adam optimizer for sparse gradients (#3137)
* sparse adam

* Favor dense addition over sparse_mask
2017-11-06 14:20:51 -05:00
46a868dab7 [Ready] Limit docs line length (#1900)
* some docs are ready

* docs

* docs

* fix some more

* fix some more
2017-07-10 10:24:54 -04:00
743e4894d2 Prefix values/indices/sparse_mask/nnz with underscore (#1457)
As discussed in #1441.

I also added some docs giving clear guidance about how to coalescing
in sparse tensors.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-03 11:14:10 -04:00
699755e04f Convert contiguous() call in adagrad to out-of-place coalesce. (#1446)
We missed this one in f2903332c7dce1fbb7d7d9f18dcfba8e853581df!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-05-02 16:51:54 -04:00
cd3bbc9dfd more operations and optimizations (hspmm, reorder, ...) 2017-04-18 12:46:54 -07:00
1018b238ac make gradients contiguous in adagrad 2017-04-18 12:46:54 -07:00
f17cfe4293 sparse tensor operations (#735) 2017-03-03 18:37:03 +01:00
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
ecfcf39f30 Improve optimizer serialization
Also, add optimizer.load_state_dict
2017-01-24 17:30:50 -05:00
95f0fa8a92 Change .grad attribute of Variables to be a Variable 2017-01-16 12:59:47 -05:00
604e13775f Add optim docs 2017-01-16 12:59:47 -05:00
09493603f6 Change optimizer API 2016-11-08 18:12:56 +01:00
df59b89fbb Add more optimizers 2016-11-07 22:50:56 +01:00
2f342af22f Move optim to legacy 2016-08-01 12:01:46 -04:00
554a1d8336 Add optim 2016-07-21 16:42:06 -04:00