Commit Graph

37 Commits

Author SHA1 Message Date
695eef05a4 optimizer exploration - v1 and v2 + fix position_weighted optimizer + decoupled weight decay (#54042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54042

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53881

1. Fix position_weighted optimizer: Position weighted layer uses default optimizer but is actually gradient_slice, which will cause problem if we do not handle it properly in the new optimizier. The solution is to use sparseadagrad when it is gradient_slices.
2. Optimizer implementation of v1 and v2: using 1st momentum with/without bias_correction.
3. also implemented decoupled weight decay in the new optimizer.

Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_2 -- test_mlp_optimization

buck test //caffe2/caffe2/python:optimizer_test -- TestDecayAdagrad

buck test //caffe2/caffe2/python/operator_test:decay_adagrad_test

ctr_mbl_feed work flow: f255731660
oc work flow: f255739503

Reviewed By: 0x10cxR1

Differential Revision: D26839668

fbshipit-source-id: 2b6881c1a88540ef5766be40f5e80001257e2199
2021-03-27 23:03:29 -07:00
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
9d8dc0318b [pruning] add rowwise counter to sparse adagrad
Summary: Use the newly added counter op in sparse adagrad

Reviewed By: chocjy, ellie-wen

Differential Revision: D19221100

fbshipit-source-id: d939d83e3b5b3179f57194be2e8864d0fbbee2c1
2020-06-30 14:40:02 -07:00
cf27d07e04 Implementation of STORM optimizer caffe2 python wrapper (#36399)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36399

Added caffe2 python wrapper and unit test for the STORM C++ operator.

Test Plan:
All newly added unit tests passed using "buck test //caffe2/caffe2/python:optimizer_test -- TestStorm"

{F233644598}

Reviewed By: chocjy

Differential Revision: D18841013

fbshipit-source-id: f692bc18412839db140202ec9a971e556db0e54f
2020-04-14 23:05:45 -07:00
5b6dd52e3c Build Unit Test of SparseRAdam
Summary: We added caffe2 python wrapper and unit test for the SparseRAdam C++ operator.

Test Plan:
Unit test is constructed following the design pattern of [Wngrad optimizer](https://our.intern.facebook.com/intern/diff/D8655724/). Test passed smoothly.
buck test //caffe2/caffe2/python:optimizer_test -- TestSparseRAdam

Test result:
{F221144048}

Reviewed By: wx1988

Differential Revision: D18330650

fbshipit-source-id: e0f4724c2b616b665e2a0fe2e5c3430696cca7ee
2019-11-18 15:22:37 -08:00
aa88c2c0b6 Unify gpu_support variable in python tests (#16748)
Summary:
Assign `has_gpu_support = has_cuda_support or has_hip_support` and make according changes in python tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16748

Differential Revision: D13983132

Pulled By: bddppq

fbshipit-source-id: ca496fd8c6ae3549b736bebd3ace7fa20a6dad7f
2019-02-07 00:29:51 -08:00
0d663cec30 Unify cuda and hip device types in Caffe2 python front end (#14221)
Summary:
Goal of this PR is to unify cuda and hip device types in caffe2 python front end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221

Differential Revision: D13148564

Pulled By: bddppq

fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b
2018-11-29 14:00:16 -08:00
a2fcd4dee5 Ensure FP16 rowwise Adagrad can be run
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12317

Reviewed By: hyuen

Differential Revision: D10190778

fbshipit-source-id: 720a9aaa4e6b1736023d8c6326a613e4ea592b31
2018-11-28 02:15:36 -08:00
4b61760738 Add Adadelta optimizer to caffe2 (#9088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9088

Closes https://github.com/pytorch/pytorch/pull/9088

- Added CPU/GPU implementations of Adadelta and SparseAdadelta.
- Added corresponding Python unittests

Reviewed By: BIT-silence

Differential Revision: D8712169

fbshipit-source-id: 544e99e13b230a919672a7341b3715d64597c0be
2018-07-24 20:09:21 -07:00
099a6d5e08 Implementation of Wngrad optimizer caffe2 python wrapper and unit test on least square regression (#9001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9001

Closes https://github.com/pytorch/pytorch/pull/9001

We added caffe2 python wrapper and unit test for the Wngrad C++ operator.

Reviewed By: chocjy

Differential Revision: D8655724

fbshipit-source-id: fb259afd6fd50231691bd75c52852b20a1e1aec8
2018-07-13 18:54:52 -07:00
4e5369349f Add FTRL Optimzier with Group Lasso regularizer (#9074)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9074

Implement an optimzier based on FTRL Optimzier which support Group
Lasso regularizer.

The relevant paper list for this optimizer:
1. About the FTRL Optimizer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf,
2. About the group lasso regularizer solver: http://www.cse.cuhk.edu.hk/~king/PUB/ICML2010-Yang-473.pdf

Differential Revision: D8623146

fbshipit-source-id: 40e08aa6319d1ad7aa95e8716e3de83b9cfb8452
2018-07-06 13:41:00 -07:00
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
602a09dde7 Update caffe2 from facebook 4f527ef46abf (#2234)
* [GanH]: two_task_discriminator

as titled

and adding label smooth

* [Dper2] Simplified UI options needed for blob magnitude visualization

* [GanH]: fix tags

as titled

* Added type and shape inference for GatherRange operator

This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.

* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python

We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.

* Bind Gloo IoException to IoError in Python

Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.

* [GanH]: add label smoothing to softmax with loss

as titled

* [C2] Enable LARS in Adagrad and hook it to DPER

* [DPER] Don't pass LayerModelHelper in create_trainer_nodes

Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.

* fix bugs in LambdaRankNdcgOp

the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.

* Restrict thread pool on iOS to only big cores

Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.

* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

* make clang happy and get fewer warnings

make clang happy and get fewer warnings

* [Personalization] Support add_output_schema() in layer_model_helper

Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.

Solution:
For flexibility, we want to add fields to output_schema incrementally.

Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.

Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer

Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
2018-03-12 12:22:59 -07:00
d013e16cf4 [C2] Enable LARS on GPU (#2115) 2018-03-02 18:06:19 -08:00
7cafdab69b [C2] Implement Layer-wise Adaptive Rate Scaling (LARS) (#2034)
* [C2] Implement Layer-wise Adaptive Rate Scaling (LARS)

* [C2] Implement Layer-wise Adaptive Rate Scaling (LARS)

* add unit test for Lars

* set default value for lars to be None

* remove lars for subclasses of SgdOptimizer
2018-02-25 14:58:31 -08:00
bcc8c8f696 Support RMSProp in Caffe2.
Summary:
Add `RmsPropOptimizer` to `optimizer.py` so RMSProp can be used as an optimizer.

`RmpsPropOptimizer` uses `RmpPropOp` to update the gradient and `MomentumSGDUpdateOp` to update the model parameters.

Differential Revision: D6118279

fbshipit-source-id: e38b8380ff74c1d1bb1e87fc300b6b55e32cd2e0
2017-11-08 16:43:18 -08:00
ce62c65c18 momentum sgd
Summary: Add support for SparseMomentumSGDUpdate and tests for momentum SGD in both dense and sparse cases

Reviewed By: akyrola

Differential Revision: D6234834

fbshipit-source-id: 9848c29ea06794ef35f1ebaff0f5e81eac4f4db9
2017-11-03 16:17:17 -07:00
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
d4336edb05 Disabled test for equivalency between Caffe2's and Numpy's YellowFin
Summary: According to GitHub issue #1168, YellowFin's accuracy between Caffe2 and Numpy models from tests are not good enough in some environments. Results were very close on my machine. GitHub's Travis failed on some tests which I later disabled. Therefore the difference doesn't come from logical differences but from loss of precision on some machines. It is safe to disable equivalency test if equivalency was already once tested.

Reviewed By: akyrola

Differential Revision: D5777049

fbshipit-source-id: c249a205d94b52c3928c37481f15227d500aafd0
2017-09-06 13:47:45 -07:00
a1992e81b3 Replaced std::copysign(x) with (x > 0 ? 1 : -1)
Summary:
Replaced std::copysign(x) with (x > 0 ? 1 : -1).
std::copysign is not available on some Android platforms which was detected in GitHub's Travis tests:
"/home/travis/build/caffe2/caffe2/caffe2/sgd/yellowfin_op.cc:57:23: error: 'copysign' is not a member of 'std'"

Reviewed By: akyrola

Differential Revision: D5756384

fbshipit-source-id: 56bc220d2c6216ff45b9cc47ed02aebf6ad439a5
2017-09-01 11:52:44 -07:00
925cfc0d90 Disabling test for YellowFin
Summary: Disabling test for YellowFin that does not pass test in Travis. Difference comes from numerical reasons. Test passes on my cpu / math libraries. Decide whether to merge it.

Reviewed By: Yangqing

Differential Revision: D5754144

fbshipit-source-id: b6ed6628f962d6904a8d522f0cf4080d7878acad
2017-09-01 10:35:48 -07:00
fefd5479a3 Initial implementation of YellowFin algorithm
Summary:
Added YellowFin optimizer to Caffe2.
This implemention is different from the original: It has separate alpha and mu for each parameter and it uses different version of Momentum SGD.
Tests / benchmarks for the optimizer are to be done. Some refactor of the code is to be done before pushing. This is still a working version.

Reviewed By: akyrola

Differential Revision: D5652689

fbshipit-source-id: c10dc0424f47c3051b454aede1d121902cb759a8
2017-08-30 18:53:46 -07:00
cc3662e939 Added support for scaling learning rate of Caffe2 optimizers during training
Summary: While there is currently support for scaling the base learning rate when loading the model, there is not support for scaling the base learning rate during training. This is needed for LATTE's seq2seq translation models, as the learning schedule is not predefined and is modified at runtime.

Reviewed By: jhcross

Differential Revision: D5701391

fbshipit-source-id: ae3bec45f238db1a2be7af9c04d720067e9095d5
2017-08-25 19:04:47 -07:00
ad07f5f05d Added norm-based gradient clipping to optimizer library
Summary: Moved code for global norm-based gradient clipping from fb specific workflows (seq2seq) to the open-source caffe2 optimizer library

Reviewed By: jhcross

Differential Revision: D5637453

fbshipit-source-id: 7e73c9a1c97c28a152c188467b27a6449f79242e
2017-08-24 10:17:50 -07:00
cc2c4d07d6 Always use assertAlmostEqual for floats when crossing python and C boundaries
Summary:
This fixes travis numerical issue.
Closes https://github.com/caffe2/caffe2/pull/1024

Differential Revision: D5571340

Pulled By: Yangqing

fbshipit-source-id: 097e6f91da68cc3eacf21fe109f342e0dddea189
2017-08-06 14:51:11 -07:00
bcce1bd04a Fix optimizer_context OSS test
Summary:
This will fix the test by querying how many instances of the optimizer are already created.
Because OSS tests doesn't run in isolation causing number of created instances of optimizer to be >= 0.

Reviewed By: akyrola

Differential Revision:
D5462433

Tags: easy

fbshipit-source-id: 7a9ab4fe5345f5d5138abb461ba7a990d9ace840
2017-07-20 12:21:09 -07:00
13980d2bb5 Set device to the default device(CPU) when DeviceContext is None.
Summary:
Fix case when optimizer isn't called within a device scope context.
Fix OptimizerContext lr blob names

Reviewed By: volkhin

Differential Revision: D5421046

fbshipit-source-id: 186a0d05f40d4442c5ba5736084626da73a0c0f1
2017-07-13 17:54:36 -07:00
3faca65adf Add a unit-test to validate sharing learning rate between
Reviewed By: kennyhorror

Differential Revision: D5413387

fbshipit-source-id: ff4022375183394ca9cee6faea5ac46e56079b86
2017-07-12 21:53:25 -07:00
b9e64ecef1 allow param_info to set optimizer
Summary: this diff adds optimizer into param_info, and the associated implementations for modelhelper and brew to set optimizer for each individual parameter.

Reviewed By: kennyhorror

Differential Revision: D5385432

fbshipit-source-id: 5d682f9d1ab077e04a5d76a24d71470f4e64fc92
2017-07-12 08:49:48 -07:00
401908d570 add_weight_decay + restore weight decay to resnet50_trainer
Summary:
Add add_weight_decay to optimizer + test.

In D5142973 I accidentally removed weight decay from resnet50 trainer, so this restores it.

Reviewed By: asaadaldien

Differential Revision: D5173594

fbshipit-source-id: c736d8955eddff151632ae6be11afde0883f7531
2017-06-02 14:16:56 -07:00
58874ad5bf Fp16 training initializers
Summary:
Re-open for re-importing :)
Closes https://github.com/caffe2/caffe2/pull/721

Differential Revision: D5164345

Pulled By: akyrola

fbshipit-source-id: e80b32556cd25610602df91a4225b93edc0ca40b
2017-06-01 08:34:46 -07:00
0f8c8f37a8 Revert D5159712: [caffe2][PR] Fp16 training initializers
Summary: This reverts commit 60a889494d2e2f4df1d720331e19f638c5eb95cc

Differential Revision: D5159712

fbshipit-source-id: 16040c911b260648857f656f92b165f92c2daae0
2017-06-01 00:17:14 -07:00
2bfacff426 Fp16 training initializers
Summary:
Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training.
Closes https://github.com/caffe2/caffe2/pull/697

Differential Revision: D5159712

Pulled By: salexspb

fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc
2017-05-31 17:46:58 -07:00
44257ea5ed automatically infer device scope for param
Summary:
hankun is using the optimizer, but having mixed set of of GPU and CPU operators. Currently this won't work with optimizer since it adds optimizers for all parameters in the current device scope. But we can actually infer the device that a param belongs to by looking at the device option in the param_init_net.

Added a test as well.

Reviewed By: salexspb

Differential Revision: D5133652

fbshipit-source-id: ad8689d75ac1f5c78981bae1b6978fe91e40ef0f
2017-05-30 12:02:19 -07:00
7270471ed6 Returns auxiliary parameters in the optimizers.
Summary:
1. Adds a function to return auxiliary parameters for each optimizer. This function can be used to serialize the optimizers so that they can be recovered.
2. Fixes the bug that the iteration blob is not incremented by one in each iteration. Suppose there are k parameters using the adam learning rate optimizer, the iteration blob is incremented by k based on the original implementation.

Reviewed By: azzolini

Differential Revision: D4872397

fbshipit-source-id: d86711feedda2ba83af5f2a18141b06a6a473733
2017-04-17 10:16:32 -07:00
014d1fe5c4 Allow test discovery in caffe2/python/
Summary:
These are all essentially no-op changes which allow for nose-style (or pytest-style) test discovery.

With this patch, you can use any of these methods to discover and run tests under `caffe2/python`:
```
python -m unittest discover -p '*test*.py' caffe2/python/
python -m nose caffe2/python/
python -m pytest caffe2/python/
```

Future work:

* Get all of the tests to pass
  * Some seem to be testing operations which don't have GPU implementations
  * I get a segfault unless I set `CUDA_VISIBLE_DEVICES=0`
  * Some tests are flaky
* Allow test discovery throughout the whole project (e.g. the `experiments/` dir)
Closes https://github.com/caffe2/caffe2/pull/199

Reviewed By: pietern

Differential Revision: D4704504

Pulled By: Yangqing

fbshipit-source-id: 8f5687ec9c8aa873dfaff30dbf44272bc38a206b
2017-03-14 18:16:41 -07:00
83437853ad refactor and modulize optimizers
Summary:
The current optimizer code in c2/python has the following issues:
(1) the optimizers in sgd.py cannot config per param-blob optimizer;
(2) sgd.py is a bad file name. optimizer.py is a better name;
(3) layer_model_helper.py has another set of optimizer code (which supports per param-blob optimizer)

This diff did the following
(1) create optimizer objects so that we can config per param-blob optimizer and that are also compatible to the existing optimizer code
(2) the new optimizer code are much more modulized
(3) move the optimizer code to file with better name (optimizer.py)
(4) replace the optimizer imports in the existing code

will do in next diffs
(1) optimizers with structured parameters for dper2
(2) get rid of the optimizer code in layer_model_helper.py

Reviewed By: salexspb

Differential Revision: D4609013

fbshipit-source-id: 2e2d6dfa8685d10498f89069157453d9feca3f27
2017-03-07 18:46:47 -08:00