Commit Graph

59 Commits

Author SHA1 Message Date
9945fd7253 Drop unused imports from caffe2/python (#49980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49980

From
```
./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/
```

Test Plan: Standard sandcastle tests

Reviewed By: xush6528

Differential Revision: D25727359

fbshipit-source-id: c4f60005b10546423dc093d31d46deb418352286
2021-01-05 13:17:46 -08:00
46b83212d1 Remove unused six code for Python 2/3 compatibility (#48077)
Summary:
This is basically a reborn version of https://github.com/pytorch/pytorch/issues/45254 .

Ref: https://github.com/pytorch/pytorch/issues/42919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48077

Reviewed By: ngimel

Differential Revision: D25687042

Pulled By: bugra

fbshipit-source-id: 05f20a6f3c5212f73d0b1505b493b720e6cf74e5
2020-12-22 18:07:08 -08:00
5b0f400488 Replace list(map(...)) constructs by list comprehensions (#46461)
Summary:
As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant.

It also fixes a bug detected by this where the argument order of `map` was confused: 030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)

Fixes https://github.com/pytorch/pytorch/issues/46392

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461

Reviewed By: ailzhang

Differential Revision: D24367015

Pulled By: ezyang

fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7
2020-10-19 18:42:49 -07:00
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
ad17dafc50 [caffe2] Remove python2 from operator_test (#33977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33977

Removing python2 from operator_test so we can retire python2 support for PyTorch.

Test Plan: waitforsandcastle

Reviewed By: seemethere

Differential Revision: D20129500

fbshipit-source-id: d4c82e4acfc795be9bec6a162c713e37ffb9f5ff
2020-03-02 08:55:53 -08:00
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
0d663cec30 Unify cuda and hip device types in Caffe2 python front end (#14221)
Summary:
Goal of this PR is to unify cuda and hip device types in caffe2 python front end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221

Differential Revision: D13148564

Pulled By: bddppq

fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b
2018-11-29 14:00:16 -08:00
189c1e1afb Rewrite http://pytorch.org -> https://pytorch.org throughout project (#12636)
Summary:
The pytorch.org site redirects all of the http:// requests to the https:// site anyway, so the comments and error messages might as well refer directly to the https:// site. The GitHub project description should also be updated to point to https://pytorch.org
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12636

Differential Revision: D10377099

Pulled By: soumith

fbshipit-source-id: f47eaba1dd3eecc5dbe62afaf7022573dc3fd039
2018-10-15 13:03:27 -07:00
197412fa8f Fix typo in comment (#7183) 2018-05-02 11:58:30 -07:00
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
37dec493a5 Scope MultiRNN blobs with name as well as layers (#2025)
* Scope MultiRNN blobs with name as well as layers

Also don't double scope MultiRNN in case of multiple layers.

* Scope input projection of first layer with name

We don't scope it with layers because the projection is done
outside of the layer.

* Avoid scoping input blob in MemongerTest.test_rnn

* Rectify input_blob in prepare_input

Revert change in memonger_test because rectifying input will solve the problem.
2018-03-02 22:21:07 -08:00
e0e124e617 Fix RNN scoping situation
Summary:
There is a long lasting problem of scoping which was introduced in original python wrappers early in H1. Basically each RNNCell implemented has to manually scope outputs of each of the operators. If somebody forgets, then there could be weird bugs with layers etc.

Approach is the following. User has to explicitly specify current scope when using  apply_over_sequence function and others if the function is going to be called several times (like for stacking layers). This way we use Caffe2 native scoping approach instead of inventing one extra API people have to use (i.e. passing scope name as an argument to the RNNCell constructor).
Closes https://github.com/caffe2/caffe2/pull/1681

Differential Revision: D6777536

Pulled By: salexspb

fbshipit-source-id: 73d860b8d4857589e04bdea5a6fcd3080d68427c
2018-02-07 17:35:29 -08:00
6a02cb2844 implement sequence length support for BasicRNN
Summary: Closes https://github.com/caffe2/caffe2/pull/1843

Differential Revision: D6839575

Pulled By: anderspapitto

fbshipit-source-id: efdf00f1c5cfb0d63f1992028a796c8277b76688
2018-02-05 21:05:51 -08:00
d8748a9d53 GRU sequence lengths: allow unspecified sequence lengths
Summary:
modeled after the earlier change for LSTM
Closes https://github.com/caffe2/caffe2/pull/1841

Differential Revision: D6837461

Pulled By: anderspapitto

fbshipit-source-id: de4e787019fa30f813a4b29f14b7000ce9d22d8e
2018-02-05 13:20:05 -08:00
33d2212751 LSTM sequence lengths: allow unspecified sequence lengths
Summary:
In this case, each sequence is treated as having a length equal to the
first dimension of the input tensor. This matches the semantics of
ONNX when the sequence length input is left out.
Closes https://github.com/caffe2/caffe2/pull/1764

Reviewed By: dzhulgakov

Differential Revision: D6751219

Pulled By: anderspapitto

fbshipit-source-id: 89e0efd12339157627494e2b8c83e952bdd8a9f8
2018-01-26 16:32:56 -08:00
e3e6680b48 Add ElmanCell and ElmanRNN
Summary: Closes https://github.com/caffe2/caffe2/pull/1742

Reviewed By: dzhulgakov

Differential Revision: D6706809

Pulled By: anderspapitto

fbshipit-source-id: 15a05786a26aeb719ea4377f4dbbb62738d9e697
2018-01-18 12:14:02 -08:00
12309f4aa6 GRU cell: add linear_before_reset boolean parameter
Summary:
This matches the semantics of cudnn (and others, like pytorch)
Closes https://github.com/caffe2/caffe2/pull/1695

Reviewed By: dzhulgakov

Differential Revision: D6658208

Pulled By: anderspapitto

fbshipit-source-id: 00e1716fba47b0ac296d1e9e0131165f4997ac7d
2018-01-08 13:22:56 -08:00
ca44c16e72 LayerConfigMILSTMCell
Summary: A version of MILSTMCell which uses layer normalization (see https://arxiv.org/pdf/1607.06450.pdf). There's a lot of copypasta because we don't want to make the existing RNNCell classes harder to approach / understand by adding new options.

Differential Revision: D6564208

fbshipit-source-id: 0bc43e12b6c08ebdf5ea6af2c631f785c302bdb4
2017-12-14 10:17:53 -08:00
540a9c279e Add LayerNormLSTM
Summary:
Adds a new `LSTMCell` subclass to the `rnn_cell` module that performs layer normalization on the fused input matrix. Moves around some code in `rnn_cell.py` to avoid copy-pasta. Adds relevant test cases to `rnn_cell_test.py`.

Had to fix `brew.layer_norm` first. See T24013870.

Reviewed By: jhcross

Differential Revision: D6454883

fbshipit-source-id: 0f4ea7a778cc5be6a7274f7b28c793f5dd7c6095
2017-12-04 10:48:37 -08:00
995c83f945 Disable cudnn dropout
Summary: The cudnn version of the DropoutOp was taking a significant (and unwarranted) amount of time in our RNN training. Further investigation showed that setting the cudnn dropout descriptors was an extremely expensive operation (https://pxl.cl/99nT), much more so than the dropout operation itself. This diff adds to the DropoutCell the option to disable cudnn. The non-cudnn version uses a raw curand call that elides all of the expensive descriptor setting.

Reviewed By: jmp84, akyrola

Differential Revision: D5972022

fbshipit-source-id: 6325ec5d6569f8b94d776cbb2554cc8ddb28f699
2017-10-04 17:24:09 -07:00
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
d9b0bcd7a4 Make all existing (except in RoIPool) "is_test" arguments required
Reviewed By: akyrola

Differential Revision: D5830168

fbshipit-source-id: 8634e9cfe308ba0ee90cd8a5c4b09a47b0b5f015
2017-09-25 23:46:12 -07:00
6b44a00c71 remove in-place Dropout from rnn_cell (bug in PR-1185)
Summary: This caused gradient generation problems. Output was made in-place in PR-1185, by mistake, I believe.

Differential Revision: D5844825

fbshipit-source-id: 4ad84d0fb468aafde9f78463b9acf89316e633ca
2017-09-15 14:03:33 -07:00
c313855523 Use brew in rnn_cell.py
Summary:
Was https://github.com/caffe2/caffe2/pull/1151.
Closes https://github.com/caffe2/caffe2/pull/1185

Differential Revision: D5794716

Pulled By: akyrola

fbshipit-source-id: c27d30d5d6dd7dacc47610150dcfef03343a7120
2017-09-13 12:02:57 -07:00
ceb13bf3fb Fix cell/hidden init issue, add copy states to test
Summary: As title. Wonder this had not been encountered before. Only affects cases where the states are copied over though.

Reviewed By: Yangqing

Differential Revision: D5777314

fbshipit-source-id: 8aef435c832e4ead5bb3d3e35bb065c734a2af5f
2017-09-06 14:16:17 -07:00
53ccbd9a6e soft-coverage attention
Summary:
Implementation of a new variant of attention module, which contains a recurrent decoder state with vectors corresponding to each source-side word and strictly increasing values, thus enabling it to model the degree to which source words have been translated.

The approach is a variant of the approaches described in https://arxiv.org/pdf/1601.04811.pdf. We simply include the sum of all previous attention weights for encoder words as a new recurrent state (coverage_t). A new linear transform on encoder_outputs is used to produce coverage_weights, which has the same dimensionality as encoder_outputs, and implicitly models the fertility of source-side words (and putting this extra information strain on the encoder network).

Thus the encoder output, the decoder state, and the coverage weights have the same dimensionality for a given source word, and attention logits are calculated as v *  tanh(coverage * coverage_weights + encoder_output + decoder_state).

Note: the entire coverage state for each translation instance is of shape (encoder_length, coverage_units), but the states for the RecurrentNetwork operator, used to train the decoder, must be flat in the data dimension. This state is therefore initialized with shape (encoder_length * coverage_units) [not shown in the open-source library] and reshaped appropriately within the apply_soft_coverage_attention() function.

Differential Revision: D5593617

fbshipit-source-id: 7d0522b5eb0b26f22e8429e4461a459f2f16ed46
2017-08-31 21:21:54 -07:00
7eba614503 RNNCell: Initializers interface, simplify _LSTM helper
Summary:
_LSTM helper is a legacy piece we had before all the RNNCell awesomeness landed. Now we need to pull it apart and create separate building blocks that people can use for any RNNs.

Please note changes to a test with double scoping. That should go away once we change RNNCell scoping logic in such a way that each cells ads its own name to the scope for all of its outputs (see another diff: D5613139 )

Reviewed By: jhcross

Differential Revision: D5632276

fbshipit-source-id: 1cb568ab995c4c0b3dd1b4bad2d028e34bded9c1
2017-08-25 12:01:24 -07:00
e89474c496 fix forward_only mode
Summary:
Forward-only mode had broken at some point. Two things: RNNCell did not pass the parameter to recurrent.py and also recurrent.py was broken if forward_only=True after python3 codemod.

Added test to rnn_cell_test to actually check the forward only parameter is passed to prevent future breakage.

Reviewed By: jmp84

Differential Revision: D5639306

fbshipit-source-id: b1bbc39d59c3f3734b2f40a1c2f3740c733e0bd4
2017-08-17 10:19:04 -07:00
a7be496fe2 Revert D5589309: modify _LSTM into _RNN to adapt GRU
Summary:
This reverts commit f5af67dfe0842acd68223f6da3e96a81639e8049

bypass-lint

Differential Revision: D5589309

fbshipit-source-id: 79b0a3a9455829c3899472a1368ef36dc75f6e14
2017-08-10 16:42:41 -07:00
7b86a34610 modify _LSTM into _RNN to adapt GRU
Summary: GRU is different than LSTM that it only has hidden states but no cell states. So in this case, reusing the code of _LSTM is problematic, as we need to delete the part of creating cell state, and change many other places that use hard-coded 4 (hidden_all, hidden, cell_all, cell) into 2 (hidden_all, hidden). Otherwise GRU will break during the backward pass, when the optimizer tries to apply gradient to each of the parameters, because cell state is never used, so it does not have gradients for the corresponding parameters (i.e., cell_state_w, cell_state_b).

Differential Revision: D5589309

fbshipit-source-id: f5af67dfe0842acd68223f6da3e96a81639e8049
2017-08-09 13:24:45 -07:00
4d8a8c2e1e Implement dot attention
Summary:
Implement dot attention as described in https://arxiv.org/abs/1508.04025
This saves the computation of weighted encoder outputs in `rnn_cell.py`
When the encoder and decoder dimensions are different, we apply an FC, which corresponds to the general case below Figure 2.
Refactored unit tests.

Reviewed By: jhcross

Differential Revision: D5486976

fbshipit-source-id: f9e9aea675b3b072fbe631bc004199b90a9d95cb
2017-08-06 11:50:16 -07:00
5449afa855 use model.create_param instead of using param_init_net directly
Summary: When creating parameters for modelhelper, we should use create_param instead of using param_init_net and model.params directly. The diff rewrite some of these cases in rnn_cell.py in order to make model._parameter_info and model.params consistent.

Reviewed By: kittipatv

Differential Revision: D5477724

fbshipit-source-id: 28c4aaf8f98d9d89125af6a42ad328008f0079e1
2017-07-24 21:17:24 -07:00
0eda7955bd use internal cell for DropoutCell output prep methods
Summary:
In order to get dimensions right, correctly identify gradients, etc., DropoutCell should call the _prepare_output and _prepare_output_sequence methods of its internal cell for its own such methods.

This bug was identified by NVIDIA intern Syed Tousif Ahmed.

Reviewed By: akyrola

Differential Revision: D5483082

fbshipit-source-id: f6df5b4a0502ed0771056638aab219fb5cc7d964
2017-07-24 14:53:11 -07:00
99e79a616b attention with encoder_lengths
Summary:
For RNN attention, we should not include the invalid parts of the encoder output (based on encoder_lengths) in the computation. This diff accomplishes that by forcing logits for those positions to be negative infinity.

Note that the this step can be bypassed by passing encoder_lengths=None, which is what we do for beam search, thus incurring no extra overhead for inference.

Reviewed By: jamesr66a

Differential Revision: D5402547

fbshipit-source-id: 1863d6050b5129e4df829c6357f0aa9ded0715dc
2017-07-23 10:06:01 -07:00
5355634dac Dict fixes/improvements and unittest targets for Python 3 in caffe2 core
Summary: As title

Reviewed By: salexspb

Differential Revision: D5316104

fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30
2017-06-29 17:05:41 -07:00
29887f556f Unrolled test for AttentionCell
Summary: Adding a test to check computational integrity of networks constructed with AttentionCell using UnrolledCell.

Reviewed By: salexspb

Differential Revision: D5306915

fbshipit-source-id: 02acfd1011f7d3ee5fac21cc2778c4a486190c43
2017-06-25 17:21:24 -07:00
ccc46229af Fix residual connections
Summary:
This diff fixes gradient computation of residual connections for a training network constructed with MultiRNNCell.

It addresses a logic bug in _prepare_output() and _prepare_output_sequence() by keeping track internally of which layers have consecutive residual connections before the output, and then reconstructing the final residual output by (re-)preparing the output of each of those layers and then combining them with a Sum operation. This also involves keeping track of which states contribute toward the reconstruction of the final sequence output so that outputs_with_grads can be correctly passed to apply_over_sequence().

Differential Revision: D5300520

fbshipit-source-id: f37d800c909e631175de7045abe192351cc11c41
2017-06-23 11:36:22 -07:00
eefd4b0bb2 Static RNN: gpu support and lstm_benchmark integration
Summary:
While this is not intended to be the best performat and
general solution, we can see from the test plan in some cases static DAG RNN could
perform better than our own implementation. Hopefully we will get
dynamic RNN DAG execution at least as fast as this one. Then we will
not need this one in production, only for testing.

Still putting it into our benchmark for comparison purposes

Reviewed By: akyrola

Differential Revision: D5210038

fbshipit-source-id: fa44baf51c455872abd6ec5f5d151cf06e15b1fa
2017-06-16 11:31:43 -07:00
2a9cb7d4a9 use brew for Tranpose --> major perf regression fix
Summary: I accidentaly noticed that we were calling the non-CUDNN version of Transpose with attention, and it is super slow. This broke when rnn_cell was changed to use ModelHelper instead of CNNModelHelper in D5062963, but calls to transpose were not "brewed".

Reviewed By: jamesr66a

Differential Revision: D5264248

fbshipit-source-id: b61494ae210f34597245f1195d20547f5b5cd8b5
2017-06-16 11:02:48 -07:00
df72826ead Static RNN
Summary:
Static RNN allows to unroll an RNN into Caffe2 graph using all existing cell abstractions. In this diff I introduce several new tests that already caught a few bugs in our RecurrentNetworkOp gradient accumulation logic by comparing it to an unrolled version.

Another use case is perf - potentially we can run an unrolled net faster because DAGNet will have access to the whole graph. Same about memonger. But this work is not part of this diff

Reviewed By: akyrola

Differential Revision: D5200943

fbshipit-source-id: 20f16fc1b2ca500d06ccc60c4cec6e81839149dc
2017-06-08 17:48:48 -07:00
d524d5b481 Fixes zip/izip for Python 3
Summary: As title

Reviewed By: salexspb

Differential Revision: D5154186

fbshipit-source-id: 2ef24557d82ae16d3bdfbc90a4cc96be8e2dc6c3
2017-06-07 00:04:26 -07:00
4bed0c6d41 Update RNN Seq2SeqModelCaffe2EnsembleDecoder to reflect training network structure
Summary: Use new blob as residual sum output, and add scoping to prevent any name conflicts.

Reviewed By: urikz

Differential Revision: D5167145

fbshipit-source-id: a01c87ed2278205e95e8395314b166afb1dca1b3
2017-06-01 23:32:35 -07:00
03503140fd DropoutCell as wrapper for another RNNCell
Summary: Added a new RNNCell, DropoutCell, which wraps an existing RNNCell and applies dropout to its primary output (as defined by get_output_state_index()).

Reviewed By: salexspb

Differential Revision: D5084871

fbshipit-source-id: 60474af84e5757a12e7fdc3814840dc9ba8e32a1
2017-05-24 11:36:45 -07:00
c39f6cf2d0 gradient accumulation fix
Summary: As noted by salexspb, MultiRNNCell had unreliable gradient computation. The problem was that recurrent gradient and gradient computed wihtin the backward step net were not being accumulated during the backward pass, but rather writing to the same blob, thus overwriting each other. This diff fixes that by artificially introducing an extra blob for the internal output, and then accumulating it into the gradient coming from the recurrent connection.

Reviewed By: salexspb

Differential Revision: D5110059

fbshipit-source-id: 16add50989fe8866361bbc21afce5f214c5292fd
2017-05-24 10:33:32 -07:00
83f6dceaa6 remove forget_bias as argument to AttentionCell constructor
Summary: argument unsused.

Differential Revision: D5096088

fbshipit-source-id: fcda8a1d2b0d7c85182ab5bc002c86640b443f97
2017-05-19 16:53:40 -07:00
f27c9eea20 dropout for C2 multilayer
Summary:
Incorporate arbitrary dropout for encoder and decoder layers for Caffe2 NMT models using current configuration. This involves separate output processing (_prepare_output() and _prepare_output_sequence()) for the final layer in a MultiRNNCell.

Switching to using the newly introduced forward_only switch for RNN cells revealed an unrelated bug in our NetGradientChecker test, which urikz is investigating.

Reviewed By: salexspb

Differential Revision: D5031964

fbshipit-source-id: 19b49607d551aa3e2140041ef4e585f128c8f178
2017-05-17 11:32:47 -07:00
37c06a3ba8 residual connections in multilayer C2 ('add' only)
Summary:
Residual connections for multilayer RNN encoder/decoder for Caffe2 NMT model. Only supporting 'add' connections (the standard approach, which ves's TF experiments concluded was at least as good as other approaches), and also only implementing for residual_level >= 1 (which also fits our use case).

It is the responsibility of the config to ensure dimension compatibility: each level at and beyond residual_level (in both the encoder and decoder) should have the same number of units, with the exception that a bidirectional initial encoder layer should have half the number of units of the succeeding layer if that next layer is a residual layer.

Differential Revision: D5023160

fbshipit-source-id: f38c1b140638fee78cf3ef7d6b4602dd462484ee
2017-05-16 17:04:58 -07:00
a28b01c155 rnn with brew
Summary:
Update rnn_cell.py and char_rnn.py example with new `brew` model.

- Deprecated CNNModelHelper
- replace all helper functions with brew helper functions
- Use `model.net.<SingleOp>` format to create bare bone Operator for better clarity.

Reviewed By: salexspb

Differential Revision: D5062963

fbshipit-source-id: 254f7b9059a29621027d2b09e932f3f81db2e0ce
2017-05-16 13:33:44 -07:00
e8c274cf16 Optimize memory usage for MI-LSTM
Summary:
Use ElementwiseLinearOps instead of manual Mul + Sum. That saves intermediate blobs.

For NMT use case

Before: https://our.intern.facebook.com/intern/fblearner/details/18060753
Time per step: 0.072
memory usage (per each of 2 GPUs): 9041MiB

After:https://our.intern.facebook.com/intern/fblearner/details/18107583
Time per step: 0.0715
Memory (per each GPU): 8560MiB

Reviewed By: akyrola

Differential Revision: D5038785

fbshipit-source-id: 4bc8155dbd0c87729e17236d68d62ca530aadb53
2017-05-10 16:53:43 -07:00
ae924be3ac Removing extra Reshapes in MILSTM with new broadcasted ops
Summary: D4873222 introduced SumReduceLike and removed the use_grad_hack ... hack. Remove unnecessary reshapes and kill use_grad_hack parameters.

Reviewed By: jamesr66a

Differential Revision: D4894243

fbshipit-source-id: c4f3f84abf95572d436b58bbdc2b18b21583c2f1
2017-05-09 14:11:04 -07:00