Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:
```2to3 -f future -w caffe2```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033
Reviewed By: seemethere
Differential Revision: D23808648
Pulled By: bugra
fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
* [easy] allow empty tensor in cuda relu op
The diff has not enabled unit test of empty tensor, because MLKVersion of ReluOp need extra work to support
* Make blob norm plotting work with distributed trainer when the old framework is used
Ignore backward step when there is no loss function;
For some customized model, we can encode the update directly in forward step and there is no backward step;
Summary: Saving 2 nets at offline training and loading the correct net the user want. The keep_device=false will help us load gpu blobs to CPU memory.
Reviewed By: dzhulgakov
Differential Revision: D5396689
fbshipit-source-id: ff26bf3759856b07f3a1bbefac4a1e613a8a02e1
Summary:
===Update log 7/10===
We are now restrained from problem of connection. Will post if this problem does not fix in 2hrs.
===Update 7/6===
Luke is experimenting on the convergence of this diff. Hopefully he could present results next week
Right now this is not affecting our original CPU training pipeline because the loading op is still correct in CPU situation now.
I will need final test to make sure. But that is now blocked by log device issue t19952135
I will do CPU/GPU nets saved in a separate diff.
====Update before 7.4====
It's actually working! Include local run screenshot
{F67959016}
dogscience
Reviewed By: dzhulgakov
Differential Revision: D5307058
fbshipit-source-id: cad5d9324c239419530f4b120392ec2ccbb72280
Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand
Reviewed By: akyrola
Differential Revision: D5142049
fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc
Summary: In some cases (for example, when include_tags option is used) output_schema contains blobs that aren't produced by the generated net. In this case we want to filter them from output_schema as well.
Differential Revision: D5120115
fbshipit-source-id: f98ea3f747589390b039d1e1987becec3980634c
Summary:
Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context.
Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN.
Reviewed By: kennyhorror
Differential Revision: D4964949
fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202
Summary: This diff allows to export a model partially, filtering layers by tags.
Reviewed By: kittipatv
Differential Revision: D4885610
fbshipit-source-id: 65394c5c9119d57a4d0703aa67ad8e79e4370e3b
Summary:
This diff is adding eval nets to layer model helper. It should be useful for
the cases when train/eval nets need some extra input (usually some supervision)
for train/eval. For example various sampled layers, etc.
Differential Revision: D4769453
fbshipit-source-id: 7a8ec7024051eab73b8869ec21e20b5f10fd9acb
Summary: Currently, we cannot have layer constant because layer params are required to have gradient and optimizer. Global constants don't cut for this because it can only be added once; therefore, a layer that add any global constant can only be used once.
Differential Revision: D4773212
fbshipit-source-id: 5b60d31f3c1602afb04b61f6d30b8e3e06ed2de3
Summary: This diff is getting rid of old metrics interface in realtime training.
Reviewed By: xianjiec
Differential Revision: D4649734
fbshipit-source-id: de4af85eb5476df9790ebd3915625bf8beee65af
Summary: The evaluation part of the two tower workflow is missing. This diff is to complete it. Part of the newly added functions can be used for other workflows, eg, feed. As the eval workflow in different workflows will be overlapped, a generic eval workflow will be added in a separate diff.
Reviewed By: kennyhorror
Differential Revision: D4646880
fbshipit-source-id: 4d6eb35df10f6f613533d442f2a04dc0332386f8
Summary:
Just the first version displays forward part of the training net. I want to refactor local/distributed code to share graph initialization and then visualize all nets individually.
Graphs don't look pretty because of a lot of DotProducts, we need to refactor it.
Reviewed By: xianjiec
Differential Revision: D4514479
fbshipit-source-id: 156bb07c62118b15022c87f197b5e378a7ef3b9f