Summary: Net construct bench was using old version of data_parallel_model API.
Reviewed By: bddppq
Differential Revision:
D5453281
Tags: easy
fbshipit-source-id: 93e1ba58511c7b25235ee50d9862fd0614b344c9
Summary:
Migrate experiments folder to fb/sparse folder. Keep FunHashOp and SparseFunHashOp because they are now assumed as a default Op in depr. What I did
# Migrate FunHashOp and SparseFunHashOp and their unitests to core-caffe2, make sure tests are passed.
# Migrate other Ops in experiment folder to fb/sparse folder. Write new TARGETS files for them. Make sure tests are passed.
# Make sure all related tests passed.
# Fix MKL definition btw. Make sure that FC_Sparse is not compiled when there is no MKL support
Reviewed By: salexspb
Differential Revision: D4952993
fbshipit-source-id: 86c03676ab4e47f04d2d0dd438a4a1c849bbbff0
Summary:
I have noticed that constructing the Xray model takes quite a while. To measure this, I wrote a benchmark script that creates a resnet-50 model on 8 gpus. This takes about 95 secs -- which is kind of annoying when you want to quickly debug stuff.
Profiling (using Python's cProfile), I was able to see that the most of the time is used in net.BlobIsDefined(), which does a linear search over external inputs and operator outputs. Thus it gets slower and slower with large nets. This can be fully optimized by keeping a separate lookup table of operator inputs and outputs (and external inputs and outputs). It is a bit annoying to keep this separate data structure, but I setup the unit tests to ensure things are doing correctly over Clones.
After the optimization, the net construction drops from 95 secs to 8.2 secs!
Reviewed By: azzolini
Differential Revision: D4288307
fbshipit-source-id: 0bb82c8bde9d86a2702b298f4aa706cba509346e