(1) Registry now uses std::function for more flexible use cases.
(2) dropout adds an "is_test" keyword.
(3) Making all gradient registered via C++. Python still provides gradient wrapper.
TODO item is to make the autograd SSA in C++ if possible. Problem is if we want to dynamically
register python gradients we will be sort of screwed because in c++ things are registered
via static variables.
(1) Loss: do not coerce a gradient output. Although it may be numerically more efficient to do so, it makes the definition of a loss kind of funny if one does not really want to run backward pass.
(2) Autodifferentiation: allow more explicit in-place check, in-place is now opt-in, and implemented a simple SSA/IR gradient generation scheme. Also added some core gradient tests.
Misc bugfixes as well.
(1) cudnn for conv
(2) cublas: after going through the work I feel it's beter to use HOST pointer mode, so changed it.
(3) storage order: despite that googlenet and multibox uses NHWC, it seems better to be still using
NCHW as default to be consistent with caffe and cudnn; moved to NCHW as default.
(1) various bugfixes.
(2) Tensor is now a class independent from its data type. This allows us
to write easier type-independent operators.
(3) code convention changes a bit: dtype -> T, Tensor<*Context> -> Tensor* alias.
(4) ParallelNet -> DAGNet to be more consistent with what it does.
(5) Caffe's own flags library instead of gflags.
(6) Caffe's own logging library instead of glog, but glog can be chosen with
compile-time definition -DCAFFE2_USE_GOOGLE_GLOG. As a result, glog macros
like CHECK, DCHECK now have prefix CAFFE_, and LOG(*) now becomes
CAFFE_LOG_*.
(7) an optional protobuf inclusion, which can be chosen with USE_SYSTEM_PROTOBUF
in build_env.py.